Bradfield

SF Python: Working with Binary Data

Hi! Thanks for joining the workshop.

My objective is to increase your comfort level working with binary data in Python.

Many files are encoded in binary, including images, audio/video, PDFs and most executables. Similarly, many data transmission protocols are binary protocols, including TCP and IP.

Perhaps you have only worked so far with text files and text-based protocols like HTTP. That’s OK! This workshop is designed to help you take the leap to working with binary.

We teach these areas in much more depth in our course Computer Architecture and the Hardware/Software Interface, which our students rate as one of the most surprisingly valuable courses they take with us. While ostensibly about how computers work, it’s actually a great way to understand how we typically represent and process data, which is important knowledge for higher level programming, too.

The workshop itself is actually taken though from our Computer Networking course: most networking protocols are binary, so we use this exercise early on to make sure that students are comfortable with binary.

Instructions

We have recorded a packet capture of an HTTP request and response for an image, performed over an imperfect network. The challenge for you is to parse the capture file, find and parse the packets constituting the image download, and reconstruct the image! It’s like a murder mystery, except with a trail of binary data and a hero rather than a villain at the end of it.

Steps:

  1. Download the pcap file
  2. Make sense of the file, using man pcap-savefile or the online version as a reference, and either Python code or a command line tool like hexdump or xxd
  3. Figure out how to parse out the individual captured packets. There should be 99 in total.
  4. Figure out how to parse and make sense of the ethernet frames.
  5. Figure out how to parse and make sense of the IP datagrams.
  6. Figure out how to parse and make sense of the TCP segments. Which ones will we need?
  7. Reconstruct the correct TCP segments to retrieve the HTTP message.
  8. Write the HTTP body to an image and open it!

This is actually a long exercise… you are unlikely to complete it tonight 🙂. But! Every step will teach you a little more about working with binary.

Staying in touch

Feel free to email me directly with feedback or questions! I’m oz@bradfieldcs.com. If you’re interested in diving deeper into either the binary data or networking aspects of this exercise, you should consider joining our next Computer Architecture and/or Computer Networking courses.

If you’d like to generally stay in touch, and receive updates from us on workshops, courses, computer science learning resources and the tech news that matters, we have a mailing list for that:

Finally, if you would like to dive deeper into computer science generally, but don’t know the best resources or overall plan, check out our microsite Teach Yourself Computer Science.

hello@bradfieldcs.com
1141 Howard St
San Francisco, California
© 2016 Bradfield School of Computer Science