Processing Network Capture Data

In order to solidify our understanding of the “enveloping” nature of the layered approach to networking, you are going to parse a binary dump of network data. During this exercise you will:

By the time you are finished you will have written a program that interacts with critical sections of some very common and important network protocols including: Ethernet, TCP, IP, and HTTP.

Stage Zero: What Do We Know About The Provided Data?

In order to properly parse any data, we have to know what the format of that data is. The file we’ve provided to you represents data in the format that it was sent across the internet. The data was collected using a network capture tool, and represents the total data sent during a single HTTP request/response cycle. The HTTP request was for a particular jpg image which was delivered by a server. Your ultimate goal is to extract this image data and write it to a file on your computer.

As we have learned, as data travels down the network layer hierarchy, the layer below accepts the data and wraps it with it’s own format. In our capture we have 4 of the 5 layers of data represented:

In addition to our knowledge of the internet layers, you need know that this data is saved in a specific file format, which is documented here. There are two notable facets to the .cap savefile: the global header and the per packet header.

Even the most careful reading of the documentation will not tell you anything about which protocols are represented at each network layer. In fact, without reading the data at each layer all we can know is that we have data like this:

Here are some useful facts about this specific capture which can help you validate your findings as you go:

Some Advice: When doing binary parsing, it is incredibly helpful to program defensively by using lots of assertions to detect if something is wrong with our assumptions.

Stage One: Read The Pcap Headers

The Global Header

Before you start programming, try to manually parse the global Pcap header. This will give you some practice “thinking in binary” and it will force us to encounter and tackle the concept of “endian-ness”. Use the xxd command to turn the provided binary dump into a hex dump then, Using the pcap documentation:

The Per Packet Header

The bits immediately following the global header will be the first per-packet header data. Parse these values manually as well:

Now, you should start writing a program. You can use any language you want for this, provided the language has mechanisms for reading binary data (but even JavaScript has this so if you’re choosing a popular language it should be no problem). Your goal is to read every individual packet header. You should write a program that can:

Stage Two: Read The Ethernet Headers

Once you’ve read the per-packet header for each packet, you want to peel off one more layer. The next layer will be the Link Layer, in our case Ethernet. You should be able to verify from the global header the type of the link layer used in this capture. It’s valuable practice to first try and use Google to find the exact specification for this header.

If you spend more than 10 minutes trying to track it down and only encounter frustration, look ahead at the spoilers section for a direct link to the header.

Once you have determined the format of the header, extend your program to:

Stage Three: Read The IP Headers

Once again, you should strive to find the specification of the IP header yourself, but it is linked below in the spoilers section. Once you’ve determined the format of the header you should be able to extend your program to:

Keep in mind, only one of the two hosts ever sends image data. If you want to build the image from the data here, you will need to filter the packets somehow…

Stage Four: Read The Transport Headers

Once you’ve parsed the IP headers, you’ll know where the transport headers start, and which protocol is being used. Once again, you may find the specification yourself, and it is linked in the spoilers section. Once you have this information you should extend your program to:

Keep in mind, packets can arrive out of order. If you want to build the image, you’ll need to reconstruct the original order somehow…

Stage Five: Parse The HTTP Data

You should have already stored all the data somewhere, now you need to put it in order, and parse it as an HTTP request. Extend your code to:

Spoilers:

Bradfield

[email protected]
576 Natoma St
San Francisco, California
© 2016 Bradfield School of Computer Science