Quick and dirty pcap slicing with tshark and friends
Network protocols are complex. Reconstructing data structures out of pcap-formatted datastreams manually is tough. Packet loss and fragmentation make things tougher. Analyzing anything above the transport layer is nigh but impossible with tcpdump. Wireshark is slick. Real slick. Wireshark comes with dissectors for a multitude of protocols, common and obscure, that remove a lot of the legwork from analysis and let you focus on the context of the data.
Sometimes, however, Wireshark can just be too cumbersome for the task at hand. Wireshark has great filters, but extracting packet data with any kind of precision can be extremely painful. The command line tool 'tshark' comes bundled with Wireshark and gives you the power of Wireshark's dissectors along with your favorite unix command line tools. For large or complex projects, your best bet is to sit down and roll a custom tool using a libpcap interface, but if you're in a rush (or lazy), you might be able to bang out a few one-liners to overcome complex challenges.
I've got a simple pcap with some HTTP traffic. Let's look at some basic tshark output:
bnull@ubuntu:~$ tshark -r sample.cap -x
1 1 0x08a7 (2215) 0 0.000000 172.16.218.55 -> 74.125.91.104 HTTP GET / HTTP/1.1
0000 00 50 56 f7 59 da 00 0c 29 d7 46 4b 08 00 45 00 .PV.Y...).FK..E.
0010 02 c7 08 a7 40 00 40 06 03 5d ac 10 da 37 4a 7d ....@.@..]...7J}
0020 5b 68 e3 25 00 50 4e 82 9c 38 88 85 7d ff 50 18 [h.%.PN..8..}.P.
0030 f9 42 33 f5 00 00 47 45 54 20 2f 20 48 54 54 50 .B3...GET / HTTP
[... snip ...]
247 2617 0x08d0 (2256) 0 0.451780 172.16.218.55 -> 74.125.91.104 TCP 58149 > http [ACK] Seq=2617 Ack=88029 Win=63810 Len=0
0000 00 50 56 f7 59 da 00 0c 29 d7 46 4b 08 00 45 00 .PV.Y...).FK..E.0010 00 28 08 d0 40 00 40 06 05 d3 ac 10 da 37 4a 7d .(..@.@......7J}
0020 5b 68 e3 25 00 50 4e 82 a6 70 88 86 d5 db 50 10 [h.%.PN..p....P.
0030 f9 42 53 99 00 00 .BS...
Great, as you can see, this looks a lot like standard tcpdump output, with the addition of some HTTP-specific header info, as well as the frame numbers (like you would expect to see in wireshark). In addition to standard features that you would find in tcpdump, we can pass in filter queries with the "-R" flag:
bnull@ubuntu:~$ tshark -r sample.cap -x -R frame.number==3
3 1 0xdba4 (56228) 0 0.047745 74.125.91.104 -> 172.16.218.55 HTTP HTTP/1.1 200 OK 1F8B08000000000002FFCD7DE97ADB38B2E8FF3C05C34CDB...
0000 00 0c 29 d7 46 4b 00 50 56 f7 59 da 08 00 45 00 ..).FK.PV.Y...E.
0010 05 b2 db a4 00 00 80 06 2d 74 4a 7d 5b 68 ac 10 ........-tJ}[h..
0020 da 37 00 50 e3 25 88 85 7d ff 4e 82 9e d7 50 18 .7.P.%..}.N...P.
0030 fa f0 3c 93 00 00 48 54 54 50 2f 31 2e 31 20 32 ..<...HTTP/1.1 2
0040 30 30 20 4f 4b 0d 0a 44 61 74 65 3a 20 54 75 65 00 OK..Date: Tue
[... snip ...]
We could just as easily filter based on HTTP data:
bnull@ubuntu:~$ tshark -r sample.cap -x -R "http.request.uri contains .png"
25 1 0xefbd (61373) 0 0.063654 172.16.218.55 -> 74.125.91.104 HTTP GET /images/nav_logo72.png HTTP/1.1
0000 00 50 56 f7 59 da 00 0c 29 d7 46 4b 08 00 45 00 .PV.Y...).FK..E.
0010 02 df ef bd 40 00 40 06 1c 2e ac 10 da 37 4a 7d ....@.@......7J}
0020 5b 68 e3 24 00 50 4e 89 76 4a 19 1a f5 d4 50 18 [h.$.PN.vJ....P.
0030 f8 fa 2b 33 00 00 47 45 54 20 2f 69 6d 61 67 65 ..+3..GET /image
0040 73 2f 6e 61 76 5f 6c 6f 67 6f 37 32 2e 70 6e 67 s/nav_logo72.png
0050 20 48 54 54 50 2f 31 2e 31 0d 0a 48 6f 73 74 3a HTTP/1.1..Host:
[... snip ...]
If we were only interested in the data from a specific field, we can use the "-Tfields" flag along with "-e" to specify which field we want:
tshark -r sample.cap -Tfields -e http.request.uri
/
[... snip ...]
/images/nav_logo72.png
/images/logos/ps_logo2.png
/gb/images/b_5dae6e31.png
/extern_js/f/CgJlbhICdXMrMEU4ACwrMFo4ACwrMA44ACwrMBc4ACwrMDw4ACwrMFE4
ACwrMFk4ACwrMAo4AEAvmgICcHMsKzAWOAAsKzAZOAAsKzAlOAAsKzAqOAAsKzA
rOAAsKzA1OAAsKzBAOAAsKzBBOAAsKzBNOAAsKzBOOAAsKzBTOACaAgZzZWFyY
2gsKzBUOAAsKzBfOAAsKzBjOAAsKzBpOAAsKzB4OAAsKzB0OAAsKzAdOAAsKzBcO
AAsKzAYOAAsKzAmOAAsgAJIkAI_/-WCyLULntH4.js
[... snip ...]
In the case of HTTP traffic, we can use this option to quickly rip out all of the cookies, URI's, host headers, file names, etc. There's lots available here to power some neat scripts when all we've got to work from is a pcap.
If you're in a situation where you need to edit a packet for whatever reason (fixing up a mangled pcap maybe), you may run into problems. It can be cumbersome to edit a pcap file in a hex editor directly, as you need to fixup your offsets. Grepping out the hex from an individual frame, you can make simple modifications with text parsing tools and recreate the frame with another Wireshark-bundled tool called 'text2pcap'. It will take your hex dump and convert it back into pcap:
bnull@ubuntu:~$ tshark -r sample.cap -x -R frame.number==4 | grep '^0'
0000 00 50 56 f7 59 da 00 0c 29 d7 46 4b 08 00 45 00 .PV.Y...).FK..E.
0010 00 28 08 a8 40 00 40 06 05 fb ac 10 da 37 4a 7d .(..@.@......7J}
0020 5b 68 e3 25 00 50 4e 82 9e d7 88 85 83 89 50 10 [h.%.PN.......P.
0030 f9 42 ad 85 00 00 .B....
bnull@ubuntu:~$ tshark -r sample.cap -x -R frame.number==4 | grep '^0' | sed 's/85 00 00/85 85 85 85 85 85 85 85 85 85 85 85 00/' > frame4.txt
bnull@ubuntu:~$ text2pcap frame4.txt frame4.pcap
Input from: frame4.txt
Output to: frame4.pcap
Wrote packet of 64 bytes at 0
Read 1 potential packet, wrote 1 packet
bnull@ubuntu:~$ tshark -r frame4.pcap -x
1 1 0x08a8 (2216) 0 0.000000 172.16.218.55 -> 74.125.91.104 TCP 58149 > http [ACK] Seq=1 Ack=1 Win=63810 [TCP CHECKSUM INCORRECT] Len=0
0000 00 50 56 f7 59 da 00 0c 29 d7 46 4b 08 00 45 00 .PV.Y...).FK..E.
0010 00 28 08 a8 40 00 40 06 05 fb ac 10 da 37 4a 7d .(..@.@......7J}
0020 5b 68 e3 25 00 50 4e 82 9e d7 88 85 83 89 50 10 [h.%.PN.......P.
0030 f9 42 ad 85 85 85 85 85 85 85 85 85 85 85 85 00 .B..............
The grep syntax filters out the non-hex lines. Also, text2pcap doesn't care about any of the trailing ASCII, so we're not worried about our misaligned tab in that last line. Neat, right? If we wanted to do this on a larger scale, we could even use mergecap (yet another Wireshark-bundled tool) to glue all of the packets back together. In fact, let's bring this all together and run through a few quick one liners to do something that would otherwise be sort of a pain in the ass. In this example, let's say that there was a lot of latency on the network and we had some sort of a sequencing issue. I would like to re-order the packets for only the return traffic, basing the new order on a packet's IP identification number. Here's the various syntax snippets that we're going to use:
Find each frame's IP identification number, to be used for sorting:
tshark -Tfields -e ip.id -r sample.cap -R "frame.number==<each frame #> && ip.dst==172.16.218.55"
Pull the hex out of each frame that we're inspecting and save it off in a sortable fashion:
tshark -r sample.pcap -x -R frame.number=="<each frame #> && ip.dst==172.16.218.55" | grep '^0' > tmp/<id #>.<frame number>
Build the text back into pcaps, creating an ordered list as we go:
text2pcap tmp/<file> tmp/<file>.cap && export PKTORDER="$PKTORDER tmp/<file>file.pcap"
Glue everything back together. 'mergecap' takes the "-a" flag to override packet timestamp and merge your inputs in the order that you supply them:
mergecap -a -w reconstructed.pcap $PKTORDER
Here's what it looks like:
# Pulling out our hex:
bnull@ubuntu:~$ for FRAME in $(tshark -r sample.cap | awk '{print $1}'); do export IDNUM=`tshark -Tfields -e ip.id -r sample.cap -R "frame.number==$FRAME && ip.dst==172.16.218.55"` && tshark -r sample.cap -x -R "frame.number==$FRAME && ip.dst==172.16.218.55" | grep '^0'> tmp/$IDNUM.$FRAME; done
# Building the pcaps:
bnull@ubuntu:~$ for TXT in $(ls tmp | sort); do text2pcap tmp/$TXT tmp/$TXT.pcap && export PKTORDER="$PKTORDER tmp/$TXT.pcap"; done
Input from: tmp/0xdba3.2
Output to: tmp/0xdba3.2.pcap
Wrote packet of 60 bytes at 0
Read 1 potential packet, wrote 1 packet
Input from: tmp/0xdba4.3
Output to: tmp/0xdba4.3.pcap
Wrote packet of 1472 bytes at 0
Read 1 potential packet, wrote 1 packet
Input from: tmp/0xdba5.5
Output to: tmp/0xdba5.5.pcap
[... snip ... ]
Input from: tmp/0xdc2d.245
Output to: tmp/0xdc2d.245.pcap
Wrote packet of 60 bytes at 0
Read 1 potential packet, wrote 1 packet
Input from: tmp/0xdc2e.246
Output to: tmp/0xdc2e.246.pcap
Wrote packet of 269 bytes at 0
Read 1 potential packet, wrote 1 packet
# Merge it back together
bnull@ubuntu:~$ mergecap -a -w reconstructed.pcap $PKTORDER
# And let's just spot check that the data looks reasonable...
bnull@ubuntu:~$ tshark -r reconstructed.pcap
1 1 0xdba3 (56227) 0 0.000000 74.125.91.104 -> 172.16.218.55 TCP http > 58149 [ACK] Seq=1 Ack=1 Win=64240 Len=0
2 1 0xdba4 (56228) 0 0.000000 74.125.91.104 -> 172.16.218.55 HTTP HTTP/1.1 200 OK 1F8B08000000000002FFCD7DE97ADB38B2E8FF3C05C34CDB...
3 1419 0xdba5 (56229) 0 0.000000 74.125.91.104 -> 172.16.218.55 HTTP Continuation or non-HTTP traffic AF8004DC623E9B1CEE01E07D375FBA419E86E47A16C46CBC...
4 2471 0xdba6 (56230) 0 0.000000 74.125.91.104 -> 172.16.218.55 HTTP Continuation or non-HTTP traffic E495499F257D454C660544D3E228EC518B1BA499EB4439E7...
5 3889 0xdba7 (56231) 0 0.000000 74.125.91.104 -> 172.16.218.55 HTTP Continuation or non-HTTP traffic B7D1AEB137AD6AE2B85BBB274C44ECDA7AE3D48FBC0734E2...
6 5307 0xdba8 (56232) 0 0.000000 74.125.91.104 -> 172.16.218.55 HTTP Continuation or non-HTTP traffic E1896511CD0690CE51543AA0349C63E3A41F02D8BD110CC6...
[... snip ... ]
137 33246 0xdc2b (56363) 0 0.000000 74.125.91.104 -> 172.16.218.55 HTTP Continuation or non-HTTP traffic C45522DF56C2292020EEC2644FD44EF1F8EA83A7E5EEC349...
138 34664 0xdc2c (56364) 0 0.000000 74.125.91.104 -> 172.16.218.55 HTTP Continuation or non-HTTP traffic 51AAA3D1CE1A53E3EA018643D6E97CD369E191EE3777F198...
139 87814 0xdc2d (56365) 0 0.000000 74.125.91.104 -> 172.16.218.55 TCP http > 58149 [ACK] Seq=87814 Ack=1946 Win=64240 Len=0
140 87814 0xdc2e (56366) 0 0.000000 74.125.91.104 -> 172.16.218.55 HTTP HTTP/1.1 204 No Content
bnull@ubuntu:~$
No problem.

