Understanding TCP Three-Way Handshake Capture Points

Preface

In network packet analysis, selecting the right capture points for TCP three-way handshake data is crucial. Capture points include the client, server, or middle end. If the middle end supports direct packet capture, multiple capture points can be used (e.g., before and after a device). This multi-point approach enables precise processing and filtering to diagnose network issues.

In real-world scenarios, complete packet data from all capture points is often unavailable. Sometimes, packets are captured only from the client, server, or middle-end, raising questions about the distinctions between capture points and whether a packet file can reveal if it originated from the client, server, or middle-end.

Capture points have always been profound to me. Despite similar analysis outcomes, data packets can show completely different phenomena at various capture points.

This article briefly summarizes the essential content of TCP’s three-way handshake data packets at various capture points from a personal perspective.



Introduction to TCP Three-Way Handshake

The TCP three-way handshake diagram is as above, mainly explained from several aspects:

  1. IRTT
  2. Length
  3. TTL
  4. Offload
  5. Other articles

The data packet analysis of the TCP three-way handshake involves many knowledge points. The selected aspects, IRTT, Length, TTL, and Offload, are mainly related to the capture points and can be analyzed more.

IRTT

What is IRTT? IRTT defined in Wireshark is the initial RTT in the TCP three-way handshake . The corresponding display filter fields and meanings are as follows:

tcp.analysis.initial_rtt
How long it took for the SYN to ACK handshake(iRTT) 

Example

Understanding TCP Three-Way Handshake Capture Points

The IRTT value is [iRTT: 0.001654000 seconds] , which is the time difference between the third packet ACK and the first packet SYN in the TCP three-way handshake.

For different capture points such as client, server and middleman, the above is actually a general way to calculate IRTT.

So what is the actual difference between each value or difference corresponding to the three different capture points of client, server and middle end?

Client

When capturing data packets on the client, a SYN is sent from the client, transmitted to the server through the intermediate network device, and then processed by the server and sent out a SYN/ACK to the client. At this time, the time difference between SYN/ACK and SYN is a . The client then returns an ACK, and the time difference between ACK and SYN/ACK is b .

Taking into account the processing power and delay of the protocol stack at each end, network propagation delay, transmission delay, etc., under normal circumstances, the relationship should be a > b .

Therefore, when the client captures packets, the time difference between SYN-SYN/ACK is higher than the time difference between SYN/ACK-ACK .

Is this conclusion correct in reverse? If the time difference between SYN-SYN/ACK is higher than the time difference between SYN/ACK-ACK , can we confirm that the data packet is definitely captured on the client? The answer is no. An obvious situation is to capture the data packet close to the client (this article defines this capture point scenario as the middle end, see below for details).

Server

When capturing data packets on the server, starting from receiving the client’s SYN, the server processes and returns SYN/ACK. At this time, the time difference between SYN/ACK and SYN is a . Then it is transmitted to the client through the intermediate network device. The client replies with ACK to confirm that it has been received by the server. At this time, the time difference between ACK and SYN/ACK is b .

Taking into account the processing power and delay of the protocol stack at each end, network propagation delay, transmission delay, etc., under normal circumstances, the relationship should be a < b .

Therefore, when the server captures packets, the time difference between SYN-SYN/ACK is lower than the time difference between SYN/ACK-ACK .

Similarly, is this conclusion correct in reverse? If the time difference between SYN-SYN/ACK is lower than the time difference between SYN/ACK-ACK , can we confirm that the data packet must have been captured on the server? The answer is no. Another obvious situation is to capture the data packet close to the server (this article defines this capture point scenario as the middle end, see below for details).

Middle End

When capturing data packets on the middle end, it starts with receiving the SYN from the client, then the server processes and returns the SYN/ACK, which is then received by the middle end . At this time, the time difference between SYN/ACK and SYN is a . After that, the server SYN/ACK is transmitted to the client, and the client replies with an ACK, which is returned to the middle end and received. At this time, the time difference between ACK and SYN/ACK is b .

Depending on the choice of the middle end, the relationship between a and b is uncertain. As mentioned above, if the packet is captured at the middle end close to the client, a > b; if the packet is captured at the middle end close to the server, a < b; if it happens to be at the real middle end, a ≈ b is also possible.

summary

The above is the IRTT analysis of the TCP three-way handshake at different capture points under normal circumstances. It is emphasized that under normal circumstances, the client and server operating system protocol stacks are considered to process extremely fast. If the client or server load, performance or other reasons lead to slow processing, this can only be judged based on the actual scenario.

Similarly, for the comparison of a and b values, whether they are larger or smaller depends on the distance between the client and the server. If the client and server communicate on the same access switch, the difference between a and b shown in the captured packets on the client or server will be very small. Assuming that it is a low-latency switch that makes the intermediate network latency extremely low, the relationship between a and b is even more difficult to say. This requires analyzing the rules from multiple data packet samples at the same capture point. If the client and server communicate on a wide area network or the Internet, the difference between a and b shown in the captured packets on the client or server will be very large.

Based on the above, the summary is as follows

Capture PointIRTT difference relationshipRemark
ClientThe time difference between SYN-SYN/ACK is higher than the time difference between SYN/ACK-ACK
Middle EndThe time difference between SYN-SYN/ACK is higher than the time difference between SYN/ACK-ACKClose to the client
The time difference between SYN-SYN/ACK is lower than the time difference between SYN/ACK-ACKClose to the server
The relationship between the SYN-SYN/ACK time difference and the SYN/ACK-ACK time difference is uncertainother
ServerThe time difference between SYN-SYN/ACK is lower than the time difference between SYN/ACK-ACK

Length

What is Length? Length as defined in Wireshark refers to the length of a packet or packet (in bytes) . The corresponding display filter fields and their meanings are as follows:

frame.len
Frame length on the wire

Minimum Length

As we all know, the minimum length of an Ethernet frame is 64 bytes, and the captured data packets generally do not contain FCS, so the minimum length of a data packet is 60 bytes, of which the minimum length of the data field is 46 bytes, which is usually referred to as the minimum MTU value.

14 bytes (Ethernet II header length) + 46 bytes (minimum data field length requirement) + 4 bytes (FCS) = 64 bytes

So how big is a standard pure ACK (without data)? 54 bytes , which is obviously less than the minimum length of 60 bytes required for a data packet.

14 bytes (Ethernet II header length) + 20 bytes (IPv4 header length) + 20 bytes (TCP header length) = 54 bytes

Therefore, in Ethernet transmission, it is necessary to fill in all 0 data to meet the minimum length requirement of 60 bytes. In the following example, the No.7 ACK data packet is filled with 6 bytes of 0 values.

But why can we still see data packets with a Length of 54 bytes in the above example ? No.3 (the third packet of the TCP three-way handshake) , No.6 (including FIN) and No.9 are pure ACKs.

The reason is the way (or principle) and location of Wireshark’s packet capture. If Wireshark captures packets locally, then the packets generated and sent locally are the packets captured before entering the network card, and the padding data and FCS are generally completed by the network card hardware/driver, so the 54-byte composition does not include the padding data. On the other hand, the 60-byte pure ACK is a data packet from the other party, including the padding data completed by the other party’s network card, and then captured by the local end when capturing packets.

summary

Therefore, if the length of the third packet in the TCP three-way handshake, that is, the ACK , is 54 bytes, then it is captured locally on the client , not on the intermediate end or the server . Why can it be defined like this? Because after the data packet is transmitted from the local client, the minimum data packet length captured in any intermediate segment will only be 60 bytes. Similarly, the third packet in the TCP three-way handshake is sent by the client, and of course it will only be 60 bytes when captured on the server.

Then, how to judge whether the capture point is the server? It is impossible to judge based on the TCP three-way handshake data packet alone. Only by combining the subsequent data packets and observing whether the pure ACK (or other data packets) sent by the server locally is less than 60 bytes, can we draw a conclusion. How to judge whether the capture point is the middle end? Observe all data packets including the TCP three-way handshake. If there is no data packet with a length less than 60 bytes, it is generally captured at the middle end.

Based on the above, the summary is as follows

Capture PointFrame LengthRemark
ClientThe third packet in the TCP three-way handshake and other pure ACKs are less than 60 bytesExcept when timestamp is selected
Middle EndAll packets, including the TCP three-way handshake, have no length less than 60 bytes.In special cases, there is a stripping of all 0-filled data
ServerThe TCP three-way handshake data packet cannot be judged alone. It is necessary to combine other data packets sent by the server to see if there is a length less than 60 bytes.

There is another case in Length, which is the packet length greater than the MTU. This is related to the Offload feature and will be outlined in the following chapters.

TTL

What is TTL? The TTL defined in Wireshark refers to the IPv4 TTL field. The corresponding display filter fields and their meanings are as follows:

ip.ttl
Time to Live,1byte

A timer field used to track the lifetime of the datagram. When the TTL field is decremented down to zero, the datagram is discarded.

Although TTL literally means the time it can survive, in fact, TTL is the maximum number of hops that an IP packet can be forwarded in the network. The TTL field is set by the sender of the IP packet. During the entire forwarding path of the IP packet from the source to the destination, each time it passes through a routing device, the routing device will modify the TTL field value. The specific method is to reduce the TTL value by 1 and then forward the IP packet. If the TTL is reduced to 0 before the IP packet reaches the destination IP, the router will discard the received IP packet with TTL=0 and send an ICMP time exceeded message to the sender of the IP packet.

The TTL value is different for different operating systems. The brief version is as follows

Device/OSTTLRemark
Linux64/255
Windows128

Example

Considering that the TTL field is set by the sender of the IP data packet, the TTL value in the TCP three-way handshake data packet can be checked. If it is not a standard value such as 64, 128, 255, it can be preliminarily determined that the capture point is an intermediate end.

On the other hand, if the TTL value is a standard value such as 64, 128, 255, can we determine whether the capture point is on the client or server? The answer is no. Because if there is a device like a layer 2 switch between the ends, the TTL value will not decrease by 1 when it passes through. For example, when capturing packets on the access switch where the client or server is located, the TTL value will also be a standard value such as 64, 128, 255. Therefore, in this case, it is necessary to combine the actual environment and use TTL to assist in judgment.

Offload

Regarding the network card offload feature, most operating systems support various forms of network offload, including IP fragmentation, TCP segmentation, reassembly, checksum verification and other operations performed in the TCP/IP protocol stack, which will be transferred to the network card hardware instead of the CPU. This can reduce the system CPU consumption and improve processing performance.

Depending on the Offload function, there are two aspects in packet capture and analysis: one is related to the packet Checksum, and the other is related to the packet Length (mentioned above).

In fact, this part has gone beyond the scope of the TCP three-way handshake. For example, Length involves the content of large segments of data in specific transmission. Since it is also a different manifestation of the capture point, it is briefly summarized.

Packet Checksum

CheckSum Offload actually transfers the TCP/UDP/IP checksum work to the NIC hardware to save the system CPU resources. For example: the Ethernet sending NIC calculates the Ethernet CRC32 checksum, and the receiving NIC verifies this checksum. If the received checksum is wrong, Wireshark will not even see the data packet because the Ethernet NIC will discard the data packet.

TCP/UDP/IP Checksum

  • The TCP checksum is calculated in three parts: TCP header, TCP data and TCP pseudo header. The TCP checksum is required.
  • The UDP checksum is calculated on three parts: the UDP header, the UDP data, and the UDP pseudo-header. The UDP checksum is optional.
  • The IP checksum only calculates and verifies the header of the IP datagram, but does not include the data part of the IP datagram.

The TCP/IP stack does not calculate the checksum itself, but simply hands an empty checksum field (zeros or random padding) to the network card hardware.

Then , based on the normal transmission of end-to-end data packets , if the Wireshark packet capture location is local, then the data packets generated and sent locally are captured before entering the network card. After Wireshark turns on the Validate the IPv4 checksum if possible option, it will show that there is a problem with the checksum.

When the data packets sent by the other end are captured by this end, the IPv4 checksum verification is normal.

The Wireshark TCP/UDP checksum scenario is basically the same as the IP checksum. The only difference is that for the TCP/UDP checksum, the TCP/IP protocol stack randomly fills the checksum field before handing it over to the network card hardware, while the IP checksum is filled with zeros.

The TCP option is Validate the TCP checksum if possible. The UDP option is Validate the UDP checksum if possible.

Therefore, it is emphasized again that based on the normal end-to-end data packet transmission (not the case of checksum problems) , the data packets sent locally by the client or server (such as SYN-SYN/ACK-ACK), when Wireshark turns on the IP, TCP or UDP checksum function, the corresponding checksum will be displayed as an error (actually normal, before the network card fills in), and when the data packet is captured in the middle, it will be displayed as normal (after the network card fills in).

Packet Length

The TSO, GSO, LRO, GRO and other functions in the NIC Offload feature mean that functions such as fragmentation, segmentation and reassembly will be offloaded to the NIC.

For the same reason, Wireshark’s packet capture method (or principle) and location, if Wireshark captures packets locally, then the packets generated and sent locally are captured before entering the network card, so for devices with TSO, GSO and other functions enabled, packets larger than the MTU may be seen in the packet capture results. (Here refers to the standard MTU size of 1500)

On the contrary, if Wireshark captures locally received data packets after the network card, if functions such as LRO and GRO are enabled, data packets larger than the MTU may be seen in the packet capture results.

Then, when the data packets are transmitted, they will be fragmented and segmented due to the MTU. Therefore, the capture point is in the middle end, and data packets larger than the MTU will not be seen in the capture results.

Other articles

Others are very common fields that may change during network transmission, such as MAC address in Ethernet II, IP address in IPv4, Port in TCP/UDP, etc. These fields may be different in different scenarios and different capture points, so I will not go into details here.

Summarize

In a nutshell, “If you stand at a different angle, you will see different results.”

Click to rate this post!
[Total: 0 Average: 0]