Gary Kaiser About the Author

Gary is a Subject Matter Expert in Network Performance Analysis at Dynatrace. He has global field enablement responsibilities for performance monitoring and analysis solutions embracing emerging and strategic technologies, including WAN optimization, thin client infrastructures, network forensics, and a unique performance management maturity methodology. He is also a co-inventor of multiple analysis features, and continues to champion the value of software-enabled expert network analysis.

Understanding Application Performance on the Network – Part VII: TCP Window Size

In Part VI, we dove into the Nagle algorithm – perhaps (or hopefully) something you’ll never see. In Part VII, we get back to “pure” network and TCP roots as we examine how the TCP receive window interacts with WAN links.

TCP Window Size

Each node participating in a TCP connection advertises its available buffer space using the TCP window size field. This value identifies the maximum amount of data a sender can transmit without receiving a window update via a TCP acknowledgement; in other words, this is the maximum number of “bytes in flight” – bytes that have been sent, are traversing the network, but remain unacknowledged. Once the sender has reached this limit and exhausted the receive window, the sender must stop and wait for a window update.

TCP Window Size: The sender transmits a full window, then waits for window updates before continuing. As these window updates arrive, the sender advances the window and may transmit more data.

The sender transmits a full window, then waits for window updates before continuing. As these window updates arrive, the sender advances the window and may transmit more data.

Long Fat Networks

High-speed, high-latency networks, sometimes referred to as Long Fat Networks (LFNs), can carry a lot of data. On these networks, small receive window sizes can limit throughput to a fraction of the available bandwidth. These two factors – bandwidth and latency – combine to influence the potential impact of a given TCP window size. LFNs networks make it possible – common, even – for a sender to transmit very fast (high bandwidth) an entire TCP window’s worth of data, having then to wait until the packets reach the distant remote site (high latency) so that acknowledgements can be returned, informing the sender of successful data delivery and available receive buffer space.

The math (and physics) concepts are straightforward. As the network speed increases, data can be clocked out onto the network medium more quickly; the bits are literally closer together. As latency increases, these bits take longer to traverse the network from sender to receiver. As a result, more bits can fit on the wire. As LFNs become more common, exhausting a receiver’s TCP window becomes increasingly problematic for some types of applications.

Bandwidth Delay Product

The Bandwidth Delay Product (BDP) is a simple formula used to calculate the maximum amount of data that can exist on the network (referred to as bits or bytes in flight) based on a link’s characteristics:

  • Bandwidth (bps) x RTT (seconds) = bits in flight
  • Divide the result by 8 for bytes in flight

If the BDP (in bytes) for a given network link exceeds the value of a session’s TCP window, then the TCP session will not be able to use all of the available bandwidth; instead, throughput will be limited by the receive window (assuming no other constraints, of course).

The BDP can also be used to calculate the maximum throughput (“bandwidth”) of a TCP connection given a fixed receive window size:

  • Bandwidth = (window size *8)/RTT

In the not-too-distant past, the TCP window had a maximum value of 65535 bytes. While today’s TCP implementations now generally include a TCP window scaling option that allows for negotiated window sizes to reach 1GB, many factors limit its practical utility. For example, firewalls, load balancers and server configurations may purposely disable the feature. So the reality is that we often still need to pay attention to the TCP window size when considering the performance of applications that transfer large amounts of data, particularly on enterprise LFNs.

As an example, consider a company with offices in New York and San Francisco; they need to replicate a large database each night, and have secured a 20Mbps network connection with 85 milliseconds of round-trip delay. Our BDP calculation tell us that the BDP is 212,500 (20,000,000 x .085 *8); in other words, a single TCP connection would require a 212KB window in order to take advantage of all of the bandwidth. The BDP calculation also tell us that the configured TCP window size of 65535 will permit approximately 6Mbps throughput (65535*8/.085), less than 1/3 of the link’s capacity.

A link’s BDP and a receiver’s TCP window size are two factors that help us to identify the potential throughput of an operation. The remaining factor is the operation itself, specifically the size of individual request or reply flows. Only flows that exceed the receiver’s TCP window size will benefit from, or be impacted by, these TCP window size constraints. Two common scenarios help illustrate this. Let’s say a user needs to transfer a 1GB file:

  • Using FTP (in stream mode) will cause the entire file to be sent in a single flow; this operation could be severely limited by the receive window.
  • Using SMB (at least older versions of the protocol) will cause the file to be sent in many smaller write commands, as SMB used to limit write messages to under 64KB; this operation would not be able to take advantage of a TCP receive window of greater than 64K. (Instead, the operation would more likely be limited by application turns and link latency; we discuss chattiness in Part VIII.)

Transaction Trace Illustration

To evaluate a trace for this window size constraint, use the Time Plot view. For Series 1, graph the sender’s payload in transit (i.e., bytes in flight); for Series 2, graph the receiver’s advertised TCP window, using a single y axis scale for reference. If the payload in transit reaches (or closely approaches) the receive window size, then it is likely that an increase in the window size will allow for improved throughput.

TCP Window Size: This Time Plot view shows the sender's TCP Payload in Transit (blue) reaching the receiver's advertised TCP window (brown); the window size is limiting throughput.

This Time Plot view shows the sender’s TCP Payload in Transit (blue) reaching the receiver’s advertised TCP window (brown); the window size is limiting throughput.

The Bounce Diagram can also be used to illustrate the impact of a TCP window constraint, emphasizing the impact of latency on data delivery and subsequent TCP acknowledgements.

TCP Window Size: Illustration of a TCP window constraint; each cluster of blue frames represents a complete window's worth of payload, and the sender must then wait for window updates.

Illustration of a TCP window constraint; each cluster of blue frames represents a complete window’s worth of payload, and the sender must then wait for window updates.

Note that the TCP window scaling option is negotiated in the TCP three-way handshake as the connection is set up; without these SYN/SYN/ACK handshake packets in the trace file, there is no way of knowing whether window scaling is active, or more accurately, what the scaling value might be. (Hint: if you observe window sizes in a trace file that appear abnormally small – such as 500 bytes – then it is likely that window scaling is active; you may not know the actual window size, but it will be greater than 64KB.)

Corrective actions

For a TCP window constraint on a LFN, assuming adequate available bandwidth, primary solution options focus on increasing the receiver’s TCP window or enabling TCP window scaling. Reducing latency – which in turn reduces the BDP – will allow greater throughput for a given TCP window; relocating a server or optimizing path selection are examples of how this reduction in latency might be accomplished.

Is TCP window scaling enabled for your key applications – especially those that serve users over LFNs? Are your file transfers and replications performing in harmony with the network they traverse?

In Part VIII, the final entry in this series, we’ll talk about application chattiness – the more common app turns kind, but also a behavior I call application windowing. Stay tuned and feel free to comment below.

About The Author
Gary Kaiser
Gary Kaiser Gary is a Subject Matter Expert in Network Performance Analysis at Dynatrace. He has global field enablement responsibilities for performance monitoring and analysis solutions embracing emerging and strategic technologies, including WAN optimization, thin client infrastructures, network forensics, and a unique performance management maturity methodology. He is also a co-inventor of multiple analysis features, and continues to champion the value of software-enabled expert network analysis.


  1. Nick Fiekowsky says:

    Optimal real world TCP window size can be far larger than bandwidth-delay product (BDP) – latency doesn’t end at the RJ-45. We had a 20 Mb/sec MPLS link between Japan and US east coast with 192 msec RTT. BDP would be just under 512 KBytes. Optimal sustained throughput achieved when receiving host advertised 2.7 MByte receive window.

    Back in WIndows XP & NetWare era we discovered that TCP tuning for larger window size significantly reduced boot time for a PC one flight up from the data center.

    Unless you’re running iPerf, additional latency stems from time taken for:
    – Application on the receiving host to be dispatched by OS
    – Time for receiving application to move data from receive buffer to disk or screen
    – Time for sending application to be dispatched by OS when send buffer empties
    – Time for sending application to marshall data to move into send buffer

    We found that larger TCP windowsize can measurably reduce host processing time while shrinking transfer time.

  2. Gary Kaiser Gary Kaiser says:

    Hi Nick,
    Thanks for your comment, and sharing your experience; often, theory and experience conflict, in which case the latter wins out.
    I like your point that latency may not end at the RJ-45; often, we (meaning I) usually abstract the definition of end-to-end delay, even incorrectly referring to it as “NIC-to-NIC.” I think a better definition – pertinent to this discussion, at least – would be “TCP stack to TCP stack.”
    If we think about the BDP and a large TCP flow, then we really are concerned with the timings of TCP ACKs; application-specific delays at the sender (which I’ve previously referred to as “starved for data” conditions) and at the receiver (which reduce the advertised TCP window size due to delays reading from the buffer) shouldn’t affect the calculation itself. But – especially on the systems you mention – the TCP stacks themselves could be OS-bound (since they would run entirely in the OS), delaying the acknowledgement of data by the receiver and/or reading the window update at the sender. So the net effect would be a delay value (for the BDP) that could be significantly greater than the physical NIC-to-NIC RTT.
    Having said that, I struggled for a while trying to explain the faster user-perceived performance. A slow app is a slow app; increasing the receive buffer can allow the data to traverse the network faster (shrinking transfer time), but if the app were slow, delays reading the data from the buffer would still remain. What if the TCP stack were slow (delaying the ACKs and increasing the BDP), and the app fast? Then the theory would seem to match the experience.
    But I defer to the real world….

    • Nick Fiekowsky says:

      Hello Gary,

      My view is that big TCP buffers provide slack that allows many components to operate in efficient “stream” mode most of the time rather than “start and stop.”

      – The disk drive can stream big chunks of data into the transmitting app since there’s a big TCP buffer ready to catch the bytes.

      – The transmitting TCP stack has a big pile of bytes on hand to sustain a max speed stream, avoiding pauses and subsequent slow start.

      – The receiving TCP stack has lots of buffer space to hold the arriving bytes, likely eliminating window freezes.

      – The receiving application stays active for long stretches since it can work through MBytes of data at a time.

      – The receiving application can feed long streams of data to its storage. The writing disk head thus stays in one track, or quickly moves to an adjacent tracks, for rapid data storage.

      Confirming experience – some years back my ancient single-core, small-memory laptop with mild TCP tuning could outperform a colleague’s shiny new dual-core, 4 GByte memory laptop in downloads. My colleague didn’t believe it at first, but disk defragmentation made the difference. I regularly defragmented my hard drive, his had never been defragmented. His laptop outperformed mine once the disk was adequately defragmented.

      Conclusion – the network is one component of an end-to-end system. Poor tuning can cripple end-to-end performance, good tuning helps. Strong network tuning allows other components to deliver their best performance, too.

  3. Hello Gary, could you please advise some for following problem:

    we have slaw bitrate link with B=32 kbps (this is power line carrier communiction) RTT is app. 200 ms. TCP WINDOW is 800 bytes, IEC 60870-5-104 ctransmit data with very small payload – 46 bytes. It means that in one TCP WINDOW we have 17 packets.

    Are there some ways how to reduce amount of ACKs, because it is bad to wait all of 17 ACKs before new send…
    Is it possible to use Nagle algorithm for collection of all ACKs in one packet?

    ALso could you please clarify, I find some articles where after receive few TCP-segments receiver send ACK only for last one with the maximal number. Is it possible to use such approach for example for TCP_DELAY mode. Wait for 500 ms and send ACK only for one segment?

    Best regards,


  4. Gary Kaiser Gary Kaiser says:

    Hi Anton,

    If your interest is to improve data transfer throughput, then it would appear that TCP is tuned quite well to your environment. The BDP = 32000*0.200 = 6400 bits in flight; divide by 8 = 800 bytes in flight. This is the maximum carrying capacity of the network, so in theory, a TCP window size of 800 bytes (or greater) would allow the link to be fully utilized.
    As the ACKs for earlier packets are received, the sender should be able to stream more data – without waiting until the ACKs are received for the remaining data; however, this is true only if the application is streaming data. The behavior you describe – send a block of data in 17 packets, then wait until all of these packets have been acknowledged, then send the next block of data – would appear to be what I call application windowing. In this case, the application (or perhaps the power line protocol you are using) is ensuring that each block of data has been successfully received before sending the next block.
    I discuss this behavior in more detail in Part VII of this blog series –
    Hope this helps you narrow down the issue.

  5. HI
    Which is the optimal value for TCP Window size for 1Gbps NIC and 600ms RTT?

  6. Hi,
    Plugging your numbers into the formula, we get (1,000,000,000 x .6)/8, or 7,500,000; 7.5MB. But the formula requires the minimum (limiting) bandwidth between the client and server, which may be significantly different than your 1Gbps NIC speed. Also, remember that the formula describes the optimum TCP window size for a network path, and assumes the application is capable of filling the pipe with large network writes.

  7. Should we calculate Latency based on what the perceived average is or what we see for the high end? Or top 10%, 20% etc. We have a 20Mbps link between sites but I’m only getting around 5Mbps max on large file transfer tests, our latency bounces from 50-70ms. Even at the top end I should be seeing over 7Meg, although I can’t be 100% positive the link is totally quiet when I’m running my tests. Don’t have solarwinds or ptrg set up to monitor yet but I can test when when there are no users and it’s the same.

  8. Gary Kaiser says:

    The calculation is intended to identify the “ideal” TCP window size, one which would allow a TCP session to use all of the bandwidth – in the absence of other traffic. So the latency value should be the minimum, measured on an idle link.
    For your 20Mbps link, a latency variance of 20 milliseconds is pretty significant; on a simple queued link, there would need to be at least 30 or more packets queued to add this delay, indicating quite high utilization. There may also be other reasons for the variance; packet shaping or QoS policies, or a poorly performing firewall are examples.
    Such congestion will of course reduce the throughput in your file transfer tests; you may be luck to achieve the 5 – 7 Mbps.

  9. I see it differently – turn it up to 11! I would define very large maximum windowsize for the following reasons:

    1. We’re looking to deliver a robust business solution, not answer an exam question.

    2. The business wants to see maximum throughput rain or shine – low latency or high. Might as well configure to deliver.

    3. For sustained maximum throughput, use the TCP buffers to compensate for latency in the network and also in the devices at both ends.

    4. Modern TCP stacks use auto-tuning so TCP buffers are only as big as necessary. Large TCP windows do not waste resource as they would have 15 years ago.

    An extreme real-world example: Bandwidth-Delay product was 500 KBytes for a trans-Pacific connection. WindowSize was nearly 2.5 MByte to get full throughput. What makes the business more effective – data trickling through with the textbook solution or optimum throughput?

  10. Gary Kaiser says:

    Hi Nick,
    Thanks for your comments; you’re points are well-taken.
    I interpreted the question a little differently; given an existing production environment, why is throughput lower than expected? The Bandwidth Delay calculation can help determine if the TCP window may be the problem – or if it makes more sense to look elsewhere. There remain many environments where TCP window scaling can’t be applied – due to server configuration, firewalls, load balancers, etc., leaving us with the 65535 byte limit. For the example above (20Mbps, 70ms), a 65535 byte window would limit throughput to about 7Mbps, which to some degree matches the casual observation. Understanding the constraint allows for confidence in choosing a solution, which may not be simple to implement.
    But as modern networks more frequently permit window scaling, your approach will become the only correct answer; thanks again.

  11. Gary,

    Your point is also valid – reviewing actual network traffic is first step in performance improvement. Many factors can make traffic slow, the packets show you the facts.

    On the other hand, too many people apply bandwidth-delay calculation and prematurely conclude TCP tuning is complete.

    Two wrongs don’t make a right, but two rights make a better answer.

  12. Igor Livshin says:

    Hi Gary,

    For a long time I am trying to find a way that would allow me to monitor socket’s backlog queue (seing how many requests stay there waiting for a WebContainer thread becomes available). Any suggestion what abject a dynaTrace sensor should monitor to get this info?

    Appreciate your help.


    • Hi Igor
      Dynatrace provides a JMX based monitoring inteface. Your Web Container most likely exposed this metric via JMX – so – dynatrace can simply pick it up. By default dynatrace automatically captures total number of handles for your JVM/CLR process – that obviously includes more than jjust socket handles – but – it also is a good health indicator.
      Let me know if you need to know more. Also – feel free to post questions like this on our community portal where we have a dynatrace open Q&A forum to discuss these sorts of product related questions



6 − one =