Summary:
It is useful to be able to account for when we detect packets as lost but actually receive an ACK for them later. This accomplishes that by retaining the packets in the outstanding packet list for a certain amount of time (1 PTO).
For all other purposes these packets are ignored.
Reviewed By: yangchi
Differential Revision: D22421662
fbshipit-source-id: 98e3a3750c79e2bcb8fcadcae5207f0c6ffc2179
Summary:
Adds support for timestamping on TX (TX byte events). This allows the application to determine when a byte that it previously wrote to the transport was put onto the wire.
Callbacks are processed within a new function `QuicTransportBase::processCallbacksAfterWriteData`, which is invoked by `writeSocketDataAndCatch`.
Reviewed By: mjoras
Differential Revision: D22008855
fbshipit-source-id: 99c1697cb74bb2387dbad231611be58f9392c99f
Summary:
The problem with Ping being a simple frame:
(1) All SimpleFrames are in the same scheduler. So sending ping means we may
also send other frames which can be problematic if we send in Initial or
Handshake space
(2) Ping isn't retranmisttable. But other Simple frames are. So we are
certainly setting this wrong when we send pure Ping packet today.
That being said, there are cases where we need to treat Ping as retransmittable.
One is when it comes to update ack state: If peer sends us Ping, we may want to
Ack early rather than late. so it makes sense to treat Ping as retransmittable.
Another place is insertion into OutstandingPackets list. When our API user sends
Ping, then also add a Ping timeout. Without adding pure Ping packets into OP list,
we won't be able to track the acks to our Pings.
Reviewed By: mjoras
Differential Revision: D21763935
fbshipit-source-id: a04e97b50cf4dd4e3974320a4d2cc16eda48eef9
Summary:
On loss timer, currently we knock all handshake packets out of the OP
list and resend everything. This means miss RTT sampling opportunities during
handshake if loss timer fires, and given our initial loss timer is likely not a
good fit for many networks, it probably fires a lot.
This diff keeps handshake packets in the OP list, and add packet cloning
support to handshake packets so we can clone them and send as probes.
With this, the handshake alarm is finally removed. PTO will take care of all
packet number space.
The diff also fixes a bug in the CloningScheduler where we missed cipher
overhead setting. That broke a few unit tests once we started to clone
handshake packets.
The writeProbingDataToSocket API is also changed to support passing a token to
it so when we clone Initial, token is added correctly. This is because during
packet cloning, we only clone frames. Headers are fresh built.
The diff also changed the cloning behavior when there is only one outstanding
packet. Currently we clone it twice and send two packets. There is no point of
doing that. Now when loss timer fires and when there is only one outstanding
packet, we only clone once.
The PacketEvent, which was an alias of PacketNumber, is now a real type that
has both PacketNumber and PacketNumberSpace to support cloning of handshake
packets. I think in the long term we should refactor PacketNumber itself into a
real type.
Reviewed By: mjoras
Differential Revision: D19863693
fbshipit-source-id: e427bb392021445a9388c15e7ea807852ddcbd08
Summary: 0 is now a valid packet number, so we should make these optional. In cases where they are needed to construct packet builder, it should be safe to use 0 as default since it's only used for computing `twiceDistance` in PacketNumber.cpp.
Reviewed By: yangchi
Differential Revision: D21948454
fbshipit-source-id: af9fdc3e28ff85f1594296c4d436f24685a0acd6
Summary:
During PTO, when we try to write new data, we have been limiting to
only stream data and crypto data. This diff adds a few more types of data to
write during probes if they are available: Simple frame, Window updates, Rst
and Blocked frames.
Right now Acks won't be included here.
Reviewed By: mjoras
Differential Revision: D21460861
fbshipit-source-id: d75a296e96375690eee5e609efd7d1b5c103e8da
Summary: This is just in wrong place. There can be multiple frames per packet that we may be implicitly ACKing.
Reviewed By: lnicco
Differential Revision: D21493553
fbshipit-source-id: 89befab01c5a27d400089478717110bca8137d99
Summary: Without doing this the loss buffer can potentially carry data indefinitely.
Reviewed By: yangchi
Differential Revision: D21484470
fbshipit-source-id: 85ac9f274c9badb51eb431b85f67a654c399a018
Summary:
Now we won't have a zero PTO and we will properly clear out the outstanding packets.
Note that this cipher dropping is not what the draft prescribes, instead dropping both the initial and handshake ciphers when we know 1-rtt communication is functioning.
Reviewed By: yangchi
Differential Revision: D20388737
fbshipit-source-id: 0b89eb80c8faa796ab09eda3eaa10a00dcf7bae9
Summary:
Currently the packet builder contructor will encode the packet
builder. This is fine when the builder creates its own output buffer. If later
on we decides not to use this builder, or it fails to build packet, the buffer
will be thrown away. But once the builder uses a buffer provided by caller, and
will be reused, we can no longer just throw it away if we decide not to use
this builder. So we have to delay the header encoding until we know we will use
the builder.
This is still not enough to solve the case where we want to use this builder,
it builds, then it fails . For that, we will need to retreat the tail position
of the IOBuf.
Reviewed By: mjoras
Differential Revision: D21000658
fbshipit-source-id: 4d758b3e260463b17c870618ba68bd4b898a7d4c
Summary: This is useful when you want to ensure that the IOBuf you pass in is encrypted inplace, as opposed to potentially creating a new one.
Reviewed By: yangchi
Differential Revision: D21135253
fbshipit-source-id: 89b6e718fc8da1324685c390c721a564bb77d01d
Summary:
To prepare for another configuration of this chain. With this, now we
can land all the outstanding GSO optimization diffs without having to worry
about breaking the production code.
Reviewed By: mjoras
Differential Revision: D20838453
fbshipit-source-id: 807a0c546305864e0d70f8989f31d3de3b812278
Summary:
As it turns out, the extra indirection from storing a unique_ptr is not worse than the gain from using an `F14ValueMap` versus an `F14VectorMap`.
This reduces the `find` cost measurably in profiles, and doesn't appear to have any real negative effects otherwise.
Reviewed By: yangchi
Differential Revision: D20923854
fbshipit-source-id: a75c4649ea3dbf0e6c89ebfe0d31d082bbdc31fd
Summary:
Currently we return the exact writable bytes number from a real
congestion controller or Path Challenger. This diff round the number up to the
nearest multiple of packet length. Doing so can greatly reduce weird bytes
counting/checking bugs we have around packet writing.
Reviewed By: mjoras
Differential Revision: D20265678
fbshipit-source-id: 2973dde3acc4b2008337127482185f34e16efb43
Summary:
This eliminatees some tech debt by completely removing the notion of version from the core transport parameters structure and the app token for zero rtt.
Note that for the draft-27 changes we will need to temporarily re-introduce it, but to a different layer (the extension encoding itself).
Reviewed By: JunqiWang
Differential Revision: D20073578
fbshipit-source-id: 2b55af621566bf1c20e21dd17251116de1788fa0
Summary:
In a recent diff this was mistakenly changed to be the earliest sent
time, which makes the loss alarm duration much shorter, then we end up firing
them a lot.
Reviewed By: mjoras
Differential Revision: D19826034
fbshipit-source-id: e190d30655533fdb220e92eddc5754fb9abd007b
Summary:
This diff:
1) introduces `EnumArray` - effectively an `std::array` indexed by an enum
2) changes loss times and `lastRetransmittablePacketSentTime` inside `LossState` to be an `EnumArray` indexed by `PacketNumberSpace`
3) makes the method `isHandshakeDone()` available for both client and server handshakes.
4) uses all those inputs to determine PTO timers in `earliestTimeAndSpace()`
Reviewed By: yangchi
Differential Revision: D19650864
fbshipit-source-id: d72e4a0cf61d2dcb76f0a7f4037c36a7c8156942
Summary: PTO sending behavior has to be changed to implement per-number-packet-space PTO. This refactoring unifies the PTO sending code in order to avoid duplicating it in both functions.
Reviewed By: yangchi
Differential Revision: D19673324
fbshipit-source-id: 99b4ef5f86039ffec2b4330a6d890c5fc8df2cfd
Summary:
The fizz `encrypt` API will opportunistically use inplace encryption if the `IOBuf` you pass it is correctly sized. If it is not it will manually create the output buffer.
It turns out doing inplace versus encrypting with a given target output is a bit faster in microbenchmarks, and it will also allow experimenting with caching `IOBuf`s on the send path.
Reviewed By: knekritz
Differential Revision: D19506291
fbshipit-source-id: 3ef41290538ceac34a344114badbd167e2c25a50
Summary:
(1) The first change is the pacing rate calculation is simplified. It
removes the interval calculation and just uses the timer tick as the interval.
Then it calculates the burst size from there. For most cases these two
calculation should land at the same result, except when the
`cwnd < minBurstSize * tick / RTT`. In that case, the current calculation would
spread writes evenly across one RTT, assuming no new Ack arrives during the RTT;
while the new calculation uses the first a few ticks to finish the cwnd amount
of data.
(2) Then this diff changes how we compensate late timer. Now the pacer will
maintain a nextWriteTime_ and lastWriteTime_, which makes it easier to
calculate time elapsed since last write. Then each time writer tries to write,
it will be allowed to write timeElapsed * pacingRate. This is much more
intuitive than the current logic.
(3) The diff also adds pacing limited tracking into the pacer. An expected
pacing rate is cached when pacing rate is refreshed by congestion controller.
Then with packets sent out, Pacer keeps calculating the current send rate. When
the send rate is lower, Pacer sets pacingLimited_ to true. Otherwise false.
Only when the connection is not pacing limited, the lastWriteTime_ will be
packet sent time, otherwise it will be set to the last nextWriteTime_. In other
words: if the send rate is lower than expected, we use the expected send time
instead of real send time to calculate time elapsed, to allow higher late
timer compenstation, to give pacer a chance to catch up.
(4) Finally this diff removes the token collecting behavior in the pacer. I
think having tokens increaed, instead of reset, when an ack refreshes the pacing
rate or when we compensate late time, is quite confusing to some people. After
all the above changes, I found tperf can still sustain good throughput without
always increase tokens, and rally actualy gives even better results. So i think
we can remove this part of the pacer that's potentially very confusing to
people who don't know how we got there.
Reviewed By: mjoras
Differential Revision: D19252744
fbshipit-source-id: b83e4a01fc812fc52117f3ec0f5c3be1badf211f
Summary:
Previously we stored an `IntervalSet` in each `WriteAckFrame`. We don't need to do this, as by the time we are writing an `ACK` the `IntervalSet` is complete. Instead of bothering with the `IntervalSet` operations, we can simply serialize it to a reverse-sorted `vector.`
Additionally this has some micro-optimizations for filling the ACK frame, with a new function for getting varint size.
Reviewed By: yangchi
Differential Revision: D19397728
fbshipit-source-id: ba6958fb36a4681edaa8394b1bcbbec3472e177d
Summary:
Before we have pacing, we limit write function to loop at most 5
times. Then pacing came in, and pacing burst is used as the limit when pacing
is enabled. Then pacing token was introduced to increase send rate when pacing
is enabled. Then when cwnd is large enough, pacing burst size can be really
large, that means this write function can loop for long time. When use loopback
as network interface, for example, the write function can loop more than 1 RTT,
which delays receive time when peer packets already arrived, which leads to
both wrong RTT estimation and wrong BBR bandwidth estimation.
This diff limits the write to a fraction of RTT as well, and current default
value will be 1/25 SRTT.
Reviewed By: mjoras
Differential Revision: D18864699
fbshipit-source-id: 0b57ee4138e4788d132152a4aa363959065f6f7f
Summary: There's no reason for creating a temporary here just to then later emplace it into outstandingPackets.
Reviewed By: yangchi
Differential Revision: D19069802
fbshipit-source-id: 4505ed414a80b24ecd0111fedc21714412fa8c53
Summary: As it turns out these end up being hot functions. Most of the hotness is it taking the cache miss, but we're able to reduce it from .9% exclusive each to .1-.3% exclusive each by inlining it (which eliminates a `callq` instruction for each).
Reviewed By: yangchi
Differential Revision: D19069685
fbshipit-source-id: c971a4d1c26a7e48008c36a129e0a842a27ca87f
Summary:
Previously we track them since we thought we can get some additional
RTT samples. But these are bad RTT samples since peer can delays the acking of
pure acks. Now we no longer trust such RTT samples, there is no reason to keep
tracking pure ack packets.
Reviewed By: mjoras
Differential Revision: D18946081
fbshipit-source-id: 0a92d88e709edf8475d67791ba064c3e8b7f627a
Summary:
Don't use IOBufQueue for most operations in mvfst and use BufQueue instead. Since BufQueue did not support a splitAtMost, added it in instead.
The only place that we still use IOBufQueue is in crypto because fizz still requires it
Reviewed By: mjoras
Differential Revision: D18846960
fbshipit-source-id: 4320b7f8614f8d2c75f6de0e6b786d33650e9656
Summary: As a part of Draft 17, application close frame has been removed, we use connection close frame to represent both application close and connection close.
Reviewed By: mjoras
Differential Revision: D18580856
fbshipit-source-id: d274fa2d3dbc59b926bca5a2b8a20328ae582703
Summary:
As the first step of deprecating quic trace in favor of qlog, this
diff removes the packet_sent event
Reviewed By: sharma95
Differential Revision: D18708001
fbshipit-source-id: 5c12c13b284f00cc7d3075086e10f517ede904cb
Summary:
The retransmission buffer tracks stream frame data we have sent that is currently unacked. We keep this as a sorted `deque`. This isn't so bad for performance, but we can do better if we break ourselves of the requirement that it be sorted (removing a binary search on ACK).
To do this we make the buffer a map of offset -> `StreamBuffer`.
There were two places that were dependent on the sorted nature of the list.
1. For partial reliablity we call `shrinkBuffers` to remove all unacked buffers less than an offset. For this we now have to do it with a full traversal of the retransmission buffer instead of only having to do an O(offset) search. In the future we could make this better by only lazily deleting from the retransmission buffer on ACK or packet loss.
2. We used the start of the retransmission buffer to determine if a delivery callback could be fired for a given offset. We need some new state to track this. Instead of tracking unacked buffers, we now track acked ranges using the existing `IntervalSet`. This set should be small for the typical case, as we think most ACKs will come in order and just cause existing ranges to merge.
Reviewed By: yangchi
Differential Revision: D18609467
fbshipit-source-id: 13cd2164352f1183362be9f675c1bdc686426698
Summary:
Use the Path rate limiter introduced in the previous diff.
When we initialize path validation of an unvalidated peer address,
enable pathValidationRateLimit.
When we receive a proper PATH_RESPONSE frame, disable this limit.
If this limit is enabled, we will check the pathValidationLimiter for
the amount of bytes we are allowed to write.
Change the migration tests in QuicServerTransportTest to use this new limiter
instead of writableByteLimits.
Update shouldWriteData to directly use the new congestionControlWritableBytes
function.
Reviewed By: yangchi
Differential Revision: D18145774
fbshipit-source-id: 1fe4fd5be7486077c58b0d1285dfb03f6c62831c
Summary:
We maintain the invariant that a buffer cannot be in the loss buffer and retransmission buffers at the same time. As the retransmission buffer holds all unacknowledged data that isn't marked lost, it is very likely to be larger than the loss buffer. This makes the existing case to check for cloning very expensive.
Instead search the loss buffer first, change the search of the loss buffer to a binary search, and elide the double search.
Reviewed By: yangchi
Differential Revision: D18203444
fbshipit-source-id: 66a4e424d61c4b0e3cad12c7eca009ad3d6c5a0d
Summary:
Add a token value into the pacer. This is so that when there is not
enough application data to consume all the burst size in current event loop, we
can accumulate the unused sending credit and use the later when new data comes
in. Each time a packet is sent, we consume 1 token. On pakcet loss, we clear
all tokens. Each time there is an ack and we refresh pacing rate, token
increases by calculated burst size. It is also increased when timer drifts
during writes. When there is available tokens, there is no delay of writing out
packets, and the burst size is current token amount.
Reviewed By: siyengar
Differential Revision: D17670053
fbshipit-source-id: 6abc3acce39e0ece90248c52c3d73935a9878e02