Summary:
First step towards d6d. Semantically we need to separate the old `udpSendPacketLen` into `peerMaxPacketSize` as well as `currPMTU`. The former is directly tied to the peer's max_packet_size transport parameter whereas the second is controlled by d6d. To get the actual udp mss, call `conn_->getUdpSendPacketLen()`, which will use the minimum of the two if d6d is enabled, otherwise it will fallback to use `peerMaxPacketSize` only.
During processClientInitialParams and processServerInitialParams, we no longer need to check whether `canIgnorePathMTU` is set because that logic is moved to `setUdpSendPacketLen`. If d6d is enabled, we set both `peerMaxPacketSize` and `currPMTU` to `packetSize` because receiving an initial packet of size x indicates both that the peer accepts x-sized packet and that the PMTU is at least x.
Many call sites and tests are changed.
Faebook:
For now, d6d is considered enabled if `canIgnorePathMTU==false` and `turnoffPMTUD==true`. Down the road, from semantic & practical POV at least one of them should be renamed to something like `enableD6D`, since enabling d6d implies turning off PMTUD and that we should not ignore PMTU. We can keep one for the sake of testing.
Reviewed By: mjoras
Differential Revision: D22049806
fbshipit-source-id: 7a9b30b7e2519c132101509be56a9e63b803dc93
Summary: We have an API behavior where setReadCalback will issue a StopSending on behalf of the app. This is useful but has confusing semantics as it always defaults to GenericApplicationError::NO_ERROR. Instead let the error be specified as part of the API.
Reviewed By: yangchi, lnicco
Differential Revision: D23055196
fbshipit-source-id: 755f4122bf445016c9b5adb23c3090fc23173eb9
Summary:
LOG an error and fallback to no pacing. This diff also stops
supporting automaticaly set pacingEnabled to true when BBR is enabled.
Reviewed By: mjoras
Differential Revision: D22875904
fbshipit-source-id: f8c8c9ea252f6e5e86f83174309b159ce93b3919
Summary:
Our current timer compenstation works as following:
Pacer schedule write -> Pacer marks scheduled time T0-> timer fires -> Pacer uses (now - T0 + writeInterval) to calculate burst size -> transport writes burst size amount of data at time T1 -> pacer schedules again.
This diff changes to:
Pacer scheduleWrite -> timer fires -> Pacer uses (now - previous T1' + writeInteral) to calculate burst size -> transport writes burst size amount of data at T1 -> pacer schedules again
because T1' < T0 < T1, this compensates the timer more.
With higher compensation from timer interval calculation, the `tokens_ += batchSize_;` code inside refreshPacingRate is removed in this diff.
Reviewed By: yangchi
Differential Revision: D22532672
fbshipit-source-id: 6547298e933965ab412d944cfd65d5c60f4dced7
Summary: We don't know if this pointer remains valid after the callback.
Reviewed By: yangchi
Differential Revision: D22709445
fbshipit-source-id: 7802ab08052b06af0268652a44a080d9d9673122
Summary:
Adds support for timestamping on TX (TX byte events). This allows the application to determine when a byte that it previously wrote to the transport was put onto the wire.
Callbacks are processed within a new function `QuicTransportBase::processCallbacksAfterWriteData`, which is invoked by `writeSocketDataAndCatch`.
Reviewed By: mjoras
Differential Revision: D22008855
fbshipit-source-id: 99c1697cb74bb2387dbad231611be58f9392c99f
Summary:
Adds `QuicSocket::InstrumentationObserver`, an observer that can be registered to receive various transport events.
The ultimate goal of this class is to provide an interface similar to what we have through TCP tracepoints. This means we need to be able to register multiple callbacks.
- Initially, the first event exposed through the callback is app rate limited. In the future, we will expose retransmissions (which are loss + TLP), loss events (confirmed), spurious retransmits, RTT measurements, and raw ACK / send operations to enable throughput and goodput measurements.
- Multiple callbacks can be registered, but a `folly::small_vector` is used to minimize memory overhead in the common case of between 0 and 2 callbacks registered.
- We currently have a few different callback classes to support instrumentation, including `QuicTransportStatsCallback` and `QLogger`. However, neither of these meet our needs:
- We only support installing a single transport stats callback and QLogger callback, and they're both specialized to specific use cases. TransportStats is about understanding in aggregation how often an event (like CWND limited) is occurring, and QLogger is about logging a specific event, instead of notifying a callback about an event and allowing it to decide how to proceed.
- Ideally, we can find a way to create a callback class that handles all three cases; we can start strategizing around that as we extend `InstrumentationObserver` and identify overlap.
Differential Revision: D21923745
fbshipit-source-id: 9fb4337d55ba3e96a89dccf035f2f6978761583e
Summary:
Notify connection callback implementation when the transport becomes app rate limited
- The callback implementation is not required to define a handler for this callback (not pure virtual).
- D21923745 will add an instrumentation callback class that can be used to allow other components to receive this signal. I'm extending to the connection callback because the overhead seems low, and we already have the connection callback registered for `HQSession`.
Reviewed By: mjoras, lnicco
Differential Revision: D21923744
fbshipit-source-id: 153696aefeab82b7fd8a6bc299c011dcce479995
Summary: Splitting `TestQuicTransport` from `QuicTransportTest` into a separate header file so that it can be reused in other tests, including to support tests that are outside of mvfst.
Reviewed By: lnicco
Differential Revision: D22013646
fbshipit-source-id: 8b6198dca822a95133f517e055a9421b98fe5221
Summary:
On loss timer, currently we knock all handshake packets out of the OP
list and resend everything. This means miss RTT sampling opportunities during
handshake if loss timer fires, and given our initial loss timer is likely not a
good fit for many networks, it probably fires a lot.
This diff keeps handshake packets in the OP list, and add packet cloning
support to handshake packets so we can clone them and send as probes.
With this, the handshake alarm is finally removed. PTO will take care of all
packet number space.
The diff also fixes a bug in the CloningScheduler where we missed cipher
overhead setting. That broke a few unit tests once we started to clone
handshake packets.
The writeProbingDataToSocket API is also changed to support passing a token to
it so when we clone Initial, token is added correctly. This is because during
packet cloning, we only clone frames. Headers are fresh built.
The diff also changed the cloning behavior when there is only one outstanding
packet. Currently we clone it twice and send two packets. There is no point of
doing that. Now when loss timer fires and when there is only one outstanding
packet, we only clone once.
The PacketEvent, which was an alias of PacketNumber, is now a real type that
has both PacketNumber and PacketNumberSpace to support cloning of handshake
packets. I think in the long term we should refactor PacketNumber itself into a
real type.
Reviewed By: mjoras
Differential Revision: D19863693
fbshipit-source-id: e427bb392021445a9388c15e7ea807852ddcbd08
Summary:
The AckScheduler right now has two modes: Immediate mode which always
write acks into the current packet, pending mode which only write if
needsToSendAckImmediately is true. The FrameScheduler choose the Immdiate mode
if there are other data to write as well. Otherwise, it chooses the Pending
mode. But if there is no other data to write and needsToSendAckImmediately is
false, the FrameScheduler will end up writing nothing.
This isn't a problem today because to be on the write path, the shouldWriteData
function would make sure we either have non-ack data to write, or
needsToSendAckImmediately is true for a packet number space. But once we allow
packets in Initial and Handshake space to be cloned, we would be on the write
path when there are probe quota. The FrameScheduler's hasData function doesn't
check needsToSendAckImmediately. It will think it has data to write as long as
AckState has changed, but can ends up writing nothing with the Pending ack
mode.
I think given the write looper won't be schedule to loop when there is no
non-ack data to write and needsToSendAckImmediately is true, it's safe to
remove Pending ack mode from AckScheduler.
Reviewed By: mjoras
Differential Revision: D22044741
fbshipit-source-id: 26fcaabdd5c45c1cae12d459ee5924a30936e209
Summary: The spec suggests doing this, and it is a better semantic than only considering the local one.
Reviewed By: yangchi
Differential Revision: D21433433
fbshipit-source-id: c38abc04810eb8807597991ce8801d81f9edc462
Summary: This is useful when you want to ensure that the IOBuf you pass in is encrypted inplace, as opposed to potentially creating a new one.
Reviewed By: yangchi
Differential Revision: D21135253
fbshipit-source-id: 89b6e718fc8da1324685c390c721a564bb77d01d
Summary:
As it turns out, the extra indirection from storing a unique_ptr is not worse than the gain from using an `F14ValueMap` versus an `F14VectorMap`.
This reduces the `find` cost measurably in profiles, and doesn't appear to have any real negative effects otherwise.
Reviewed By: yangchi
Differential Revision: D20923854
fbshipit-source-id: a75c4649ea3dbf0e6c89ebfe0d31d082bbdc31fd
Summary:
Currently we return the exact writable bytes number from a real
congestion controller or Path Challenger. This diff round the number up to the
nearest multiple of packet length. Doing so can greatly reduce weird bytes
counting/checking bugs we have around packet writing.
Reviewed By: mjoras
Differential Revision: D20265678
fbshipit-source-id: 2973dde3acc4b2008337127482185f34e16efb43
Summary: When we don't use NiceMock we end up with a ton of spam in failing tests for every callback that we didn't EXPECT. This makes failed test output extremely noisy.
Reviewed By: sharmafb
Differential Revision: D19977113
fbshipit-source-id: 1a083fba13308cd3f2859da364c8106e349775bb
Summary:
This diff:
1) introduces `EnumArray` - effectively an `std::array` indexed by an enum
2) changes loss times and `lastRetransmittablePacketSentTime` inside `LossState` to be an `EnumArray` indexed by `PacketNumberSpace`
3) makes the method `isHandshakeDone()` available for both client and server handshakes.
4) uses all those inputs to determine PTO timers in `earliestTimeAndSpace()`
Reviewed By: yangchi
Differential Revision: D19650864
fbshipit-source-id: d72e4a0cf61d2dcb76f0a7f4037c36a7c8156942
Summary:
(1) The first change is the pacing rate calculation is simplified. It
removes the interval calculation and just uses the timer tick as the interval.
Then it calculates the burst size from there. For most cases these two
calculation should land at the same result, except when the
`cwnd < minBurstSize * tick / RTT`. In that case, the current calculation would
spread writes evenly across one RTT, assuming no new Ack arrives during the RTT;
while the new calculation uses the first a few ticks to finish the cwnd amount
of data.
(2) Then this diff changes how we compensate late timer. Now the pacer will
maintain a nextWriteTime_ and lastWriteTime_, which makes it easier to
calculate time elapsed since last write. Then each time writer tries to write,
it will be allowed to write timeElapsed * pacingRate. This is much more
intuitive than the current logic.
(3) The diff also adds pacing limited tracking into the pacer. An expected
pacing rate is cached when pacing rate is refreshed by congestion controller.
Then with packets sent out, Pacer keeps calculating the current send rate. When
the send rate is lower, Pacer sets pacingLimited_ to true. Otherwise false.
Only when the connection is not pacing limited, the lastWriteTime_ will be
packet sent time, otherwise it will be set to the last nextWriteTime_. In other
words: if the send rate is lower than expected, we use the expected send time
instead of real send time to calculate time elapsed, to allow higher late
timer compenstation, to give pacer a chance to catch up.
(4) Finally this diff removes the token collecting behavior in the pacer. I
think having tokens increaed, instead of reset, when an ack refreshes the pacing
rate or when we compensate late time, is quite confusing to some people. After
all the above changes, I found tperf can still sustain good throughput without
always increase tokens, and rally actualy gives even better results. So i think
we can remove this part of the pacer that's potentially very confusing to
people who don't know how we got there.
Reviewed By: mjoras
Differential Revision: D19252744
fbshipit-source-id: b83e4a01fc812fc52117f3ec0f5c3be1badf211f
Summary: Some test cases used FileQLogger to verify that correct events have been logged. MockQLogger allows to check what events are logged without using an in-memory store like in FileQLogger. Migrated QuicLossFunctionsTest and QuicTransportTest from FileQLogger to MockQLOgger to simplify logic and cut dependencies.
Reviewed By: yangchi
Differential Revision: D19426423
fbshipit-source-id: eb1ae16a81656efd7c491eae790484c73dede8f3
Summary:
Previously we track them since we thought we can get some additional
RTT samples. But these are bad RTT samples since peer can delays the acking of
pure acks. Now we no longer trust such RTT samples, there is no reason to keep
tracking pure ack packets.
Reviewed By: mjoras
Differential Revision: D18946081
fbshipit-source-id: 0a92d88e709edf8475d67791ba064c3e8b7f627a
Summary:
If the packet is too small we might automatically add padding frames
to make it large enough. However we used to add padding frames to the
frame list as well.
We dont need this, lets just add the regular frame types.
Reviewed By: mjoras
Differential Revision: D18903074
fbshipit-source-id: f73f82f96f833347c84a38eb1035c46e35ba3b2f
Summary:
Don't use IOBufQueue for most operations in mvfst and use BufQueue instead. Since BufQueue did not support a splitAtMost, added it in instead.
The only place that we still use IOBufQueue is in crypto because fizz still requires it
Reviewed By: mjoras
Differential Revision: D18846960
fbshipit-source-id: 4320b7f8614f8d2c75f6de0e6b786d33650e9656
Summary:
In the current client code we read one packet, go back to epoll, and then read
another packet. This is not very efficient.
This changes it so that we can read multiple packets in one go from an epoll
callback.
This only performs changes on the client
Reviewed By: mjoras
Differential Revision: D18797962
fbshipit-source-id: 81be82111064ade4fe3a07b1d9d3d01e180f29f5
Summary:
The retransmission buffer tracks stream frame data we have sent that is currently unacked. We keep this as a sorted `deque`. This isn't so bad for performance, but we can do better if we break ourselves of the requirement that it be sorted (removing a binary search on ACK).
To do this we make the buffer a map of offset -> `StreamBuffer`.
There were two places that were dependent on the sorted nature of the list.
1. For partial reliablity we call `shrinkBuffers` to remove all unacked buffers less than an offset. For this we now have to do it with a full traversal of the retransmission buffer instead of only having to do an O(offset) search. In the future we could make this better by only lazily deleting from the retransmission buffer on ACK or packet loss.
2. We used the start of the retransmission buffer to determine if a delivery callback could be fired for a given offset. We need some new state to track this. Instead of tracking unacked buffers, we now track acked ranges using the existing `IntervalSet`. This set should be small for the typical case, as we think most ACKs will come in order and just cause existing ranges to merge.
Reviewed By: yangchi
Differential Revision: D18609467
fbshipit-source-id: 13cd2164352f1183362be9f675c1bdc686426698
Summary: The state machine logic is quite abstruse, this modifies it to make it more readable.
Reviewed By: siyengar
Differential Revision: D18488301
fbshipit-source-id: c6fd52973880931e34904713e8b147f56d0c4629
Summary:
Use the Path rate limiter introduced in the previous diff.
When we initialize path validation of an unvalidated peer address,
enable pathValidationRateLimit.
When we receive a proper PATH_RESPONSE frame, disable this limit.
If this limit is enabled, we will check the pathValidationLimiter for
the amount of bytes we are allowed to write.
Change the migration tests in QuicServerTransportTest to use this new limiter
instead of writableByteLimits.
Update shouldWriteData to directly use the new congestionControlWritableBytes
function.
Reviewed By: yangchi
Differential Revision: D18145774
fbshipit-source-id: 1fe4fd5be7486077c58b0d1285dfb03f6c62831c
Summary:
We maintain the invariant that a buffer cannot be in the loss buffer and retransmission buffers at the same time. As the retransmission buffer holds all unacknowledged data that isn't marked lost, it is very likely to be larger than the loss buffer. This makes the existing case to check for cloning very expensive.
Instead search the loss buffer first, change the search of the loss buffer to a binary search, and elide the double search.
Reviewed By: yangchi
Differential Revision: D18203444
fbshipit-source-id: 66a4e424d61c4b0e3cad12c7eca009ad3d6c5a0d
Summary:
The intention here was always to write to streams in a round robin fashion. However, this functionality has been effectively broken since introduction as `lastScheduledStream` was never set. We can fix this by having the `StreamFrameScheduler` set `nextScheduledStream` after it has written to the streams. Additionally we need to remove a check that kept us from moving past a stream if it still had data left to write.
In extreme cases this would cause streams to be completely starved, and ruin concurrency.
Reviewed By: siyengar
Differential Revision: D17748652
fbshipit-source-id: a3d05c54ee7eaed4d858df9d89035fe8f252c727
Summary:
Use the custom variant type for write frames as well, now that
we use them for read frames.
Reviewed By: mjoras
Differential Revision: D17776862
fbshipit-source-id: 47093146d0f1565c22e5393ed012c70e2e23d279