Summary: `PacketEvent` is a very inaccurate and misleading name. We're basically using this as an identifier for cloned packets, so `ClonedPacketIdentifier` is a much better.
Reviewed By: kvtsoy
Differential Revision: D61871891
fbshipit-source-id: f9c626d900c8b7ab7e231c9bad4c1629384ebb77
Summary:
**Context**
The `BufAccessor` is used to access a contiguous section of memory. Right now, it works with a `Buf` under the hood.
**Overall plan**
The plan is to change the `BufAccessor` to use a `uint8_t*` instead. Since we're using section of contiguous memory, there's no need to use a chained buffer abstraction here. This'll move us closer to deprecating the usage `folly::IOBuf`.
**What this diff is doing**
Most use cases of the `BufAccessor` look like the following:
```
auto buf = bufAccessor.obtain();
// Do something with buf, like calling trimEnd
bufAccessor.release(buf)
```
I'm adding APIs to the `BufAccessor` so that there's no need to `obtain()` and `release()` the `Buf`. We'd instead just call an API on the `BufAccessor`, which would call that same API on the underlying `folly::IOBuf`. Later on, we'll change the `BufAccessor` to use a `uint8_t*` under the hood.
I'm currently leaving in the `obtain()`, `release()`, and `buf()` APIs because Fizz and the AsyncUDPSocket expect `folly::IOBuf` as inputs in many of their APIs. Once those callsites are migrated off `folly::IOBuf`, we can remove these APIs.
Reviewed By: mjoras
Differential Revision: D60973166
fbshipit-source-id: 52aa3541d0c4878c7ee8525d70ac280508b61e24
Summary:
Timely reaction to congestion requires relaying any CE marks to the sender as soon as possible.
This change schedules an ack to be sent whenever incoming packets are received with CE marks. This will only happen when the readEcnOnIngress option is enabled.
Reviewed By: mjoras
Differential Revision: D58423959
fbshipit-source-id: 30f8cf8b11d0446985c2d87d7df67c24c0d5afdf
Summary: Useful to track so we can optimize it.
Reviewed By: kvtsoy
Differential Revision: D58196435
fbshipit-source-id: c15c409f998430fd0c6acde0539d0345123a9e15
Summary: We have a lot of optionals that are either integral values or std::chrono::microseconds. These end up wasting memory, where we can instead store sentinel values to encode whether the value is there or not. This reduces the effective range of the type by one value, but that is an acceptable tradeoff.
Reviewed By: kvtsoy
Differential Revision: D57684368
fbshipit-source-id: b406b86011f9b8169b6e5e925265f4829928cc63
Summary:
The idea here is to make it so we can swap out the type we are using for optionality. In the near term we are going to try swapping towards one that more aggressively tries to save size.
For now there is no functional change and this is just a big aliasing diff.
Reviewed By: sharmafb
Differential Revision: D57633896
fbshipit-source-id: 6eae5953d47395b390016e59cf9d639f3b6c8cfe
Summary:
Remove cmsgs from each outstanding packet. Storing this as a map is excessive. Instead store it as a packed enum. This supports one kind of mark per packet.
If we ned to we could in the future support multiple types of marks per packet by using a bitset-style flag, e.g. each bit index representing a different mark. For now just use an enum.
Reviewed By: jbeshay
Differential Revision: D57397865
fbshipit-source-id: 6d34215c9d7e39537c44c6c304a8ce3a5883541e
Summary: This is effectively an unused field encoding duplicated information, but it was widespread.
Reviewed By: kvtsoy
Differential Revision: D57289922
fbshipit-source-id: ca1499e2576e5ae28e3880b865a29c2b8d9a3d1b
Summary:
The existing batch writers do not handle failed writes to the AsyncUDPSocket. A packet that fails to be written is detected as a packet loss later when feedback is received from the peer. This negatively impacts the congestion controller because of the fake loss signal, and artificially inflates the number of retransmitted packets/bytes.
This change adds a new batch writer (SinglePacketBackpressuretBatchWriter) that retains the buffers when a write fails. For subsequent writes, the writer retries the same buffer. No new packets are scheduled until the retried buffer succeeds.
Notes:
- To make sure that retry writes are scheduled, the write callback is installed on the socket when a buffer needs to be retried.
- The retries are for an already scheduled packet. The connection state reflects the timing of the first attempt. This could still have an impact on rtt samples, etc. but it this is a milder impact compared to fake losses/retranmissions.
- Any changes outside of the batch writer only impact the new batch writer. Existing batch writers do not use the fields and are not affected by the changes in this diff.
Reviewed By: kvtsoy
Differential Revision: D57597576
fbshipit-source-id: 9476d71ce52e383c5946466f64bb5eecd4f5d549
Summary:
The time between iterations is not significant, so we can just call `Clock::now` once in the beginning and reuse the same value.
I ran a canary with some counters to get an idea of the amount of time between the start of the first iteration and the end of the last iteration (see D57510979), and:
* Edge p100: 1500 us
* olb p100: 1900 us
* Edge p99: 413 us
* olb p99: 396 us
The wins we're seeing are 0.13% relative CPU.
Reviewed By: kvtsoy
Differential Revision: D57594650
fbshipit-source-id: 9d0f827564179745cd83eb6ca211df68d3f23f8b
Summary: This setting is no longer needed.
Reviewed By: mjoras
Differential Revision: D57112554
fbshipit-source-id: 4720dd864f24ac21a775419522254195c5ea215f
Summary:
std::deque by default allocates a large block on the heap for managing its state. This has a fixed memory cost both per connection (because of the crypto streams) and per stream. CircularDeque by comparison does not have this overhead and is only 32 bytes per structure.
For example:
```
"size": 656,
"name": "readBuffer",
"typePath": ["a0", "conn_", "ptr_val", "cryptoState", "ptr_val", "initialStream", "readBuffer"],
"typeNames": ["std::deque<quic::StreamBuffer, std::allocator<quic::StreamBuffer>>"],
```
This should save about 6kB per connection with no streams, and additional memory per stream.
Reviewed By: jbeshay, hanidamlaj
Differential Revision: D56578219
fbshipit-source-id: ab2b529fa9a4169bea6862b11ccbf178c6f5abb1
Summary:
std::deque by default allocates a large block on the heap for managing its state. This has a fixed memory cost both per connection (because of the crypto streams) and per stream. CircularDeque by comparison does not have this overhead and is only 32 bytes per structure.
For example:
```
"size": 656,
"name": "readBuffer",
"typePath": ["a0", "conn_", "ptr_val", "cryptoState", "ptr_val", "initialStream", "readBuffer"],
"typeNames": ["std::deque<quic::StreamBuffer, std::allocator<quic::StreamBuffer>>"],
```
This should save about 6kB per connection with no streams, and additional memory per stream.
Reviewed By: kvtsoy
Differential Revision: D56496459
fbshipit-source-id: ec4049614939f885f64481bd374c81e74ad47c66
Summary: This reduces the number of stats callbacks when processing multiple packets in rapid succession.
Reviewed By: mjoras
Differential Revision: D56315022
fbshipit-source-id: 750024301f28b21e3125c144ead6f115706736a4
Summary:
The current ackVisitor passed to processAckFrame() gets executed with every acked frame.
Not every check in that visitor needs to be performed for every frame.
This refactors it into two different visitors, one that is called once per acked packet and another that is called once per acked frame.
Reviewed By: mjoras
Differential Revision: D56090734
fbshipit-source-id: 7b0877684e439f9e88c0aae7a294245570cf8309
Summary: Neither QUIC not TransportMonitor is using the `packetsInflight` field of the `OutstandingPacketMetadata`
Reviewed By: hanidamlaj
Differential Revision: D55926288
fbshipit-source-id: 32efd66add1e6374a8d3446ff635fe582b36e644
Summary: Neither QUIC not TransportMonitor is using the `totalBodyBytesSent` field of the `OutstandingPacketMetadata`
Reviewed By: hanidamlaj
Differential Revision: D55897240
fbshipit-source-id: 521f8a016f838dd1fe0593daa7a4e45c4fd222cf
Summary:
In order to exercise the key update path more frequently, this adds an option to initiate one key update early into the connection after a fixed number of packets is written. After that, keys are updated periodically at a significantly lower frequency.
Currently the defaults are to initiate the first key update after 500 packets, then periodically every 1M packets (which is under the confidentiality limit).
Reviewed By: mjoras
Differential Revision: D54082404
fbshipit-source-id: 3edd7489ceb8c643a319edb06e006803f1e736ba
Summary:
Allow the server/client transport to initiate periodic key update. It's defaulted to being disabled.
The new logic for initiating and verifying a key update was handled correctly by the peer is consolidated in QuicTransportFunctions.
Reviewed By: mjoras
Differential Revision: D53109624
fbshipit-source-id: 0c3a944978fc0e0a84252da953dc116aa7c26379
Summary:
This stack adds key update support to Mvfst client and server. This diff adds the main logic for detecting key updates in the QuicReadCodec. When an update is successful, the server transport reacts to it by updating the write phase and cipher.
The high level design is as follows:
- The QuicReadCodec is responsible for detecting incoming key update attempts by the peer, as well as tracking any ongoing locally-initiated key updates.
- Upon detecting a successful key update, the QuicReadCodec updates its state. The Server/Client transport reacts to this change by updating its write phase and cipher.
- A locally initiated key update starts with updating the write phase and key, and signaling the read codec that a key update has been initiated.
- The read codec keeps this in a pending state until a packet is successfully received in the new phase.
- Functions for syncing the read/write phase on incoming key updates, as well as initiating and verifying outgoing key updates are abstracted in QuicTransportFunctions and are used by both the client and server transports.
- Common handshake functions used for rotating the keys are now in HandshakeLayer that is shared by both client and server handshakes.
Reviewed By: mjoras
Differential Revision: D53016559
fbshipit-source-id: 134e965dabd62917193544a9655a4eb8868ab7f8
Summary: This is similar to the previous commit. We use a stack-based IOBuf instead of allocating one on the heap. This saves CPU because allocations/deallocations on the stack are a lot cheaper.
Reviewed By: hanidamlaj
Differential Revision: D53101722
fbshipit-source-id: dd59a7eca6498db19472a62f954db3e2f2f27a42
Summary: Creating an IOBuf on the heap when we use `folly::IOBuf::wrapBuffer` is expensive.
Reviewed By: hanidamlaj
Differential Revision: D52506216
fbshipit-source-id: eed2b77beae0419b542b0461303785cc175e3518
Summary:
This is the major transition that updates mvfst code to use the new interfaces. The new Folly implementations of the interfaces maintain all the existing behavior of folly types so this should not introduce any functional change. The core changes are:
- Update the BatchWriters to use the new interfaces.
- Update the FunctionLooper to use the new interfaces.
- Change QuicServerTransport to take the folly types and wrap them in the new types for use in the QuicTransportBase.
The rest of the diff is for updating all the existing uses of the QuicTrasnport to initialize the necessary types and pass them to the QUIC transport instead of directly passing folly types.
Reviewed By: mjoras
Differential Revision: D51413481
fbshipit-source-id: 5ed607e12b9a52b96148ad9b4f8f43899655d936
Summary:
- Remove setCustomTransportParameter, which (based on the quic v19 rfc), verifies whether a parameter is within the private range [0xff00, 0xffff]
> Values with the first byte in the range 0x00 to 0xfe (in hexadecimal) are assigned via the Specification Required policy [RFC8126].
- Consolidating adding MaxStreamGroups transport parameter into all other transport parameters extension.
More specifically, `QuicClientTransport::maybeEnableStreamGroups()` logic is now moved into `QuicClientTransport::setSupportedExtensionTransportParameters()`
Reviewed By: mjoras
Differential Revision: D50461610
fbshipit-source-id: 802b546c8364586cdcf36a230b156ca140c57ce4
Summary:
This diff changes `QuicAsyncUDPSocketWrapper` so that it is an abstraction layer that inherits from `QuicAsyncUDPSocketType`, instead of simply being a container with aliases.
- Key changes in `QuicAsyncUDPSocketWrapper.h`, the rest of the updates switch us from using `QuicAsyncUDPSocketType` to `QuicAsyncUDPSocketWrapper`.
- It's difficult to mock the UDP socket today given that we expose the entire `folly::AsyncUDPSocket` type to the higher layers of the QUIC stack. This complicates testing and emulation because any mock / fake has to implement low level primitives like `recvmmsg`, and because the `folly::AsyncUDPSocket` interface can change over time.
- Pure virtual functions will be defined in `QuicAsyncUDPSocketWrapper` in a follow up diff to start creating an interface between the higher layers of the mvfst QUIC stack and the UDP socket, and this interface will abstract away lower layer details such as `cmsgs` and `io_vec`, and instead focus on populating higher layer structures such as `NetworkData` and `ReceivedPacket` (D48714615). This will make it easier for us to mock or fake the UDP socket.
This diff relies on changes to `folly::MockAsyncUDPSocket` introduced in D48717389.
--
This diff is part of a larger stack focused on the following:
- **Cleaning up client and server UDP packet receive paths while improving testability.** We currently have multiple receive paths for client and server. Capabilities vary significantly and there are few tests. For instance:
- The server receive path supports socket RX timestamps, abet incorrectly in that it does not store timestamp per packet. In comparison, the client receive path does not currently support socket RX timestamps, although the code in `QuicClientTransport::recvmsg` and `QuicClientTransport::recvmmsg` makes reference to socket RX timestamps, making it confusing to understand the capabilities available when tracing through the code. This complicates the tests in `QuicTypedTransportTests`, as we have to disable test logic that depends on socket RX timestamps for client tests.
- The client currently has three receive paths, and none of them are well tested.
- **Modularize and abstract components in the receive path.** This will make it easier to mock/fake the UDP socket and network layers.
- `QuicClientTransport` and `QuicServerTransport` currently contain UDP socket handling logic that operates over lower layer primitives such `cmsg` and `io_vec` (see `QuicClientTransport::recvmmsg` and `...::recvmsg` as examples).
- Because this UDP socket handling logic is inside of the mvfst transport implementations, it is difficult to test this logic in isolation and mock/fake the underlying socket and network layers. For instance, injecting a user space network emulator that operates at the socket layer would require faking `folly::AsyncUDPSocket`, which is non-trivial given that `AsyncUDPSocket` does not abstract away intricacies arising from the aforementioned lower layer primitives.
- By shifting this logic into an intermediate layer between the transport and the underlying UDP socket, it will be easier to mock out the UDP socket layer when testing functionality at higher layers, and inject fake components when we want to emulate the network between a mvfst client and server. It will also be easier for us to have unit tests focused on testing interactions between the UDP socket implementation and this intermediate layer.
- **Improving receive path timestamping.** We only record a single timestamp per `NetworkData` at the moment, but (1) it is possible for a `NetworkData` to have multiple packets, each with their own timestamps, and (2) we should be able to record both userspace and socket timestamps.
Reviewed By: jbeshay, hanidamlaj
Differential Revision: D48717388
fbshipit-source-id: 4f34182a69ab1e619e454da19e357a6a2ee2b9ab
Summary:
This limits the writableBytes by the available tokens (ie `ThrottlingSignalProvider::bytesToSend`), if the connection is being throttled and that value is provided by the throttling signal provider.
Note that these throttling signals to the transport are provided only for the connections that belong to a specific QE experiment.
Reviewed By: silver23arrow
Differential Revision: D47850261
fbshipit-source-id: 8c52e66198db6d1fee252cacea69b82963a1601a
Summary:
This will allow for the new SinglePacketInplaceBatchWriter to be used
on clients
Reviewed By: mjoras
Differential Revision: D47742872
fbshipit-source-id: da2869d4ea7693f2ced4f6d7901216a20da29fa7
Summary:
This is essentially a further extension of the previous idead of having a sequence number per-DSR stream.
The rationale is the same -- to reduce spurious loss declared from the reordering threshold.
The primary case this works around is the following
```
<!DSR><!DSR><!DSR><DSR><DSR><DSR><!DSR>
```
Suppose we get an ACK for [0-1,3-5] leaving only packet number 2 remaining.
The naive reordering threshold will declare packet number 2 as lost. However, in principle this is arguably wrong since when considered on the timeline of !DSR packets, the gap does not exceed the reordering threshold.
To accomodate this we need to track a monotonically increasing sequence number for each non-DSR packet written, and store that in the packet metadata. This way we can use that for the reordering comparison rather than the packet number itself.
When there is no DSR packets ever written to a connection this should devolve to an identical result to using the packet number, as they will increment together.
Reviewed By: kvtsoy
Differential Revision: D46742386
fbshipit-source-id: 2983746081c7b6282358416e2bb1bcc80861be58
Summary:
This has been a default behavior for us for a long time. Basically, we will always include ACK frames in a burst if there's new packets to ACK.
This is overall probably fine and is flexible to peer behavior such that the peer can get away with basically never sending anything ACK-eliciting.
However it is not without its issues. In particular, for DSR it means that we write an excessive amount of pure ACK packets since they cannot be coalesced with a DSR burst.
Make this behavior optional, such that we only include ACK frames in non-probing schedulers when we are only writing stream data.
Reviewed By: kvtsoy
Differential Revision: D39479149
fbshipit-source-id: 6e92d45311a3fb23750c93483b94e97dd3fdce26
Summary: Turns out these functions have been returning 0 for bytes written for a while. Nothing was using them (yet), so there wasn't any functional breakage.
Reviewed By: kvtsoy
Differential Revision: D46653336
fbshipit-source-id: 765b3363c1fc0729e8239d1293ddcc4ae4bb9c79
Summary:
As in title. I don't think this was affecting anything but remove it since we are using DSR.
(Note: this ignores all push blocking failures!)
Reviewed By: hanidamlaj
Differential Revision: D46522584
fbshipit-source-id: 34e0ff1f208d00be51963d2081decde5e7acccf7
Summary:
The write loop functions for DSR or non-DSR are segmented today. As such, so are the schedulers. Mirroring this, we also currently store the DSR and non-DSR streams in separate write queues. This makes it impossible to effectively balance between the two without potential priority inversions or starvation.
Combining them into a single queue eliminates this possibility, but is not entirely straightforward.
The main difficulty comes from the schedulers. The `StreamFrameScheduler` for non-DSR data essentially loops over the control stream queue and the normal write queue looking for the next stream to write to a given packet. When the queues are segmented things are nice and easy. When they are combined, we have to deal with the potential that the non-DSR scheduler will hit a stream with only DSR data. Simply bailing isn't quite correct, since it will just cause an empty write loop. To fix that we need check, after we are finished writing a packet, if the next scheduled stream only has DSR data. If it does, we need to ensure `hasPendingData()` returns false.
The same needs to be done in reverse for the DSR stream scheduler.
The last major compication is that we need another loop which wraps the two individual write loop functions, and calls both functions until the packet limit is exhausted or there's no more data to write. This is to handle the case where there are, for example, two active streams with the same incremental priority, and one is DSR and the other is not. In this case each write loop we want to write `packetLimit` packets, flip flopping between DSR and non DSR packets.
This kind of round robining is pathologically bad for DSR, and a future diff will experiment with changing the round robin behavior such that we write a minimum number of packets per stream before moving on to the next stream.
This change also contains some other refactors, such as eliminating `updateLossStreams` from the stream manager.
(Note: this ignores all push blocking failures!)
Reviewed By: kvtsoy
Differential Revision: D46249067
fbshipit-source-id: 56a37c02fef51908c1336266ed40ac6d99bd14d4
Summary:
This has been hardcoded to SRTT/25 for a long time. However, especially when using DSR this might not be the most appropriate since it can start to get close to real world SRTTs.
Control it via a knob, and also add the behavior such that setting it to 0 effectively disables the time limit.
Reviewed By: jbeshay
Differential Revision: D46084438
fbshipit-source-id: 7dc619fdff1bd5c3f8654c4936200e0340ef94f2
Summary:
When using ACK_FREQUENCY and high reordering thresholds, the normal tricks for drawing ACKs with PTOs are slightly defeated.
Thankfully we can get around this by issuing IMMEDIATE_ACKs when probing. The strategy is basically to replace one of the usual probes with an IMMEDIATE_ACK probe.
Also, include ACKs in these probes (and the PING probes), since there's not much reason not to.
Reviewed By: jbeshay
Differential Revision: D45965703
fbshipit-source-id: 9b98473312a239e17a0b8040f45d197b1a2c8484
Summary:
Adds a prewrite function in packet processors that can be used to set PreWrite Reuqests that will apply apply to the next write loop in the QuicSocket.
Currently, the PrewriteRequest only contains a list of cmsgs but it can be expanded when needed.
This replaces the setAdditionalCmsgs callback that was previously passed down directly to AsyncUDPSocket, allowing multiple processors to affect the additionalCmsgs sent to the AsyncUDPSocket.
Reviewed By: bschlinker
Differential Revision: D44233700
fbshipit-source-id: 799357ee5df89b2a526611bd83d0b4a9a13b71d5