From bb5d907aa9b154ba2129b72dad8f54b228e54b6c Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Wed, 11 Oct 2023 20:46:26 +0200
Subject: [PATCH 01/15] Automatically pick up all Markdown files

Assume GNU make. We already do with the toplevel makefile.

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/Makefile | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/docs/architecture/Makefile b/docs/architecture/Makefile
index 6252ab0f91..5bee504c29 100644
--- a/docs/architecture/Makefile
+++ b/docs/architecture/Makefile
@@ -2,20 +2,7 @@ PANDOC = pandoc
 
 default: all
 
-all_markdown = \
-               alternative-implementations.md \
-               mbed-crypto-storage-specification.md \
-               psa-crypto-implementation-structure.md \
-               psa-migration/psa-limitations.md \
-               psa-migration/strategy.md \
-               psa-migration/tasks-g2.md \
-               psa-migration/testing.md \
-               testing/driver-interface-test-strategy.md \
-               testing/invasive-testing.md \
-               testing/psa-storage-format-testing.md \
-               testing/test-framework.md \
-               tls13-support.md \
-	       # This line is intentionally left blank
+all_markdown = $(wildcard *.md */*.md)
 
 html: $(all_markdown:.md=.html)
 pdf: $(all_markdown:.md=.pdf)

From f7806ca7820ad4b24535778442973637ee78b738 Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Thu, 12 Oct 2023 16:00:11 +0200
Subject: [PATCH 02/15] Analyze requirements for protection of arguments in
 shared memory

Propose a dual-approach strategy where some buffers are copied and others
can remain shared.

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/psa-shared-memory.md | 208 +++++++++++++++++++++++++
 1 file changed, 208 insertions(+)
 create mode 100644 docs/architecture/psa-shared-memory.md

diff --git a/docs/architecture/psa-shared-memory.md b/docs/architecture/psa-shared-memory.md
new file mode 100644
index 0000000000..a7fe013fca
--- /dev/null
+++ b/docs/architecture/psa-shared-memory.md
@@ -0,0 +1,208 @@
+PSA API functions and shared memory
+===================================
+
+## Introduction
+
+This document discusses the security architecture of systems where PSA API functions might receive arguments that are in memory that is shared with an untrusted process. On such systems, the untrusted process might access a shared memory buffer while the cryptography library is using it, and thus cause unexpected behavior in the cryptography code.
+
+### Core assumptions
+
+We assume the following scope limitations:
+
+* Only PSA Crypto API functions are in scope (including Mbed TLS extensions to the official API specification). Legacy crypto, X.509, TLS, or any other function which is not called `psa_xxx` is out of scope.
+* We only consider [input buffers](https://arm-software.github.io/psa-api/crypto/1.1/overview/conventions.html#input-buffer-sizes) and [output buffers](https://arm-software.github.io/psa-api/crypto/1.1/overview/conventions.html#output-buffer-sizes). Any other data is assumed to be in non-shared memory.
+
+## System architecture discussion
+
+### Architecture overview
+
+We consider a system that memory separation between partitions: a partition can't access another partition's memory directly. Partitions are meant to be isolated from each other: a partition may only affect the integrity of another partition via well-defined system interfaces. For example, this can be a Unix/POSIX-like system that isolates processes, or isolation between the secure world and the non-secure world relying on a mechanism such as TrustZone, or isolation between secure-world applications on such a system.
+
+More precisely, we consider such a system where our PSA Crypto implementation is running inside one partition, called the **crypto service**. The crypto service receives remote procedure calls from other partitions, validates their arguments (e.g. validation of key identifier ownership), and calls a PSA Crypto API function. This document is concerned with environments where the arguments passed to a PSA Crypto API function may be in shared memory (as opposed to environments where the inputs are always copied into memory that is solely accessible by the crypto service before calling the API function, and likewise with output buffers after the function returns).
+
+When the data is accessible to another partition, there is a risk that this other partition will access it while the crypto implementation is working. Although this could be prevented by suspending the whole system while crypto is working, such a limitation is rarely desirable and most systems don't offer a way to do it. (Even systems that hav absolute thread priorities, and where crypto has a higher priority than any untrusted partition, may be vulnerable due to having multiple cores or asynchronous data transfers with peripherals.)
+
+### Risks and vulnerabilities
+
+We consider a security architecture with two or three entities:
+
+* a crypto service, which offers PSA crypto API calls over RPC (remote procedure call) using shared memory for some input or output arguments;
+* a client of the crypto service, which makes a RPC to the crypto service;
+* in some scenarios, a client of the client, which makes a RPC to the crypto client which re-shares the memory with the crypto service.
+
+#### Read-read inconsistency
+
+If an input argument is in shared memory, there is a risk of a **read-read inconsistency**:
+
+1. The crypto code reads part of the input and validates it, or injects it into a calculation.
+2. The client (or client's client) modifies the input.
+3. The crypto code reads the same part again, and performs an action which would be impossible if the input had had the same value all along.
+
+Vulnerability example: suppose the input contains data with a type-length-value or length-value encoding (for example, importing an RSA key). The crypto code reads the length field and checks that it fits within the buffer. (This could be the length of the overall data, or the length of an embedded field) Later, the crypto code reads the length again and uses it without validation. A malicious client can modify the length field in the shared memory between the two reads and thus cause a buffer overread on the second read.
+
+#### Write-read inconsistency
+
+If an output argument is in shared memory, there is a risk of a **write-read inconsistency**:
+
+1. The crypto code writes some intermediate data into the output buffer.
+2. The client (or client's client) modifies the intermediate data.
+3. The crypto code reads the intermediata data back and continues the calculation, leading to an outcome that would not be possible if the intermediate data had not been modified.
+
+Vulnerability example: suppose that an RSA signature function works by formatting the data in place in the output buffer, then applying the RSA private-key operation in place. (This is how `mbedtls_rsa_pkcs1_sign` works.) A malicious client may write badly formatted data into the buffer, so that the private-key operation is not a valid signature (e.g. it could be a decryption), violating the RSA key's usage policy.
+
+Vulnerability example with chained calls: we consider the same RSA signature operation as before. In this example, we additionally assume that the data to sign comes from an attestation application which signs some data on behalf of a final client: the key and the data to sign are under the attestation application's control, and the final client must not be able to obtain arbitrary signatures. The final client shares an output buffer for the signature with the attestation application, and the attestation application re-shares this buffer with the crypto service. A malicious final client can modify the intermediate data and thus sign arbitrary data.
+
+#### Write-write disclosure
+
+If an output argument is in shared memory, there is a risk of a **write-write disclosure**:
+
+1. The crypto code writes some intermediate data into the output buffer. This intermediate data must remain confidential.
+2. The client (or client's client) reads the intermediate data.
+3. The crypto code overwrites the intermediate data.
+
+Vulnerability example with chained calls: we consider a provisioning application that provides a data encryption service on behalf of multiple clients, using a single shared key. Clients are not allowed to access each other's data. The provisioning application isolates clients by including the client identity in the associated data. Suppose that an AEAD decryption function processes the ciphertext incrementally by simultaneously writing the plaintext to the output buffer and calculating the tag. (This is how AEAD decryption usually works.) At the end, if the tag is wrong, the decryption function wipes the output buffer. Assume that the output buffer for the plaintext is shared from the client to the provisioning application, which re-shares it with the crypto service. A malicious client can read another client (the victim)'s encrypted data by passing the ciphertext to the provisioning application, which will attempt to decrypt it with associated data identifying the requesting client. Although the operation will fail beacuse the tag is wrong, the malicious client still reads the victim plaintext.
+
+### Possible countermeasures
+
+In this section, we briefly discuss generic countermeasures.
+
+#### Copying
+
+Copying is a valid countermeasure. It is conceptually simple. However, it is often unattractive because it requires additional memory and time.
+
+Note that although copying is very easy to write into a program, there is a risk that a compiler (especially with whole-program optimization) may optimize the copy away, if it does not understand that copies between shared memory and non-shared memory are semantically meaningful.
+
+Example: the PSA Firmware Framework 1.0 forbids shared memory between partitions. This restriction is lifted in version 1.1 due to concerns over RAM usage.
+
+#### Careful accesses
+
+The following rules guarantee that shared memory cannot result in a security violation:
+
+* Never read the same input twice at the same index.
+* Never read back from an output.
+* Once potentially confidential data has been written to an output, it may not be overwritten. (This rule is more complex to allow writing non-confidential data first, for example to pre-initialize an output to zero for robustness.)
+
+These rules are very difficult to enforce.
+
+Example: these are the rules that a GlobalPlatform TEE Trusted Application (application running on the secure side of TrustZone on Cortex-A) must follow.
+
+## Protection requirements
+
+### Responsibility for protection
+
+A call to a crypto service to perform a crypto operation involes the following components:
+
+1. The remote procedure call framework provided by the operating system.
+2. The code of the crypto service.
+3. The code of the PSA Crypto dispatch layer (also known as the core), which is provided by Mbed TLS.
+4. The implementation of the cryptographic mechanism, which may be provided by Mbed TLS or by a third-party driver.
+
+The [PSA Crypto API specification](https://arm-software.github.io/psa-api/crypto/1.1/overview/conventions.html#stability-of-parameters) puts the responsibility for protection on the implementation of the PSA Crypto API, i.e. (3) or (4).
+
+> In an environment with multiple threads or with shared memory, the implementation carefully accesses non-overlapping buffer parameters in order to prevent any security risk resulting from the content of the buffer being modified or observed during the execution of the function. (...)
+
+In Mbed TLS 2.x and 3.x up to and including 3.5.0, there is no defense against buffers in shared memory. The responsibility shifts to (1) or (2), but this is not documented.
+
+In the remainder of this chapter, we will discuss how to implement this high-level requirement where it belongs: inside the implementation of the PSA Crypto API. Note that this allows two possible levels: in the dispatch layer (independently of the implementation of each mechanism) or in the driver (specific to each implementation).
+
+#### Protection in the dispatch layer
+
+The dispatch layer has no control over how the driver layer will access buffers. Therefore the only possible protection at this layer method is to ensure that drivers have no access to shared memory. This means that any buffer located in shared memory must be copied into or out of a buffer in memory owned by the crypto service (heap or stack). This adds inefficiency, mostly in terms of RAM usage.
+
+For buffers with a small static size limit, this is something we often do for convenience, especially with output buffers. However, as of Mbed TLS 3.5.0, it is not done systematically.
+
+It is ok to skip the copy if it is known for sure that a buffer is not in shared memory. However, the location of the buffer is not under the control of Mbed TLS. This means skipping the copy would have to be a compile-time or run-time option which has to be set by the application using Mbed TLS. This is both an additional maintenance cost (more code to analyze, more testing burden), and a residual security risk in case the party who is responsible for setting this option does not set it correctly. As a consequence, Mbed TLS will not offer this configurability unless there is a compelling argument.
+
+#### Protection in the driver layer
+
+Putting the responsibility for protection in the driver layer increases the overall amount of work since there are more driver implementations than dispatch implementations. (This is true even inside Mbed TLS: almost all API functions have multiple underlying implementations, one for each algorithm.) It also increases the risk to the ecosystem since some drivers might not protect correctly. Therefore having drivers be responsible for protection is only a good choice if there is a definite benefit to it, compared to allocating an internal buffer and copying. An expected benefit in some cases is that there are practical protection methods other than copying.
+
+Some cryptographic mechanisms are naturally implemented by processing the input in a single pass, with a low risk of ever reading the same byte twice, and by writing the final output directly into the output buffer. For such mechanism, it is sensible to mandate that drivers respect these rules.
+
+In the next section, we will analyze how susceptible various cryptographic mechanisms are to shared memory vulnerabilities.
+
+### Susceptibility of different mechanisms
+
+#### Operations involving small buffers
+
+For operations involving **small buffers**, the cost of copying is low. For many of those, the risk of not copying is high:
+
+* Any parsing of formatted data has a high risk of [read-read inconsistency](#read-read-inconsistency).
+* An internal review shows that for RSA operations, it is natural for an implementation to have a [write-read inconsistency](#write-read-inconsistency) or a [write-write disclosure](#write-write-disclosure).
+
+Note that in this context, a “small buffer” is one with a size limit that is known at compile time, and small enough that copying the data is not prohibitive. For example, an RSA key fits in a small buffer. A hash input is not a small buffer, even if it happens to be only a few bytes long in one particular call.
+
+The following buffers are considered small buffers:
+
+* Any input or output from asymmetric cryptography (signature, encryption/decryption, key exchange, PAKE), including key import and export.
+* The output of a hash or MAC operation.
+* Cooked key derivation output.
+
+**Design decision: the dispatch layer shall copy all small buffers**.
+
+#### Symmetric cryptography inputs
+
+Message inputs to hash, MAC, cipher (plaintext or ciphertext), AEAD (associated data, plaintext, ciphertext) and key derivation operations are at a low risk of [read-read inconsistency](#read-read-inconsistency) because they are unformatted data, and for all specified algorithms, it is natural to process the input one byte at a time.
+
+**Design decision: require symmetric cryptography drivers to read their input without a risk of read-read inconsistency**.
+
+TODO: what about IV/nonce inputs? They are typically small, but don't necessarily have a static size limit (e.g. GCM recommends a 12-byte nonce, but also allows large nonces).
+
+#### Key derivation outputs
+
+Key derivation typically emits its output as a stream, with no error condition detected after setup other than operational failures (e.g. communication failure with an accelerator) or running out of data to emit (which can easily be checked before emitting any data, since the data size is known in advance).
+
+(Note that this is about raw byte output, not about cooked key derivation, i.e. deriving a structured key, which is considered a [small buffer](#operations-involving-small-buffers).)
+
+**Design decision: require key derivation drivers to emit their output without reading back from the output buffer**.
+
+#### Cipher and AEAD outputs
+
+AEAD decryption is at risk of [write-write disclosure](#write-write-disclosure) when the tag does not match.
+
+Cipher and AEAD outputs are at risk of [write-write disclosure](#write-read-inconsistency) and [write-write disclosure](#write-write-disclosure) if they are implemented by copying the input into the output buffer with `memmove`, then processing the data in place. In particular, this approach makes it easy to fully support overlapping, since `memmove` will take care of overlapping cases correctly, which is otherwise hard to do portably (C99 does not offer an efficient, portable way to check whether two buffers overlap).
+
+**Design decision: the dispatch layer shall allocate an intermediate buffer for cipher and AEAD outputs**.
+
+## Design of shared memory protection
+
+This section explains how Mbed TLS implements the shared memory protection stragegy summarized below.
+
+### Shared memory protection strategy
+
+* The core (dispatch layer) shall make a copy of the following buffers, so that drivers do not receive arguments that are in shared memory:
+    * Any input or output from asymmetric cryptography (signature, encryption/decryption, key exchange, PAKE), including key import and export.
+    * The output of a hash or MAC operation.
+    * Cooked key derivation output.
+
+* A document shall explain the requirements on drivers for arguments whose access needs to be protected:
+    * Hash and MAC input.
+    * Cipher/AEAD IV/nonce (to be confirmed).
+    * Key derivation input (excluding key agreement).
+    * Raw key derivation output (excluding cooked key derivation output).
+
+* The built-in implementations of cryptographic mechanisms with arguments whose access needs to be protected shall protect those arguments.
+
+Justification: see “[Susceptibility of different mechanisms](susceptibility-of-different-mechanisms)”.
+
+### Implementation of copying
+
+Copy what needs copying. This seems straightforward.
+
+### Validation of copying
+
+TODO
+
+Proposed general idea: have tests where the test code calling API functions allocates memory in a certain pool, and code in the library allocates memory in a different pool. Test drivers check that needs-copying arguments are within the library pool, not within the test pool.
+
+### Shared memory protection requirements
+
+TODO: write document and reference it here.
+
+### Validation of protection for built-in mechanisms
+
+TODO
+
+## Analysis of argument protection in built-in mechanisms
+
+TODO

From 8daedaeac987c278cc6c52b401d1465d2f709358 Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Fri, 13 Oct 2023 18:47:29 +0200
Subject: [PATCH 03/15] Fix typos and copypasta

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/psa-shared-memory.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/architecture/psa-shared-memory.md b/docs/architecture/psa-shared-memory.md
index a7fe013fca..6e3a44ee02 100644
--- a/docs/architecture/psa-shared-memory.md
+++ b/docs/architecture/psa-shared-memory.md
@@ -16,11 +16,11 @@ We assume the following scope limitations:
 
 ### Architecture overview
 
-We consider a system that memory separation between partitions: a partition can't access another partition's memory directly. Partitions are meant to be isolated from each other: a partition may only affect the integrity of another partition via well-defined system interfaces. For example, this can be a Unix/POSIX-like system that isolates processes, or isolation between the secure world and the non-secure world relying on a mechanism such as TrustZone, or isolation between secure-world applications on such a system.
+We consider a system that has memory separation between partitions: a partition can't access another partition's memory directly. Partitions are meant to be isolated from each other: a partition may only affect the integrity of another partition via well-defined system interfaces. For example, this can be a Unix/POSIX-like system that isolates processes, or isolation between the secure world and the non-secure world relying on a mechanism such as TrustZone, or isolation between secure-world applications on such a system.
 
 More precisely, we consider such a system where our PSA Crypto implementation is running inside one partition, called the **crypto service**. The crypto service receives remote procedure calls from other partitions, validates their arguments (e.g. validation of key identifier ownership), and calls a PSA Crypto API function. This document is concerned with environments where the arguments passed to a PSA Crypto API function may be in shared memory (as opposed to environments where the inputs are always copied into memory that is solely accessible by the crypto service before calling the API function, and likewise with output buffers after the function returns).
 
-When the data is accessible to another partition, there is a risk that this other partition will access it while the crypto implementation is working. Although this could be prevented by suspending the whole system while crypto is working, such a limitation is rarely desirable and most systems don't offer a way to do it. (Even systems that hav absolute thread priorities, and where crypto has a higher priority than any untrusted partition, may be vulnerable due to having multiple cores or asynchronous data transfers with peripherals.)
+When the data is accessible to another partition, there is a risk that this other partition will access it while the crypto implementation is working. Although this could be prevented by suspending the whole system while crypto is working, such a limitation is rarely desirable and most systems don't offer a way to do it. (Even systems that have absolute thread priorities, and where crypto has a higher priority than any untrusted partition, may be vulnerable due to having multiple cores or asynchronous data transfers with peripherals.)
 
 ### Risks and vulnerabilities
 
@@ -46,7 +46,7 @@ If an output argument is in shared memory, there is a risk of a **write-read inc
 
 1. The crypto code writes some intermediate data into the output buffer.
 2. The client (or client's client) modifies the intermediate data.
-3. The crypto code reads the intermediata data back and continues the calculation, leading to an outcome that would not be possible if the intermediate data had not been modified.
+3. The crypto code reads the intermediate data back and continues the calculation, leading to an outcome that would not be possible if the intermediate data had not been modified.
 
 Vulnerability example: suppose that an RSA signature function works by formatting the data in place in the output buffer, then applying the RSA private-key operation in place. (This is how `mbedtls_rsa_pkcs1_sign` works.) A malicious client may write badly formatted data into the buffer, so that the private-key operation is not a valid signature (e.g. it could be a decryption), violating the RSA key's usage policy.
 
@@ -90,7 +90,7 @@ Example: these are the rules that a GlobalPlatform TEE Trusted Application (appl
 
 ### Responsibility for protection
 
-A call to a crypto service to perform a crypto operation involes the following components:
+A call to a crypto service to perform a crypto operation involves the following components:
 
 1. The remote procedure call framework provided by the operating system.
 2. The code of the crypto service.
@@ -160,13 +160,13 @@ Key derivation typically emits its output as a stream, with no error condition d
 
 AEAD decryption is at risk of [write-write disclosure](#write-write-disclosure) when the tag does not match.
 
-Cipher and AEAD outputs are at risk of [write-write disclosure](#write-read-inconsistency) and [write-write disclosure](#write-write-disclosure) if they are implemented by copying the input into the output buffer with `memmove`, then processing the data in place. In particular, this approach makes it easy to fully support overlapping, since `memmove` will take care of overlapping cases correctly, which is otherwise hard to do portably (C99 does not offer an efficient, portable way to check whether two buffers overlap).
+Cipher and AEAD outputs are at risk of [write-read-inconsistency](#write-read-inconsistency) and [write-write disclosure](#write-write-disclosure) if they are implemented by copying the input into the output buffer with `memmove`, then processing the data in place. In particular, this approach makes it easy to fully support overlapping, since `memmove` will take care of overlapping cases correctly, which is otherwise hard to do portably (C99 does not offer an efficient, portable way to check whether two buffers overlap).
 
 **Design decision: the dispatch layer shall allocate an intermediate buffer for cipher and AEAD outputs**.
 
 ## Design of shared memory protection
 
-This section explains how Mbed TLS implements the shared memory protection stragegy summarized below.
+This section explains how Mbed TLS implements the shared memory protection strategy summarized below.
 
 ### Shared memory protection strategy
 

From 60c453ee725311cacc0d25f7651d81ad57f9f620 Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Fri, 13 Oct 2023 19:07:56 +0200
Subject: [PATCH 04/15] Expand explanations of the vulnerabilities

Add a few more examples.

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/psa-shared-memory.md | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/docs/architecture/psa-shared-memory.md b/docs/architecture/psa-shared-memory.md
index 6e3a44ee02..61d37dbff1 100644
--- a/docs/architecture/psa-shared-memory.md
+++ b/docs/architecture/psa-shared-memory.md
@@ -18,10 +18,12 @@ We assume the following scope limitations:
 
 We consider a system that has memory separation between partitions: a partition can't access another partition's memory directly. Partitions are meant to be isolated from each other: a partition may only affect the integrity of another partition via well-defined system interfaces. For example, this can be a Unix/POSIX-like system that isolates processes, or isolation between the secure world and the non-secure world relying on a mechanism such as TrustZone, or isolation between secure-world applications on such a system.
 
-More precisely, we consider such a system where our PSA Crypto implementation is running inside one partition, called the **crypto service**. The crypto service receives remote procedure calls from other partitions, validates their arguments (e.g. validation of key identifier ownership), and calls a PSA Crypto API function. This document is concerned with environments where the arguments passed to a PSA Crypto API function may be in shared memory (as opposed to environments where the inputs are always copied into memory that is solely accessible by the crypto service before calling the API function, and likewise with output buffers after the function returns).
+More precisely, we consider such a system where our PSA Crypto implementation is running inside one partition, called the **crypto service**. The crypto service receives remote procedure calls (RPC) from other partitions, validates their arguments (e.g. validation of key identifier ownership), and calls a PSA Crypto API function. This document is concerned with environments where the arguments passed to a PSA Crypto API function may be in shared memory (as opposed to environments where the inputs are always copied into memory that is solely accessible by the crypto service before calling the API function, and likewise with output buffers after the function returns).
 
 When the data is accessible to another partition, there is a risk that this other partition will access it while the crypto implementation is working. Although this could be prevented by suspending the whole system while crypto is working, such a limitation is rarely desirable and most systems don't offer a way to do it. (Even systems that have absolute thread priorities, and where crypto has a higher priority than any untrusted partition, may be vulnerable due to having multiple cores or asynchronous data transfers with peripherals.)
 
+The crypto service must guarantee that it behaves as if the rest of the world was suspended while it is executed. A behavior that is only possible if an untrusted entity accesses a buffer while the crypto service is processing the data is a security violation.
+
 ### Risks and vulnerabilities
 
 We consider a security architecture with two or three entities:
@@ -30,6 +32,8 @@ We consider a security architecture with two or three entities:
 * a client of the crypto service, which makes a RPC to the crypto service;
 * in some scenarios, a client of the client, which makes a RPC to the crypto client which re-shares the memory with the crypto service.
 
+The behavior of RPC is defined for in terms of values of inputs and outputs. This models an ideal world where the content of input and output buffers is not accessible outside the crypto service while it is processing an RPC. It is a security violation if the crypto service behaves in a way that cannot be achieved by setting the inputs before the RPC call, and reading the outputs after the RPC call is finished.
+
 #### Read-read inconsistency
 
 If an input argument is in shared memory, there is a risk of a **read-read inconsistency**:
@@ -38,7 +42,12 @@ If an input argument is in shared memory, there is a risk of a **read-read incon
 2. The client (or client's client) modifies the input.
 3. The crypto code reads the same part again, and performs an action which would be impossible if the input had had the same value all along.
 
-Vulnerability example: suppose the input contains data with a type-length-value or length-value encoding (for example, importing an RSA key). The crypto code reads the length field and checks that it fits within the buffer. (This could be the length of the overall data, or the length of an embedded field) Later, the crypto code reads the length again and uses it without validation. A malicious client can modify the length field in the shared memory between the two reads and thus cause a buffer overread on the second read.
+Vulnerability example (parsing): suppose the input contains data with a type-length-value or length-value encoding (for example, importing an RSA key). The crypto code reads the length field and checks that it fits within the buffer. (This could be the length of the overall data, or the length of an embedded field) Later, the crypto code reads the length again and uses it without validation. A malicious client can modify the length field in the shared memory between the two reads and thus cause a buffer overread on the second read.
+
+Vulnerability example (dual processing): consider an RPC to perform authenticated encryption, using a mechanism with an encrypt-and-MAC structure. The authenticated encryption implementation separately calculates the ciphertext and the MAC from the plantext. A client sets the plaintext input to `"PPPP"`, then starts the RPC call, then changes the input buffer to `"QQQQ"` while the crypto service is working.
+
+* Any of `enc("PPPP")+mac("PPPP")`, `enc("PPQQ")+mac("PPQQ")` or `enc("QQQQ")+mac("QQQQ")` are valid outputs: they are outputs that can be produced by this authenticated encryption RPC.
+* If the authenticated encryption calculates the ciphertext before the client changes the output buffer and calculates the MAC after that change, reading the input buffer again each time, the output will be `enc("PPPP")+mac("QQQQ")`. There is no input that can lead to this output, hence this behavior violates the security guarantees of the crypto service.
 
 #### Write-read inconsistency
 
@@ -60,7 +69,9 @@ If an output argument is in shared memory, there is a risk of a **write-write di
 2. The client (or client's client) reads the intermediate data.
 3. The crypto code overwrites the intermediate data.
 
-Vulnerability example with chained calls: we consider a provisioning application that provides a data encryption service on behalf of multiple clients, using a single shared key. Clients are not allowed to access each other's data. The provisioning application isolates clients by including the client identity in the associated data. Suppose that an AEAD decryption function processes the ciphertext incrementally by simultaneously writing the plaintext to the output buffer and calculating the tag. (This is how AEAD decryption usually works.) At the end, if the tag is wrong, the decryption function wipes the output buffer. Assume that the output buffer for the plaintext is shared from the client to the provisioning application, which re-shares it with the crypto service. A malicious client can read another client (the victim)'s encrypted data by passing the ciphertext to the provisioning application, which will attempt to decrypt it with associated data identifying the requesting client. Although the operation will fail beacuse the tag is wrong, the malicious client still reads the victim plaintext.
+Vulnerability example with chained calls (temporary exposure): an application encrypts some data, and lets its clients store the ciphertext. Clients may not have access to the plaintext. To save memory, when it calls the crypto service, it passes an output buffer that is in the final client's memory. Suppose the encryption mechanism works by copying its input to the output buffer then encrypting in place (for example, to simplify considerations related to overlap, or because the implementation relies on a low-level API that works in place). In this scenario, the plaintext is exposed to the final client while the encryption in progress, which violates the confidentiality of the plaintext.
+
+Vulnerability example with chained calls (backtrack): we consider a provisioning application that provides a data encryption service on behalf of multiple clients, using a single shared key. Clients are not allowed to access each other's data. The provisioning application isolates clients by including the client identity in the associated data. Suppose that an AEAD decryption function processes the ciphertext incrementally by simultaneously writing the plaintext to the output buffer and calculating the tag. (This is how AEAD decryption usually works.) At the end, if the tag is wrong, the decryption function wipes the output buffer. Assume that the output buffer for the plaintext is shared from the client to the provisioning application, which re-shares it with the crypto service. A malicious client can read another client (the victim)'s encrypted data by passing the ciphertext to the provisioning application, which will attempt to decrypt it with associated data identifying the requesting client. Although the operation will fail beacuse the tag is wrong, the malicious client still reads the victim plaintext.
 
 ### Possible countermeasures
 

From 352095ca865a4ff4478838a80ca5a678d8899634 Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Fri, 13 Oct 2023 19:56:22 +0200
Subject: [PATCH 05/15] Simplify the relaxed output-output rule

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/psa-shared-memory.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/architecture/psa-shared-memory.md b/docs/architecture/psa-shared-memory.md
index 61d37dbff1..e71eed405b 100644
--- a/docs/architecture/psa-shared-memory.md
+++ b/docs/architecture/psa-shared-memory.md
@@ -91,7 +91,8 @@ The following rules guarantee that shared memory cannot result in a security vio
 
 * Never read the same input twice at the same index.
 * Never read back from an output.
-* Once potentially confidential data has been written to an output, it may not be overwritten. (This rule is more complex to allow writing non-confidential data first, for example to pre-initialize an output to zero for robustness.)
+* Never write to the output twice at the same index.
+    * This rule can usefully be relaxed in many circumstances. It is ok to write data that is independent of the inputs (and not otherwise confidential), then overwrite it. For example, it is ok to zero the output buffer before starting to process the input.
 
 These rules are very difficult to enforce.
 

From db00543b3ae281443ad925c36bf8c0cd3ff872da Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Fri, 13 Oct 2023 19:57:53 +0200
Subject: [PATCH 06/15] Add a section on write-read feedback

It's a security violation, although it's not clear whether it really needs
to influence the design.

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/psa-shared-memory.md | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/docs/architecture/psa-shared-memory.md b/docs/architecture/psa-shared-memory.md
index e71eed405b..5d3273a1a7 100644
--- a/docs/architecture/psa-shared-memory.md
+++ b/docs/architecture/psa-shared-memory.md
@@ -73,6 +73,18 @@ Vulnerability example with chained calls (temporary exposure): an application en
 
 Vulnerability example with chained calls (backtrack): we consider a provisioning application that provides a data encryption service on behalf of multiple clients, using a single shared key. Clients are not allowed to access each other's data. The provisioning application isolates clients by including the client identity in the associated data. Suppose that an AEAD decryption function processes the ciphertext incrementally by simultaneously writing the plaintext to the output buffer and calculating the tag. (This is how AEAD decryption usually works.) At the end, if the tag is wrong, the decryption function wipes the output buffer. Assume that the output buffer for the plaintext is shared from the client to the provisioning application, which re-shares it with the crypto service. A malicious client can read another client (the victim)'s encrypted data by passing the ciphertext to the provisioning application, which will attempt to decrypt it with associated data identifying the requesting client. Although the operation will fail beacuse the tag is wrong, the malicious client still reads the victim plaintext.
 
+#### Write-read feedback
+
+If a function both has an input argument and an output argument in shared memory, and processes its input incrementally to emit output incrementally, the following sequence of events is possible:
+
+1. The crypto code processes part of the input and writes the corresponding part of the output.
+2. The client reads the early output and uses that to calculate the next part of the input.
+3. The crypto code processes the rest of the input.
+
+There are cryptographic mechanisms for which this breaks security properties. An example is [CBC encryption](https://link.springer.com/content/pdf/10.1007/3-540-45708-9_2.pdf): if the client can choose the content of a plaintext block after seeing the immediately preceding ciphertext block, this gives the client a decryption oracle. This is a security violation if the key policy only allowed the client to encrypt, not to decrypt.
+
+TODO: is this a risk we want to take into account? Although this extends the possible behaviors of the one-shot interface, the client can do the same thing legitimately with the multipart interface.
+
 ### Possible countermeasures
 
 In this section, we briefly discuss generic countermeasures.
@@ -87,7 +99,7 @@ Example: the PSA Firmware Framework 1.0 forbids shared memory between partitions
 
 #### Careful accesses
 
-The following rules guarantee that shared memory cannot result in a security violation:
+The following rules guarantee that shared memory cannot result in a security violation other than [write-read feedback](#write-read feedback):
 
 * Never read the same input twice at the same index.
 * Never read back from an output.

From 2859267a27e23f1eb551ba01ba7c76c74c70489c Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Fri, 13 Oct 2023 20:01:36 +0200
Subject: [PATCH 07/15] Clarify terminology: built-in driver

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/psa-shared-memory.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/architecture/psa-shared-memory.md b/docs/architecture/psa-shared-memory.md
index 5d3273a1a7..57341b09ef 100644
--- a/docs/architecture/psa-shared-memory.md
+++ b/docs/architecture/psa-shared-memory.md
@@ -119,7 +119,7 @@ A call to a crypto service to perform a crypto operation involves the following
 1. The remote procedure call framework provided by the operating system.
 2. The code of the crypto service.
 3. The code of the PSA Crypto dispatch layer (also known as the core), which is provided by Mbed TLS.
-4. The implementation of the cryptographic mechanism, which may be provided by Mbed TLS or by a third-party driver.
+4. The driver implementing the cryptographic mechanism, which may be provided by Mbed TLS (built-in driver) or by a third-party driver.
 
 The [PSA Crypto API specification](https://arm-software.github.io/psa-api/crypto/1.1/overview/conventions.html#stability-of-parameters) puts the responsibility for protection on the implementation of the PSA Crypto API, i.e. (3) or (4).
 
@@ -223,10 +223,10 @@ Proposed general idea: have tests where the test code calling API functions allo
 
 TODO: write document and reference it here.
 
-### Validation of protection for built-in mechanisms
+### Validation of protection for built-in drivers
 
 TODO
 
-## Analysis of argument protection in built-in mechanisms
+## Analysis of argument protection in built-in drivers
 
 TODO

From 9cad3b3a705888f86b5dc0c90d31fa91c37e33fa Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Fri, 13 Oct 2023 20:03:18 +0200
Subject: [PATCH 08/15] Design change for cipher/AEAD

There are many reasons why a driver might violate the security requirements
for plaintext or ciphertext buffers, so mandate copying.

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/psa-shared-memory.md | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/docs/architecture/psa-shared-memory.md b/docs/architecture/psa-shared-memory.md
index 57341b09ef..d1a4b2bdcc 100644
--- a/docs/architecture/psa-shared-memory.md
+++ b/docs/architecture/psa-shared-memory.md
@@ -164,9 +164,9 @@ The following buffers are considered small buffers:
 
 **Design decision: the dispatch layer shall copy all small buffers**.
 
-#### Symmetric cryptography inputs
+#### Symmetric cryptography inputs with small output
 
-Message inputs to hash, MAC, cipher (plaintext or ciphertext), AEAD (associated data, plaintext, ciphertext) and key derivation operations are at a low risk of [read-read inconsistency](#read-read-inconsistency) because they are unformatted data, and for all specified algorithms, it is natural to process the input one byte at a time.
+Message inputs to hash, MAC and key derivation operations are at a low risk of [read-read inconsistency](#read-read-inconsistency) because they are unformatted data, and for all specified algorithms, it is natural to process the input one byte at a time.
 
 **Design decision: require symmetric cryptography drivers to read their input without a risk of read-read inconsistency**.
 
@@ -180,13 +180,25 @@ Key derivation typically emits its output as a stream, with no error condition d
 
 **Design decision: require key derivation drivers to emit their output without reading back from the output buffer**.
 
-#### Cipher and AEAD outputs
+#### Cipher and AEAD
 
 AEAD decryption is at risk of [write-write disclosure](#write-write-disclosure) when the tag does not match.
 
+AEAD encryption and decryption are at risk of [read-read inconsistency](#read-read-inconsistency) if they process the input multiple times, which is natural in a number of cases:
+
+* when encrypting with an encrypt-and-authenticate or authenticate-then-encrypt structure (one read to calculate the authentication tag and another read to encrypt);
+* when decrypting with an encrypt-then-authenticate structure (one read to decrypt and one read to calculate the authentication tag);
+* with SIV modes (not yet present in the PSA API, but likely to come one day) (one full pass to calculate the IV, then another full pass for the core authenticated encryption);
+
 Cipher and AEAD outputs are at risk of [write-read-inconsistency](#write-read-inconsistency) and [write-write disclosure](#write-write-disclosure) if they are implemented by copying the input into the output buffer with `memmove`, then processing the data in place. In particular, this approach makes it easy to fully support overlapping, since `memmove` will take care of overlapping cases correctly, which is otherwise hard to do portably (C99 does not offer an efficient, portable way to check whether two buffers overlap).
 
-**Design decision: the dispatch layer shall allocate an intermediate buffer for cipher and AEAD outputs**.
+**Design decision: the dispatch layer shall allocate an intermediate buffer for cipher and AEAD plaintext/ciphertext inputs and outputs**.
+
+Note that this can be a single buffer for the input and the output if the driver supports in-place operation (which it is supposed to, since it is supposed to support arbitrary overlap, although this is not always the case in Mbed TLS, a [known issue](https://github.com/Mbed-TLS/mbedtls/issues/3266)). A side benefit of doing this intermediate copy is that overlap will be supported.
+
+For all currently implemented AEAD modes, the associated data is only processed once to calculate an intermedidate value of the authentication tag.
+
+**Design decision: for now, require AEAD drivers to read their input without a risk of read-read inconsistency**. Make a note to revisit this when we start supporting an SIV mode, at which point the dispatch layer shall copy the input for modes that are not known to be low-risk.
 
 ## Design of shared memory protection
 
@@ -196,12 +208,14 @@ This section explains how Mbed TLS implements the shared memory protection strat
 
 * The core (dispatch layer) shall make a copy of the following buffers, so that drivers do not receive arguments that are in shared memory:
     * Any input or output from asymmetric cryptography (signature, encryption/decryption, key exchange, PAKE), including key import and export.
+    * Plaintext/ciphertext inputs and outputs for cipher and AEAD.
     * The output of a hash or MAC operation.
     * Cooked key derivation output.
 
 * A document shall explain the requirements on drivers for arguments whose access needs to be protected:
     * Hash and MAC input.
     * Cipher/AEAD IV/nonce (to be confirmed).
+    * AEAD associated data (to be confirmed).
     * Key derivation input (excluding key agreement).
     * Raw key derivation output (excluding cooked key derivation output).
 

From 35de1f7a7dd90503aa692f5c0c3f973d7c547a79 Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Fri, 13 Oct 2023 20:04:16 +0200
Subject: [PATCH 09/15] Distinguish whole-message signature from other
 asymmetric cryptography

Whole-message signature may process the message multiple times (EdDSA
signature does it).

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/psa-shared-memory.md | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/docs/architecture/psa-shared-memory.md b/docs/architecture/psa-shared-memory.md
index d1a4b2bdcc..da6da2212f 100644
--- a/docs/architecture/psa-shared-memory.md
+++ b/docs/architecture/psa-shared-memory.md
@@ -158,9 +158,10 @@ Note that in this context, a “small buffer” is one with a size limit that is
 
 The following buffers are considered small buffers:
 
-* Any input or output from asymmetric cryptography (signature, encryption/decryption, key exchange, PAKE), including key import and export.
-* The output of a hash or MAC operation.
+* Any input or output directly related to asymmetric cryptography (signature, encryption/decryption, key exchange, PAKE), including key import and export.
+    * Note that this does not include inputs or outputs that are not processed by an asymmetric primitives, for example the message input to `psa_sign_message` or `psa_verify_message`.
 * Cooked key derivation output.
+* The output of a hash or MAC operation.
 
 **Design decision: the dispatch layer shall copy all small buffers**.
 
@@ -200,6 +201,12 @@ For all currently implemented AEAD modes, the associated data is only processed
 
 **Design decision: for now, require AEAD drivers to read their input without a risk of read-read inconsistency**. Make a note to revisit this when we start supporting an SIV mode, at which point the dispatch layer shall copy the input for modes that are not known to be low-risk.
 
+#### Message signature
+
+For signature algorithms with a hash-and-sign framework, the input to sign/verify-message is passed to a hash, and thus can follow the same rules as [symmetric cryptography inputs with small output](#symmetric-cryptography-inputs-with-small-output). This is also true for `PSA_ALG_RSA_PKCS1V15_SIGN_RAW`, which is the only non-hash-and-sign signature mechanism implemented in Mbed TLS 3.5. This is not true for PureEdDSA (`#PSA_ALG_PURE_EDDSA`), which is not yet implemented: [PureEdDSA signature](https://www.rfc-editor.org/rfc/rfc8032#section-5.1.6) processes the message twice. (However, PureEdDSA verification only processes the message once.)
+
+**Design decision: for now, require sign/verify-message drivers to read their input without a risk of read-read inconsistency**. Make a note to revisit this when we start supporting PureEdDSA, at which point the dispatch layer shall copy the input for algorithms such as PureEdDSA that are not known to be low-risk.
+
 ## Design of shared memory protection
 
 This section explains how Mbed TLS implements the shared memory protection strategy summarized below.

From 7bc1bb65e9dc89f80c251d3b417f6a9710323869 Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Fri, 13 Oct 2023 20:05:25 +0200
Subject: [PATCH 10/15] Short explanations of what is expected in the design
 sections

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/psa-shared-memory.md | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/docs/architecture/psa-shared-memory.md b/docs/architecture/psa-shared-memory.md
index da6da2212f..440281705f 100644
--- a/docs/architecture/psa-shared-memory.md
+++ b/docs/architecture/psa-shared-memory.md
@@ -236,18 +236,20 @@ Copy what needs copying. This seems straightforward.
 
 ### Validation of copying
 
-TODO
+TODO: how to we validate that we didn't forget to copy?
 
 Proposed general idea: have tests where the test code calling API functions allocates memory in a certain pool, and code in the library allocates memory in a different pool. Test drivers check that needs-copying arguments are within the library pool, not within the test pool.
 
+Note: we could also validate by unmapping or poisoning the memory containing the API function arguments while the driver is working. But how would we validate that we have poison/unmap calls in the right places?
+
 ### Shared memory protection requirements
 
 TODO: write document and reference it here.
 
 ### Validation of protection for built-in drivers
 
-TODO
+TODO: when there is a requirement on drivers, how to we validate that our built-in implementation meets these requirements? (This may be through testing, review, static analysis or any other means or a combination.)
 
 ## Analysis of argument protection in built-in drivers
 
-TODO
+TODO: analyze the built-in implementations of mechanisms for which there is a requirement on drivers. By code inspection, how satisfied are we that they meet the requirement?

From 6998721c695db1773768269c9861eed8d6d877a4 Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Fri, 13 Oct 2023 20:05:32 +0200
Subject: [PATCH 11/15] Add a section skeleton for copy bypass

It's something we're likely to want to do at some point.

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/psa-shared-memory.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/docs/architecture/psa-shared-memory.md b/docs/architecture/psa-shared-memory.md
index 440281705f..f705e9542e 100644
--- a/docs/architecture/psa-shared-memory.md
+++ b/docs/architecture/psa-shared-memory.md
@@ -253,3 +253,11 @@ TODO: when there is a requirement on drivers, how to we validate that our built-
 ## Analysis of argument protection in built-in drivers
 
 TODO: analyze the built-in implementations of mechanisms for which there is a requirement on drivers. By code inspection, how satisfied are we that they meet the requirement?
+
+## Copy bypass
+
+For efficiency, we are likely to want mechanisms to bypass the copy and process buffers directly in builds that are not affected by shared memory considerations.
+
+Expand this section to document any mechanisms that bypass the copy.
+
+Make sure that such mechanisms preserve the guarantees when buffers overlap.

From 1f2802c403ebd795c163073cb77705258e7fa7e5 Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Fri, 13 Oct 2023 21:49:17 +0200
Subject: [PATCH 12/15] Suggest validating copy by memory poisoning

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/psa-shared-memory.md | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/docs/architecture/psa-shared-memory.md b/docs/architecture/psa-shared-memory.md
index f705e9542e..ef8e4d9984 100644
--- a/docs/architecture/psa-shared-memory.md
+++ b/docs/architecture/psa-shared-memory.md
@@ -238,9 +238,33 @@ Copy what needs copying. This seems straightforward.
 
 TODO: how to we validate that we didn't forget to copy?
 
+#### Validation of copying with memory pools
+
 Proposed general idea: have tests where the test code calling API functions allocates memory in a certain pool, and code in the library allocates memory in a different pool. Test drivers check that needs-copying arguments are within the library pool, not within the test pool.
 
-Note: we could also validate by unmapping or poisoning the memory containing the API function arguments while the driver is working. But how would we validate that we have poison/unmap calls in the right places?
+#### Validation of copying by memory poisoning
+
+Proposed general idea: in test code, “poison” the memory area used by input and output parameters that must be copied. Poisoning means something that prevents accessing memory while it is poisoned. This could be via memory protection (allocate with `mmap` then disable access with `mprotect`), or some kind of poisoning for an analyzer such as MSan or Valgrind.
+
+In the library, the code that does the copying temporarily unpoisons the memory by calling a test hook.
+
+```
+static void copy_to_user(void *copy_buffer, void *const input_buffer, size_t length) {
+#if defined(MBEDTLS_TEST_HOOKS)
+    if (mbedtls_psa_core_poison_memory != NULL) {
+        mbedtls_psa_core_poison_memory(copy_buffer, length, 0);
+    }
+#endif
+    memcpy(copy_buffer, input_buffer, length);
+#if defined(MBEDTLS_TEST_HOOKS)
+    if (mbedtls_psa_core_poison_memory != NULL) {
+        mbedtls_psa_core_poison_memory(copy_buffer, length, 1);
+    }
+#endif
+}
+```
+
+The reason to poison the memory before calling the library, rather than after the copy-in (and symmetrically for output buffers) is so that the test will fail if we forget to copy, or we copy the wrong thing. This would not be the case if we relied on the library's copy function to do the poisoning: that would only validate that the driver code does not access the memory on the condition that the copy is done as expected.
 
 ### Shared memory protection requirements
 

From a3ce6437bf488d84151c48f79e0ccbeaca67e76f Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Mon, 16 Oct 2023 15:39:37 +0200
Subject: [PATCH 13/15] Typos

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/psa-shared-memory.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/architecture/psa-shared-memory.md b/docs/architecture/psa-shared-memory.md
index ef8e4d9984..890918c0e2 100644
--- a/docs/architecture/psa-shared-memory.md
+++ b/docs/architecture/psa-shared-memory.md
@@ -44,7 +44,7 @@ If an input argument is in shared memory, there is a risk of a **read-read incon
 
 Vulnerability example (parsing): suppose the input contains data with a type-length-value or length-value encoding (for example, importing an RSA key). The crypto code reads the length field and checks that it fits within the buffer. (This could be the length of the overall data, or the length of an embedded field) Later, the crypto code reads the length again and uses it without validation. A malicious client can modify the length field in the shared memory between the two reads and thus cause a buffer overread on the second read.
 
-Vulnerability example (dual processing): consider an RPC to perform authenticated encryption, using a mechanism with an encrypt-and-MAC structure. The authenticated encryption implementation separately calculates the ciphertext and the MAC from the plantext. A client sets the plaintext input to `"PPPP"`, then starts the RPC call, then changes the input buffer to `"QQQQ"` while the crypto service is working.
+Vulnerability example (dual processing): consider an RPC to perform authenticated encryption, using a mechanism with an encrypt-and-MAC structure. The authenticated encryption implementation separately calculates the ciphertext and the MAC from the plaintext. A client sets the plaintext input to `"PPPP"`, then starts the RPC call, then changes the input buffer to `"QQQQ"` while the crypto service is working.
 
 * Any of `enc("PPPP")+mac("PPPP")`, `enc("PPQQ")+mac("PPQQ")` or `enc("QQQQ")+mac("QQQQ")` are valid outputs: they are outputs that can be produced by this authenticated encryption RPC.
 * If the authenticated encryption calculates the ciphertext before the client changes the output buffer and calculates the MAC after that change, reading the input buffer again each time, the output will be `enc("PPPP")+mac("QQQQ")`. There is no input that can lead to this output, hence this behavior violates the security guarantees of the crypto service.
@@ -191,13 +191,13 @@ AEAD encryption and decryption are at risk of [read-read inconsistency](#read-re
 * when decrypting with an encrypt-then-authenticate structure (one read to decrypt and one read to calculate the authentication tag);
 * with SIV modes (not yet present in the PSA API, but likely to come one day) (one full pass to calculate the IV, then another full pass for the core authenticated encryption);
 
-Cipher and AEAD outputs are at risk of [write-read-inconsistency](#write-read-inconsistency) and [write-write disclosure](#write-write-disclosure) if they are implemented by copying the input into the output buffer with `memmove`, then processing the data in place. In particular, this approach makes it easy to fully support overlapping, since `memmove` will take care of overlapping cases correctly, which is otherwise hard to do portably (C99 does not offer an efficient, portable way to check whether two buffers overlap).
+Cipher and AEAD outputs are at risk of [write-read inconsistency](#write-read-inconsistency) and [write-write disclosure](#write-write-disclosure) if they are implemented by copying the input into the output buffer with `memmove`, then processing the data in place. In particular, this approach makes it easy to fully support overlapping, since `memmove` will take care of overlapping cases correctly, which is otherwise hard to do portably (C99 does not offer an efficient, portable way to check whether two buffers overlap).
 
 **Design decision: the dispatch layer shall allocate an intermediate buffer for cipher and AEAD plaintext/ciphertext inputs and outputs**.
 
 Note that this can be a single buffer for the input and the output if the driver supports in-place operation (which it is supposed to, since it is supposed to support arbitrary overlap, although this is not always the case in Mbed TLS, a [known issue](https://github.com/Mbed-TLS/mbedtls/issues/3266)). A side benefit of doing this intermediate copy is that overlap will be supported.
 
-For all currently implemented AEAD modes, the associated data is only processed once to calculate an intermedidate value of the authentication tag.
+For all currently implemented AEAD modes, the associated data is only processed once to calculate an intermediate value of the authentication tag.
 
 **Design decision: for now, require AEAD drivers to read their input without a risk of read-read inconsistency**. Make a note to revisit this when we start supporting an SIV mode, at which point the dispatch layer shall copy the input for modes that are not known to be low-risk.
 

From 87889ebe86ee5a122dd73fd0f34dd207a626779b Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Mon, 16 Oct 2023 15:40:02 +0200
Subject: [PATCH 14/15] Fix editorial error with semantic consequences

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/psa-shared-memory.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/architecture/psa-shared-memory.md b/docs/architecture/psa-shared-memory.md
index 890918c0e2..3bb49d9d29 100644
--- a/docs/architecture/psa-shared-memory.md
+++ b/docs/architecture/psa-shared-memory.md
@@ -199,7 +199,7 @@ Note that this can be a single buffer for the input and the output if the driver
 
 For all currently implemented AEAD modes, the associated data is only processed once to calculate an intermediate value of the authentication tag.
 
-**Design decision: for now, require AEAD drivers to read their input without a risk of read-read inconsistency**. Make a note to revisit this when we start supporting an SIV mode, at which point the dispatch layer shall copy the input for modes that are not known to be low-risk.
+**Design decision: for now, require AEAD drivers to read the additional data without a risk of read-read inconsistency**. Make a note to revisit this when we start supporting an SIV mode, at which point the dispatch layer shall copy the input for modes that are not known to be low-risk.
 
 #### Message signature
 

From 8ebeb9c180a91c220ab746a1af6f16ffa48c1989 Mon Sep 17 00:00:00 2001
From: Gilles Peskine <Gilles.Peskine@arm.com>
Date: Mon, 16 Oct 2023 18:35:54 +0200
Subject: [PATCH 15/15] Test for read-read inconsistency with mprotect and
 ptrace/gdb

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
---
 docs/architecture/psa-shared-memory.md | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/docs/architecture/psa-shared-memory.md b/docs/architecture/psa-shared-memory.md
index 3bb49d9d29..783b118841 100644
--- a/docs/architecture/psa-shared-memory.md
+++ b/docs/architecture/psa-shared-memory.md
@@ -274,6 +274,30 @@ TODO: write document and reference it here.
 
 TODO: when there is a requirement on drivers, how to we validate that our built-in implementation meets these requirements? (This may be through testing, review, static analysis or any other means or a combination.)
 
+Note: focusing on read-read inconsistencies for now, as most of the cases where we aren't copying are inputs.
+
+#### Linux mprotect+ptrace
+
+Idea: call `mmap` to allocate memory for arguments and `mprotect` to deny or reenable access. Use `ptrace` from a parent process to react to SIGSEGV from a denied access. On SIGSEGV happening in the faulting region:
+
+1. Use `ptrace` to execute a `mprotect` system call in the child to enable access. TODO: How? `ptrace` can modify registers and memory in the child, which includes changing parameters of a syscall that's about to be executed, but not directly cause the child process to execute a syscall that it wasn't about to execute.
+2. Use `ptrace` with `PTRACE_SINGLESTEP` to re-execute the failed load/store instrution.
+3. Use `ptrace` to execute a `mprotect` system call in the child to disable access.
+4. Use `PTRACE_CONT` to resume the child execution.
+
+Record the addresses that are accessed. Mark the test as failed if the same address is read twice.
+
+#### Debugger + mprotect
+
+Idea: call `mmap` to allocate memory for arguments and `mprotect` to deny or reenable access. Use a debugger to handle SIGSEGV (Gdb: set signal catchpoint). If the segfault was due to accessing the protected region:
+
+1. Execute `mprotect` to allow access.
+2. Single-step the load/store instruction.
+3. Execute `mprotect` to disable access.
+4. Continue execution.
+
+Record the addresses that are accessed. Mark the test as failed if the same address is read twice. This part might be hard to do in the gdb language, so we may want to just log the addresses and then use a separate program to analyze the logs, or do the gdb tasks from Python.
+
 ## Analysis of argument protection in built-in drivers
 
 TODO: analyze the built-in implementations of mechanisms for which there is a requirement on drivers. By code inspection, how satisfied are we that they meet the requirement?