* PoC cache configuration control Expaned boards.txt.py to allow new MMU options and create revised .ld's Updated eboot to pass 48K IRAM segments. Added Cache_Read_Enable intercept to modify call for 16K ICACHE Update platform.txt to pass new mmu options through to compiler and linker preprocessor. Added quick example: esp8266/MMU48K * Style corrections Added MMU_ qualifier to new defines. Moved changes into their own file. Don't know how to fix platformio issue. * Added detailed description for Cache_Read_Enable. Updated tools/sizes.py to report correct IRAM size and indicate ICACHE size. Merged in earlephilhower's work on unaligned exception. Refactored and added support for store operations and changed the name to be more closely aligned with its function. Improved crash reporting path. * Style and MMU_SEC_HEAP corrections. * Improved asm register usage. Added some inline functions to aid in byte and short access to iRAM. * only byte read has been tested Updated .ld file to work better with platform.io; however, I am still missing some steps, so platformio will still fail. * Interesting glitch in boards.txt after github merge. A new board in master was missing new additions added by boards.txt.py in the PR. Which the CI flags when it rebuilds boards.txt. * Support for 2nd Heap, excess IRAM, through umm_malloc. Adapted changes to umm_malloc, Esp.cpp, StackThunk.cpp, WiFiClientSecureBearSSL.cpp, and virtualmem.ino to irammem.ino from @earlephilhower PR #6994. Reworked umm_malloc to use context pointers instead of copy context. umm_malloc now supports allocations from IRAM. Added class HeapSelectIram, ... to aid in selecting alternate heaps, modeled after class InterruptLock. Restrict alloc request from ISRs to DRAM. Never ending improvements to debug printing. Sec Heap option now pulls in free IRAM left over in the 1st 32K block. Managed through umm_malloc with HeapSelectIram. Updated examples. * Post push CI cleanup. * Cleanup part II * Cleanup part III * Updates to support platformio, maybe. * Added exception C wrapper replacement. * CI Cleanup * CI Cleanup II Don't know what to do with platformio it doesn't like my .S file. ifdef out USE_ISR_SAFE_EXC_WRAPPER to block the new assemlby module from building on platformio only. * Changes to exc-c-wrapper-handler.S to assemble under platformio. * For platformio, Correction to toolchain-xtensa include path. @mcspr, Thankyou! * Temporarily added --print-memory-usage to ld parameters for cross-checking IRAM size. * undo change to platform.txt * correct merge conflict. take 1 * Fixed #if... for building umm_get_oom_count. It was not building when UMM_STATS_FULL was used. * Commented out XMC support. Compatibility issues with PoC when using 16K ICACHE. * Corrected size.py, DRAM bracketing changed to not include ICACHE with DRAM total. * Added additional _context for support of use of UMM_INLINE_METRICS. Corrected some UMM_POSION missed edits. * Changes to clear errors and warnings from toolchain 10.1 Several fixes and improvements to example MMU48K. With the improved optimization in toolchain 10.1 The example divide by 0 exception was failing with a HWDT event instead of its exception handler. The compiler saw the obscured divide by 0 and replaced it with a break point. * Isolated incompatable definitions related to _xtos_set_exception_handler. GDBSTUB definitions are different from the BootROM's. * Update tools/platformio-build.py Co-authored-by: Max Prokhorov <prokhorov.max@outlook.com> * Requested changes Changed mmu related usages of ETS_... defines to DBG_MMU_... Cleanup in example MMU48K.ino. Removed stale memory reference macro and mmu_status print statement. Cleanup printf '\n' to be '\r\n'. Improved issolation of development debug prints from the rest of the debug prints. * Corrected comment. And added missing include. * Improve comment. * style and comment correction * Added draft mmu.rst file and updated index. Updated example HeapMetric.ino to also illustrate use of IRAM Improved comments in exc-c-wrapper-handler.S. Added insurance IRQ disable. * Updated mmu.rst Improved function name uniqueness for is_iram, is_dram, and is_icache by adding prefix mmu_. Also, made them available outside of a debug build. Made pointer precision width more specific. Made some of the static inline functions in mmu_irm.h safe for ISRs by setting then for always inline. * Add a default MMU_IRAM_SIZE value for a new CI test to pass. Extended use 'umm_heap_context_t *_context' argument in ..._core functions and expanded its usage to reduce unnecessary repeated calls to umm_info(NULL, false), also removed recursion from umm_info(NULL, true). Fixed stack buffer length in umm_info_safe_printf_P and heap.cpp. Added example for creating an IRAM reserve section. Updated mmu.rst. Grammar and spelling corrections. * CI appeasement * CI appeasement with comment correction. * Ensure SYS always runs with DRAM Heap selected. * Add/move heap stack overflow/underflow check to Esp.cpp where the event was discarded. * Improved comment clarity of purpose for IramReserve.ino. Clean up MMU48K.ino * Added missing #include * Corrected usage of warning * CI appeasement and use #message not #pragma message * Updated git version of eboot.elf to match build version. Good test catch. * Remove conditional build option USE_ISR_SAFE_EXC_WRAPPER, always install. Use the replacement wrapper on non32xfer_exception_handler install. Added comments to code describing some exception handling issues. * Updated mmu.rst * Expanded and clarified comments. Limited access to some detailed typdefs/prototypes to .cpp modules, to avoid future build conflicts. Completed TODO for verifing that the "C" structure struct __exception_frame matches the ASM version. Fixed some typo's, code rot, and added some more cases in examaple irammem.ino. Refactored a little and reordered printing to ease comparison between methods. Corrected `#ifdef __cplusplus` coverage area. Cleaned up `extern "C" ...` usage. Fixes issues with including mmu_iram.h or esp8266_undocumented.h in .c files. * Style fixes and more cleanup * Style fix * Remove unnessasary IRAM_ATTR from install_non32xfer_exception_handler Some comment tuning. In the context of _xtos_set_exception_handler and the functions it registers, changed to type int for exception cause type. This is also the type used by gdbstub and some other Xtensa files I found.
10 KiB
MMU - Adjust the Ratio of ICACHE to IRAM
Overview
The ESP8266 has a total of 64K of instruction memory, IRAM. This 64K of IRAM is composed of one dedicated 32K block of IRAM and two 16K blocks of IRAM. The last two 16K blocks of IRAM are flexible in the sense that it can be used as a transparent cache for external flash memory. These blocks can either be used for IRAM or an instruction cache for executing code out of flash, ICACHE.
The code generated for a sketch is divided up into two groups, ICACHE and IRAM. IRAM offers faster execution. It is used for interrupt service routines, exception handling, and time-critical code. The ICACHE allows for the execution of up to 1MB of code stored in flash. On a cache miss, a delay occurs as the instructions are read from flash via the SPI bus.
There is 98KB of DRAM space. This memory can be accessed as byte, short, or a 32-bit word. Access must be aligned according to the data type size. A 16bit short must be on a multiple of 2-byte address boundary. Likewise, a 32-bit word must be on a multiple of 4-byte address boundary. In contrast, data access in IRAM or ICACHE must always be a full 32-bit word and aligned. We will discuss a non32-bit exception handler for this later.
Option Summary
The
Arduino IDE Tools menu option, MMU
has the following
selections:
32KB cache + 32KB IRAM (balanced)
- This is the legacy ratio.
- Try this option 1st.
16KB cache + 48KB IRAM (IRAM)
- With just 16KB cache, execution of code out of flash may be slowed by more cache misses when compared to 32KB. The slowness will vary with the sketch.
- Use this if you need a little more IRAM space, and you have enough DRAM space.
16KB cache + 48KB IRAM and 2nd Heap (shared)
- This option builds on the previous option and creates a 2nd Heap made with IRAM.
- The 2nd Heap size will vary with free IRAM.
- This option is flexible. IRAM usage for code can overflow into the additional 16KB IRAM region, shrinking the 2nd Heap below 16KB. Or IRAM can be under 32KB, allowing the 2nd Heap to be larger than 16KB.
- Installs a Non-32-Bit Access handler for IRAM. This allows for byte and 16-bit aligned short access.
- This 2nd Heap is supported by the standard
malloc
APIs. - Heap selection is handled through a
HeapSelect
class. This allows a specific heap selection for the duration of a scope. - Use this option, if you are still running out of DRAM space after you have moved as many of your constant strings/data elements that you can to PROGMEM.
16KB cache + 32KB IRAM + 16KB 2nd Heap (not shared)
- Not managed by the
umm_malloc
heap library - If required, non-32-Bit Access for IRAM must be enabled separately.
- Enables a 16KB block of unmanaged IRAM memory
- Data persist across reboots, but not deep sleep.
- Works well for when you need a simple large chunk of memory. This option will reduce the resources required to support a shared 2nd Heap.
- Not managed by the
MMU related build defines and possible values. These values change as indicated with the menu options above:
#define |
balanced | IRAM | shared (IRAM and Heap) | not shared (IRAM and Heap) |
---|---|---|---|---|
MMU_IRAM_SIZE |
0x8000 |
0xC000 |
0xC000 |
0x8000 |
MMU_ICACHE_SIZE |
0x8000 |
0x4000 |
0x4000 |
0x4000 |
MMU_IRAM_HEAP |
-- | -- | defined, enablesumm_malloc |
-- |
MMU_SEC_HEAP |
-- | ** | ** | 0x40108000 |
MMU_SEC_HEAP_SIZE |
-- | ** | ** | 0x4000 |
** This define is to an inline function that calculates the value,
based on unused code space, requires
#include <mmu_iram.h>
.
The
Arduino IDE Tools menu option, Non-32-Bit Access
has the
following selections:
Use pgm_read macros for IRAM/PROGMEM
Byte/Word access to IRAM/PROGMEM (very slow)
- This option adds a non32-bit exception handler to your build.
- Handles read/writes to IRAM and reads to ICACHE.
- Supports short and byte access to IRAM
- Not recommended for high-frequency access data, use DRAM if you can.
- Expect it to be slower than DRAM, each character access, will require a complete save and restore of all 16+ registers.
- Processing an exception uses 256 bytes of stack space just to get started. The actual handler will add a little more.
- This option is implicitly enabled and required when you select MMU
option
16KB cache + 48KB IRAM and 2nd Heap (shared)
.
IRAM, unlike DRAM, must be accessed as aligned full 32-bit words, no byte or short access. The pgm_read macros are an option; however, the store operation remains an issue. For a block copy, ets_memcpy appears to work well as long as the byte count is rounded up to be evenly divided by 4, and source and destination addresses are 4 bytes aligned.
A word of caution, I have seen one case with the new toolchain 10.1
where code that reads a 32-bit word to extract a byte was optimized away
to be a byte read. Using volatile
on the pointer stopped
the over-optimization.
To get a sense of how memory access time is effected, see examples
MMU48K
and irammem
in
ESP8266
.
Miscellaneous
For calls to
umm_malloc
with interrupts disabled.
malloc
will always allocate from theDRAM
heap when called with interrupts disabled.realloc
with a NULL pointer will usemalloc
and return aDRAM
heap allocation. Note, callingrealloc
with interrupts disabled is not officially supported. You are on your own if you do this.- If you must use IRAM memory in your ISR, allocate the memory in your init code. To reduce the time spent in the ISR, avoid non32-bit access that would trigger the exception handler. For short or byte access, consider using the inline functions described in section "Performance Functions" below.
How to Select Heap
The MMU
selection
16KB cache + 48KB IRAM and 2nd Heap (shared)
allows you to
use the standard heap API function calls (malloc
,
calloc
, free
, ... ). to allocate memory from
DRAM or IRAM. This selection can be made by instantiating the class
HeapSelectIram
or HeapSelectDram
. The usage is
similar to that of the InterruptLock
class. The
default/initial heap source is DRAM. The class is in
umm_malloc/umm_heap_select.h
.
...
char *bufferDram;
bufferDram = (char *)malloc(33);
char *bufferIram;
{
HeapSelectIram ephemeral;
bufferIram = (char *)malloc(33);
}
...
free(bufferIram);
free(bufferDram);
...
free
will always return memory to the correct heap.
There is no need for tracking and selecting before freeing.
realloc
with a non-NULL pointer will always resize the
allocation from the original heap it was allocated from. When the
supplied pointer is NULL, then the current heap selection is used.
Low-level primitives for selecting a heap. These are used by the above Classes:
umm_get_current_heap_id()
umm_set_heap_by_id( ID value )
- Possible ID values
UMM_HEAP_DRAM
UMM_HEAP_IRAM
Also, an alternate stack select method API is available. This is not as easy as the class method; however, for some small set of cases, it may provide some additional control:
ESP.setIramHeap()
Pushes current heap ID onto a stack and sets Heap API for an IRAM selection.ESP.setDramHeap()
Pushes current heap ID onto a stack and sets Heap API for a DRAM selection.ESP.resetHeap()
Restores previously pushed heap. ### Identify Memory
These always inlined functions can be used to determine the resource of a pointer:
bool mmu_is_iram(const void *addr);
bool mmu_is_dram(const void *addr);
bool mmu_is_icache(const void *addr);
Performance Functions
While these always inlined functions, will bypass the need for the exception handler reducing execution time and stack use, it comes at the cost of increased code size.
These are an alternative to the pgm_read
macros for
reading from IRAM. When compiled with 'Debug Level: core' range checks
are performed on the pointer value to make sure you are reading from the
address range of IRAM, DRAM, or ICACHE.
uint8_t mmu_get_uint8(const void *p8);
uint16_t mmu_get_uint16(const uint16_t *p16);
int16_t mmu_get_int16(const int16_t *p16);
While these functions are intended for writing to IRAM, they will work with DRAM. When compiled with 'Debug Level: core', range checks are performed on the pointer value to make sure you are writing to the address range of IRAM or DRAM.
uint8_t mmu_set_uint8(void *p8, const uint8_t val);
uint16_t mmu_set_uint16(uint16_t *p16, const uint16_t val);
int16_t mmu_set_int16(int16_t *p16, const int16_t val);
: