* Reduce the IRAM (and heap) usage of I2C code
The I2C code takes a large chunk of IRAM space. Attempt to reduce the
size of the routines without impacting functionality.
First, remove the `static` classifier on the sda/scl variables in the
event handlers. The first instructions in the routines overwrite the
last value stored in them, anyway, and their addresses are never taken.
* Make most variables ints, not uint8_ts
Where it doesn't make a functional difference, make global variables
ints and not unit8_t. Bytewide updates and extracts require multiple
instructions and hence increase IRAM usage as well as runtime.
* Make local flag vars int
Sketch uses 270855 bytes (25%) of program storage space. Maximum is 1044464 bytes.
Global variables use 27940 bytes (34%) of dynamic memory, leaving 53980 bytes for local variables. Maximum is 81920 bytes.
./xtensa-lx106-elf/bin/xtensa-lx106-elf-objdump -t -j .text1 /tmp/arduino_build_9615/*elf | sort -k1 | head -20
401000cc l F .text1 00000014 twi_delay
401000ec l F .text1 00000020 twi_reply$part$1
4010010c g F .text1 00000035 twi_reply
4010014c g F .text1 00000052 twi_stop
401001a0 g F .text1 0000003b twi_releaseBus
40100204 g F .text1 000001e6 twi_onTwipEvent
40100404 l F .text1 000001f7 onSdaChange
40100608 l F .text1 000002fd onSclChange
40100908 l F .text1 0000003b onTimer
* Factor out !scl in onSdaChange
If SCL is low then all branches of the case are no-ops, so factor that
portion outo to remove some redundant logic in each case.
Sketch uses 270843 bytes (25%) of program storage space. Maximum is 1044464 bytes.
Global variables use 27944 bytes (34%) of dynamic memory, leaving 53976 bytes for local variables. Maximum is 81920 bytes.
401000cc l F .text1 00000014 twi_delay
401000ec l F .text1 00000020 twi_reply$part$1
4010010c g F .text1 00000035 twi_reply
4010014c g F .text1 00000052 twi_stop
401001a0 g F .text1 0000003b twi_releaseBus
40100204 g F .text1 000001e6 twi_onTwipEvent
40100404 l F .text1 000001e7 onSdaChange
401005f8 l F .text1 000002fd onSclChange
401008f8 l F .text1 0000003b onTimer
0x0000000040107468 _text_end = ABSOLUTE (.)
* Make tiny twi_reply inline
twi_reply is a chunk of code that can be inlined and actually save IRAM
space because certain conditions acan be statically evaluated by gcc.
Sketch uses 270823 bytes (25%) of program storage space. Maximum is 1044464 bytes.
Global variables use 27944 bytes (34%) of dynamic memory, leaving 53976 bytes for local variables. Maximum is 81920 bytes.
401000cc l F .text1 00000014 twi_delay
401000f4 g F .text1 00000052 twi_stop
40100148 g F .text1 0000003b twi_releaseBus
401001b0 g F .text1 00000206 twi_onTwipEvent
401003d0 l F .text1 000001e7 onSdaChange
401005c4 l F .text1 000002fd onSclChange
401008c4 l F .text1 0000003b onTimer
40100918 g F .text1 00000085 millis
401009a0 g F .text1 0000000f micros
401009b0 g F .text1 00000022 micros64
401009d8 g F .text1 00000013 delayMicroseconds
401009f0 g F .text1 00000034 __digitalRead
401009f0 w F .text1 00000034 digitalRead
40100a3c g F .text1 000000e4 interrupt_handler
40100b20 g F .text1 0000000f vPortFree
0x0000000040107434 _text_end = ABSOLUTE (.)
* Inline additional twi_** helper functions
Sketch uses 270799 bytes (25%) of program storage space. Maximum is 1044464 bytes.
Global variables use 27944 bytes (34%) of dynamic memory, leaving 53976 bytes for local variables. Maximum is 81920 bytes.
401000cc l F .text1 00000014 twi_delay
401000f4 w F .text1 0000003b twi_releaseBus
4010015c g F .text1 00000246 twi_onTwipEvent
401003bc l F .text1 000001e7 onSdaChange
401005b0 l F .text1 000002f9 onSclChange
401008ac l F .text1 0000003b onTimer
0x000000004010741c _text_end = ABSOLUTE (.)
* Convert state machine to 1-hot for faster lookup
GCC won't use a lookup table for the TWI state machine, so it ends up
using a series of straight line compare-jump, compare-jumps to figure
out which branch of code to execute for each state. For branches that
have multiple states that call them, this can expand to a lot of code.
Short-circuit the whole thing by converting the FSM to a 1-hot encoding
while executing it, and then just and-ing the 1-hot state with the
bitmask of states with the same code.
Sketch uses 270719 bytes (25%) of program storage space. Maximum is 1044464 bytes.
Global variables use 27944 bytes (34%) of dynamic memory, leaving 53976 bytes for local variables. Maximum is 81920 bytes.
401000cc l F .text1 00000014 twi_delay
401000f4 w F .text1 0000003b twi_releaseBus
4010015c g F .text1 00000246 twi_onTwipEvent
401003c0 l F .text1 000001b1 onSdaChange
40100580 l F .text1 000002da onSclChange
4010085c l F .text1 0000003b onTimer
0x00000000401073cc _text_end = ABSOLUTE (.)
Saves 228 bytes of IRAM vs. master, uses 32 additional bytes of heap.
* Factor out twi_status setting
twi_status is set immediately before an event handler is called,
resulting in lots of duplicated code. Set the twi_status flag inside
the handler itself.
Saves an add'l ~100 bytes of IRAM from prior changes, for a total of
~340 bytes.
earle@server:~/Arduino/hardware/esp8266com/esp8266/tools$ ./xtensa-lx106-elf/bin/xtensa-lx106-elf-objdump -t -j .text1 /tmp/arduino_build_849115/*elf | sort -k1 | head -20
401000cc l F .text1 00000014 twi_delay
401000f4 w F .text1 0000003b twi_releaseBus
40100160 g F .text1 0000024e twi_onTwipEvent
401003c8 l F .text1 00000181 onSdaChange
40100558 l F .text1 00000297 onSclChange
* Use a struct to hold globals for TWI
Thanks to the suggestion from @mhightower83, move all global objects
into a struct. This lets a single base pointer register to be used in
place of constantly reloading the address of each individual variable.
This might be better expressed by moving this to a real C++
implementaion based on a class object (the twi.xxxx would go back to the
old xxx-only naming for vars), but there would then need to be API
wrappers since the functionality is exposed through a plain C API.
Saves 168 additional code bytes, for a grand total of 550 bytes IRAM.
earle@server:~/Arduino/hardware/esp8266com/esp8266/tools$ ./xtensa-lx106-elf/bin/xtensa-lx106-elf-objdump -t -j .text1 /tmp/arduino_build_849115/*elf | sort -k1 | head -20
401000cc l F .text1 00000014 twi_delay
401000e8 w F .text1 00000032 twi_releaseBus
40100128 g F .text1 00000217 twi_onTwipEvent
4010034c l F .text1 00000149 onSdaChange
4010049c l F .text1 00000267 onSclChange
40100704 l F .text1 00000028 onTimer
* Use enums for states, move one more var to twi struct
Make the TWI states enums and not #defines, in the hope that it will
allow GCC to more easily flag problems and general good code
organization.
401000cc l F .text1 00000014 twi_delay
401000e8 w F .text1 00000032 twi_releaseBus
40100128 g F .text1 00000217 twi_onTwipEvent
4010034c l F .text1 00000149 onSdaChange
4010049c l F .text1 00000257 onSclChange
401006f4 l F .text1 00000028 onTimer
Looks like another 16 bytes IRAM saved from the prior push.
Sketch uses 267079 bytes (25%) of program storage space. Maximum is 1044464 bytes.
Global variables use 27696 bytes (33%) of dynamic memory, leaving 54224 bytes for local variables. Maximum is 81920 bytes.
* Save 4 heap bytes by reprdering struct
* Convert to C++ class, clean up code
Convert the entire file into a C++ class (with C wrappers to preserve
the ABI). This allows for setting individual values of the global
struct(class) in-situ instead of a cryptic list at the end of the struct
definition. It also removes a lot of redundant `twi.`s from most class
members.
Clean up the code by converting from `#defines` to inline functions, get
rid of ternarys-as-ifs, use real enums, etc.
For slave_receiver.ino, the numbers are:
GIT Master IRAM: 0x723c
This push IRAM: 0x6fc0
For a savings of 636 total IRAM bytes (note, there may be a slight flash
text increase, but we have 1MB of flash to work with and only 32K of IRAM
so the tradeoff makes sense.
* Run astyle core.conf, clean up space/tab/etc.
Since the C++ version has significant text differences anyway, now is a
good time to clean up the mess of spaces, tabs, and differing cuddles.
* Add enum use comment, rename twi::delay, fix SDA/SCL_READ bool usage
Per review comments
* Replace clock stretch repeated code w/inline loop
There were multiple places where the code was waiting for a slave to
finish stretching the clock. Factor them out to an *inline* function
to reduce code smell.
* Remove slave code when not using slave mode
Add a new twi_setSlaveMode call which actually attached the interrupts
to the slave pin change code onSdaChenge/onSclChange. Don't attach
interrupts in the main twi_begin.
Because slave mode is only useful should a onoReceive or onRequest
callback, call twi_setSlaveMode and attach interrupts on the Wire
setters.
This allows GCC to not link in slave code unless slave mode is used,
saving over 1,000 bytes of IRAM in the common, master-only case.
cont_run is only called by loop_task(), which is not going to execute
during an IRQ and is stored, itself, in flash.
cont_yield cannot be called from an IRQ (since it's illegal to yield
inside IRQs), so move it out of IRAM, too.
Saves ~71 bytes of IRAM
* use a scheduled function for settimeofday_cb
* per review
* use a generic and clear name for trivial functional variable type name used for callbacks
This is another instance in the core library where we pass in read-only
parameters as pass-by-value, where in the case of String() that
is inefficient as it involves copy-constructor/temp string creations.
* Replace ASM block w/C marco for PSTR
GAS doesn't support the C language idiom of catenating two strings
together with quotes (i.e. "x" "y" === "xy").
Specify the section attribute fully in the section attribute, instead,
to allow this.
* Fix WString optimization
PR #6573 introduced a corner case where a blind String() without any
initialization was in an in invalid state because the buffer and len
would not be updated properly. Concatenating to the empty string could
cause a failure.
Now, set the default state in ::init() to SSO (which is what happened
before when we were using String(char *s="")) and fix the crash.
As @dirkmuller found out in #6568, there is a difference in code
executed between `String str(nullptr)` and `String str("")`, but in the
end the actual object is identical. It's a few bytes of code, but every
little bit counts.
Update the default `String()` constructor to use `nullptr` and not `""`.
This will remove a constant literal load and the execution of the
String::copy method and strlen().
* Add typedef for putc1, fn_putc1_t.
Replaced relevant usage of `(void *)` with `fn_putc1_t`.
Correct usage of `ets_putc()`, returning 0, in libc_replacement.cpp
This PR assumes PR https://github.com/esp8266/Arduino/pull/6489#issue-315018841 has merged and removes `uart_buff_switch` from `umm_performance.cpp`
Updated method of defining `_rom_putc1` to be more acceptable (I hope) to the new compiler.
* Use PROVIDE to expose ROM function entry point, ets_uart_putc1.
Added comments to ets_putc() and ets_uart_putc1() to explain their
differences. Change prototype of ets_putc() to conform with fp_putc_t.
Updated _isr_safe_printf_P to use new definition, ets_uart_putc1.
In order to give user libs a change to update to the new symbols, re-add
the _SPIFFS_XX symbols to the linker file with a comment that they are
deprecated.
Also add back spiffs_hal_xxx functions, also marked as deprecated.
Fixes#6542
* Add code to select the UART for Boot ROM ets_putc which is used by
::printf, ets_printf_P in core_esp_postmortem.cpp and others.
ets_putc is a wrapper for uart_tx_one_char. uart_tx_one_char uses
the element buff_uart_no in UartDev (A structure in data area of the
Boot ROM) to select UART0 or UART1. uart_buff_switch is used to set
that entry.
The structure for UartDev can be found in uart.h from the
ESP8266_NONOS_SDK. As best I can tell the Boot ROM always
defaults to UART0.
* Fixes debug UART selection for ets_putc
This addresses an issue of UART selection for ROM function ets_putc,
which is used by ::printf, ets_printf_P in core_esp_postmortem.cpp
and others. Currently ets_putc stays on UART0 after
Serial1.setDebugOutput(true) is called.
ets_putc() is not affected by calls to ets_install_putc1.
Its UART selection is controlled by the ROM function uart_buff_switch.
Updated uart_set_debug() to call uart_buff_switch whenever debug is
enabled on an UART. For the case of disabling, a call to select UART0
is made, because there is no disable option for this print method.
* Removed fp_putc_t typedef, save for a later PR
* Replace the SDK's use of ets_intr_lock/unlock with nestable versions
Testing has shown that there are several paths in the SDK that result in nested
calls to ets_intr_lock() / ets_intr_unlock() which may be a problem.
These functions also do not preserve the enabled interrupt level and may
result in code running with interrupts enabled when that is not intended.
This issue has recently been fixed in the Arduino code by using
xt_rsil() / xt_wsr_ps() but still exists in the Espressif SDK code.
This commit is intended to fix that and should be used in addition to the above.
The maximum nesting I have seen is 2 and lock/unlock calls appear to be balanced.
A max of 7 levels of nesting leaves plenty of room for that to change.
* make ets_intr_lock_stack uint16_t and behave like the original on over/underflow
The PS register is 15 bits, we should store the whole thing as xt_wsr_ps()
writes the whole thing.
Also if there is an underflow, we should make sure interrupts are enabled.
Same goes for overflow making sure interrupts are disabled, although this
is less important.
* Rename ets_intr_(un)lock_nest to ets_intr_(un)lock
This saves having to modify libmain.a, libpp.a and libnet80211.a to use the
nested versions.
Adjusts fix_sdk_libs.sh accordingly.
* Remove ets_intr_(un)lock from the rom .ld as we no longer use them
* ets_post() wrapper to preserve interrupt state
Add a wrapper around the ets_post code in rom to preserve the interrupt enable state.
Rather than modifying the SDK libs, rename ets_post in the .ld file and call the
wrapper "ets_post" to replace it.
As far as I can establish, ets_post is the only rom function in use by our code or
the SDK libs we use that causes calls to ets_intr_(un)lock.
* Add IRAM_ATTR to ets_intr_(un)lock and ets_post wrappers.
* Throw in a few comments and make ets_intr_lock_stack* static.
* mDNS debug option + AP address is used by default when STA is also present
* mDNS: store network interface, checking it is up
* igmp: force on selected interface (avoid crash *sometimes*)
* fix for all lwip2 ipv4 ipv6 & lwip1
* mdns: IPAddress is not needed to reference associated interface
* mdns: debug: fix print warnings
* emulation: add ets_strncpy
* emulation: truly emulate AddrList (remove fake one)
* Per @earlephilhower suggestion
* Hints from @earlephilhower
* Namespace BearSSL in core "feels" wrong - using catch-all esp8266 instead.
* After review remarks by @earlephilhower
Adjust the ::write implementation in Print and its overridden copy in
UART to allow it to silentely accept PROGMEM strings (since there is no
write_P macro).
Fixes#6383
* Correct critical section with interrupt level preserving and nest support
alternative. Replace ets_intr_lock()/ets_intr_unlock() with uint32_t
oldValue=xt_rsil(3)/xt_wrs(oldValue). Added UMM_CRITICAL_DECL macro to define
storage for current state. Expanded UMM_CRITICAL_... to use unique
identifiers. This helpt facilitate gather function specific timing
information.
Replace printf with something that is ROM or IRAM based so that a printf
that occurs during an ISR malloc/new does not cause a crash. To avoid any
reentry issue it should also avoid doing malloc lib calls.
Refactor realloc to avoid memcpy/memmove while in critical section. This is
only effective when realloc is called with interrupts enabled. The copy
process alone can take over 10us (when copying more than ~498 bytes with a
80MHz CPU clock). It would be good practice for an ISR to avoid realloc.
Note, while doing this might initially sound scary, this appears to be very
stable. It ran on my troublesome sketch for over 3 weeks until I got back from
vacation and flashed an update. Troublesome sketch - runs ESPAsyncTCP, with
modified fauxmo emulation for 10 devices. It receives lost of Network traffic
related to uPnP scans, which includes lots of TCP connects disconnects RSTs
related to uPnP discovery.
I have clocked umm_info critical lock time taking as much as 180us. A common
use for the umm_info call is to get the free heap result. It is common
to try and closely monitor free heap as a method to detect memory leaks.
This may result in frequent calls to umm_info. There has not been a clear
test case that shows an issue yet; however, I and others think they are or
have had crashes related to this.
I have added code that adjusts the running free heap number from _umm_malloc,
_umm_realloc, and _umm_free. Removing the need to do a long interrupts
disabled calculation via _umm_info.
Build optional, min/max time measurements for locks held while in info,
malloc, realloc, and free. Also, maintain a count of how many times each is
called with INTLEVEL set.
* Fixed. travis build complaint.
* Changes for https://github.com/esp8266/Arduino/pull/6274#pullrequestreview-259579883
* Added requested comment and missing comment for UMM_CRITICAL_PERIOD_ANALYZE.
* Updated comments and update xt_rsil()
* Moved xt_rsil&co (pulled in __STRINGIFY) definitions out of
Arduino.h, to cores/esp8266/core_esp8266_features.h
Added esp_get_cycle_count() to core_esp8266_features.h.
Updated umm_malloc and Esp.h to use new defines and location.
* Added "#ifndef CORE_MOCK" around conflicted area.
* Moved performance measurment and ESP specific definitions to
umm_performance.h/cpp. Removed testing asserts.
* Commented out umm analyze. Delay CRITICAL_SECTION_EXIT() in
umm_realloc() to avoid exposing a transient OOM condition to ISR.
* Missed file change. This commit has: Delay CRITICAL_SECTION_EXIT() in
umm_realloc() to avoid exposing a transient OOM condition to ISR.
* 2nd Path. Removed early release of critical section around memmove
to avoid a possible OOM for an ISR.
* improved variable name
* Resolved ISR OOM concern with `_umm_realloc()`
Updated realloc() to do a preliminary free() of unused space,
before performing a critical section exit and memmove.
This change was applied to the current _umm_realloc().
This change should reduce the risk of an ISR getting an
OOM, during a realloc memmove operation.
Added additional stats for verifying correct operation.
* Resolved ISR OOM concern in _umm_realloc()
Updated realloc() to do a preliminary free() of unused space,
before performing a critical section exit and memmove.
This change was applied to the current _umm_realloc().
This change should reduce the risk of an ISR getting an
OOM when interrupting an active realloc memmove operation.
Added additional stats for verifying correct operation.
Updated: for clarity and Travis-CI fail.
* Update to keep access to alternate printf in one file.
* Updated to use ISR safe versions of memmove, memcpy, and memset.
The library versions of memmove, memcpy, and memset were in flash.
Updated to use ROM functions ets_memmove, ets_memcpy, and ets_memset.
Additional note, the library version of memmove does not appear to
have been optimized. It took almost 10x longer than the ROM version.
Renamed printf macro to DBGLOG_FUNCTION and moved to umm_malloc_cfg.h.
Changed printf macro usage to use DBGLOG_FUNCTION.
* Update umm_malloc.cpp
Fix comment
Fixes#6066
Preserve any existing sample rate for the I2S unit when performing an
`i2s_begin`. If nothing has ever been set, default to 44.1KHz as
before.
* Add a FS::check() optional method
Fixes#2634
Expose any low-level filesystem check operations for users, and add
documentation on this and the gc() methods.
* Update doc w/more gc() info and link
Default mode (no exceptions) will no longer use the stdc++ library new
allocator when there is not enough memory. Instead, it will return
nullptr. This is the pre-exceptions-available behavior (2.5.0 and
earlier).
When exceptions are enabled, use the real new and throw exceptions that
can be caught at higher levels, or which will crash the app with an
uncaught exception if they're not handled.
Update to #6309
Fixes#2090
The Updater checks that an update isn't already in progress on ::begin,
but when an error happens in the middle of an upload it's impossible to
actually reset this flag w/o a reboot.
Reset the state members (esp. _size) on any error condition so
that you can restart the transfer with a new ::begin. Any error
condition is fatal, anyway, so there is no reason not to clear the
current state at that point.
The SPIFFS config object was defined in FS.h in its own namespace, but
is not made easily available like other SPIFFS and FS objects because of
a missing `using` statement. Add it in FS.h
Fixes#6322
* workaround when an exceptin occurs while in an ISR
* tuning for gdb
* remove dead code and rename defines/variables
* per reviews: naming, handle "unhandled return" case
* fix reset message
* Created empty method
* Changed method name from "empty" to "isEmpty". Created a new method to empty a string
* Changed method name from "empty" to "clear".
Cleans up all warnings seen w/GCC 9.1 to allow it to track the main
branch more easily until 3.x.
Does not include Ticker.h "fix" of pragmas around a function cast we're
doing that GCC9 doesn't like, that will be addressed separately and
maybe only in the 3.0 branch.
Does not include GDB hook fix, either, because the pragmas required
to disable the GCC9.1 warnings don't exist in 4.8 at all.
Without this the compiler may use memory references loaded to registers before the fence, in computation within the fence. These values could have changed before xt_rsil()
(critical section start) was called.
Note: this is needed to stop the compiler from reordering instructions at the critical section boundary.
* enable by default latest 2.2.x firmware, including fixed espnow
* LittleFS: avoid crash when FS size is 0
* flash size defaults: 1M for generic board, not empty FS for all
Apply most compatible changes needed to get the core compiling under GCC
7.2 to the main gcc 4.8 tree to ease porting for 3.0.0.
Update pgmspace.h with corrected and optimized unaligned pgm_read
macros. Now pgm_read_dword in the unaligned case gives proper results
even if optimization is enabled and is also written in assembly and only
1 instruction longer than the pgm_read_byte macro (which also has been
optimized to reduce 1 instruction). These changes should marginally
shrink code and speed up flash reads accordingly.
The toolchain should/will be rebuilt at a later time with this
optimization to ensure it's used in the libc.a/etc. files.
After verifying that they really were spurious, clean up the warnings
that gcc -wextra reports, except for LeaMDNS.
Upgrade GCC to gcc-7 for host builds