13 Commits

Author SHA1 Message Date
Alessandro Gatti
7a69b2d786 py/asmrv32: Use Zcmp opcodes for function prologues and epilogues.
This commit introduces the possibility of using Zcmp opcodes when
generating function prologues and epilogues, reducing the generated code
size.

With the addition of selected Zcmp opcodes, each generated function can
be up to 30 bytes shorter and having a faster prologue and epilogue.  If
Zcmp opcodes can be used then register saving is a matter of a simple
CM.PUSH opcode rather than a series of C.SWSP opcodes.  Conversely,
register restoring is a single CM.POPRET opcode instead of a series of
C.LWSP opcodes followed by a C.JR RA opcode.  This should also lead to
faster code given that there's only one opcode doing the registers
saving rather than a series of them.

For functions that allocate less than three locals then the generated
code will allocate up to 12 bytes of unused stack space.  Whilst this is
a relatively rare occurrence for generated native and viper code,
inline assembler blocks will probably incur into this penalty.  Still,
considering that at the moment the only targets that support Zcmp
opcodes are relatively high-end MCUs (the RP2350 in RV32 mode and the
ESP32P4), this is probably not much of an issue.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2025-12-19 17:06:53 +11:00
Alessandro Gatti
3cd95dda64 py/asmrv32: Generate better comparison sequences.
This commit changes the sequences generated for not-equal and
less-than-or-equal comparisons, in favour of better replacements.

The new not-equal comparison generates a sequence of equal size but
without the burden of a jump to set the output value, this also had
the effect of reducing the size of the code generator as only two
opcodes need to be generated instead of three.

The less-than-or-equal sequence, on the other hand, is actually two
bytes shorter and does not contain any jumps.  If Zcb opcodes can be
used for performing the final XOR operation then two more bytes could be
saved on each comparison.  The same remarks about having a shorter
generator due to two opcodes being generated instead of three still
applies here.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2025-11-03 13:53:45 +11:00
Alessandro Gatti
a1684ad2c1 py/asmrv32: Refactor register-indexed load/store emitters.
This commit shortens register-indexed load/store emitter functions, by
reusing integer-indexed equivalent operations as part of the sequence
generation process.

Before these changes, register-indexed load/store emitters would follow
two steps to generate the sequence: generate opcodes to fix up the
register offset to make it point to the exact position in memory where
the operation should take place, and then perform the load/store
operation itself using 0 as an offset from the recalculated address
register.

Since there is already a generic optimised emitter for integer-indexed
load/stores, that bit of code can be reused rather than having an ad-hoc
implementation that is tailored to operate on an offset of 0.  Removing
the custom emitter code in favour of calling the general integer-indexed
emitter saves around 150 bytes without any changes in the emitter
behaviour (generating the same opcode sequence and making use of future
improvement in that emitter too).

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2025-11-03 13:21:46 +11:00
Alessandro Gatti
cb7ca6f1bc py/asmrv32: Use RV32 Zba opcodes if possible.
This commit adds optional support for selected Zba opcodes (address
generation) to speed up Viper and native code generation on MCUs where
those opcodes are supported (namely RP2350).

Right now support for these opcodes is opt-in, as extension detection
granularity on the RISC-V platform is still a bit in flux.  Relying on
the 'B' bit in the MISA register may yield both false positives and
false negatives depending on the RISC-V implementation the check runs
on.

As a side-effect of Zba support, regular non-byte load/stores have been
made shorter by two bytes.  Whilst this makes code using Zba take up the
same space as non-Zba code, the former will still be faster as it will
have to process just one instruction instead of two, without stalling
registers between the shift and the addition needed to compute the final
offset.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2025-09-19 15:51:37 +02:00
Alessandro Gatti
0ee3f99da2 py/asmrv32: Make lt/le comparisons emitter shorter.
Some checks failed
JavaScript code lint and formatting with Biome / eslint (push) Has been cancelled
Check code formatting / code-formatting (push) Has been cancelled
Check spelling with codespell / codespell (push) Has been cancelled
Build docs / build (push) Has been cancelled
Check examples / embedding (push) Has been cancelled
Package mpremote / build (push) Has been cancelled
.mpy file format and tools / test (push) Has been cancelled
Build ports metadata / build (push) Has been cancelled
alif port / build_alif (alif_ae3_build) (push) Has been cancelled
cc3200 port / build (push) Has been cancelled
esp32 port / build_idf (esp32_build_c2_c6) (push) Has been cancelled
esp32 port / build_idf (esp32_build_cmod_spiram_s2) (push) Has been cancelled
esp32 port / build_idf (esp32_build_s3_c3) (push) Has been cancelled
esp8266 port / build (push) Has been cancelled
mimxrt port / build (push) Has been cancelled
nrf port / build (push) Has been cancelled
powerpc port / build (push) Has been cancelled
qemu port / build_and_test_arm (bigendian) (push) Has been cancelled
qemu port / build_and_test_arm (sabrelite) (push) Has been cancelled
qemu port / build_and_test_arm (thumb) (push) Has been cancelled
qemu port / build_and_test_rv32 (push) Has been cancelled
renesas-ra port / build_renesas_ra_board (push) Has been cancelled
rp2 port / build (push) Has been cancelled
samd port / build (push) Has been cancelled
stm32 port / build_stm32 (stm32_misc_build) (push) Has been cancelled
stm32 port / build_stm32 (stm32_nucleo_build) (push) Has been cancelled
stm32 port / build_stm32 (stm32_pyb_build) (push) Has been cancelled
unix port / minimal (push) Has been cancelled
unix port / reproducible (push) Has been cancelled
unix port / standard (push) Has been cancelled
unix port / standard_v2 (push) Has been cancelled
unix port / coverage (push) Has been cancelled
unix port / coverage_32bit (push) Has been cancelled
unix port / nanbox (push) Has been cancelled
unix port / longlong (push) Has been cancelled
unix port / float (push) Has been cancelled
unix port / gil_enabled (push) Has been cancelled
unix port / stackless_clang (push) Has been cancelled
unix port / float_clang (push) Has been cancelled
unix port / settrace_stackless (push) Has been cancelled
unix port / macos (push) Has been cancelled
unix port / qemu_mips (push) Has been cancelled
unix port / qemu_arm (push) Has been cancelled
unix port / qemu_riscv64 (push) Has been cancelled
unix port / sanitize_address (push) Has been cancelled
unix port / sanitize_undefined (push) Has been cancelled
webassembly port / build (push) Has been cancelled
windows port / build-vs (Debug, x64, dev, 2017, [15, 16)) (push) Has been cancelled
windows port / build-vs (Debug, x64, dev, 2022, [17, 18)) (push) Has been cancelled
windows port / build-vs (Debug, x86, dev, 2017, [15, 16)) (push) Has been cancelled
windows port / build-vs (Debug, x86, dev, 2022, [17, 18)) (push) Has been cancelled
windows port / build-vs (Release, x64, dev, 2017, [15, 16)) (push) Has been cancelled
windows port / build-vs (Release, x64, dev, 2019, [16, 17)) (push) Has been cancelled
windows port / build-vs (Release, x64, dev, 2022, [17, 18)) (push) Has been cancelled
windows port / build-vs (Release, x64, standard, 2017, [15, 16)) (push) Has been cancelled
windows port / build-vs (Release, x64, standard, 2019, [16, 17)) (push) Has been cancelled
windows port / build-vs (Release, x64, standard, 2022, [17, 18)) (push) Has been cancelled
windows port / build-vs (Release, x86, dev, 2017, [15, 16)) (push) Has been cancelled
windows port / build-vs (Release, x86, dev, 2019, [16, 17)) (push) Has been cancelled
windows port / build-vs (Release, x86, dev, 2022, [17, 18)) (push) Has been cancelled
windows port / build-vs (Release, x86, standard, 2017, [15, 16)) (push) Has been cancelled
windows port / build-vs (Release, x86, standard, 2019, [16, 17)) (push) Has been cancelled
windows port / build-vs (Release, x86, standard, 2022, [17, 18)) (push) Has been cancelled
windows port / build-mingw (i686, mingw32, dev) (push) Has been cancelled
windows port / build-mingw (i686, mingw32, standard) (push) Has been cancelled
windows port / build-mingw (x86_64, mingw64, dev) (push) Has been cancelled
windows port / build-mingw (x86_64, mingw64, standard) (push) Has been cancelled
windows port / cross-build-on-linux (push) Has been cancelled
zephyr port / build (push) Has been cancelled
Python code lint and formatting with ruff / ruff (push) Has been cancelled
This commit simplifies the emitter code in charge of generating opcodes
performing less-than and less-than-or-equal comparisons.

By rewriting the SLT/SLTU opcode generator (handling less-than
comparisons) and de-inlining the less-than comparison generator call in
the less-than-or-equal generator, the output binary is ~80 bytes smaller
(measurements taken from the QEMU port).

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2025-08-11 22:31:13 +10:00
Alessandro Gatti
a8e0369826 py/asmrv32: Implement the full set of Viper load/store operations.
This commit expands the implementation of Viper load/store operations
that are optimised for the RV32 platform.

Given the way opcodes are encoded, all value sizes are implemented with
only two functions - one for loads and one for stores.  This should
reduce duplication with existing operations and should, in theory, save
space as some code is removed.  Both load and store emitters will
generate the shortest possible sequence (as long as the stack pointer is
not involved), using compressed opcodes when possible.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2025-07-01 15:34:29 +10:00
Alessandro Gatti
268acb714d py/emitinlinerv32: Add inline assembler support for RV32.
This commit adds support for writing inline assembler functions when
targeting a RV32IMC processor.

Given that this takes up a bit of rodata space due to its large
instruction decoding table and its extensive error messages, it is
enabled by default only on offline targets such as mpy-cross and the
qemu port.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2025-01-02 11:49:10 +11:00
Alessandro Gatti
3044233ea3 py/misc: Add a popcount(uint32_t) implementation.
This makes the existing popcount(uint32_t) implementation found in the
RV32 emitter available to the rest of the codebase.  This version of
popcount will use intrinsic or builtin implementations if they are
available, falling back to a generic implementation if that is not the
case.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2025-01-01 10:44:53 +01:00
Alessandro Gatti
7d8b2d89cc py/asmrv32: Use REG_TEMP2 whenever possible.
Some checks are pending
JavaScript code lint and formatting with Biome / eslint (push) Waiting to run
Check code formatting / code-formatting (push) Waiting to run
Check code size / build (push) Waiting to run
Check spelling with codespell / codespell (push) Waiting to run
Check commit message formatting / build (push) Waiting to run
Build docs / build (push) Waiting to run
Check examples / embedding (push) Waiting to run
Package mpremote / build (push) Waiting to run
.mpy file format and tools / test (push) Waiting to run
Build ports metadata / build (push) Waiting to run
cc3200 port / build (push) Waiting to run
esp32 port / build_idf (esp32_build_cmod_spiram_s2) (push) Waiting to run
esp32 port / build_idf (esp32_build_s3_c3) (push) Waiting to run
esp8266 port / build (push) Waiting to run
mimxrt port / build (push) Waiting to run
nrf port / build (push) Waiting to run
powerpc port / build (push) Waiting to run
qemu-arm port / build_and_test (push) Waiting to run
qemu-riscv port / build_and_test (push) Waiting to run
renesas-ra port / build_renesas_ra_board (push) Waiting to run
rp2 port / build (push) Waiting to run
samd port / build (push) Waiting to run
stm32 port / build_stm32 (stm32_misc_build) (push) Waiting to run
stm32 port / build_stm32 (stm32_nucleo_build) (push) Waiting to run
stm32 port / build_stm32 (stm32_pyb_build) (push) Waiting to run
unix port / minimal (push) Waiting to run
unix port / reproducible (push) Waiting to run
unix port / standard (push) Waiting to run
unix port / standard_v2 (push) Waiting to run
unix port / coverage (push) Waiting to run
unix port / coverage_32bit (push) Waiting to run
unix port / nanbox (push) Waiting to run
unix port / float (push) Waiting to run
unix port / stackless_clang (push) Waiting to run
unix port / float_clang (push) Waiting to run
unix port / settrace (push) Waiting to run
unix port / settrace_stackless (push) Waiting to run
unix port / macos (push) Waiting to run
unix port / qemu_mips (push) Waiting to run
unix port / qemu_arm (push) Waiting to run
unix port / qemu_riscv64 (push) Waiting to run
webassembly port / build (push) Waiting to run
windows port / build-vs (Debug, x64, windows-2022, dev, 2022, [17, 18)) (push) Waiting to run
windows port / build-vs (Debug, x64, windows-latest, dev, 2017, [15, 16)) (push) Waiting to run
windows port / build-vs (Debug, x86, windows-2022, dev, 2022, [17, 18)) (push) Waiting to run
windows port / build-vs (Debug, x86, windows-latest, dev, 2017, [15, 16)) (push) Waiting to run
windows port / build-vs (Release, x64, windows-2019, dev, 2019, [16, 17)) (push) Waiting to run
windows port / build-vs (Release, x64, windows-2019, standard, 2019, [16, 17)) (push) Waiting to run
windows port / build-vs (Release, x64, windows-2022, dev, 2022, [17, 18)) (push) Waiting to run
windows port / build-vs (Release, x64, windows-2022, standard, 2022, [17, 18)) (push) Waiting to run
windows port / build-vs (Release, x64, windows-latest, dev, 2017, [15, 16)) (push) Waiting to run
windows port / build-vs (Release, x64, windows-latest, standard, 2017, [15, 16)) (push) Waiting to run
windows port / build-vs (Release, x86, windows-2019, dev, 2019, [16, 17)) (push) Waiting to run
windows port / build-vs (Release, x86, windows-2019, standard, 2019, [16, 17)) (push) Waiting to run
windows port / build-vs (Release, x86, windows-2022, dev, 2022, [17, 18)) (push) Waiting to run
windows port / build-vs (Release, x86, windows-2022, standard, 2022, [17, 18)) (push) Waiting to run
windows port / build-vs (Release, x86, windows-latest, dev, 2017, [15, 16)) (push) Waiting to run
windows port / build-vs (Release, x86, windows-latest, standard, 2017, [15, 16)) (push) Waiting to run
windows port / build-mingw (i686, mingw32, dev) (push) Waiting to run
windows port / build-mingw (i686, mingw32, standard) (push) Waiting to run
windows port / build-mingw (x86_64, mingw64, dev) (push) Waiting to run
windows port / build-mingw (x86_64, mingw64, standard) (push) Waiting to run
windows port / cross-build-on-linux (push) Waiting to run
zephyr port / build (push) Waiting to run
Python code lint and formatting with ruff / ruff (push) Waiting to run
The RV32 emitter used an additional temporary register, as certain code
sequences required extra storage.  This commit removes its usage in all
but one case, using REG_TEMP2 instead.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2024-08-19 15:53:50 +10:00
Alessandro Gatti
da0e027fa5 py/asmrv32: Emit C.LW opcodes only when necessary.
The RV32 emitter sometimes generated short load opcodes even when it
was not supposed to.  This commit fixes an off-by-one error in its
offset eligibility range calculation and corrects one case of offset
calculation, operating on the raw label index number rather than its
effective offset in the stack (C.LW assumes all loads are
word-aligned).

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2024-08-19 15:53:50 +10:00
Alessandro Gatti
326e1149ec py/asmrv32: Fix short/long jumps scheduling.
The RV32 emitter always scheduled short jumps even outside the emit
compiler pass.  Running the full test suite through the native emitter
instead of just the tests that depend on the emitter at runtime (as in,
`micropython/native_*` and `micropython/viper_* tests`) uncovered more
places where the invalid behaviour was still present.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2024-08-19 15:53:50 +10:00
Alessandro Gatti
0600e4f273 py/asmrv32: Make some code sequences smaller.
This commit changes a few code sequences to use more compressed opcodes
where possible.  The sequences in question are the ones that show up the
most in the test suite and require the least amount of code changes, namely
short offset loads from memory to RET/ARG registers, indirect calls through
the function table, register-based jumps, locals' offset calculation,
reg-is-null jumps, and register comparisons.

There are no speed losses or gains from these changes, but there is an
average 15-20% generated code size reduction.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2024-07-01 11:36:03 +10:00
Alessandro Gatti
8338f66352 py/asmrv32: Add RISC-V RV32IMC native code emitter.
This adds a native code generation backend for RISC-V RV32I CPUs, currently
limited to the I, M, and C instruction sets.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2024-06-21 15:06:07 +10:00