Commit Graph

1 Commits

Author SHA1 Message Date
Damien George
7ea88f7da2 py/lexer: Add support for nested f-strings within f-strings.
It turns out that it's relatively simple to support nested f-strings, which
is what this commit implements.

The way the MicroPython f-string parser works at the moment is:
1. it extracts the f-string arguments (things in curly braces) into a
   temporary buffer (a vstr)
2. once the f-string ends (reaches its closing quote) the lexer switches to
   tokenizing the temporary buffer
3. once the buffer is empty it switches back to the stream.

The temporary buffer can easily hold f-strings itself (ie nested f-strings)
and they can be re-parsed by the lexer using the same algorithm.  The only
thing stopping that from working is that the temporary buffer can't be
reused for the nested f-string because it's currently being parsed.

This commit fixes that by adding a second temporary buffer, which is the
"injection" buffer.  That allows arbitrary number of nestings with a simple
modification to the original algorithm:
1. when an f-string is encountered the string is parsed and its arguments
   are extracted into `fstring_args`
2. when the f-string finishes, `fstring_args` is inserted into the current
   position in `inject_chrs` (which is the start of that buffer if no
   injection is ongoing)
3. `fstring_args` is now cleared and ready for any further f-strings
   (nested or not)
4. the lexer switches to `inject_chrs` if it's not already reading from it
5. if an f-string appeared inside the f-string then it is in `inject_chrs`
   and can be processed as before, extracting its arguments into
   `fstring_args`, which can then be inserted again into `inject_chrs`
6. once `inject_chrs` is exhausted (meaning that all levels of f-strings
   have been fully processed) the lexer switched back to tokenizing the
   stream.

Amazingly, this scheme supports arbitrary numbers of nestings of f-strings
using the same quote style.

This adds some code size and a bit more memory usage for the lexer.  In
particular for a single (non-nested) f-string it now makes an extra copy of
the `fstring_args` data, when copying it across to `inject_chrs`.
Otherwise, memory use only goes up with the complexity of nested f-strings.

Signed-off-by: Damien George <damien@micropython.org>
2026-02-04 23:19:09 +11:00