Commit Graph

43 Commits

Author SHA1 Message Date
Damien George 7ea88f7da2 py/lexer: Add support for nested f-strings within f-strings.
It turns out that it's relatively simple to support nested f-strings, which
is what this commit implements.

The way the MicroPython f-string parser works at the moment is:
1. it extracts the f-string arguments (things in curly braces) into a
   temporary buffer (a vstr)
2. once the f-string ends (reaches its closing quote) the lexer switches to
   tokenizing the temporary buffer
3. once the buffer is empty it switches back to the stream.

The temporary buffer can easily hold f-strings itself (ie nested f-strings)
and they can be re-parsed by the lexer using the same algorithm.  The only
thing stopping that from working is that the temporary buffer can't be
reused for the nested f-string because it's currently being parsed.

This commit fixes that by adding a second temporary buffer, which is the
"injection" buffer.  That allows arbitrary number of nestings with a simple
modification to the original algorithm:
1. when an f-string is encountered the string is parsed and its arguments
   are extracted into `fstring_args`
2. when the f-string finishes, `fstring_args` is inserted into the current
   position in `inject_chrs` (which is the start of that buffer if no
   injection is ongoing)
3. `fstring_args` is now cleared and ready for any further f-strings
   (nested or not)
4. the lexer switches to `inject_chrs` if it's not already reading from it
5. if an f-string appeared inside the f-string then it is in `inject_chrs`
   and can be processed as before, extracting its arguments into
   `fstring_args`, which can then be inserted again into `inject_chrs`
6. once `inject_chrs` is exhausted (meaning that all levels of f-strings
   have been fully processed) the lexer switched back to tokenizing the
   stream.

Amazingly, this scheme supports arbitrary numbers of nestings of f-strings
using the same quote style.

This adds some code size and a bit more memory usage for the lexer.  In
particular for a single (non-nested) f-string it now makes an extra copy of
the `fstring_args` data, when copying it across to `inject_chrs`.
Otherwise, memory use only goes up with the complexity of nested f-strings.

Signed-off-by: Damien George <damien@micropython.org>
2026-02-04 23:19:09 +11:00
Damien George 617c7dba3b py/lexer: Use null char as lexer EOF sentinel.
The null byte cannot exist in source code (per CPython), so use it to
indicate the end of the input stream (instead of `(mp_uint_t)-1`).  This
allows the cache chars (chr0/1/2 and their saved versions) to be 8-bit
bytes, making it clear that they are not `unichar` values.  It also saves a
bit of memory in the `mp_lexer_t` data structure.  (And in a future commit
allows the saved cache chars to be eliminated entirely by storing them in
a vstr instead.)

In order to keep code size down, the frequently used `chr0` is still of
type `uint32_t`.  Having it 32-bit means that machine instructions to load
it are smaller (it adds about +80 bytes to Thumb code if `chr0` is changed
to `uint8_t`).

Also add tests for invalid bytes in the input stream to make sure there are
no regressions in this regard.

Signed-off-by: Damien George <damien@micropython.org>
2026-02-04 23:19:09 +11:00
Damien George 96007e7de5 py/lexer: Add static assert that token enum values all fit in a byte.
Signed-off-by: Damien George <damien@micropython.org>
2024-07-18 12:44:44 +10:00
Damien George 3c8089d1b1 py/lexer: Support raw f-strings.
Support for raw str/bytes already exists, and extending that to raw
f-strings is easy.  It also reduces code size because it eliminates an
error message.

Signed-off-by: Damien George <damien@micropython.org>
2024-06-06 17:34:28 +10:00
Jim Mussared 5015779a6f py/builtinevex: Handle invalid filenames for execfile.
If a non-string buffer was passed to execfile, then it would be passed
as a non-null-terminated char* to mp_lexer_new_from_file.

This changes mp_lexer_new_from_file to take a qstr instead (as in almost
all cases a qstr will be created from this input anyway to set the
`__file__` attribute on the module).

This now makes execfile require a string (not generic buffer) argument,
which is probably a good fix to make anyway.

Fixes issue #12522.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2023-10-12 15:17:59 +11:00
Damien George 5956466c0e py/builtin: Clean up and simplify import_stat and builtin_open config.
The following changes are made:

- If MICROPY_VFS is enabled then mp_vfs_import_stat and mp_vfs_open are
  automatically used for mp_import_stat and mp_builtin_open respectively.

- If MICROPY_PY_IO is enabled then "open" is automatically included in the
  set of builtins, and points to mp_builtin_open_obj.

This helps to clean up and simplify the most common port configuration.

Signed-off-by: Damien George <damien@micropython.org>
2022-05-25 13:04:45 +10:00
Jim Mussared 692d36d779 py: Implement partial PEP-498 (f-string) support.
This implements (most of) the PEP-498 spec for f-strings and is based on
https://github.com/micropython/micropython/pull/4998 by @klardotsh.

It is implemented in the lexer as a syntax translation to `str.format`:
  f"{a}" --> "{}".format(a)

It also supports:
  f"{a=}" --> "a={}".format(a)

This is done by extracting the arguments into a temporary vstr buffer,
then after the string has been tokenized, the lexer input queue is saved
and the contents of the temporary vstr buffer are injected into the lexer
instead.

There are four main limitations:
- raw f-strings (`fr` or `rf` prefixes) are not supported and will raise
  `SyntaxError: raw f-strings are not supported`.

- literal concatenation of f-strings with adjacent strings will fail
    "{}" f"{a}" --> "{}{}".format(a)    (str.format will incorrectly use
                                         the braces from the non-f-string)
    f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate)

- PEP-498 requires the full parser to understand the interpolated
  argument, however because this entirely runs in the lexer it cannot
  resolve nested braces in expressions like
    f"{'}'}"

- The !r, !s, and !a conversions are not supported.

Includes tests and cpydiffs.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-08-14 16:58:40 +10:00
Damien George 1783950311 py/compile: Implement PEP 572, assignment expressions with := operator.
The syntax matches CPython and the semantics are equivalent except that,
unlike CPython, MicroPython allows using := to assign to comprehension
iteration variables, because disallowing this would take a lot of code to
check for it.

The new compile-time option MICROPY_PY_ASSIGN_EXPR selects this feature and
is enabled by default, following MICROPY_PY_ASYNC_AWAIT.
2020-06-16 22:02:24 +10:00
Damien George 01e5802ee3 py: Remove 3 obsolete commented-out lines from header files. 2019-11-26 21:36:41 +11:00
Damien George 2069c563f9 py: Add support for matmul operator @ as per PEP 465.
To make progress towards MicroPython supporting Python 3.5, adding the
matmul operator is important because it's a really "low level" part of the
language, being a new token and modifications to the grammar.

It doesn't make sense to make it configurable because 1) it would make the
grammar and lexer complicated/messy; 2) no other operators are
configurable; 3) it's not a feature that can be "dynamically plugged in"
via an import.

And matmul can be useful as a general purpose user-defined operator, it
doesn't have to be just for numpy use.

Based on work done by Jim Mussared.
2019-09-26 15:12:39 +10:00
Damien George 6ce7c051e8 py/lexer: Reorder operator tokens to match corresponding binary ops. 2019-09-26 14:37:26 +10:00
Alexander Steffen 55f33240f3 all: Use the name MicroPython consistently in comments
There were several different spellings of MicroPython present in comments,
when there should be only one.
2017-07-31 18:35:40 +10:00
Alexander Steffen 299bc62586 all: Unify header guard usage.
The code conventions suggest using header guards, but do not define how
those should look like and instead point to existing files. However, not
all existing files follow the same scheme, sometimes omitting header guards
altogether, sometimes using non-standard names, making it easy to
accidentally pick a "wrong" example.

This commit ensures that all header files of the MicroPython project (that
were not simply copied from somewhere else) follow the same pattern, that
was already present in the majority of files, especially in the py folder.

The rules are as follows.

Naming convention:
* start with the words MICROPY_INCLUDED
* contain the full path to the file
* replace special characters with _

In addition, there are no empty lines before #ifndef, between #ifndef and
one empty line before #endif. #endif is followed by a comment containing
the name of the guard macro.

py/grammar.h cannot use header guards by design, since it has to be
included multiple times in a single C file. Several other files also do not
need header guards as they are only used internally and guaranteed to be
included only once:
* MICROPY_MPHALPORT_H
* mpconfigboard.h
* mpconfigport.h
* mpthreadport.h
* pin_defs_*.h
* qstrdefs*.h
2017-07-18 11:57:39 +10:00
Damien George 5124a94067 py/lexer: Convert mp_uint_t to size_t where appropriate. 2017-02-17 12:44:24 +11:00
Damien George 773278ec30 py/lexer: Simplify handling of line-continuation error.
Previous to this patch there was an explicit check for errors with line
continuation (where backslash was not immediately followed by a newline).

But this check is not necessary: if there is an error then the remaining
logic of the tokeniser will reject the backslash and correctly produce a
syntax error.
2017-02-17 11:30:14 +11:00
Damien George ae43679792 py/lexer: Use strcmp to make keyword searching more efficient.
Since the table of keywords is sorted, we can use strcmp to do the search
and stop part way through the search if the comparison is less-than.

Because all tokens that are names are subject to this search, this
optimisation will improve the overall speed of the lexer when processing
a script.

The change also decreases code size by a little bit because we now use
strcmp instead of the custom str_strn_equal function.
2017-02-17 11:10:35 +11:00
Damien George c305ae3243 py/lexer: Permanently disable the mp_lexer_show_token function.
The lexer is very mature and this debug function is no longer used.  If
it's really needed one can uncomment it and recompile.
2016-12-22 10:49:54 +11:00
Damien George 5bdf1650de py/lexer: Make lexer use an mp_reader as its source. 2016-11-16 18:35:01 +11:00
pohmelie 81ebba7e02 py: add async/await/async for/async with syntax
They are sugar for marking function as generator, "yield from"
and pep492 python "semantically equivalents" respectively.

@dpgeorge was the original author of this patch, but @pohmelie made
changes to implement `async for` and `async with`.
2016-04-13 15:26:38 +01:00
Damien George 031278f661 unix: Allow to cat a script into stdin from the command line.
See issue #1306.
2015-06-04 23:42:45 +01:00
Damien George 2e2e404ff7 py: Allow to compile with extra warnings (sign-compare, unused-param). 2015-03-19 00:25:33 +00:00
Damien George 7d414a1b52 py: Parse big-int/float/imag constants directly in parser.
Previous to this patch, a big-int, float or imag constant was interned
(made into a qstr) and then parsed at runtime to create an object each
time it was needed.  This is wasteful in RAM and not efficient.  Now,
these constants are parsed straight away in the parser and turned into
objects.  This allows constants with large numbers of digits (so
addresses issue #1103) and takes us a step closer to #722.
2015-02-08 01:57:40 +00:00
Damien George b4b10fd350 py: Put all global state together in state structures.
This patch consolidates all global variables in py/ core into one place,
in a global structure.  Root pointers are all located together to make
GC tracing easier and more efficient.
2015-01-07 20:33:00 +00:00
Damien George 51dfcb4bb7 py: Move to guarded includes, everywhere in py/ core.
Addresses issue #1022.
2015-01-01 20:32:09 +00:00
Paul Sokolovsky 8ab6f90674 py: Move to guarded includes for compile.h and related headers. 2014-12-27 16:12:17 +02:00
Damien George a4c52c5a3d py: Optimise lexer by exposing lexer type.
mp_lexer_t type is exposed, mp_token_t type is removed, and simple lexer
functions (like checking current token kind) are now inlined.

This saves 784 bytes ROM on 32-bit unix, 348 bytes on stmhal, and 460
bytes on bare-arm.  It also saves a tiny bit of RAM since mp_lexer_t
is a bit smaller.  Also will run a bit more efficiently.
2014-12-05 19:35:18 +00:00
Damien George 94fbe9711a py: Change lexer stream API to return bytes not chars.
Lexer is now 8-bit clean inside strings.
2014-07-30 11:46:05 +01:00
Damien George 54eb4e723e lexer: Convert type (u)int to mp_(u)int_t. 2014-07-03 13:47:47 +01:00
Paul Sokolovsky d3439d0c60 py: Instead of having "debug on" var, have "optimization level" var.
This allows to have multiple "optimization" levels (CPython has two
(-OO removes docstrings), we can have more).
2014-06-03 12:32:59 +03:00
Damien George 04b9147e15 Add license header to (almost) all files.
Blanket wide to all .c and .h files.  Some files originating from ST are
difficult to deal with (license wise) so it was left out of those.

Also merged modpyb.h, modos.h, modstm.h and modtime.h in stmhal/.
2014-05-03 23:27:38 +01:00
Damien George e09ffa1400 Search paths properly on import and execute __init__.py if it exists. 2014-02-05 23:57:48 +00:00
Damien George b829b5caec Implement mp_parse_node_free; print properly repr(string). 2014-01-25 13:51:19 +00:00
Damien George 08335004cf Add source file name and line number to error messages.
Byte code has a map from byte-code offset to source-code line number,
used to give better error messages.
2014-01-18 23:24:36 +00:00
Damien George 9528cd66d7 Convert parse errors to exceptions.
Parser no longer prints an error, but instead returns an exception ID
and message.
2014-01-15 21:23:31 +00:00
Damien George 69a818d418 py: Improve memory management for parser; add lexer error for bad line cont. 2014-01-12 13:55:24 +00:00
Damien George 9193f89296 Move lexerstr to main py directory (everyone uses it). 2014-01-08 15:28:26 +00:00
Damien George e9906ac3d7 Add ellipsis object. 2014-01-04 18:44:46 +00:00
Damien George 66028ab6dc Basic implementation of import.
import works for simple cases.  Still work to do on finding the right
script, and setting globals/locals correctly when running an imported
function.
2014-01-03 14:03:48 +00:00
Damien d99b05282d Change object representation from 1 big union to individual structs.
A big change.  Micro Python objects are allocated as individual structs
with the first element being a pointer to the type information (which
is itself an object).  This scheme follows CPython.  Much more flexible,
not necessarily slower, uses same heap memory, and can allocate objects
statically.

Also change name prefix, from py_ to mp_ (mp for Micro Python).
2013-12-21 18:17:45 +00:00
Damien fa2162bc77 Integrate new lexer stream with stm framework. 2013-10-20 17:42:00 +01:00
Damien a5185f4bc8 Abstract out back-end stream functionality from lexer. 2013-10-20 14:41:27 +01:00
Damien 91d387de7d Improve indent/dedent error checking and reporting. 2013-10-09 15:09:52 +01:00
Damien 429d71943d Initial commit. 2013-10-04 19:53:11 +01:00