micropython

mirror of https://github.com/micropython/micropython.git synced 2026-05-01 05:10:15 +02:00

Author	SHA1	Message	Date
Damien George	7ea88f7da2	py/lexer: Add support for nested f-strings within f-strings. It turns out that it's relatively simple to support nested f-strings, which is what this commit implements. The way the MicroPython f-string parser works at the moment is: 1. it extracts the f-string arguments (things in curly braces) into a temporary buffer (a vstr) 2. once the f-string ends (reaches its closing quote) the lexer switches to tokenizing the temporary buffer 3. once the buffer is empty it switches back to the stream. The temporary buffer can easily hold f-strings itself (ie nested f-strings) and they can be re-parsed by the lexer using the same algorithm. The only thing stopping that from working is that the temporary buffer can't be reused for the nested f-string because it's currently being parsed. This commit fixes that by adding a second temporary buffer, which is the "injection" buffer. That allows arbitrary number of nestings with a simple modification to the original algorithm: 1. when an f-string is encountered the string is parsed and its arguments are extracted into `fstring_args` 2. when the f-string finishes, `fstring_args` is inserted into the current position in `inject_chrs` (which is the start of that buffer if no injection is ongoing) 3. `fstring_args` is now cleared and ready for any further f-strings (nested or not) 4. the lexer switches to `inject_chrs` if it's not already reading from it 5. if an f-string appeared inside the f-string then it is in `inject_chrs` and can be processed as before, extracting its arguments into `fstring_args`, which can then be inserted again into `inject_chrs` 6. once `inject_chrs` is exhausted (meaning that all levels of f-strings have been fully processed) the lexer switched back to tokenizing the stream. Amazingly, this scheme supports arbitrary numbers of nestings of f-strings using the same quote style. This adds some code size and a bit more memory usage for the lexer. In particular for a single (non-nested) f-string it now makes an extra copy of the `fstring_args` data, when copying it across to `inject_chrs`. Otherwise, memory use only goes up with the complexity of nested f-strings. Signed-off-by: Damien George <damien@micropython.org>	2026-02-04 23:19:09 +11:00
Damien George	617c7dba3b	py/lexer: Use null char as lexer EOF sentinel. The null byte cannot exist in source code (per CPython), so use it to indicate the end of the input stream (instead of `(mp_uint_t)-1`). This allows the cache chars (chr0/1/2 and their saved versions) to be 8-bit bytes, making it clear that they are not `unichar` values. It also saves a bit of memory in the `mp_lexer_t` data structure. (And in a future commit allows the saved cache chars to be eliminated entirely by storing them in a vstr instead.) In order to keep code size down, the frequently used `chr0` is still of type `uint32_t`. Having it 32-bit means that machine instructions to load it are smaller (it adds about +80 bytes to Thumb code if `chr0` is changed to `uint8_t`). Also add tests for invalid bytes in the input stream to make sure there are no regressions in this regard. Signed-off-by: Damien George <damien@micropython.org>	2026-02-04 23:19:09 +11:00
Damien George	96007e7de5	py/lexer: Add static assert that token enum values all fit in a byte. Signed-off-by: Damien George <damien@micropython.org>	2024-07-18 12:44:44 +10:00
Damien George	3c8089d1b1	py/lexer: Support raw f-strings. Support for raw str/bytes already exists, and extending that to raw f-strings is easy. It also reduces code size because it eliminates an error message. Signed-off-by: Damien George <damien@micropython.org>	2024-06-06 17:34:28 +10:00
Jim Mussared	5015779a6f	py/builtinevex: Handle invalid filenames for execfile. If a non-string buffer was passed to execfile, then it would be passed as a non-null-terminated char* to mp_lexer_new_from_file. This changes mp_lexer_new_from_file to take a qstr instead (as in almost all cases a qstr will be created from this input anyway to set the `__file__` attribute on the module). This now makes execfile require a string (not generic buffer) argument, which is probably a good fix to make anyway. Fixes issue #12522. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>	2023-10-12 15:17:59 +11:00
Damien George	5956466c0e	py/builtin: Clean up and simplify import_stat and builtin_open config. The following changes are made: - If MICROPY_VFS is enabled then mp_vfs_import_stat and mp_vfs_open are automatically used for mp_import_stat and mp_builtin_open respectively. - If MICROPY_PY_IO is enabled then "open" is automatically included in the set of builtins, and points to mp_builtin_open_obj. This helps to clean up and simplify the most common port configuration. Signed-off-by: Damien George <damien@micropython.org>	2022-05-25 13:04:45 +10:00
Jim Mussared	692d36d779	py: Implement partial PEP-498 (f-string) support. This implements (most of) the PEP-498 spec for f-strings and is based on https://github.com/micropython/micropython/pull/4998 by @klardotsh. It is implemented in the lexer as a syntax translation to `str.format`: f"{a}" --> "{}".format(a) It also supports: f"{a=}" --> "a={}".format(a) This is done by extracting the arguments into a temporary vstr buffer, then after the string has been tokenized, the lexer input queue is saved and the contents of the temporary vstr buffer are injected into the lexer instead. There are four main limitations: - raw f-strings (`fr` or `rf` prefixes) are not supported and will raise `SyntaxError: raw f-strings are not supported`. - literal concatenation of f-strings with adjacent strings will fail "{}" f"{a}" --> "{}{}".format(a) (str.format will incorrectly use the braces from the non-f-string) f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate) - PEP-498 requires the full parser to understand the interpolated argument, however because this entirely runs in the lexer it cannot resolve nested braces in expressions like f"{'}'}" - The !r, !s, and !a conversions are not supported. Includes tests and cpydiffs. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>	2021-08-14 16:58:40 +10:00
Damien George	1783950311	py/compile: Implement PEP 572, assignment expressions with := operator. The syntax matches CPython and the semantics are equivalent except that, unlike CPython, MicroPython allows using := to assign to comprehension iteration variables, because disallowing this would take a lot of code to check for it. The new compile-time option MICROPY_PY_ASSIGN_EXPR selects this feature and is enabled by default, following MICROPY_PY_ASYNC_AWAIT.	2020-06-16 22:02:24 +10:00
Damien George	01e5802ee3	py: Remove 3 obsolete commented-out lines from header files.	2019-11-26 21:36:41 +11:00
Damien George	2069c563f9	py: Add support for matmul operator @ as per PEP 465. To make progress towards MicroPython supporting Python 3.5, adding the matmul operator is important because it's a really "low level" part of the language, being a new token and modifications to the grammar. It doesn't make sense to make it configurable because 1) it would make the grammar and lexer complicated/messy; 2) no other operators are configurable; 3) it's not a feature that can be "dynamically plugged in" via an import. And matmul can be useful as a general purpose user-defined operator, it doesn't have to be just for numpy use. Based on work done by Jim Mussared.	2019-09-26 15:12:39 +10:00
Damien George	6ce7c051e8	py/lexer: Reorder operator tokens to match corresponding binary ops.	2019-09-26 14:37:26 +10:00
Alexander Steffen	55f33240f3	all: Use the name MicroPython consistently in comments There were several different spellings of MicroPython present in comments, when there should be only one.	2017-07-31 18:35:40 +10:00
Alexander Steffen	299bc62586	all: Unify header guard usage. The code conventions suggest using header guards, but do not define how those should look like and instead point to existing files. However, not all existing files follow the same scheme, sometimes omitting header guards altogether, sometimes using non-standard names, making it easy to accidentally pick a "wrong" example. This commit ensures that all header files of the MicroPython project (that were not simply copied from somewhere else) follow the same pattern, that was already present in the majority of files, especially in the py folder. The rules are as follows. Naming convention: * start with the words MICROPY_INCLUDED * contain the full path to the file * replace special characters with _ In addition, there are no empty lines before #ifndef, between #ifndef and one empty line before #endif. #endif is followed by a comment containing the name of the guard macro. py/grammar.h cannot use header guards by design, since it has to be included multiple times in a single C file. Several other files also do not need header guards as they are only used internally and guaranteed to be included only once: * MICROPY_MPHALPORT_H * mpconfigboard.h * mpconfigport.h * mpthreadport.h * pin_defs_.h qstrdefs*.h	2017-07-18 11:57:39 +10:00
Damien George	5124a94067	py/lexer: Convert mp_uint_t to size_t where appropriate.	2017-02-17 12:44:24 +11:00
Damien George	773278ec30	py/lexer: Simplify handling of line-continuation error. Previous to this patch there was an explicit check for errors with line continuation (where backslash was not immediately followed by a newline). But this check is not necessary: if there is an error then the remaining logic of the tokeniser will reject the backslash and correctly produce a syntax error.	2017-02-17 11:30:14 +11:00
Damien George	ae43679792	py/lexer: Use strcmp to make keyword searching more efficient. Since the table of keywords is sorted, we can use strcmp to do the search and stop part way through the search if the comparison is less-than. Because all tokens that are names are subject to this search, this optimisation will improve the overall speed of the lexer when processing a script. The change also decreases code size by a little bit because we now use strcmp instead of the custom str_strn_equal function.	2017-02-17 11:10:35 +11:00
Damien George	c305ae3243	py/lexer: Permanently disable the mp_lexer_show_token function. The lexer is very mature and this debug function is no longer used. If it's really needed one can uncomment it and recompile.	2016-12-22 10:49:54 +11:00
Damien George	5bdf1650de	py/lexer: Make lexer use an mp_reader as its source.	2016-11-16 18:35:01 +11:00
pohmelie	81ebba7e02	py: add async/await/async for/async with syntax They are sugar for marking function as generator, "yield from" and pep492 python "semantically equivalents" respectively. @dpgeorge was the original author of this patch, but @pohmelie made changes to implement `async for` and `async with`.	2016-04-13 15:26:38 +01:00
Damien George	031278f661	unix: Allow to cat a script into stdin from the command line. See issue #1306.	2015-06-04 23:42:45 +01:00
Damien George	2e2e404ff7	py: Allow to compile with extra warnings (sign-compare, unused-param).	2015-03-19 00:25:33 +00:00
Damien George	7d414a1b52	py: Parse big-int/float/imag constants directly in parser. Previous to this patch, a big-int, float or imag constant was interned (made into a qstr) and then parsed at runtime to create an object each time it was needed. This is wasteful in RAM and not efficient. Now, these constants are parsed straight away in the parser and turned into objects. This allows constants with large numbers of digits (so addresses issue #1103) and takes us a step closer to #722.	2015-02-08 01:57:40 +00:00
Damien George	b4b10fd350	py: Put all global state together in state structures. This patch consolidates all global variables in py/ core into one place, in a global structure. Root pointers are all located together to make GC tracing easier and more efficient.	2015-01-07 20:33:00 +00:00
Damien George	51dfcb4bb7	py: Move to guarded includes, everywhere in py/ core. Addresses issue #1022.	2015-01-01 20:32:09 +00:00
Paul Sokolovsky	8ab6f90674	py: Move to guarded includes for compile.h and related headers.	2014-12-27 16:12:17 +02:00
Damien George	a4c52c5a3d	py: Optimise lexer by exposing lexer type. mp_lexer_t type is exposed, mp_token_t type is removed, and simple lexer functions (like checking current token kind) are now inlined. This saves 784 bytes ROM on 32-bit unix, 348 bytes on stmhal, and 460 bytes on bare-arm. It also saves a tiny bit of RAM since mp_lexer_t is a bit smaller. Also will run a bit more efficiently.	2014-12-05 19:35:18 +00:00
Damien George	94fbe9711a	py: Change lexer stream API to return bytes not chars. Lexer is now 8-bit clean inside strings.	2014-07-30 11:46:05 +01:00
Damien George	54eb4e723e	lexer: Convert type (u)int to mp_(u)int_t.	2014-07-03 13:47:47 +01:00
Paul Sokolovsky	d3439d0c60	py: Instead of having "debug on" var, have "optimization level" var. This allows to have multiple "optimization" levels (CPython has two (-OO removes docstrings), we can have more).	2014-06-03 12:32:59 +03:00
Damien George	04b9147e15	Add license header to (almost) all files. Blanket wide to all .c and .h files. Some files originating from ST are difficult to deal with (license wise) so it was left out of those. Also merged modpyb.h, modos.h, modstm.h and modtime.h in stmhal/.	2014-05-03 23:27:38 +01:00
Damien George	e09ffa1400	Search paths properly on import and execute __init__.py if it exists.	2014-02-05 23:57:48 +00:00
Damien George	b829b5caec	Implement mp_parse_node_free; print properly repr(string).	2014-01-25 13:51:19 +00:00
Damien George	08335004cf	Add source file name and line number to error messages. Byte code has a map from byte-code offset to source-code line number, used to give better error messages.	2014-01-18 23:24:36 +00:00
Damien George	9528cd66d7	Convert parse errors to exceptions. Parser no longer prints an error, but instead returns an exception ID and message.	2014-01-15 21:23:31 +00:00
Damien George	69a818d418	py: Improve memory management for parser; add lexer error for bad line cont.	2014-01-12 13:55:24 +00:00
Damien George	9193f89296	Move lexerstr to main py directory (everyone uses it).	2014-01-08 15:28:26 +00:00
Damien George	e9906ac3d7	Add ellipsis object.	2014-01-04 18:44:46 +00:00
Damien George	66028ab6dc	Basic implementation of import. import works for simple cases. Still work to do on finding the right script, and setting globals/locals correctly when running an imported function.	2014-01-03 14:03:48 +00:00
Damien	d99b05282d	Change object representation from 1 big union to individual structs. A big change. Micro Python objects are allocated as individual structs with the first element being a pointer to the type information (which is itself an object). This scheme follows CPython. Much more flexible, not necessarily slower, uses same heap memory, and can allocate objects statically. Also change name prefix, from py_ to mp_ (mp for Micro Python).	2013-12-21 18:17:45 +00:00
Damien	fa2162bc77	Integrate new lexer stream with stm framework.	2013-10-20 17:42:00 +01:00
Damien	a5185f4bc8	Abstract out back-end stream functionality from lexer.	2013-10-20 14:41:27 +01:00
Damien	91d387de7d	Improve indent/dedent error checking and reporting.	2013-10-09 15:09:52 +01:00
Damien	429d71943d	Initial commit.	2013-10-04 19:53:11 +01:00

43 Commits