py/persistentcode: Add architecture flags compatibility checks.

This commit extends the MPY file format in a backwards-compatible way to
store an encoded form of architecture-specific flags that have been
specified in the "mpy-cross" command line, or that have been explicitly
set as part of a native emitter configuration.

The file format changes are as follows:

* The features byte, previously containing the target native
  architecture and the minor file format version, now claims bit 6 as a
  flag indicating the presence of an encoded architecture flags integer
* If architecture flags need to be stored, they are placed right after
  the MPY file header.

This means that properly-written MPY parsers, if encountering a MPY file
containing encoded architecture flags, should raise an error since no
architecture identifiers have been defined that make use of bits 6 and
7 in the referenced header byte.  This should give enough guarantees of
backwards compatibility when this feature is used (improper parsers were
subjected to breakage anyway).

The encoded architecture flags could have been placed at the end, but:

* Having them right after the header makes the architecture
  compatibility checks occur before having read the whole file in memory
  (which still happens on certain platforms as the reader may be backed
  by a memory buffer), and prevents eventual memory allocations that do
  not take place if the module is rejected early
* Properly-written MPY file parsers should have checked the upper two
  bits of the flags byte to be actually zero according to the format
  specification available right before this change, so no assumptions
  should have been made on the exact order of the chunks for an
  unexpected format.

The meaning of the architecture flags value is backend-specific, with
the only common characteristic of being a variable-encoded unsigned
integer for the time being.

The changes made to the file format effectively limit the number of
possible target architectures to 16, of which 13 are already claimed.
There aren't that many new architectures planned to be supported for the
lifetime of the current MPY file format, so this change still leaves
space for architecture updates if needed.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit is contained in:
Alessandro Gatti
2025-09-23 00:50:51 +02:00
parent 7373338fa9
commit a6bc1ccbe5
6 changed files with 80 additions and 7 deletions

View File

@@ -80,6 +80,10 @@ If importing an .mpy file fails then try the following:
above, or by inspecting the ``MPY_CROSS_FLAGS`` Makefile variable for the above, or by inspecting the ``MPY_CROSS_FLAGS`` Makefile variable for the
port that you are using. port that you are using.
* If the third byte of the .mpy file has bit #6 set, then check whether the
encoded architecture-specific flag bits vuint is compatible with the
target you're importing the file on.
The following table shows the correspondence between MicroPython release The following table shows the correspondence between MicroPython release
and .mpy version. and .mpy version.
@@ -153,10 +157,31 @@ size field
====== ================================ ====== ================================
byte value 0x4d (ASCII 'M') byte value 0x4d (ASCII 'M')
byte .mpy major version number byte .mpy major version number
byte native arch and minor version number (was feature flags in older versions) byte feature flags, native arch, minor version number (was feature flags in older versions)
byte number of bits in a small int byte number of bits in a small int
====== ================================ ====== ================================
The third byte is split as follows (MSB first):
====== ================================
bit meaning
====== ================================
7 reserved, must be 0
6 an architecture-specific flags vuint follows the header
5..2 native arch number
1..0 minor version number
====== ================================
Architecture-specific flags
~~~~~~~~~~~~~~~~~~~~~~~~~~~
If bit #6 of the header's feature flags byte is set, then a vuint containing
optional architecture-specific information will follow the header. The contents
of this integer depends on which native architecture the file is meant for.
See also ``mpy-tool.py``'s ``-march-flags`` command-line option to set this
value when creating MPY files.
The global qstr and constant tables The global qstr and constant tables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -93,6 +93,7 @@ static int compile_and_save(const char *file, const char *output_file, const cha
mp_parse_tree_t parse_tree = mp_parse(lex, MP_PARSE_FILE_INPUT); mp_parse_tree_t parse_tree = mp_parse(lex, MP_PARSE_FILE_INPUT);
mp_compiled_module_t cm; mp_compiled_module_t cm;
cm.context = m_new_obj(mp_module_context_t); cm.context = m_new_obj(mp_module_context_t);
cm.arch_flags = 0;
mp_compile_to_raw_code(&parse_tree, source_name, false, &cm); mp_compile_to_raw_code(&parse_tree, source_name, false, &cm);
if ((output_file != NULL && strcmp(output_file, "-") == 0) || if ((output_file != NULL && strcmp(output_file, "-") == 0) ||

View File

@@ -220,6 +220,7 @@ typedef struct _mp_compiled_module_t {
bool has_native; bool has_native;
size_t n_qstr; size_t n_qstr;
size_t n_obj; size_t n_obj;
size_t arch_flags;
#endif #endif
} mp_compiled_module_t; } mp_compiled_module_t;

View File

@@ -471,7 +471,7 @@ void mp_raw_code_load(mp_reader_t *reader, mp_compiled_module_t *cm) {
|| header[3] > MP_SMALL_INT_BITS) { || header[3] > MP_SMALL_INT_BITS) {
mp_raise_ValueError(MP_ERROR_TEXT("incompatible .mpy file")); mp_raise_ValueError(MP_ERROR_TEXT("incompatible .mpy file"));
} }
if (MPY_FEATURE_DECODE_ARCH(header[2]) != MP_NATIVE_ARCH_NONE) { if (arch != MP_NATIVE_ARCH_NONE) {
if (!MPY_FEATURE_ARCH_TEST(arch)) { if (!MPY_FEATURE_ARCH_TEST(arch)) {
if (MPY_FEATURE_ARCH_TEST(MP_NATIVE_ARCH_NONE)) { if (MPY_FEATURE_ARCH_TEST(MP_NATIVE_ARCH_NONE)) {
// On supported ports this can be resolved by enabling feature, eg // On supported ports this can be resolved by enabling feature, eg
@@ -483,6 +483,12 @@ void mp_raw_code_load(mp_reader_t *reader, mp_compiled_module_t *cm) {
} }
} }
size_t arch_flags = 0;
if (MPY_FEATURE_ARCH_FLAGS_TEST(header[2])) {
(void)arch_flags;
mp_raise_ValueError(MP_ERROR_TEXT("incompatible .mpy file"));
}
size_t n_qstr = read_uint(reader); size_t n_qstr = read_uint(reader);
size_t n_obj = read_uint(reader); size_t n_obj = read_uint(reader);
mp_module_context_alloc_tables(cm->context, n_qstr, n_obj); mp_module_context_alloc_tables(cm->context, n_qstr, n_obj);
@@ -504,6 +510,7 @@ void mp_raw_code_load(mp_reader_t *reader, mp_compiled_module_t *cm) {
cm->has_native = MPY_FEATURE_DECODE_ARCH(header[2]) != MP_NATIVE_ARCH_NONE; cm->has_native = MPY_FEATURE_DECODE_ARCH(header[2]) != MP_NATIVE_ARCH_NONE;
cm->n_qstr = n_qstr; cm->n_qstr = n_qstr;
cm->n_obj = n_obj; cm->n_obj = n_obj;
cm->arch_flags = arch_flags;
#endif #endif
// Deregister exception handler and close the reader. // Deregister exception handler and close the reader.
@@ -672,7 +679,7 @@ void mp_raw_code_save(mp_compiled_module_t *cm, mp_print_t *print) {
byte header[4] = { byte header[4] = {
'M', 'M',
MPY_VERSION, MPY_VERSION,
cm->has_native ? MPY_FEATURE_ENCODE_SUB_VERSION(MPY_SUB_VERSION) | MPY_FEATURE_ENCODE_ARCH(MPY_FEATURE_ARCH_DYNAMIC) : 0, (cm->arch_flags != 0 ? MPY_FEATURE_ARCH_FLAGS : 0) | (cm->has_native ? MPY_FEATURE_ENCODE_SUB_VERSION(MPY_SUB_VERSION) | MPY_FEATURE_ENCODE_ARCH(MPY_FEATURE_ARCH_DYNAMIC) : 0),
#if MICROPY_DYNAMIC_COMPILER #if MICROPY_DYNAMIC_COMPILER
mp_dynamic_compiler.small_int_bits, mp_dynamic_compiler.small_int_bits,
#else #else
@@ -681,6 +688,10 @@ void mp_raw_code_save(mp_compiled_module_t *cm, mp_print_t *print) {
}; };
mp_print_bytes(print, header, sizeof(header)); mp_print_bytes(print, header, sizeof(header));
if (cm->arch_flags) {
mp_print_uint(print, cm->arch_flags);
}
// Number of entries in constant table. // Number of entries in constant table.
mp_print_uint(print, cm->n_qstr); mp_print_uint(print, cm->n_qstr);
mp_print_uint(print, cm->n_obj); mp_print_uint(print, cm->n_obj);

View File

@@ -45,7 +45,7 @@
// Macros to encode/decode native architecture to/from the feature byte // Macros to encode/decode native architecture to/from the feature byte
#define MPY_FEATURE_ENCODE_ARCH(arch) ((arch) << 2) #define MPY_FEATURE_ENCODE_ARCH(arch) ((arch) << 2)
#define MPY_FEATURE_DECODE_ARCH(feat) ((feat) >> 2) #define MPY_FEATURE_DECODE_ARCH(feat) (((feat) >> 2) & 0x2F)
// Define the host architecture // Define the host architecture
#if MICROPY_EMIT_X86 #if MICROPY_EMIT_X86
@@ -85,6 +85,10 @@
#define MPY_FILE_HEADER_INT (MPY_VERSION \ #define MPY_FILE_HEADER_INT (MPY_VERSION \
| (MPY_FEATURE_ENCODE_SUB_VERSION(MPY_SUB_VERSION) | MPY_FEATURE_ENCODE_ARCH(MPY_FEATURE_ARCH)) << 8) | (MPY_FEATURE_ENCODE_SUB_VERSION(MPY_SUB_VERSION) | MPY_FEATURE_ENCODE_ARCH(MPY_FEATURE_ARCH)) << 8)
// Architecture-specific flags are present in the .mpy file
#define MPY_FEATURE_ARCH_FLAGS (0x40)
#define MPY_FEATURE_ARCH_FLAGS_TEST(x) (((x) & MPY_FEATURE_ARCH_FLAGS) == MPY_FEATURE_ARCH_FLAGS)
enum { enum {
MP_NATIVE_ARCH_NONE = 0, MP_NATIVE_ARCH_NONE = 0,
MP_NATIVE_ARCH_X86, MP_NATIVE_ARCH_X86,

View File

@@ -120,6 +120,8 @@ MP_BC_FORMAT_QSTR = 1
MP_BC_FORMAT_VAR_UINT = 2 MP_BC_FORMAT_VAR_UINT = 2
MP_BC_FORMAT_OFFSET = 3 MP_BC_FORMAT_OFFSET = 3
MP_NATIVE_ARCH_FLAGS_PRESENT = 0x40
mp_unary_op_method_name = ( mp_unary_op_method_name = (
"__pos__", "__pos__",
"__neg__", "__neg__",
@@ -542,6 +544,7 @@ class CompiledModule:
mpy_source_file, mpy_source_file,
mpy_segments, mpy_segments,
header, header,
arch_flags,
qstr_table, qstr_table,
obj_table, obj_table,
raw_code, raw_code,
@@ -554,6 +557,7 @@ class CompiledModule:
self.mpy_segments = mpy_segments self.mpy_segments = mpy_segments
self.source_file = qstr_table[0] self.source_file = qstr_table[0]
self.header = header self.header = header
self.arch_flags = arch_flags
self.qstr_table = qstr_table self.qstr_table = qstr_table
self.obj_table = obj_table self.obj_table = obj_table
self.raw_code = raw_code self.raw_code = raw_code
@@ -1339,7 +1343,7 @@ def read_mpy(filename):
if header[1] != config.MPY_VERSION: if header[1] != config.MPY_VERSION:
raise MPYReadError(filename, "incompatible .mpy version") raise MPYReadError(filename, "incompatible .mpy version")
feature_byte = header[2] feature_byte = header[2]
mpy_native_arch = feature_byte >> 2 mpy_native_arch = (feature_byte >> 2) & 0x2F
if mpy_native_arch != MP_NATIVE_ARCH_NONE: if mpy_native_arch != MP_NATIVE_ARCH_NONE:
mpy_sub_version = feature_byte & 3 mpy_sub_version = feature_byte & 3
if mpy_sub_version != config.MPY_SUB_VERSION: if mpy_sub_version != config.MPY_SUB_VERSION:
@@ -1350,6 +1354,11 @@ def read_mpy(filename):
raise MPYReadError(filename, "native architecture mismatch") raise MPYReadError(filename, "native architecture mismatch")
config.mp_small_int_bits = header[3] config.mp_small_int_bits = header[3]
arch_flags = 0
# Read the architecture-specific flag bits if present.
if (feature_byte & MP_NATIVE_ARCH_FLAGS_PRESENT) != 0:
arch_flags = reader.read_uint()
# Read number of qstrs, and number of objects. # Read number of qstrs, and number of objects.
n_qstr = reader.read_uint() n_qstr = reader.read_uint()
n_obj = reader.read_uint() n_obj = reader.read_uint()
@@ -1378,6 +1387,7 @@ def read_mpy(filename):
filename, filename,
segments, segments,
header, header,
arch_flags,
qstr_table, qstr_table,
obj_table, obj_table,
raw_code, raw_code,
@@ -1673,25 +1683,39 @@ def merge_mpy(compiled_modules, output_file):
merged_mpy.extend(f.read()) merged_mpy.extend(f.read())
else: else:
main_cm_idx = None main_cm_idx = None
arch_flags = 0
for idx, cm in enumerate(compiled_modules): for idx, cm in enumerate(compiled_modules):
feature_byte = cm.header[2] feature_byte = cm.header[2]
mpy_native_arch = feature_byte >> 2 mpy_native_arch = (feature_byte >> 2) & 0x2F
if mpy_native_arch: if mpy_native_arch:
# Must use qstr_table and obj_table from this raw_code # Must use qstr_table and obj_table from this raw_code
if main_cm_idx is not None: if main_cm_idx is not None:
raise Exception("can't merge files when more than one contains native code") raise Exception("can't merge files when more than one contains native code")
main_cm_idx = idx main_cm_idx = idx
arch_flags = cm.arch_flags
if main_cm_idx is not None: if main_cm_idx is not None:
# Shift main_cm to front of list. # Shift main_cm to front of list.
compiled_modules.insert(0, compiled_modules.pop(main_cm_idx)) compiled_modules.insert(0, compiled_modules.pop(main_cm_idx))
if config.arch_flags is not None:
arch_flags = config.arch_flags
header = bytearray(4) header = bytearray(4)
header[0] = ord("M") header[0] = ord("M")
header[1] = config.MPY_VERSION header[1] = config.MPY_VERSION
header[2] = config.native_arch << 2 | config.MPY_SUB_VERSION if config.native_arch else 0 header[2] = (
(MP_NATIVE_ARCH_FLAGS_PRESENT if arch_flags != 0 else 0)
| config.native_arch << 2
| config.MPY_SUB_VERSION
if config.native_arch
else 0
)
header[3] = config.mp_small_int_bits header[3] = config.mp_small_int_bits
merged_mpy.extend(header) merged_mpy.extend(header)
if arch_flags != 0:
merged_mpy.extend(mp_encode_uint(arch_flags))
n_qstr = 0 n_qstr = 0
n_obj = 0 n_obj = 0
for cm in compiled_modules: for cm in compiled_modules:
@@ -1823,6 +1847,12 @@ def main(args=None):
default=16, default=16,
help="mpz digit size used by target (default 16)", help="mpz digit size used by target (default 16)",
) )
cmd_parser.add_argument(
"-march-flags",
metavar="F",
type=int,
help="architecture flags value to set in the output file (strips existing flags if not present)",
)
cmd_parser.add_argument("-o", "--output", default=None, help="output file") cmd_parser.add_argument("-o", "--output", default=None, help="output file")
cmd_parser.add_argument("files", nargs="+", help="input .mpy files") cmd_parser.add_argument("files", nargs="+", help="input .mpy files")
args = cmd_parser.parse_args(args) args = cmd_parser.parse_args(args)
@@ -1835,6 +1865,7 @@ def main(args=None):
}[args.mlongint_impl] }[args.mlongint_impl]
config.MPZ_DIG_SIZE = args.mmpz_dig_size config.MPZ_DIG_SIZE = args.mmpz_dig_size
config.native_arch = MP_NATIVE_ARCH_NONE config.native_arch = MP_NATIVE_ARCH_NONE
config.arch_flags = args.march_flags
# set config values for qstrs, and get the existing base set of qstrs # set config values for qstrs, and get the existing base set of qstrs
# already in the firmware # already in the firmware