improved drain_frontend

2025-12-15 19:10:08 +01:00 · 2022-01-12 20:33:09 +01:00
parent a1ad0fbf47
commit 39f97457c6
3 changed files with 20 additions and 8 deletions
--- a/README.md
+++ b/README.md
@@ -152,7 +152,7 @@ Both `nanoBench.sh` and `kernel-nanoBench.sh` support the following command-line
 | `-basic_mode`                | The effect of this option is described in the [Generated Code](#generated-code) section. |
 | `-no_mem`                    | If this option is enabled, the code for `read_perf_ctrs` does not make any memory accesses and stores all performance counter values in registers. This can, for example, be useful for benchmarks that require that the state of the data caches does not change after the execution of `code_init`. *If this option is used, the code to be benchmarked must not modify registers* ***R8-R11 (Intel)*** *and* ***R8-R13 (AMD).*** *Furthermore, `read_perf_ctrs` will modify* ***RAX, RCX, and RDX***. |
 | `-no_normalization`          | If this option is enabled, the measurement results are not divided by the number of repetitions. |
-| `-df`                        | If this option is enabled, the front-end buffers are drained after `code_init`, after `code_late_init`, and after the last instance of `code` by executing a long sequence of 15-Byte `NOP` instructions. |
+| `-df`                        | If this option is enabled, the front-end buffers are drained after `code_init`, after `code_late_init`, and after the last instance of `code` by executing an lfence, followed by a long sequence of 1-Byte `NOP` instructions, followed by a long sequence of 15-Byte `NOP` instructions. |
 | `-cpu <n>`                   | Pins the measurement thread to CPU n. `[Default: Pin the thread to the CPU it is currently running on.]` |
 | `-verbose`                   | Outputs the results of all performance counter readings. In the user-space version, the results are printed to stdout. The output of the kernel module can be accessed using `dmesg`. |