mirror of
https://github.com/RRZE-HPC/OSACA.git
synced 2026-01-05 02:30:08 +01:00
Refactoring of README and more description
This commit is contained in:
229
README.rst
229
README.rst
@@ -6,12 +6,9 @@ OSACA
|
|||||||
=====
|
=====
|
||||||
|
|
||||||
Open Source Architecture Code Analyzer
|
Open Source Architecture Code Analyzer
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
--------------------------------------
|
||||||
|
|
||||||
This tool allows automatic instruction fetching of assembly code,
|
For an innermost loop kernel in assembly, this tool allows automatic instruction fetching of assembly code and automatic runtime prediction including throughput analysis and detection for critical path and loop-carried dependencies.
|
||||||
auto-generating of testcases for assembly instructions creating latency
|
|
||||||
and throughput benchmarks on a specific instruction form and throughput
|
|
||||||
analysis and throughput prediction for a innermost loop kernel.
|
|
||||||
|
|
||||||
.. image:: https://travis-ci.org/RRZE-HPC/OSACA.svg?branch=master
|
.. image:: https://travis-ci.org/RRZE-HPC/OSACA.svg?branch=master
|
||||||
:target: https://travis-ci.org/RRZE-HPC/OSACA
|
:target: https://travis-ci.org/RRZE-HPC/OSACA
|
||||||
@@ -26,7 +23,7 @@ Getting started
|
|||||||
===============
|
===============
|
||||||
|
|
||||||
Installation
|
Installation
|
||||||
~~~~~~~~~~~~
|
------------
|
||||||
On most systems with python pip and setuputils installed, just run:
|
On most systems with python pip and setuputils installed, just run:
|
||||||
|
|
||||||
.. code:: bash
|
.. code:: bash
|
||||||
@@ -44,7 +41,7 @@ To build OSACA from source, clone this repository using ``git clone https://gith
|
|||||||
After installation, OSACA can be started with the command ``osaca`` in the CLI.
|
After installation, OSACA can be started with the command ``osaca`` in the CLI.
|
||||||
|
|
||||||
Dependencies:
|
Dependencies:
|
||||||
~~~~~~~~~~~~~~~
|
-------------
|
||||||
Additional requirements are:
|
Additional requirements are:
|
||||||
|
|
||||||
- `Python3 <https://www.python.org/>`_
|
- `Python3 <https://www.python.org/>`_
|
||||||
@@ -69,31 +66,37 @@ The usage of OSACA can be listed as:
|
|||||||
|
|
||||||
osaca [-h] [-V] [--arch ARCH] [--fixed] [--db-check]
|
osaca [-h] [-V] [--arch ARCH] [--fixed] [--db-check]
|
||||||
[--import MICROBENCH] [--insert-marker]
|
[--import MICROBENCH] [--insert-marker]
|
||||||
[--export-graph GRAPHNAME] FILEPATH
|
[--export-graph GRAPHNAME] [--ignore-unknown] [--verbose]
|
||||||
|
FILEPATH
|
||||||
|
|
||||||
-h, --help
|
-h, --help
|
||||||
prints out the help message.
|
prints out the help message.
|
||||||
-V, --version
|
-V, --version
|
||||||
shows the program’s version number.
|
shows the program’s version number.
|
||||||
--arch ARCH
|
--arch ARCH
|
||||||
needs to be replaced with the wished architecture abbreviation.
|
needs to be replaced with the target architecture abbreviation.
|
||||||
Possible options are ``SNB``, ``IVB``, ``HSW``, ``BDW``, ``SKX`` and ``CSX`` for the latest Intel micro architectures starting from Intel Sandy Bridge and ``ZEN1`` for AMD Zen (17h family) architecture.
|
Possible options are ``SNB``, ``IVB``, ``HSW``, ``BDW``, ``SKX`` and ``CSX`` for the latest Intel micro architectures starting from Intel Sandy Bridge and ``ZEN1``, ``ZEN2`` for AMD Zen architectures.
|
||||||
Furthermore, ``TX2`` for Marvell`s ARM-based ThunderX2 architecture is available.
|
Furthermore, ``TX2`` for Marvell`s ARM-based ThunderX2 architecture is available.
|
||||||
--fixed
|
--fixed
|
||||||
Run the throughput analysis with fixed probabilities for all suitable ports per instruction.
|
Run the throughput analysis with fixed port utilization for all suitable ports per instruction.
|
||||||
Otherwise, OSACA will print out the optimal port utilization for the kernel.
|
Otherwise, OSACA will print out the optimal port utilization for the kernel.
|
||||||
--db-check
|
--db-check
|
||||||
Run a sanity check on the by "--arch" specified database.
|
Run a sanity check on the by "--arch" specified database.
|
||||||
The output depends on the verbosity level.
|
The output depends on the verbosity level.
|
||||||
Keep in mind you have to provide a (dummy) filename in anyway.
|
Keep in mind you have to provide an existing (dummy) filename in anyway.
|
||||||
--import MICROBENCH
|
--import MICROBENCH
|
||||||
Import a given microbenchmark output file into the corresponding architecture instruction database.
|
Import a given microbenchmark output file into the corresponding architecture instruction database.
|
||||||
Define the type of microbenchmark either as "ibench" or "asmbench".
|
Define the type of microbenchmark either as "ibench" or "asmbench".
|
||||||
--insert-marker
|
--insert-marker
|
||||||
OSACA calls the Kerncraft module for the interactively insertion of `IACA <https://software.intel.com/en-us/articles/intel-architecture-code-analyzer>`_ marker in suggested assembly blocks.
|
OSACA calls the Kerncraft module for the interactively insertion of `IACA <https://software.intel.com/en-us/articles/intel-architecture-code-analyzer>`_ byte markers or OSACA AArch64 byte markers in suggested assembly blocks.
|
||||||
--export-graph EXPORT_PATH
|
--export-graph EXPORT_PATH
|
||||||
Output path for .dot file export. If "." is given, the file will be stored as "./osaca_dg.dot".
|
Output path for .dot file export. If "." is given, the file will be stored as "./osaca_dg.dot".
|
||||||
After the file was created, you can convert it to a PDF file using dot: `dot -Tpdf osaca_dg.dot -o osaca_dependency_graph.pdf`
|
After the file was created, you can convert it to a PDF file using `dot <https://graphviz.gitlab.io/_pages/pdf/dotguide.pdf>`_.
|
||||||
|
--ignore-unknown
|
||||||
|
Force OSACA to apply a throughput and latency of 0.0 cy for all unknown instruction forms.
|
||||||
|
If not specified, a warning will be printed instead if one ore more isntruction form is unknown to OSACA.
|
||||||
|
-v, --verbose
|
||||||
|
Increases verbosity level
|
||||||
|
|
||||||
The **FILEPATH** describes the filepath to the file to work with and is always necessary
|
The **FILEPATH** describes the filepath to the file to work with and is always necessary
|
||||||
|
|
||||||
@@ -102,50 +105,179 @@ ______________________
|
|||||||
Hereinafter OSACA's scope of function will be described.
|
Hereinafter OSACA's scope of function will be described.
|
||||||
|
|
||||||
Throughput & Latency analysis
|
Throughput & Latency analysis
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
-----------------------------
|
||||||
As main functionality of OSACA this process starts by default. It is always necessary to specify the core architecture by the flag ``--arch ARCH``, where ``ARCH`` can stand for ``SNB``, ``IVB``, ``HSW``, ``BDW``, ``SKX``, ``CSX``, ``ZEN`` or ``TX2``.
|
As main functionality of OSACA, the tool starts the analysis on a marked assembly file by running the following command with one or more of the optional parameters:
|
||||||
|
|
||||||
For extracting the right kernel, one has to mark it beforehand.
|
.. code-block:: bash
|
||||||
Currently, only the detechtion of markers in the assembly code and therefore the analysis of assemly files is supported by OSACA.
|
|
||||||
|
|
||||||
**Assembly code**
|
osaca --arch ARCH [--fixed] [--ignore-unknown]
|
||||||
|
[--export-graph EXPORT_PATH]
|
||||||
|
file
|
||||||
|
|
||||||
|
The ``file`` parameter specifies the target assembly file and is always mandatory. |br|
|
||||||
|
The parameter ``ARCH`` is positional for the analysis and must be replaced by the target architecture abbreviation. |br|
|
||||||
|
OSACA assumes an optimal scheduling for all instructions and assumes the processor to be able to schedule instructions in a way that it achieves a minimal reciprocal throughput.
|
||||||
|
However, in older versions (<=v0.2.2) of OSACA, a fixed probability for port utilization was assumed.
|
||||||
|
This means, instructions with *N* available ports for execution were scheduled with a probability of *1/N* to each of the ports.
|
||||||
|
This behavior can be enforced by using the ``--fixed`` flag. |br|
|
||||||
|
If one or more instruction forms are unknown to OSACA, it refuses to print an overall throughput, CP and
|
||||||
|
LCD analysis and marks all unknown instruction forms with ``X`` next to the mnemonic.
|
||||||
|
This is done so the user does not miss out on this unrecognized instruction and might assume an incorrect runtime prediction.
|
||||||
|
To force OSACA to apply a throughput and latency of 0.0 cy for all unknown instruction forms, the flag ``--ignore-unknown`` can be specified. |br|
|
||||||
|
To get a visualization of the analyzed kernel and its dependency chains, OSACA provides the option to additionally produce a graph as DOT file, which represents the kernel and all register dependencies inside of it.
|
||||||
|
The tool highlights all LCDs and the CP.
|
||||||
|
The graph generation is done by running OSACA with the ``--export-graph EXPORT_GRAPH`` flag.
|
||||||
|
OSACA stores the DOT file either at the by ``EXPORT_GRAPH`` specified filepath or uses the default filename "osaca_dg.dot" in the current working directory.
|
||||||
|
Subsequently, the DOT-graph can be adjusted in its appearance and converted to various output formats such as PDF, SVG, or PNG using the `dot command <https://graphviz.gitlab.io/_pages/pdf/dotguide.pdf>`_, e.g., ``dot -Tpdf osaca_dg.dot -o
|
||||||
|
graph.pdf`` to generate a PDF document.
|
||||||
|
|
||||||
|
Marker insertion
|
||||||
|
----------------
|
||||||
|
For extracting the right kernel, one has to mark it in beforehand.
|
||||||
|
Currently, only the detection of markers in the assembly code and therefore the analysis of assembly files is supported by OSACA.
|
||||||
|
|
||||||
Marking a kernel means to insert the byte markers in the assembly file in before and after the loop.
|
Marking a kernel means to insert the byte markers in the assembly file in before and after the loop.
|
||||||
For this, the start marker has to be inserted right in front of the loop label and the end marker directly after the jump instruction.
|
For this, the start marker has to be inserted right in front of the loop label and the end marker directly after the jump instruction.
|
||||||
For the convience of the user, in x86 assembly IACA byte markers are used.
|
IACA requires byte markers since it operates on opcode-level.
|
||||||
|
To provide a trade-off between reusability for such tool and convenient usability, OSACA supports both byte markers and comment line markers.
|
||||||
|
While the byte markers for x86 are equivalent to IACA byte markers, the comment keywords ``OSACA-BEGIN`` and ``OSACA-END`` are based on LLVM-MCA's markers.
|
||||||
|
|
||||||
**x86 Byte Markers**
|
x86 markers
|
||||||
|
^^^^^^^^^^^
|
||||||
.. code-block:: gas
|
**Byte markers**
|
||||||
|
|
||||||
movl $111,%ebx #IACA/OSACA START MARKER
|
|
||||||
.byte 100,103,144 #IACA/OSACA START MARKER
|
|
||||||
Loop:
|
|
||||||
# ...
|
|
||||||
movl $222,%ebx #IACA/OSACA END MARKER
|
|
||||||
.byte 100,103,144 #IACA/OSACA END MARKER
|
|
||||||
|
|
||||||
**AArch64 Byte Markers**
|
|
||||||
|
|
||||||
.. code-block:: asm
|
.. code-block:: asm
|
||||||
|
|
||||||
mov x1, #111 // OSACA START
|
movl $111,%ebx #IACA/OSACA START MARKER
|
||||||
.byte 213,3,32,31 // OSACA START
|
.byte 100,103,144 #IACA/OSACA START MARKER
|
||||||
\\ ...
|
.loop:
|
||||||
mov x1, #222 // OSACA END
|
# loop body
|
||||||
.byte 213,3,32,31 // OSACA END
|
jb .loop
|
||||||
|
movl $222,%ebx #IACA/OSACA END MARKER
|
||||||
|
.byte 100,103,144 #IACA/OSACA END MARKER
|
||||||
|
|
||||||
.. Include new measurements into the data file
|
**Comment line markers**
|
||||||
.. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
.. Running OSACA with the flag ``-i`` or ``--include-ibench`` and a specified micro architecture ``ARCH``, it takes the values given in an ibench output file and checks them for reasonability. If a value is not in the data file already, it will be added, otherwise OSACA prints out a warning message and keeps the old value in the data file. If a value does not pass the validation, a warning message is shown, however, OSACA will keep working with the new value. The handling of ibench is shortly described in the example section below.
|
|
||||||
|
|
||||||
Insert IACA markers
|
.. code-block:: asm
|
||||||
~~~~~~~~~~~~~~~~~~~
|
|
||||||
Using the ``--insert-marker`` flags for a given file, OSACA calls the implemented Kerncraft module for identifying and marking the inner-loop block in *manual mode*. More information about how this is done can be found in the `Kerncraft repository <https://github.com/RRZE-HPC/kerncraft>`_.
|
|
||||||
Note that this currrently only works for x86 loop kernels
|
|
||||||
|
|
||||||
Example
|
# OSACA-BEGIN
|
||||||
=======
|
.loop:
|
||||||
|
# loop body
|
||||||
|
jb .loop
|
||||||
|
# OSACA-END
|
||||||
|
|
||||||
|
AArch64 markers
|
||||||
|
^^^^^^^^^^^^^^^
|
||||||
|
**Byte markers**
|
||||||
|
|
||||||
|
.. code-block:: asm
|
||||||
|
|
||||||
|
mov x1, #111 // OSACA START
|
||||||
|
.byte 213,3,32,31 // OSACA START
|
||||||
|
.loop:
|
||||||
|
// loop body
|
||||||
|
b.ne .loop
|
||||||
|
mov x1, #222 // OSACA END
|
||||||
|
.byte 213,3,32,31 // OSACA END
|
||||||
|
|
||||||
|
**Comment line markers**
|
||||||
|
|
||||||
|
.. code-block:: asm
|
||||||
|
|
||||||
|
// OSACA-BEGIN
|
||||||
|
.loop:
|
||||||
|
// loop body
|
||||||
|
b.ne .loop
|
||||||
|
// OSACA-END
|
||||||
|
|
||||||
|
OSACA in combination with Kerncraft provides a functionality for the automatic detection of possible loop kernels and inserting markers.
|
||||||
|
This can be done by using the ``--insert-marker`` flag together with the path to the target assembly file and the target architecture.
|
||||||
|
|
||||||
|
Benchmark import
|
||||||
|
----------------
|
||||||
|
OSACA supports the automatic integration of new instruction forms by parsing the output of the micro-
|
||||||
|
benchmark tools `asmbench <https://github.com/RRZE-HPC/asmbench>`_ and `ibench <https://github.com/RRZE-HPC/ibench>`_.
|
||||||
|
This can be achieved by running OSACA with the command line option ``--import MICROBENCH``:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
osaca --arch ARCH --import MICROBENCH file
|
||||||
|
|
||||||
|
``MICROBENCH`` specifies one of the currently supported benchmark tools, i.e., "asmbench" or "ibench".
|
||||||
|
``ARCH`` defines the abbreviation of the target architecture for which the instructions will be added and file must be the path to the generated output file of the benchmark.
|
||||||
|
The format of this file has to match either the basic command line output of ibench, e.g.,
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
[INSTRUCTION FORM]-TP: 0.500 (clock cycles) [DEBUG - result: 1.000000]
|
||||||
|
[INSTRUCTION FORM]-LT: 4.000 (clock cycles) [DEBUG - result: 1.000000]
|
||||||
|
|
||||||
|
or the command line output of asmbench including the name of the instruction form in a separate line at the
|
||||||
|
beginning, e.g.:
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
[INSTRUCTION FORM]
|
||||||
|
Latency: 4.00 cycle
|
||||||
|
Throughput: 0.50 cycle
|
||||||
|
|
||||||
|
|
||||||
|
Note that there must be an empty line after each throughput measurement as part of the output so that one instruction form entry consists of four (4) lines.
|
||||||
|
|
||||||
|
To let OSACA import the instruction form with the correct operands, the naming conventions for the instruction form name must be followed:
|
||||||
|
|
||||||
|
* The first part of the name is the mnemonic and ends with the character "``-``" (not part of the mnemonic in the DB).
|
||||||
|
|
||||||
|
* The second part of the name are the operands. Each operand must be separated from another operand by the character "``_``".
|
||||||
|
|
||||||
|
* For each **x86** operand, one of the following symbols must be used:
|
||||||
|
|
||||||
|
* "``r``" for general purpose registers (rax, edi, r9, ...)
|
||||||
|
* "``x``", "``y``", or "``z``" for xmm, ymm, or zmm registers, respectively
|
||||||
|
* "``i``" for immediates
|
||||||
|
* "``m``" for a memory address. Add "``b``" if the memory address contains a base register, "``o``" if it contains an offset,
|
||||||
|
"``i``" if it contains an index register, and "``s``" if the index register additionally has a scale factor of *more* than 1.
|
||||||
|
|
||||||
|
* For each **AArch64** operand, one of the following symbols must be used:
|
||||||
|
|
||||||
|
* "``w``", "``x``", "``b``", "``h``", "``s``", "``d``", or "``q``" for registers with the corresponding prefix.
|
||||||
|
* "``v``" followed by a single character ("``b``", "``h``", "``s``", or "``d``") for vector registers with the corresponding lane width of the second character.
|
||||||
|
If no second character is given, OSACA assumes a lane width of 64 bit (``d``) as default.
|
||||||
|
* "``i``" for immediates
|
||||||
|
* "``m``" for a memory address. Add "``b``" if the memory address contains a base register, "``o``" if it contains an offset,
|
||||||
|
"``i``" if it contains an index register, and "``s``" if the index register additionally has a scale factor of *more*
|
||||||
|
than 1. Add "``r``" if the address format uses pre-indexing and "``p``" if it uses post-indexing.
|
||||||
|
|
||||||
|
Valid instruction form examples for x86 are ``vaddpd-x_x_x``, ``mov-r_mboi``, and ``vfmadd213pd-mbis_y_y``. |br|
|
||||||
|
Valid instruction form examples for AArch64 are ``fadd-vd_vd_v``, ``ldp-d_d_mo``, and ``fmov-s_i``. |br|
|
||||||
|
|
||||||
|
Note that the options to define operands are limited, therefore, one might need to adjust the instruction forms in the architecture DB after importing.
|
||||||
|
OSACA parses the output for an arbitrary number of instruction forms and adds them as entries to the architecture DB.
|
||||||
|
The user must edit the ISA DB in case the instruction form shows irregular source and destination operands for its ISA syntax. OSACA applies the following rules by default:
|
||||||
|
|
||||||
|
* If there is only one operand, it is considered as source operand
|
||||||
|
|
||||||
|
* In case of multiple operands the target operand (depending on the ISA syntax the last or first one) is considered to be the
|
||||||
|
destination operand, all others are considered as source operands.
|
||||||
|
|
||||||
|
Database check
|
||||||
|
--------------
|
||||||
|
Since a manual adjustment of the ISA DB is currently indispensable when adding new instruction forms,
|
||||||
|
OSACA provides a database sanity check using the --db-check flag. It can be executed via:
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
osaca --arch ARCH --db-check [-v] file
|
||||||
|
|
||||||
|
``ARCH`` defines the abbreviation of the target architecture of the database to check.
|
||||||
|
The ``file`` argument needs to be specified as it is positional but may be any existing dummy path.
|
||||||
|
When called, OSACA prints a summary of database information containing the amount of missing throughput values, latency values or μ-ops assignments for an instruction form.
|
||||||
|
Furthermore, it shows the amount of duplicate instruction forms in both the architecture DB and the ISA DB and checks how many instruction forms in the ISA DB are non-existent in the architecture DB.
|
||||||
|
Finally, it checks via simple heuristics how many of the instruction forms contained in the architecture DB might miss an ISA DB entry.
|
||||||
|
Running the database check including the ``-v`` verbosity flag, OSACA prints in addition the specific name of the identified instruction forms so that the user can check the mentioned incidents.
|
||||||
|
|
||||||
|
Examples
|
||||||
|
========
|
||||||
For clarifying the functionality of OSACA a sample kernel is analyzed for an Intel CSX core hereafter:
|
For clarifying the functionality of OSACA a sample kernel is analyzed for an Intel CSX core hereafter:
|
||||||
|
|
||||||
.. code-block:: c
|
.. code-block:: c
|
||||||
@@ -212,3 +344,8 @@ Implementation: Jan Laukemann
|
|||||||
License
|
License
|
||||||
=======
|
=======
|
||||||
`AGPL-3.0 </LICENSE>`_
|
`AGPL-3.0 </LICENSE>`_
|
||||||
|
|
||||||
|
.. # define a hard line break for HTML
|
||||||
|
.. |br| raw:: html
|
||||||
|
|
||||||
|
<br />
|
||||||
|
|||||||
Reference in New Issue
Block a user