mirror of
https://github.com/RRZE-HPC/OSACA.git
synced 2025-12-16 09:00:05 +01:00
Added explanation for kernels
This commit is contained in:
@@ -5,18 +5,19 @@ The used compilers were Intel Parallel Studio 19.0up05 and GNU 9.1.0 i
|
|||||||
|
|
||||||
To analyze the kernels with OSACA, run
|
To analyze the kernels with OSACA, run
|
||||||
```
|
```
|
||||||
osaca --arch ARCH filepath
|
osaca --arch ARCH FILE
|
||||||
```
|
```
|
||||||
While all Zen and TX2 kernels use the comment-style OSACA markers, the kernels for Intel Cascade Lake (*.csx.*.s) use the byte markers to be able to be analyzed by IACA as well.
|
While all Zen and TX2 kernels use the comment-style OSACA markers, the kernels for Intel Cascade Lake (*.csx.*.s) use the byte markers to be able to be analyzed by IACA as well.
|
||||||
For this use
|
For this use
|
||||||
```
|
```
|
||||||
iaca -arch SKX filepath
|
gcc -c FILE.s
|
||||||
|
iaca -arch SKX FILE.o
|
||||||
```
|
```
|
||||||
|
|
||||||
------------
|
------------
|
||||||
The kernels will be explained briefly in the following.
|
The kernels currently contained in the examples are shown briefly in the following.
|
||||||
|
|
||||||
### Copy
|
### Copy (`copy/`)
|
||||||
```c
|
```c
|
||||||
double * restrict a, * restrict b;
|
double * restrict a, * restrict b;
|
||||||
|
|
||||||
@@ -25,19 +26,89 @@ for(long i=0; i < size; ++i){
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Vector add
|
### Vector add (`add/`)
|
||||||
|
```c
|
||||||
|
double * restrict a, * restrict b, * restrict c;
|
||||||
|
|
||||||
### Vector update
|
for(long i=0; i < size; ++i){
|
||||||
|
a[i] = b[i] + c[i];
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
### Sum reduction
|
### Vector update (`update/`)
|
||||||
|
```c
|
||||||
|
double * restrict a;
|
||||||
|
|
||||||
### DAXPY
|
for(long i=0; i < size; ++i){
|
||||||
|
a[i] = scale * a[i];
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
### STREAM triad
|
### Sum reduction (`sum_reduction/`)
|
||||||
|
```c
|
||||||
|
double * restrict a;
|
||||||
|
|
||||||
### Schönauer triad
|
for(long i=0; i < size; ++i){
|
||||||
|
scale = scale + a[i];
|
||||||
|
}
|
||||||
|
```
|
||||||
|
For this kernel we noticed an overlap of the loop bodies when using gcc with `-Ofast` flag (see this [blog post](https://blogs.fau.de/hager/archives/7658) for more information).
|
||||||
|
We therefore compiled all gcc version additionally with `-O3` flag instead.
|
||||||
|
These versions are named accordingly.
|
||||||
|
|
||||||
### Gauss-Seidel method
|
### DAXPY (`daxpy/`)
|
||||||
|
```c
|
||||||
|
double * restrict a, * restrict b;
|
||||||
|
|
||||||
### Jacobi 2D
|
for(long i=0; i < size; ++i){
|
||||||
|
a[i] = a[i] + scale * b[i];
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### STREAM triad (`triad/`)
|
||||||
|
```c
|
||||||
|
double * restrict a, * restrict b, * restrict c;
|
||||||
|
|
||||||
|
for(long i=0; i < size; ++i){
|
||||||
|
a[i] = b[i] + scale * c[i];
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Schönauer triad (`striad/`)
|
||||||
|
```c
|
||||||
|
double * restrict a, * restrict b, * restrict c, * restrict d;
|
||||||
|
|
||||||
|
for(long i=0; i < size; ++i){
|
||||||
|
a[i] = b[i] + c[i] * d[i];
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Gauss-Seidel method (`gs/`)
|
||||||
|
```c
|
||||||
|
double ** restrict a;
|
||||||
|
|
||||||
|
for(long k=1; k < size_k-1; ++k){
|
||||||
|
for(long i=1; i < size_i-1; ++i){
|
||||||
|
a[k][i] = scale * (
|
||||||
|
a[k][i-1] + a[k+1][i]
|
||||||
|
+ a[k][i+1] + a[k-1][i]
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Jacobi 2D (`j2d/`)
|
||||||
|
```c
|
||||||
|
double ** restrict a, ** restrict b;
|
||||||
|
|
||||||
|
for(long k=1; k < size_k-1; ++k){
|
||||||
|
for(long i=1; i < size_i-1; ++i){
|
||||||
|
a[k][i] = 0.25 * (
|
||||||
|
b[k][i-1] + b[k+1][i]
|
||||||
|
+ b[k][i+1] + b[k-1][i]
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
For this kernel we noticed a discrepancy between measurements and predcitions especially when using AVX-512 instructions.
|
||||||
|
We therefore compiled the x86 kernels additionally with AVX/SSE instruction and marekd those kernels accordingly.
|
||||||
|
|||||||
Reference in New Issue
Block a user