diff --git a/examples/README.md b/examples/README.md index 8fac95f..338d57e 100644 --- a/examples/README.md +++ b/examples/README.md @@ -5,18 +5,19 @@ The used compilers were Intel Parallel Studio 19.0up05 and GNU 9.1.0 i To analyze the kernels with OSACA, run ``` -osaca --arch ARCH filepath +osaca --arch ARCH FILE ``` While all Zen and TX2 kernels use the comment-style OSACA markers, the kernels for Intel Cascade Lake (*.csx.*.s) use the byte markers to be able to be analyzed by IACA as well. For this use ``` -iaca -arch SKX filepath +gcc -c FILE.s +iaca -arch SKX FILE.o ``` ------------ -The kernels will be explained briefly in the following. +The kernels currently contained in the examples are shown briefly in the following. -### Copy +### Copy (`copy/`) ```c double * restrict a, * restrict b; @@ -25,19 +26,89 @@ for(long i=0; i < size; ++i){ } ``` -### Vector add +### Vector add (`add/`) +```c +double * restrict a, * restrict b, * restrict c; -### Vector update +for(long i=0; i < size; ++i){ + a[i] = b[i] + c[i]; +} +``` -### Sum reduction +### Vector update (`update/`) +```c +double * restrict a; -### DAXPY +for(long i=0; i < size; ++i){ + a[i] = scale * a[i]; +} +``` -### STREAM triad +### Sum reduction (`sum_reduction/`) +```c +double * restrict a; -### Schönauer triad +for(long i=0; i < size; ++i){ + scale = scale + a[i]; +} +``` +For this kernel we noticed an overlap of the loop bodies when using gcc with `-Ofast` flag (see this [blog post](https://blogs.fau.de/hager/archives/7658) for more information). +We therefore compiled all gcc version additionally with `-O3` flag instead. +These versions are named accordingly. -### Gauss-Seidel method +### DAXPY (`daxpy/`) +```c +double * restrict a, * restrict b; -### Jacobi 2D +for(long i=0; i < size; ++i){ + a[i] = a[i] + scale * b[i]; +} +``` +### STREAM triad (`triad/`) +```c +double * restrict a, * restrict b, * restrict c; + +for(long i=0; i < size; ++i){ + a[i] = b[i] + scale * c[i]; +} +``` + +### Schönauer triad (`striad/`) +```c +double * restrict a, * restrict b, * restrict c, * restrict d; + +for(long i=0; i < size; ++i){ + a[i] = b[i] + c[i] * d[i]; +} +``` + +### Gauss-Seidel method (`gs/`) +```c +double ** restrict a; + +for(long k=1; k < size_k-1; ++k){ + for(long i=1; i < size_i-1; ++i){ + a[k][i] = scale * ( + a[k][i-1] + a[k+1][i] + + a[k][i+1] + a[k-1][i] + ); + } +} +``` + +### Jacobi 2D (`j2d/`) +```c +double ** restrict a, ** restrict b; + +for(long k=1; k < size_k-1; ++k){ + for(long i=1; i < size_i-1; ++i){ + a[k][i] = 0.25 * ( + b[k][i-1] + b[k+1][i] + + b[k][i+1] + b[k-1][i] + ); + } +} +``` +For this kernel we noticed a discrepancy between measurements and predcitions especially when using AVX-512 instructions. +We therefore compiled the x86 kernels additionally with AVX/SSE instruction and marekd those kernels accordingly.