nanoBench Cache Analyzer

This commit is contained in:
Andreas Abel
2019-09-24 14:48:08 +02:00
parent 5d651050da
commit 9c075f04a5
12 changed files with 2258 additions and 0 deletions

View File

@@ -0,0 +1,133 @@
# nanoBench Cache Analyzer
This folder contains several tools for analyzing caches using hardware performance counters.
Results for recent Intel CPUs that were obtained using these tools are available on [uops.info/cache.html](https://uops.info/cache.html).
Make sure to read the [Prerequisites](#prerequisites) section before trying any of the tools.
# Tools
## cacheSeq.py
This tool can be used to measure how many cache hits and misses executing an access sequence generates.
As an example, consider the following call:
sudo ./cacheSeq.py -level 2 -sets 10-14,20,35 -seq "A B C D A? C! B?"
The tool will make memory accesses to four different blocks in all of the specified sets of the second-level cache. Elements of the sequence that end with a `?` will be included in the performance counter measurements; the other elements will be accessed, but the number of hits/misses they generate will not be recorded. Elements that end with a `!` will be flushed (using the `CLFLUSH` instruction) instead of being accessed. The order of the accesses is as follows: First, block `A` will be accessed in all of the specified sets, then block `B` will be accessed in all of the specified sets, and so on.
Between every two accesses to the same set in a lower-level cache, the tool automatically adds enough accesses to the higher-level caches (that map to different sets and/or slices in the lower-level cache) to make sure that the corresponding lines are evicted from the higher-level cache and the access actually reaches the lower-level cache. These additional accesses are excluded from the performance counter measurements.
By default, the `WBINVD` instruction is call before executing the access sequence. This can be disabled with the `-noWbinvd` option.
The tool has the following command-line options:
| Option | Description |
|------------------------------|-------------|
| `-seq <sequence>` | Main access sequence. |
| `-loop <n>` | Number of times the main access sequence is executed. `[Default: n=1]` |
| `-seq_init <sequence>` | Access sequence that is executed once in the beginning before the main sequence. |
| `-level <n>` | Cache level `[Default: n=1]` |
| `-sets <sets>` | Cache sets in which the access sequence will be executed. By default all cache sets are used, except number of sets of the higher level cache, which are needed for clearing the higher level cache. |
| `-cBox <n>` | CBox in which the access sequence will be executed. `[Default: n=1]` |
| `-noClearHL` | Do not clear higher level caches. |
| `-noWbinvd` | Do not call wbinvd before each run. |
| `-sim <policy>` | Simulate the given policy instead of running the experiment on the hardware. For a list of available policies, see `cacheSim.py`. |
| `-simAssoc <n>` | Associativity of the simulated cache. `[Default: n=1]` |
## hitMiss.py
Similar to `cacheSeq.py`, but only outputs whether the last access of a sequence is a hit or a miss.
## cacheGraph.py
Generates an HTML file with graph that shows the *ages* of all cache blocks after executing an access sequence. The *age* of a block B is the number of fresh blocks that need to be accessed before B is evicted.
The tool supports all of the command-line options of `cacheSeq.py`, except the `-loop` option. In addition to that, the following options are supported:
| Option | Description |
|------------------------------|-------------|
| `-blocks <blocks>` | Only determine the ages of the blocks in the given list. `[Default: consider all blocks in the access sequence]` |
| `-maxAge <n>` | The maximum age to consider. `[Default: 2*associativity]` |
| `-output <file>` | File name of the HTML file. `[Default: graph.html]` |
## replPolicy.py
Determines the replacement policy by generating random access sequences and comparing the number of hits on the actual hardware to the number of hits in a simulation of different policies. By default, a number of commonly used policies are simulated. With the `-allQLRUVariants` option, a more comprehensive list of more than 300 QLRU variants is tested.
The tool outputs all results in the form of an HTML table.
If the `-findCtrEx` option is used, it will try to find a small counterexample for each policy.
It supports the following additional command-line parameters:
| Option | Description |
|------------------------------|-------------|
| `-level <n>` | Cache level `[Default: n=1]` |
| `-sets <sets>` | Cache sets for which the replacement policy will be tested. |
| `-cBox <n>` | CBox for which the replacement policy will be tested. `[Default: n=0]` |
| `-policies <policies>` | Only consider the policies in the given comma-separated list |
| `-useInitSeq <seq>` | Adds a fixed prefix to each randomly generated sequence. This can be used to initialize the cache to a specific state. |
| `-nRandSeq <n>` | Number of random sequences. `[Default: n=100]` |
| `-lRandSeq <n>` | Length of random sequences. `[Default: n=50]` |
| `-output <file>` | File name of the HTML file. `[Default: replPolicy.html]` |
## permPolicy.py
If the replacement policy is a permutation policy (see [Measurement-based Modeling of the Cache Replacement Policy](http://embedded.cs.uni-saarland.de/publications/CacheModelingRTAS2013.pdf)), this tool determines the permutation vectors. In addition to that, it outputs a set of age graphs for the access sequences generated by the permutation policy inference algorithm. These graphs can be a useful starting point for analyzing policies that are not permutation policies.
## strideGraph.py
Generates a graph that shows the number of core cycles (per access) when accessing memory areas of different sizes repeatedly using a given stride (which can be specified with the `-stride` option). An example can be seen [here](https://uops.info/cache/lat_CFL.html).
## cpuid.py
Obtains cache and TLB information using the `CPUID` instruction.
## cacheInfo.py
Combines information from `cpuid.py` with information on the number of slices of the L3 cache that is obtained through measurements.
## setDueling.py
For caches that use set dueling to choose between two different policies, this tool can generate a graph that shows the sets that use a fixed policy.
## cacheLib.py
Library containing helper functions used by the other tools.
## cacheSim.py
This file contains the implementations of the simulated policies used by some of the other tools.
# Prerequisites
To use the tools in this folder, the nanoBench kernel module needs to be loaded. Instructions on how to do this, can be found in the main README on <https://github.com/andreas-abel/nanoBench>.
nanoBench needs to be configured to use a physically contiguous memory area that is large enough for the access sequences that you want to test. This can be achieved with the `set-R14-size.sh` script in the main NanoBench folder.
You can, e.g., call it as follows
sudo ./set-R14-size.sh 1G
to reserve a memory area of 1 GB. If your system has enough memory, but the above call is unable to reserve a memory area of the requested size, a reboot often helps.
For analyzing shared caches, it can make sense to disable other cores using the same cache. The `single-core-mode.sh` script in the main nanoBench folder can be used to disable all but one core.
Furthermore, it can also be helpful to disable cache prefetching. On recent Intel CPUs, this can be done by executing
sudo modprobe msr; sudo wrmsr -a 0x1a4 15
On some not so recent Intel CPUs (e.g., Core 2 Duo), you can use
sudo modprobe msr; sudo wrmsr -a 0x1a0 0xE0668D2689
instead.
I'm not aware of a way to disable cache prefetching on recent AMD CPUs. If you know how to do this, please consider posting an answer to [this question](https://stackoverflow.com/questions/57855793/how-to-disable-cache-prefetching-on-amd-family-17h-cpus).
The tools that generate graphs need [Plotly](https://plot.ly/python/) to be installed. This can be achieved via
sudo apt install python-pip; pip install plotly

View File

@@ -0,0 +1,86 @@
#!/usr/bin/python
from itertools import count
from collections import namedtuple, OrderedDict
import argparse
import sys
from cacheLib import *
import cacheSim
from plotly.offline import plot
import plotly.graph_objects as go
import logging
log = logging.getLogger(__name__)
# traces is a list of (name, y value list) pairs
def getPlotlyGraphDiv(title, x_title, y_title, traces):
fig = go.Figure()
fig.update_layout(title_text=title)
fig.update_xaxes(title_text=x_title)
fig.update_yaxes(title_text=y_title)
for name, y_values in traces:
fig.add_trace(go.Scatter(y=y_values, mode='lines+markers', name=name))
return plot(fig, include_plotlyjs=False, output_type='div')
def main():
parser = argparse.ArgumentParser(description='Generates a graph with the ages of each block')
parser.add_argument("-seq", help="Access sequence", required=True)
parser.add_argument("-seq_init", help="Initialization sequence", default='')
parser.add_argument("-level", help="Cache level (Default: 1)", type=int, default=1)
parser.add_argument("-sets", help="Cache set (if not specified, all cache sets are used)")
parser.add_argument("-noClearHL", help="Do not clear higher levels", action='store_true')
parser.add_argument("-noWbinvd", help="Do not call wbinvd before each run", action='store_true')
parser.add_argument("-nMeasurements", help="Number of measurements", type=int, default=10)
parser.add_argument("-agg", help="Aggregate function", default='med')
parser.add_argument("-cBox", help="cBox (default: 1)", type=int, default=1)
parser.add_argument("-logLevel", help="Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)", default='WARNING')
parser.add_argument("-blocks", help="Blocks to consider (default: all blocks in seq)")
parser.add_argument("-maxAge", help="Maximum age", type=int)
parser.add_argument("-output", help="Output file name", default='graph.html')
parser.add_argument("-sim", help="Simulate the given policy instead of running the experiment on the hardware")
parser.add_argument("-simAssoc", help="Associativity of the simulated cache (default: 8)", type=int, default=8)
parser.add_argument("-simRep", help="Number of repetitions", type=int, default=1)
args = parser.parse_args()
logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.getLevelName(args.logLevel))
if args.blocks:
blocksStr = args.blocks
else:
blocksStr = args.seq
blocks = list(OrderedDict.fromkeys(re.sub('[?!,;]', ' ', blocksStr).split()))
html = ['<html>', '<head>', '<script src="https://cdn.plot.ly/plotly-latest.min.js">', '</script>', '</head>', '<body>']
if args.sim:
policyClass = cacheSim.AllPolicies[args.sim]
if not args.maxAge:
maxAge = 2*args.simAssoc
else:
maxAge = args.maxAge
nSets = len(parseCacheSetsStr(args.level, True, args.sets))
traces = cacheSim.getGraph(blocks, args.seq, policyClass, args.simAssoc, maxAge, nSets=nSets, nRep=args.simRep, agg=args.agg)
title = 'Access Sequence: ' + args.seq.replace('?','').strip() + ' <n fresh blocks> <block>?'
html.append(getPlotlyGraphDiv(title, '# of fresh blocks', 'Hits', traces))
else:
_, nbDict = getAgesOfBlocks(blocks, args.level, args.seq, initSeq=args.seq_init, cacheSets=args.sets, cBox=args.cBox, clearHL=(not args.noClearHL),
wbinvd=(not args.noWbinvd), returnNbResults=True, maxAge=args.maxAge, nMeasurements=args.nMeasurements, agg=args.agg)
for event in sorted(e for e in nbDict.values()[0][0].keys() if 'HIT' in e or 'MISS' in e):
traces = [(b, [nb[event] for nb in nbDict[b]]) for b in blocks]
title = 'Access Sequence: ' + (args.seq_init + ' ' + args.seq).replace('?','').strip() + ' <n fresh blocks> <block>?'
html.append(getPlotlyGraphDiv(title, '# of fresh blocks', event, traces))
html += ['</body>', '</html>']
with open(args.output ,'w') as f:
f.write('\n'.join(html))
print 'Graph written to ' + args.output
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,27 @@
#!/usr/bin/python
import argparse
from cacheLib import *
import logging
log = logging.getLogger(__name__)
def main():
parser = argparse.ArgumentParser(description='Cache Information')
parser.add_argument("-logLevel", help="Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)", default='INFO')
args = parser.parse_args()
logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.getLevelName(args.logLevel))
cpuidInfo = getCpuidCacheInfo()
print ''
print getCacheInfo(1)
print getCacheInfo(2)
if 'L3' in cpuidInfo:
print getCacheInfo(3)
if __name__ == "__main__":
main()

600
tools/CacheAnalyzer/cacheLib.py Executable file
View File

@@ -0,0 +1,600 @@
#!/usr/bin/python
from itertools import count
from collections import namedtuple
import math
import re
import subprocess
import sys
import cpuid
sys.path.append('../..')
from kernelNanoBench import *
import logging
log = logging.getLogger(__name__)
def getEventConfig(event):
arch = getArch()
if event == 'L1_HIT':
if arch in ['Core', 'EnhancedCore']: return '40.0E ' + event # L1D_CACHE_LD.MES
if arch in ['NHM', 'WSM']: return 'CB.01 ' + event
if arch in ['SNB', 'IVB', 'HSW', 'BDW', 'SKL', 'SKX', 'KBL', 'CFL', 'CNL']: return 'D1.01 ' + event
if event == 'L1_MISS':
if arch in ['Core', 'EnhancedCore']: return 'CB.01.CTR=0 ' + event
if arch in ['IVB', 'HSW', 'BDW', 'SKL', 'SKX', 'KBL', 'CFL', 'CNL']: return 'D1.08 ' + event
if arch in ['ZEN+']: return '064.70 ' + event
if event == 'L2_HIT':
if arch in ['Core', 'EnhancedCore']: return '29.7E ' + event # L2_LD.THIS_CORE.ALL_INCL.MES
if arch in ['NHM', 'WSM']: return 'CB.02 ' + event
if arch in ['SNB', 'IVB', 'HSW', 'BDW', 'SKL', 'SKX', 'KBL', 'CFL', 'CNL']: return 'D1.02 ' + event
if arch in ['ZEN+']: return '064.70 ' + event
if event == 'L2_MISS':
if arch in ['Core', 'EnhancedCore']: return 'CB.04.CTR=0 ' + event
if arch in ['IVB', 'HSW', 'BDW', 'SKL', 'SKX', 'KBL', 'CFL', 'CNL']: return 'D1.10 ' + event
if arch in ['ZEN+']: return '064.08 ' + event
if event == 'L3_HIT':
if arch in ['NHM', 'WSM']: return 'CB.04 ' + event
if arch in ['SNB', 'IVB', 'HSW', 'BDW', 'SKL', 'SKX', 'KBL', 'CFL', 'CNL']: return 'D1.04 ' + event
if event == 'L3_MISS':
if arch in ['NHM', 'WSM']: return 'CB.10 ' + event
if arch in ['SNB', 'IVB', 'HSW', 'BDW', 'SKL', 'SKX', 'KBL', 'CFL', 'CNL']: return 'D1.20 ' + event
return ''
def getDefaultCacheConfig():
return '\n'.join(filter(None, [getEventConfig('L' + str(l) + '_' + hm) for l in range(1,4) for hm in ['HIT', 'MISS']]))
def getDefaultCacheMSRConfig():
if 'Intel' in getCPUVendor() and 'L3' in getCpuidCacheInfo() and getCpuidCacheInfo()['L3']['complex']:
if getArch() in ['CNL']:
dist = 8
ctrOffset = 2
else:
dist = 16
ctrOffset = 6
return '\n'.join('msr_0xE01=0x20000000.msr_' + format(0x700 + dist*cbo, 'x') + '=0x408F34 msr_' + format(0x700 + ctrOffset + dist*cbo, 'x') +
' CACHE_LOOKUP_CBO_' + str(cbo) for cbo in range(0, getNCBoxUnits()))
return ''
def isClose(a, b, rel_tol=1e-09, abs_tol=0.0):
return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)
class CacheInfo:
def __init__(self, level, assoc, lineSize, nSets, nSlices=None, nCboxes=None):
self.level = level
self.assoc = assoc
self.lineSize = lineSize
self.nSets = nSets
self.waySize = lineSize * nSets
self.size = self.waySize * assoc * (nSlices if nSlices is not None else 1)
self.nSlices = nSlices
self.nCboxes = nCboxes
def __str__(self):
return '\n'.join(['L' + str(self.level) + ':',
' Size: ' + str(self.size/1024) + ' kB',
' Associativity: ' + str(self.assoc),
' Line Size: ' + str(self.lineSize) + ' B',
' Number of sets' + (' (per slice)' if self.nSlices is not None else '') + ': ' + str(self.nSets),
' Way size' + (' (per slice)' if self.nSlices is not None else '') + ': ' + str(self.waySize/1024) + ' kB',
(' Number of CBoxes: ' + str(self.nCboxes) if self.nCboxes is not None else ''),
(' Number of slices: ' + str(self.nSlices) if self.nSlices is not None else '')])
def getArch():
if not hasattr(getArch, 'arch'):
cpu = cpuid.CPUID()
getArch.arch = cpuid.micro_arch(cpu)
return getArch.arch
def getCPUVendor():
if not hasattr(getCPUVendor, 'vendor'):
cpu = cpuid.CPUID()
getCPUVendor.vendor = cpuid.cpu_vendor(cpu)
return getCPUVendor.vendor
def getCpuidCacheInfo():
if not hasattr(getCpuidCacheInfo, 'cpuidCacheInfo'):
cpu = cpuid.CPUID()
log.debug(cpuid.get_basic_info(cpu))
getCpuidCacheInfo.cpuidCacheInfo = cpuid.get_cache_info(cpu)
if not len(set(c['lineSize'] for c in getCpuidCacheInfo.cpuidCacheInfo.values())) == 1:
raise ValueError('All line sizes must be the same')
return getCpuidCacheInfo.cpuidCacheInfo
def getCacheInfo(level):
if level == 1:
if not hasattr(getCacheInfo, 'L1CacheInfo'):
cpuidInfo = getCpuidCacheInfo()['L1D']
getCacheInfo.L1CacheInfo = CacheInfo(1, cpuidInfo['assoc'], cpuidInfo['lineSize'], cpuidInfo['nSets'])
return getCacheInfo.L1CacheInfo
elif level == 2:
if not hasattr(getCacheInfo, 'L2CacheInfo'):
cpuidInfo = getCpuidCacheInfo()['L2']
getCacheInfo.L2CacheInfo = CacheInfo(2, cpuidInfo['assoc'], cpuidInfo['lineSize'], cpuidInfo['nSets'])
return getCacheInfo.L2CacheInfo
elif level == 3:
if not hasattr(getCacheInfo, 'L3CacheInfo'):
if not 'L3' in getCpuidCacheInfo():
raise ValueError('invalid level')
cpuidInfo = getCpuidCacheInfo()['L3']
if not 'complex' in cpuidInfo or not cpuidInfo['complex']:
getCacheInfo.L3CacheInfo = CacheInfo(3, cpuidInfo['assoc'], cpuidInfo['lineSize'], cpuidInfo['nSets'])
else:
lineSize = cpuidInfo['lineSize']
assoc = cpuidInfo['assoc']
nSets = cpuidInfo['nSets']
stride = 2**((lineSize*nSets/getNCBoxUnits())-1).bit_length() # smallest power of two larger than lineSize*nSets/nCBoxUnits
ms = findMaximalNonEvictingL3SetInCBox(0, stride, assoc, 0)
log.debug('Maximal non-evicting L3 set: ' + str(len(ms)) + ' ' + str(ms))
nCboxes = getNCBoxUnits()
nSlices = nCboxes * int(math.ceil(float(len(ms))/assoc))
getCacheInfo.L3CacheInfo = CacheInfo(3, assoc, lineSize, nSets/nSlices, nSlices, nCboxes)
return getCacheInfo.L3CacheInfo
else:
raise ValueError('invalid level')
def getNCBoxUnits():
if not hasattr(getNCBoxUnits, 'nCBoxUnits'):
try:
subprocess.check_output(['modprobe', 'msr'])
cbo_config = subprocess.check_output(['rdmsr', '0x396'])
if getArch() in ['CNL']:
getNCBoxUnits.nCBoxUnits = int(cbo_config)
else:
getNCBoxUnits.nCBoxUnits = int(cbo_config) - 1
log.debug('Number of CBox Units: ' + str(getNCBoxUnits.nCBoxUnits))
except subprocess.CalledProcessError as e:
log.critical('Error: ' + e.output)
sys.exit()
except OSError as e:
log.critical("rdmsr not found. Try 'sudo apt install msr-tools'")
sys.exit()
return getNCBoxUnits.nCBoxUnits
def getCBoxOfAddress(address):
if not hasattr(getCBoxOfAddress, 'cBoxMap'):
getCBoxOfAddress.cBoxMap = dict()
cBoxMap = getCBoxOfAddress.cBoxMap
if not address in cBoxMap:
setNanoBenchParameters(config='', msrConfig=getDefaultCacheMSRConfig(), nMeasurements=10, unrollCount=1, loopCount=10, aggregateFunction='min',
basicMode=True, noMem=True)
ec = getCodeForAddressLists([AddressList([address],False,True)])
nb = runNanoBench(code=ec.code, oneTimeInit=ec.oneTimeInit)
nCacheLookups = [nb['CACHE_LOOKUP_CBO_'+str(cBox)] for cBox in range(0, getNCBoxUnits())]
cBoxMap[address] = nCacheLookups.index(max(nCacheLookups))
return cBoxMap[address]
def getNewAddressesInCBox(n, cBox, cacheSet, prevAddresses, notInCBox=False):
if not prevAddresses:
maxPrevAddress = cacheSet * getCacheInfo(3).lineSize
else:
maxPrevAddress = max(prevAddresses)
addresses = []
for addr in count(maxPrevAddress+getCacheInfo(3).waySize, getCacheInfo(3).waySize):
if not notInCBox and getCBoxOfAddress(addr) == cBox:
addresses.append(addr)
if notInCBox and getCBoxOfAddress(addr) != cBox:
addresses.append(addr)
if len(addresses) >= n:
return addresses
def getNewAddressesNotInCBox(n, cBox, cacheSet, prevAddresses):
return getNewAddressesInCBox(n, cBox, cacheSet, prevAddresses, notInCBox=True)
pointerChasingInits = dict()
#addresses must not contain duplicates
def getPointerChasingInit(addresses):
if tuple(addresses) in pointerChasingInits:
return pointerChasingInits[tuple(addresses)]
init = 'lea RAX, [R14+' + str(addresses[0]) + ']; '
init += 'mov RBX, RAX; '
i = 0
while i < len(addresses)-1:
stride = addresses[i+1] - addresses[i]
init += '1: add RBX, ' + str(stride) + '; '
init += 'mov [RAX], RBX; '
init += 'mov RAX, RBX; '
i += 1
oldI = i
while i < len(addresses)-1 and (addresses[i+1] - addresses[i]) == stride:
i += 1
if oldI != i:
init += 'lea RCX, [R14+' + str(addresses[i]) + ']; '
init += 'cmp RAX, RCX; '
init += 'jne 1b; '
init += 'mov qword ptr [R14 + ' + str(addresses[-1]) + '], 0; '
pointerChasingInits[tuple(addresses)] = init
return init
ExperimentCode = namedtuple('ExperimentCode', 'code init oneTimeInit')
def getCodeForAddressLists(codeAddressLists, initAddressLists=[], wbinvd=False):
distinctAddrLists = set(tuple(l.addresses) for l in initAddressLists+codeAddressLists)
if len(distinctAddrLists) > 1 and set.intersection(*list(set(l) for l in distinctAddrLists)):
raise ValueError('same address in different lists')
code = []
init = (['wbinvd; '] if wbinvd else [])
oneTimeInit = []
r14Size = getR14Size()
alreadyAddedOneTimeInits = set()
for addressLists, codeList, isInit in [(initAddressLists, init, True), (codeAddressLists, code, False)]:
if addressLists is None: continue
pfcEnabled = True
for addressList in addressLists:
addresses = addressList.addresses
if len(addresses) < 1: continue
if any(addr >= r14Size for addr in addresses):
sys.stderr.write('Size of memory area too small. Try increasing it with set-R14-size.sh.\n')
exit(1)
if not isInit:
if addressList.exclude and pfcEnabled:
codeList.append(PFC_STOP_ASM + '; ')
pfcEnabled = False
elif not addressList.exclude and not pfcEnabled:
codeList.append(PFC_START_ASM + '; ')
pfcEnabled = True
# use multiple lfence instructions to make sure that the block is actually in the cache and not still in a fill buffer
codeList.append('lfence; ' * 10)
if addressList.flush:
for address in addresses:
codeList.append('clflush [R14 + ' + str(address) + ']; ')
else:
if len(addresses) == 1:
codeList.append('mov RCX, [R14 + ' + str(addresses[0]) + ']; ')
else:
if not tuple(addresses) in alreadyAddedOneTimeInits:
oneTimeInit.append(getPointerChasingInit(addresses))
alreadyAddedOneTimeInits.add(tuple(addresses))
codeList.append('lea RCX, [R14+' + str(addresses[0]) + ']; 1: mov RCX, [RCX]; jrcxz 2f; jmp 1b; 2: ')
if not isInit and not pfcEnabled:
codeList.append(PFC_START_ASM + '; ')
return ExperimentCode(''.join(code), ''.join(init), ''.join(oneTimeInit))
def getClearHLAddresses(level, cacheSetList, cBox=1):
lineSize = getCacheInfo(1).lineSize
if level == 1:
return []
elif (level == 2) or (level == 3 and getCacheInfo(3).nSlices is None):
nSets = getCacheInfo(level).nSets
if not all(nSets > getCacheInfo(lLevel).nSets for lLevel in range(1, level)):
raise ValueError('L' + str(level) + ' way size must be greater than lower level way sizes')
nHLSets = getCacheInfo(level-1).nSets
nClearAddresses = 2*sum(getCacheInfo(hLevel).assoc for hLevel in range(1, level))
HLSets = set(cs % nHLSets for cs in cacheSetList)
addrForClearingHL = []
for HLSet in HLSets:
possibleSets = [cs for cs in range(HLSet, nSets, nHLSets) if cs not in cacheSetList]
if not possibleSets:
raise ValueError("not enough cache sets available for clearing higher levels")
addrForClearingHLSet = []
for setIndex in count(HLSet, nHLSets):
if not setIndex % nSets in possibleSets:
continue
addrForClearingHLSet.append(setIndex*lineSize)
if len(addrForClearingHLSet) >= nClearAddresses:
break
addrForClearingHL += addrForClearingHLSet
return addrForClearingHL
elif level == 3:
if not hasattr(getClearHLAddresses, 'clearL2Map'):
getClearHLAddresses.clearL2Map = dict()
clearL2Map = getClearHLAddresses.clearL2Map
if not cBox in clearL2Map:
clearL2Map[cBox] = dict()
clearAddresses = []
for L3Set in cacheSetList:
if not L3Set in clearL2Map[cBox]:
clearL2Map[cBox][L3Set] = getNewAddressesNotInCBox(2*(getCacheInfo(1).assoc+getCacheInfo(2).assoc), cBox, L3Set, [])
clearAddresses += clearL2Map[cBox][L3Set]
return clearAddresses
L3SetToWayIDMap = dict()
def getAddresses(level, wayID, cacheSetList, cBox=1, clearHL=True):
lineSize = getCacheInfo(1).lineSize
if level <= 2 or (level == 3 and getCacheInfo(3).nSlices is None):
nSets = getCacheInfo(level).nSets
waySize = getCacheInfo(level).waySize
return [(wayID*waySize) + s*lineSize for s in cacheSetList]
elif level == 3:
if not cBox in L3SetToWayIDMap:
L3SetToWayIDMap[cBox] = dict()
addresses = []
for L3Set in cacheSetList:
if not L3Set in L3SetToWayIDMap[cBox]:
L3SetToWayIDMap[cBox][L3Set] = dict()
if getCacheInfo(3).nSlices != getNCBoxUnits():
for i, addr in enumerate(findMinimalL3EvictionSet(L3Set, cBox)):
L3SetToWayIDMap[cBox][L3Set][i] = addr
if not wayID in L3SetToWayIDMap[cBox][L3Set]:
if getCacheInfo(3).nSlices == getNCBoxUnits():
L3SetToWayIDMap[cBox][L3Set][wayID] = next(iter(getNewAddressesInCBox(1, cBox, L3Set, L3SetToWayIDMap[cBox][L3Set].values())))
else:
L3SetToWayIDMap[cBox][L3Set][wayID] = next(iter(findCongruentL3Addresses(1, L3SetToWayIDMap[cBox][L3Set].values())))
addresses.append(L3SetToWayIDMap[cBox][L3Set][wayID])
return addresses
raise ValueError('invalid level')
# removes ?s and !s
def getBlockName(blockStr):
return re.sub('[?!]', '', blockStr)
def parseCacheSetsStr(level, clearHL, cacheSetsStr):
cacheSetList = []
if cacheSetsStr is not None:
for s in cacheSetsStr.split(','):
if '-' in s:
first, last = s.split('-')[:2]
cacheSetList += range(int(first), int(last)+1)
else:
cacheSetList.append(int(s))
else:
nSets = getCacheInfo(level).nSets
if level > 1 and clearHL:
nHLSets = getCacheInfo(level-1).nSets
cacheSetList = range(nHLSets, nSets)
else:
cacheSetList = range(0, nSets)
return cacheSetList
AddressList = namedtuple('AddressList', 'addresses exclude flush')
# cacheSets=None means do access in all sets
# in this case, the first nL1Sets many sets of L2 will be reserved for clearing L1
# if wbinvd is set, wbinvd will be called before initSeq
def runCacheExperiment(level, seq, initSeq='', cacheSets=None, cBox=1, clearHL=True, loop=1, wbinvd=False, nMeasurements=10, warmUpCount=1, agg='avg'):
lineSize = getCacheInfo(1).lineSize
cacheSetList = parseCacheSetsStr(level, clearHL, cacheSets)
clearHLAddrList = None
if (clearHL and level > 1):
clearHLAddrList = AddressList(getClearHLAddresses(level, cacheSetList, cBox), True, False)
initAddressLists = []
seqAddressLists = []
nameToID = dict()
for seqString, addrLists in [(initSeq, initAddressLists), (seq, seqAddressLists)]:
for seqEl in seqString.split():
name = getBlockName(seqEl)
wayID = nameToID.setdefault(name, len(nameToID))
exclude = not '?' in seqEl
flush = '!' in seqEl
addresses = getAddresses(level, wayID, cacheSetList, cBox=cBox, clearHL=clearHL)
if clearHLAddrList is not None and not flush:
addrLists.append(clearHLAddrList)
addrLists.append(AddressList(addresses, exclude, flush))
ec = getCodeForAddressLists(seqAddressLists, initAddressLists, wbinvd)
log.debug('\nInitAddresses: ' + str(initAddressLists))
log.debug('\nSeqAddresses: ' + str(seqAddressLists))
log.debug('\nOneTimeInit: ' + ec.oneTimeInit)
log.debug('\nInit: ' + ec.init)
log.debug('\nCode: ' + ec.code)
resetNanoBench()
setNanoBenchParameters(config=getDefaultCacheConfig(), msrConfig=getDefaultCacheMSRConfig(), nMeasurements=nMeasurements, unrollCount=1, loopCount=loop,
warmUpCount=warmUpCount, aggregateFunction=agg, basicMode=True, noMem=True, verbose=None)
return runNanoBench(code=ec.code, init=ec.init, oneTimeInit=ec.oneTimeInit)
def printNB(nb_result):
for r in nb_result.items():
print r[0] + ': ' + str(r[1])
def findMinimalL3EvictionSet(cacheSet, cBox):
setNanoBenchParameters(config='\n'.join([getEventConfig('L3_HIT'), getEventConfig('L3_MISS')]), msrConfig=None, nMeasurements=10, unrollCount=1, loopCount=10,
warmUpCount=None, initialWarmUpCount=None, aggregateFunction='med', basicMode=True, noMem=True, verbose=None)
if not hasattr(findMinimalL3EvictionSet, 'evSetForCacheSet'):
findMinimalL3EvictionSet.evSetForCacheSet = dict()
evSetForCacheSet = findMinimalL3EvictionSet.evSetForCacheSet
if cacheSet in evSetForCacheSet:
return evSetForCacheSet[cacheSet]
addresses = []
curAddress = cacheSet*getCacheInfo(3).lineSize
while len(addresses) < getCacheInfo(3).assoc:
curAddress += getCacheInfo(3).waySize
if getCBoxOfAddress(curAddress) == cBox:
addresses.append(curAddress)
while True:
curAddress += getCacheInfo(3).waySize
if not getCBoxOfAddress(curAddress) == cBox: continue
addresses += [curAddress]
ec = getCodeForAddressLists([AddressList(addresses,False,False)])
setNanoBenchParameters(config=getDefaultCacheConfig(), msrConfig='', nMeasurements=10, unrollCount=1, loopCount=100,
aggregateFunction='med', basicMode=True, noMem=True)
nb = runNanoBench(code=ec.code, oneTimeInit=ec.oneTimeInit)
if nb['L3_HIT'] < len(addresses) - .9:
break
for i in reversed(range(0, len(addresses))):
tmpAddresses = addresses[:i] + addresses[(i+1):]
ec = getCodeForAddressLists([AddressList(tmpAddresses,False,False)])
nb = runNanoBench(code=ec.code, oneTimeInit=ec.oneTimeInit)
if nb['L3_HIT'] < len(tmpAddresses) - 0.9:
addresses = tmpAddresses
evSetForCacheSet[cacheSet] = addresses
return addresses
def findCongruentL3Addresses(n, L3EvictionSet):
setNanoBenchParameters(config=getEventConfig('L3_HIT'), msrConfig=None, nMeasurements=10, unrollCount=1, loopCount=100,
warmUpCount=None, initialWarmUpCount=None, aggregateFunction='med', basicMode=True, noMem=True, verbose=None)
congrAddresses = []
L3WaySize = getCacheInfo(3).waySize
for newAddr in count(max(L3EvictionSet)+L3WaySize, L3WaySize):
tmpAddresses = L3EvictionSet[:getCacheInfo(3).assoc] + [newAddr]
ec = getCodeForAddressLists([AddressList(tmpAddresses,False,False)])
nb = runNanoBench(code=ec.code, oneTimeInit=ec.oneTimeInit)
if nb['L3_HIT'] < len(tmpAddresses) - 0.9:
congrAddresses.append(newAddr)
if len(congrAddresses) >= n: break
return congrAddresses
def findMaximalNonEvictingL3SetInCBox(start, stride, L3Assoc, cBox):
curAddress = start
addresses = []
while len(addresses) < L3Assoc:
if getCBoxOfAddress(curAddress) == cBox:
addresses.append(curAddress)
curAddress += stride
notAdded = 0
while notAdded < L3Assoc:
curAddress += stride
if not getCBoxOfAddress(curAddress) == cBox:
continue
newAddresses = addresses + [curAddress]
ec = getCodeForAddressLists([AddressList(newAddresses,False,False)])
setNanoBenchParameters(config=getEventConfig('L3_HIT'), msrConfig='', nMeasurements=10, unrollCount=1, loopCount=10,
aggregateFunction='med', basicMode=True, noMem=True)
nb = runNanoBench(code=ec.code, oneTimeInit=ec.oneTimeInit)
if nb['L3_HIT'] > len(newAddresses) - .9:
addresses = newAddresses
notAdded = 0
else:
notAdded += 1
return addresses
def getUnusedBlockNames(n, usedBlockNames, prefix=''):
newBlockNames = []
i = 0
while len(newBlockNames) < n:
name = prefix + str(i)
if not name in usedBlockNames: newBlockNames.append(name)
i += 1
return newBlockNames
# Returns a dict with the age of each block, i.e., how many fresh blocks need to be accessed until the block is evicted
# if returnNbResults is True, the function returns additionally all measurment results (as the second component of a tuple)
def getAgesOfBlocks(blocks, level, seq, initSeq='', maxAge=None, cacheSets=None, cBox=1, clearHL=True, wbinvd=False, returnNbResults=False, nMeasurements=10, agg='avg'):
ages = dict()
if returnNbResults: nbResults = dict()
if maxAge is None:
maxAge = 2*getCacheInfo(level).assoc
nSets = len(parseCacheSetsStr(level, clearHL, cacheSets))
for block in blocks:
if returnNbResults: nbResults[block] = []
for nNewBlocks in range(0, maxAge+1):
curSeq = seq.replace('?', '') + ' '
newBlocks = getUnusedBlockNames(nNewBlocks, seq+initSeq, 'N')
curSeq += ' '.join(newBlocks) + ' ' + block + '?'
nb = runCacheExperiment(level, curSeq, initSeq=initSeq, cacheSets=cacheSets, cBox=cBox, clearHL=clearHL, loop=0, wbinvd=wbinvd, nMeasurements=nMeasurements)
if returnNbResults: nbResults[block].append(nb)
hitEvent = 'L' + str(level) + '_HIT'
missEvent = 'L' + str(level) + '_MISS'
if hitEvent in nb:
if isClose(nb[hitEvent], 0.0, abs_tol=0.1):
if not block in ages:
ages[block] = nNewBlocks
#if not returnNbResults:
#break
elif missEvent in nb:
if nb[missEvent] > nSets - 0.1:
if not block in ages:
ages[block] = nNewBlocks
#if not returnNbResults:
#break
else:
raise ValueError('no cache results available')
if not block in ages:
ages[block] = -1
if returnNbResults:
return (ages, nbResults)
else:
return ages

47
tools/CacheAnalyzer/cacheSeq.py Executable file
View File

@@ -0,0 +1,47 @@
#!/usr/bin/python
from itertools import count, cycle, islice
from collections import namedtuple, OrderedDict
import argparse
import sys
from cacheLib import *
import cacheSim
import logging
log = logging.getLogger(__name__)
def main():
parser = argparse.ArgumentParser(description='Cache Benchmarks')
parser.add_argument("-seq", help="Access sequence", required=True)
parser.add_argument("-seq_init", help="Initialization sequence", default='')
parser.add_argument("-level", help="Cache level (Default: 1)", type=int, default=1)
parser.add_argument("-sets", help="Cache sets (if not specified, all cache sets are used)")
parser.add_argument("-cBox", help="cBox (default: 1)", type=int, default=1) # use 1 as default, as, e.g., on SNB, box 0 only has 15 ways instead of 16
parser.add_argument("-noClearHL", help="Do not clear higher levels", action='store_true')
parser.add_argument("-nMeasurements", help="Number of measurements", type=int, default=10)
parser.add_argument("-agg", help="Aggregate function", default='med')
parser.add_argument("-loop", help="Loop count (Default: 1)", type=int, default=1)
parser.add_argument("-noWbinvd", help="Do not call wbinvd before each run", action='store_true')
parser.add_argument("-logLevel", help="Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)", default='WARNING')
parser.add_argument("-sim", help="Simulate the given policy instead of running the experiment on the hardware")
parser.add_argument("-simAssoc", help="Associativity of the simulated cache (default: 8)", type=int, default=8)
args = parser.parse_args()
logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.getLevelName(args.logLevel))
if args.sim:
policyClass = cacheSim.AllPolicies[args.sim]
setCount = len(parseCacheSetsStr(args.level, (not args.noClearHL), args.sets))
seq = args.seq_init + (' ' + args.seq) * args.loop
hits = cacheSim.getHits(seq, policyClass, args.simAssoc, setCount) / args.loop
print 'Hits: ' + str(hits)
else:
nb = runCacheExperiment(args.level, args.seq, initSeq=args.seq_init, cacheSets=args.sets, cBox=args.cBox, clearHL=(not args.noClearHL), loop=args.loop,
wbinvd=(not args.noWbinvd), nMeasurements=args.nMeasurements, agg=args.agg)
printNB(nb)
if __name__ == "__main__":
main()

360
tools/CacheAnalyzer/cacheSim.py Executable file
View File

@@ -0,0 +1,360 @@
#!/usr/bin/python
import random
from itertools import count
from numpy import median
from cacheLib import *
import logging
log = logging.getLogger(__name__)
def rindex(lst, value):
return len(lst) - lst[::-1].index(value) - 1
class ReplPolicySim(object):
def __init__(self, assoc):
self.assoc = assoc
self.blocks = [None] * assoc
def acc(self, block):
raise NotImplementedError()
def flush(self, block):
if block in self.blocks:
self.blocks[self.blocks.index(block)] = block
class FIFOSim(ReplPolicySim):
def __init__(self, assoc):
super(FIFOSim, self).__init__(assoc)
def acc(self, block):
hit = block in self.blocks
if not hit:
self.blocks = [block] + self.blocks[0:self.assoc-1]
return hit
class LRUSim(ReplPolicySim):
def __init__(self, assoc):
super(LRUSim, self).__init__(assoc)
def acc(self, block):
hit = block in self.blocks
self.blocks = [block] + [b for b in self.blocks if b!=block][0:self.assoc-1]
return hit
class PLRUSim(ReplPolicySim):
def __init__(self, assoc, linearInit=False, randLeaf=False, randRoot=False):
super(PLRUSim, self).__init__(assoc)
self.linearInit = linearInit
self.randLeaf = randLeaf
self.randRoot = randRoot
self.bits = [[0 for _ in range(0, 2**(level))] for level in range(0, int(math.ceil(math.log(assoc,2))))]
def acc(self, block):
hit = block in self.blocks
if hit:
self.updateIndexBits(self.blocks.index(block))
else:
if self.linearInit and None in self.blocks:
idx = self.blocks.index(None)
else:
idx = self.getIndexForBits()
self.blocks[idx] = block
self.updateIndexBits(idx)
return hit
def getIndexForBits(self, level=0, idx = 0):
if level == len(self.bits) - 1:
ret = 2*idx
if self.randLeaf:
ret += random.randint(0,1)
else:
ret += self.bits[level][idx]
return min(self.assoc - 1, ret)
elif level == 0 and self.randRoot:
return self.getIndexForBits(level + 2, random.randint(0,2))
else:
return self.getIndexForBits(level + 1, 2*idx + self.bits[level][idx])
def updateIndexBits(self, accIndex):
lastIdx = accIndex
for level in reversed(range(0, len(self.bits))):
curIdx = lastIdx/2
self.bits[level][curIdx] = 1 - (lastIdx % 2)
lastIdx = curIdx
class PLRUlSim(PLRUSim):
def __init__(self, assoc):
super(PLRUlSim, self).__init__(assoc, linearInit=True)
class PLRURandSim(PLRUSim):
def __init__(self, assoc):
super(PLRURandSim, self).__init__(assoc, randLeaf=True)
class RandPLRUSim(PLRUSim):
def __init__(self, assoc):
super(RandPLRUSim, self).__init__(assoc, randRoot=True)
AllRandPLRUVariants = {
'RandPLRU': RandPLRUSim,
'PLRURand': PLRURandSim,
}
class QLRUSim(ReplPolicySim):
def __init__(self, assoc, hitFunc, missFunc, replIdxFunc, updFunc, updOnMissOnly=False):
super(QLRUSim, self).__init__(assoc)
self.hitFunc = hitFunc
self.missFunc = missFunc
self.replIdxFunc = replIdxFunc
self.updFunc = updFunc
self.updOnMissOnly = updOnMissOnly
self.bits = [3] * assoc
def acc(self, block):
hit = block in self.blocks
if hit:
index = self.blocks.index(block)
self.bits[index] = self.hitFunc(self.bits[index])
else:
if self.updOnMissOnly:
self.bits = self.updFunc(self.bits, -1)
index = self.replIdxFunc(self.bits, self.blocks)
self.blocks[index] = block
self.bits[index] = self.missFunc()
if not self.updOnMissOnly:
self.bits = self.updFunc(self.bits, index)
return hit
QLRUHitFuncs = {
'H21': lambda x: {3:2, 2:1, 1:0, 0:0}[x],
'H20': lambda x: {3:2, 2:0, 1:0, 0:0}[x],
'H11': lambda x: {3:1, 2:1, 1:0, 0:0}[x],
'H10': lambda x: {3:1, 2:0, 1:0, 0:0}[x],
'H00': lambda x: {3:0, 2:0, 1:0, 0:0}[x],
}
QLRUMissFuncs = {
'M0': lambda: 0,
'M1': lambda: 1,
'M2': lambda: 2,
'M3': lambda: 3,
}
QLRUMissRandFuncs = {
'MR32': lambda: (2 if random.randint(0,15) == 0 else 3),
'MR31': lambda: (1 if random.randint(0,15) == 0 else 3),
'MR30': lambda: (0 if random.randint(0,15) == 0 else 3),
'MR21': lambda: (1 if random.randint(0,15) == 0 else 2),
'MR20': lambda: (0 if random.randint(0,15) == 0 else 2),
'MR10': lambda: (0 if random.randint(0,15) == 0 else 1),
}
QLRUReplIdxFuncs = {
'R0': lambda bits, blocks: blocks.index(None) if None in blocks else bits.index(3), #CFL L3
'R1': lambda bits, blocks: blocks.index(None) if None in blocks else (bits.index(3) if 3 in bits else 0), #IVB
'R2': lambda bits, blocks: rindex(blocks, None) if None in blocks else bits.index(3), # CFL L2
}
QLRUUpdFuncs = {
'U0': lambda bits, replIdx: [b + (3 - max(bits)) for b in bits], #CFL L3
'U1': lambda bits, replIdx: [(b + (3 - max(bits[:replIdx]+bits[replIdx+1:])) if i != replIdx else b) for i, b in enumerate(bits)], #CFL L2
'U2': lambda bits, replIdx: [b+1 for b in bits] if not 3 in bits else bits, # IVB
'U3': lambda bits, replIdx: [((b+1) if i != replIdx else b) for i, b in enumerate(bits)] if not 3 in bits else bits,
}
# all deterministic QLRU variants
AllDetQLRUVariants = {
'QLRU_' + hf[0] + '_' + mf[0] + '_' + rf[0] + '_' + uf[0] + ('_UMO' if umo else ''):
type('QLRU_' + hf[0] + '_' + mf[0] + '_' + rf[0] + '_' + uf[0] + ('_UMO' if umo else ''), (QLRUSim,),
{'__init__': lambda self, assoc, hfl=hf[1], mfl=mf[1], rfl=rf[1], ufl=uf[1], umol=umo: QLRUSim.__init__(self, assoc, hfl, mfl, rfl, ufl, umol)})
for hf in QLRUHitFuncs.items()
for mf in QLRUMissFuncs.items()
for rf in QLRUReplIdxFuncs.items()
for uf in QLRUUpdFuncs.items()
for umo in [False, True]
if not (rf[0] in ['R0', 'R2'] and uf[0] in ['U2', 'U3'])
}
# all randomized QLRU variants
AllRandQLRUVariants = {
'QLRU_' + hf[0] + '_' + mf[0] + '_' + rf[0] + '_' + uf[0] + ('_UMO' if umo else ''):
type('QLRU_' + hf[0] + '_' + mf[0] + '_' + rf[0] + '_' + uf[0] + ('_UMO' if umo else ''), (QLRUSim,),
{'__init__': lambda self, assoc, hfl=hf[1], mfl=mf[1], rfl=rf[1], ufl=uf[1], umol=umo: QLRUSim.__init__(self, assoc, hfl, mfl, rfl, ufl, umol)})
for hf in QLRUHitFuncs.items()
for mf in QLRUMissRandFuncs.items()
for rf in QLRUReplIdxFuncs.items()
for uf in QLRUUpdFuncs.items()
for umo in [False, True]
if not (rf[0] in ['R0', 'R2'] and uf[0] in ['U2', 'U3'])
}
class MRUSim(ReplPolicySim):
def __init__(self, assoc, updIfNotFull=True):
super(MRUSim, self).__init__(assoc)
self.bits = [1] * assoc
self.updIfNotFull = updIfNotFull
def acc(self, block):
hit = block in self.blocks
full = not (None in self.blocks)
if hit:
index = self.blocks.index(block)
else:
if not full:
index = self.blocks.index(None)
else:
index = self.bits.index(1)
self.blocks[index] = block
if (full or self.updIfNotFull):
self.bits[index] = 0
if not 1 in self.bits:
self.bits = [(1 if bi!=index else 0) for bi, _ in enumerate(self.bits)]
return hit
class MRUNSim(MRUSim):
def __init__(self, assoc):
super(MRUNSim, self).__init__(assoc, False)
# according to ISCA'10 paper
class NRUSim(ReplPolicySim):
def __init__(self, assoc):
super(NRUSim, self).__init__(assoc)
self.bits = [1] * assoc
def acc(self, block):
hit = block in self.blocks
if hit:
index = self.blocks.index(block)
self.bits[index] = 0
else:
while not 1 in self.bits:
self.bits = [1] * self.assoc
index = self.bits.index(1)
self.blocks[index] = block
self.bits[index] = 0
return hit
CommonPolicies = {
'FIFO': FIFOSim,
'LRU': LRUSim,
'PLRU': PLRUSim,
'PLRUl': PLRUlSim,
'MRU': MRUSim, # NHM
'MRU_N': MRUNSim, # SNB
'NRU': NRUSim,
'QLRU_H11_M1_R0_U0': AllDetQLRUVariants['QLRU_H11_M1_R0_U0'], # CFL L3
'QLRU_H21_M1_R0_U0_UMO': AllDetQLRUVariants['QLRU_H21_M2_R0_U0_UMO'], # https://arxiv.org/pdf/1904.06278.pdf paper
'QLRU_H11_M1_R1_U2': AllDetQLRUVariants['QLRU_H11_M1_R1_U2'], # IVB
'QLRU_H00_M1_R2_U1': AllDetQLRUVariants['QLRU_H00_M1_R2_U1'], # CFL L2
'QLRU_H00_M1_R0_U1': AllDetQLRUVariants['QLRU_H00_M1_R0_U1'], # CNL L2
'SRRIP': AllDetQLRUVariants['QLRU_H00_M2_R0_U0_UMO'],
}
AllDetPolicies = dict(CommonPolicies.items() + AllDetQLRUVariants.items())
AllRandPolicies = dict(AllRandQLRUVariants.items() + AllRandPLRUVariants.items())
AllPolicies = dict(AllDetPolicies.items() + AllRandPolicies.items())
def getHits(seq, policySimClass, assoc, nSets):
hits = 0
policySims = [policySimClass(assoc) for _ in range(0, nSets)]
for blockStr in seq.split():
blockName = getBlockName(blockStr)
if '!' in blockStr:
for policySim in policySims:
policySim.flush(blockName)
else:
for policySim in policySims:
hit = policySim.acc(blockName)
if '?' in blockStr:
hits += int(hit)
return hits
def getAges(blocks, seq, policySimClass, assoc):
ages = {}
for block in blocks:
for i in count(0):
curSeq = seq + ' ' + ' '.join('N' + str(n) for n in range(0,i)) + ' ' + block + '?'
if getHits(policySimClass(assoc), curSeq) == 0:
ages[block] = i
break
return ages
def getGraph(blocks, seq, policySimClass, assoc, maxAge, nSets=1, nRep=1, agg="med"):
traces = []
for block in blocks:
trace = []
for i in range(0, maxAge):
curSeq = seq + ' ' + ' '.join('N' + str(n) for n in range(0,i)) + ' ' + block + '?'
hits = [getHits(curSeq, policySimClass, assoc, nSets) for _ in range(0, nRep)]
if agg == "med":
aggValue = median(hits)
elif agg == "min":
aggValue = min(hits)
else:
aggValue = float(sum(hits))/nRep
trace.append(aggValue)
traces.append((block, trace))
return traces
def getPermutations(policySimClass, assoc, maxAge=None):
# initial ages
initBlocks = ['I' + str(i) for i in range(0, assoc)]
seq = ' '.join(initBlocks)
initAges = getAges(initBlocks, seq, policySimClass, assoc)
accSeqStr = 'Access sequence: <wbinvd> ' + seq
print accSeqStr
print 'Ages: {' + ', '.join(b + ': ' + str(initAges[b]) for b in initBlocks) + '}'
blocks = ['B' + str(i) for i in range(0, assoc)]
baseSeq = ' '.join(initBlocks + blocks)
ages = getAges(blocks, baseSeq, policySimClass, assoc)
accSeqStr = 'Access sequence: <wbinvd> ' + baseSeq
print accSeqStr
print 'Ages: {' + ', '.join(b + ': ' + str(ages[b]) for b in blocks) + '}'
blocksSortedByAge = [a[0] for a in sorted(ages.items(), key=lambda x: -x[1])] # most recent block first
for permI, permBlock in enumerate(blocksSortedByAge):
seq = baseSeq + ' ' + permBlock
permAges = getAges(blocks, seq, policySimClass, assoc)
accSeqStr = 'Access sequence: <wbinvd> ' + seq
perm = [-1] * assoc
for bi, b in enumerate(blocksSortedByAge):
permAge = permAges[b]
if permAge < 1 or permAge > assoc:
break
perm[assoc-permAge] = bi
print u'\u03A0_' + str(permI) + ' = ' + str(tuple(perm))

545
tools/CacheAnalyzer/cpuid.py Executable file
View File

@@ -0,0 +1,545 @@
#!/usr/bin/python
# -*- coding: utf-8 -*-
# Copyright (C) 2019 Andreas Abel
#
# This file was modified from https://github.com/flababah/cpuid.py
#
# Original license and copyright notice:
#
# The MIT License (MIT)
#
# Copyright (c) 2014 Anders Høst
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
import collections
import ctypes
import os
import platform
import struct
import sys
from ctypes import c_uint32, c_int, c_long, c_ulong, c_size_t, c_void_p, POINTER, CFUNCTYPE
import logging
log = logging.getLogger(__name__)
# Posix x86_64:
# Three first call registers : RDI, RSI, RDX
# Volatile registers : RAX, RCX, RDX, RSI, RDI, R8-11
# Windows x86_64:
# Three first call registers : RCX, RDX, R8
# Volatile registers : RAX, RCX, RDX, R8-11
# cdecl 32 bit:
# Three first call registers : Stack (%esp)
# Volatile registers : EAX, ECX, EDX
_POSIX_64_OPC = [
0x53, # push %rbx
0x89, 0xf0, # mov %esi,%eax
0x89, 0xd1, # mov %edx,%ecx
0x0f, 0xa2, # cpuid
0x89, 0x07, # mov %eax,(%rdi)
0x89, 0x5f, 0x04, # mov %ebx,0x4(%rdi)
0x89, 0x4f, 0x08, # mov %ecx,0x8(%rdi)
0x89, 0x57, 0x0c, # mov %edx,0xc(%rdi)
0x5b, # pop %rbx
0xc3 # retq
]
_WINDOWS_64_OPC = [
0x53, # push %rbx
0x89, 0xd0, # mov %edx,%eax
0x49, 0x89, 0xc9, # mov %rcx,%r9
0x44, 0x89, 0xc1, # mov %r8d,%ecx
0x0f, 0xa2, # cpuid
0x41, 0x89, 0x01, # mov %eax,(%r9)
0x41, 0x89, 0x59, 0x04, # mov %ebx,0x4(%r9)
0x41, 0x89, 0x49, 0x08, # mov %ecx,0x8(%r9)
0x41, 0x89, 0x51, 0x0c, # mov %edx,0xc(%r9)
0x5b, # pop %rbx
0xc3 # retq
]
_CDECL_32_OPC = [
0x53, # push %ebx
0x57, # push %edi
0x8b, 0x7c, 0x24, 0x0c, # mov 0xc(%esp),%edi
0x8b, 0x44, 0x24, 0x10, # mov 0x10(%esp),%eax
0x8b, 0x4c, 0x24, 0x14, # mov 0x14(%esp),%ecx
0x0f, 0xa2, # cpuid
0x89, 0x07, # mov %eax,(%edi)
0x89, 0x5f, 0x04, # mov %ebx,0x4(%edi)
0x89, 0x4f, 0x08, # mov %ecx,0x8(%edi)
0x89, 0x57, 0x0c, # mov %edx,0xc(%edi)
0x5f, # pop %edi
0x5b, # pop %ebx
0xc3 # ret
]
is_windows = os.name == "nt"
is_64bit = ctypes.sizeof(ctypes.c_voidp) == 8
class CPUID_struct(ctypes.Structure):
_fields_ = [(r, c_uint32) for r in ("eax", "ebx", "ecx", "edx")]
class CPUID(object):
def __init__(self):
if platform.machine() not in ("AMD64", "x86_64", "x86", "i686"):
raise SystemError("Only available for x86")
if is_windows:
if is_64bit:
# VirtualAlloc seems to fail under some weird
# circumstances when ctypes.windll.kernel32 is
# used under 64 bit Python. CDLL fixes this.
self.win = ctypes.CDLL("kernel32.dll")
opc = _WINDOWS_64_OPC
else:
# Here ctypes.windll.kernel32 is needed to get the
# right DLL. Otherwise it will fail when running
# 32 bit Python on 64 bit Windows.
self.win = ctypes.windll.kernel32
opc = _CDECL_32_OPC
else:
opc = _POSIX_64_OPC if is_64bit else _CDECL_32_OPC
size = len(opc)
code = (ctypes.c_ubyte * size)(*opc)
if is_windows:
self.win.VirtualAlloc.restype = c_void_p
self.win.VirtualAlloc.argtypes = [ctypes.c_void_p, ctypes.c_size_t, ctypes.c_ulong, ctypes.c_ulong]
self.addr = self.win.VirtualAlloc(None, size, 0x1000, 0x40)
if not self.addr:
raise MemoryError("Could not allocate RWX memory")
else:
self.libc = ctypes.cdll.LoadLibrary(None)
self.libc.valloc.restype = ctypes.c_void_p
self.libc.valloc.argtypes = [ctypes.c_size_t]
self.addr = self.libc.valloc(size)
if not self.addr:
raise MemoryError("Could not allocate memory")
self.libc.mprotect.restype = c_int
self.libc.mprotect.argtypes = [c_void_p, c_size_t, c_int]
ret = self.libc.mprotect(self.addr, size, 1 | 2 | 4)
if ret != 0:
raise OSError("Failed to set RWX")
ctypes.memmove(self.addr, code, size)
func_type = CFUNCTYPE(None, POINTER(CPUID_struct), c_uint32, c_uint32)
self.func_ptr = func_type(self.addr)
def __call__(self, eax, ecx=0):
struct = CPUID_struct()
self.func_ptr(struct, eax, ecx)
return struct.eax, struct.ebx, struct.ecx, struct.edx
def __del__(self):
if is_windows:
self.win.VirtualFree.restype = c_long
self.win.VirtualFree.argtypes = [c_void_p, c_size_t, c_ulong]
self.win.VirtualFree(self.addr, 0, 0x8000)
elif self.libc:
# Seems to throw exception when the program ends and
# libc is cleaned up before the object?
self.libc.free.restype = None
self.libc.free.argtypes = [c_void_p]
self.libc.free(self.addr)
def cpu_vendor(cpu):
_, b, c, d = cpu(0)
return str(struct.pack("III", b, d, c).decode("ascii"))
def cpu_name(cpu):
return " ".join(str("".join((struct.pack("IIII", *cpu(0x80000000 + i)).decode("ascii")
for i in range(2, 5))).replace('\x00', '')).split())
VersionInfo = collections.namedtuple('VersionInfo', 'displ_family displ_model stepping')
def version_info(cpu):
a, _, _, _ = cpu(0x01)
displ_family = (a >> 8) & 0xF
if (displ_family == 0x0F):
displ_family += (a >> 20) & 0xFF
displ_model = (a >> 4) & 0xF
if (displ_family == 0x06 or displ_family == 0x0F):
displ_model += (a >> 12) & 0xF0
stepping = a & 0xF
return VersionInfo(int(displ_family), int(displ_model), int(stepping))
def micro_arch(cpu):
vi = version_info(cpu)
if (vi.displ_family, vi.displ_model) in [(0x06, 0x0F)]:
return 'Core'
if (vi.displ_family, vi.displ_model) in [(0x06, 0x17)]:
return 'EnhancedCore'
if (vi.displ_family, vi.displ_model) in [(0x06, 0x1A), (0x06, 0x1E), (0x06, 0x1F), (0x06, 0x2E)]:
return 'NHM'
if (vi.displ_family, vi.displ_model) in [(0x06, 0x25), (0x06, 0x2C), (0x06, 0x2F)]:
return 'WSM'
if (vi.displ_family, vi.displ_model) in [(0x06, 0x2A), (0x06, 0x2D)]:
return 'SNB'
if (vi.displ_family, vi.displ_model) in [(0x06, 0x3A)]:
return 'IVB'
if (vi.displ_family, vi.displ_model) in [(0x06, 0x3C), (0x06, 0x45), (0x06, 0x46)]:
return 'HSW'
if (vi.displ_family, vi.displ_model) in [(0x06, 0x3D), (0x06, 0x47), (0x06, 0x56), (0x06, 0x4F)]:
return 'BDW'
if (vi.displ_family, vi.displ_model) in [(0x06, 0x4E), (0x06, 0x5E)]:
return 'SKL'
if (vi.displ_family, vi.displ_model) in [(0x06, 0x55)]:
return 'SKX'
if (vi.displ_family, vi.displ_model) in [(0x06, 0x8E), (0x06, 0x9E)]:
# ToDo: not sure if this is correct
if vi.stepping <= 0x9:
return 'KBL'
else:
return 'CFL'
if (vi.displ_family, vi.displ_model) in [(0x06, 0x66)]:
return 'CNL'
if (vi.displ_family, vi.displ_model) in [(0x17, 0x01), (0x17, 0x11)]:
return 'ZEN'
if (vi.displ_family, vi.displ_model) in [(0x17, 0x08), (0x17, 0x18)]:
return 'ZEN+'
if (vi.displ_family, vi.displ_model) in [(0x17, 0x71)]:
return 'ZEN2'
return 'unknown'
# See Table 3-12 (Encoding of CPUID Leaf 2 Descriptors) in Intel's Instruction Set Reference
leaf2_descriptors = {
0x01: ('TLB', 'Instruction TLB: 4 KByte pages, 4-way set associative, 32 entries'),
0x02: ('TLB', 'Instruction TLB: 4 MByte pages, fully associative, 2 entries'),
0x03: ('TLB', 'Data TLB: 4 KByte pages, 4-way set associative, 64 entries'),
0x04: ('TLB', 'Data TLB: 4 MByte pages, 4-way set associative, 8 entries'),
0x05: ('TLB', 'Data TLB1: 4 MByte pages, 4-way set associative, 32 entries'),
0x06: ('Cache', '1st-level instruction cache: 8 KBytes, 4-way set associative, 32 byte line size'),
0x08: ('Cache', '1st-level instruction cache: 16 KBytes, 4-way set associative, 32 byte line size'),
0x09: ('Cache', '1st-level instruction cache: 32KBytes, 4-way set associative, 64 byte line size'),
0x0A: ('Cache', '1st-level data cache: 8 KBytes, 2-way set associative, 32 byte line size'),
0x0B: ('TLB', 'Instruction TLB: 4 MByte pages, 4-way set associative, 4 entries'),
0x0C: ('Cache', '1st-level data cache: 16 KBytes, 4-way set associative, 32 byte line size'),
0x0D: ('Cache', '1st-level data cache: 16 KBytes, 4-way set associative, 64 byte line size'),
0x0E: ('Cache', '1st-level data cache: 24 KBytes, 6-way set associative, 64 byte line size'),
0x1D: ('Cache', '2nd-level cache: 128 KBytes, 2-way set associative, 64 byte line size'),
0x21: ('Cache', '2nd-level cache: 256 KBytes, 8-way set associative, 64 byte line size'),
0x22: ('Cache', '3rd-level cache: 512 KBytes, 4-way set associative, 64 byte line size, 2 lines per sector'),
0x23: ('Cache', '3rd-level cache: 1 MBytes, 8-way set associative, 64 byte line size, 2 lines per sector'),
0x24: ('Cache', '2nd-level cache: 1 MBytes, 16-way set associative, 64 byte line size'),
0x25: ('Cache', '3rd-level cache: 2 MBytes, 8-way set associative, 64 byte line size, 2 lines per sector'),
0x29: ('Cache', '3rd-level cache: 4 MBytes, 8-way set associative, 64 byte line size, 2 lines per sector'),
0x2C: ('Cache', '1st-level data cache: 32 KBytes, 8-way set associative, 64 byte line size'),
0x30: ('Cache', '1st-level instruction cache: 32 KBytes, 8-way set associative, 64 byte line size'),
0x40: ('Cache', 'No 2nd-level cache or, if processor contains a valid 2nd-level cache, no 3rd-level cache'),
0x41: ('Cache', '2nd-level cache: 128 KBytes, 4-way set associative, 32 byte line size'),
0x42: ('Cache', '2nd-level cache: 256 KBytes, 4-way set associative, 32 byte line size'),
0x43: ('Cache', '2nd-level cache: 512 KBytes, 4-way set associative, 32 byte line size'),
0x44: ('Cache', '2nd-level cache: 1 MByte, 4-way set associative, 32 byte line size'),
0x45: ('Cache', '2nd-level cache: 2 MByte, 4-way set associative, 32 byte line size'),
0x46: ('Cache', '3rd-level cache: 4 MByte, 4-way set associative, 64 byte line size'),
0x47: ('Cache', '3rd-level cache: 8 MByte, 8-way set associative, 64 byte line size'),
0x48: ('Cache', '2nd-level cache: 3MByte, 12-way set associative, 64 byte line size'),
0x49: ('Cache', '3rd-level cache: 4MB, 16-way set associative, 64-byte line size (Intel Xeon processor MP, Family 0FH, Model 06H); 2nd-level cache: 4 MByte, 16-way set associative, 64 byte line size'),
0x4A: ('Cache', '3rd-level cache: 6MByte, 12-way set associative, 64 byte line size'),
0x4B: ('Cache', '3rd-level cache: 8MByte, 16-way set associative, 64 byte line size'),
0x4C: ('Cache', '3rd-level cache: 12MByte, 12-way set associative, 64 byte line size'),
0x4D: ('Cache', '3rd-level cache: 16MByte, 16-way set associative, 64 byte line size'),
0x4E: ('Cache', '2nd-level cache: 6MByte, 24-way set associative, 64 byte line size'),
0x4F: ('TLB', 'Instruction TLB: 4 KByte pages, 32 entries'),
0x50: ('TLB', 'Instruction TLB: 4 KByte and 2-MByte or 4-MByte pages, 64 entries'),
0x51: ('TLB', 'Instruction TLB: 4 KByte and 2-MByte or 4-MByte pages, 128 entries'),
0x52: ('TLB', 'Instruction TLB: 4 KByte and 2-MByte or 4-MByte pages, 256 entries'),
0x55: ('TLB', 'Instruction TLB: 2-MByte or 4-MByte pages, fully associative, 7 entries'),
0x56: ('TLB', 'Data TLB0: 4 MByte pages, 4-way set associative, 16 entries'),
0x57: ('TLB', 'Data TLB0: 4 KByte pages, 4-way associative, 16 entries'),
0x59: ('TLB', 'Data TLB0: 4 KByte pages, fully associative, 16 entries'),
0x5A: ('TLB', 'Data TLB0: 2 MByte or 4 MByte pages, 4-way set associative, 32 entries'),
0x5B: ('TLB', 'Data TLB: 4 KByte and 4 MByte pages, 64 entries'),
0x5C: ('TLB', 'Data TLB: 4 KByte and 4 MByte pages,128 entries'),
0x5D: ('TLB', 'Data TLB: 4 KByte and 4 MByte pages,256 entries'),
0x60: ('Cache', '1st-level data cache: 16 KByte, 8-way set associative, 64 byte line size'),
0x61: ('TLB', 'Instruction TLB: 4 KByte pages, fully associative, 48 entries'),
0x63: ('TLB', 'Data TLB: 2 MByte or 4 MByte pages, 4-way set associative, 32 entries and a separate array with 1 GByte pages, 4-way set associative, 4 entries'),
0x64: ('TLB', 'Data TLB: 4 KByte pages, 4-way set associative, 512 entries'),
0x66: ('Cache', '1st-level data cache: 8 KByte, 4-way set associative, 64 byte line size'),
0x67: ('Cache', '1st-level data cache: 16 KByte, 4-way set associative, 64 byte line size'),
0x68: ('Cache', '1st-level data cache: 32 KByte, 4-way set associative, 64 byte line size'),
0x6A: ('Cache', 'uTLB: 4 KByte pages, 8-way set associative, 64 entries'),
0x6B: ('Cache', 'DTLB: 4 KByte pages, 8-way set associative, 256 entries'),
0x6C: ('Cache', 'DTLB: 2M/4M pages, 8-way set associative, 128 entries'),
0x6D: ('Cache', 'DTLB: 1 GByte pages, fully associative, 16 entries'),
0x70: ('Cache', 'Trace cache: 12 K-μop, 8-way set associative'),
0x71: ('Cache', 'Trace cache: 16 K-μop, 8-way set associative'),
0x72: ('Cache', 'Trace cache: 32 K-μop, 8-way set associative'),
0x76: ('TLB', 'Instruction TLB: 2M/4M pages, fully associative, 8 entries'),
0x78: ('Cache', '2nd-level cache: 1 MByte, 4-way set associative, 64byte line size'),
0x79: ('Cache', '2nd-level cache: 128 KByte, 8-way set associative, 64 byte line size, 2 lines per sector'),
0x7A: ('Cache', '2nd-level cache: 256 KByte, 8-way set associative, 64 byte line size, 2 lines per sector'),
0x7B: ('Cache', '2nd-level cache: 512 KByte, 8-way set associative, 64 byte line size, 2 lines per sector'),
0x7C: ('Cache', '2nd-level cache: 1 MByte, 8-way set associative, 64 byte line size, 2 lines per sector'),
0x7D: ('Cache', '2nd-level cache: 2 MByte, 8-way set associative, 64byte line size'),
0x7F: ('Cache', '2nd-level cache: 512 KByte, 2-way set associative, 64-byte line size'),
0x80: ('Cache', '2nd-level cache: 512 KByte, 8-way set associative, 64-byte line size'),
0x82: ('Cache', '2nd-level cache: 256 KByte, 8-way set associative, 32 byte line size'),
0x83: ('Cache', '2nd-level cache: 512 KByte, 8-way set associative, 32 byte line size'),
0x84: ('Cache', '2nd-level cache: 1 MByte, 8-way set associative, 32 byte line size'),
0x85: ('Cache', '2nd-level cache: 2 MByte, 8-way set associative, 32 byte line size'),
0x86: ('Cache', '2nd-level cache: 512 KByte, 4-way set associative, 64 byte line size'),
0x87: ('Cache', '2nd-level cache: 1 MByte, 8-way set associative, 64 byte line size'),
0xA0: ('DTLB', 'DTLB: 4k pages, fully associative, 32 entries'),
0xB0: ('TLB', 'Instruction TLB: 4 KByte pages, 4-way set associative, 128 entries'),
0xB1: ('TLB', 'Instruction TLB: 2M pages, 4-way, 8 entries or 4M pages, 4-way, 4 entries'),
0xB2: ('TLB', 'Instruction TLB: 4KByte pages, 4-way set associative, 64 entries'),
0xB3: ('TLB', 'Data TLB: 4 KByte pages, 4-way set associative, 128 entries'),
0xB4: ('TLB', 'Data TLB1: 4 KByte pages, 4-way associative, 256 entries'),
0xB5: ('TLB', 'Instruction TLB: 4KByte pages, 8-way set associative, 64 entries'),
0xB6: ('TLB', 'Instruction TLB: 4KByte pages, 8-way set associative, 128 entries'),
0xBA: ('TLB', 'Data TLB1: 4 KByte pages, 4-way associative, 64 entries'),
0xC0: ('TLB', 'Data TLB: 4 KByte and 4 MByte pages, 4-way associative, 8 entries'),
0xC1: ('STLB', 'Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries'),
0xC2: ('DTLB', 'DTLB: 4 KByte/2 MByte pages, 4-way associative, 16 entries'),
0xC3: ('STLB', 'Shared 2nd-Level TLB: 4 KByte /2 MByte pages, 6-way associative, 1536 entries. Also 1GBbyte pages, 4-way, 16 entries.'),
0xC4: ('DTLB', 'DTLB: 2M/4M Byte pages, 4-way associative, 32 entries'),
0xCA: ('STLB', 'Shared 2nd-Level TLB: 4 KByte pages, 4-way associative, 512 entries'),
0xD0: ('Cache', '3rd-level cache: 512 KByte, 4-way set associative, 64 byte line size'),
0xD1: ('Cache', '3rd-level cache: 1 MByte, 4-way set associative, 64 byte line size'),
0xD2: ('Cache', '3rd-level cache: 2 MByte, 4-way set associative, 64 byte line size'),
0xD6: ('Cache', '3rd-level cache: 1 MByte, 8-way set associative, 64 byte line size'),
0xD7: ('Cache', '3rd-level cache: 2 MByte, 8-way set associative, 64 byte line size'),
0xD8: ('Cache', '3rd-level cache: 4 MByte, 8-way set associative, 64 byte line size'),
0xDC: ('Cache', '3rd-level cache: 1.5 MByte, 12-way set associative, 64 byte line size'),
0xDD: ('Cache', '3rd-level cache: 3 MByte, 12-way set associative, 64 byte line size'),
0xDE: ('Cache', '3rd-level cache: 6 MByte, 12-way set associative, 64 byte line size'),
0xE2: ('Cache', '3rd-level cache: 2 MByte, 16-way set associative, 64 byte line size'),
0xE3: ('Cache', '3rd-level cache: 4 MByte, 16-way set associative, 64 byte line size'),
0xE4: ('Cache', '3rd-level cache: 8 MByte, 16-way set associative, 64 byte line size'),
0xEA: ('Cache', '3rd-level cache: 12MByte, 24-way set associative, 64 byte line size'),
0xEB: ('Cache', '3rd-level cache: 18MByte, 24-way set associative, 64 byte line size'),
0xEC: ('Cache', '3rd-level cache: 24MByte, 24-way set associative, 64 byte line size'),
0xF0: ('Prefetch', '64-Byte prefetching'),
0xF1: ('Prefetch', '128-Byte prefetching'),
0xFE: ('General', 'CPUID leaf 2 does not report TLB descriptor information; use CPUID leaf 18H to query TLB and other address translation parameters.'),
0xFF: ('General', 'CPUID leaf 2 does not report cache descriptor information, use CPUID leaf 4 to query cache parameters')
}
# 0xAABBCCDD -> [0xDD, 0xCC, 0xBB, 0xAA]
def get_bytes(reg):
return [((reg >> s) & 0xFF) for s in range(0, 32, 8)]
def get_bit(reg, bit):
return (reg >> bit) & 1
# Returns the bits between the indexes start and end (inclusive); start must be <= end
def get_bits(reg, start, end):
return (reg >> start) & ((1 << (end-start+1)) - 1)
def get_cache_info(cpu):
vendor = cpu_vendor(cpu)
cacheInfo = dict()
if vendor == 'GenuineIntel':
log.info('\nCPUID Leaf 2 information:')
a, b, c, d = cpu(0x02)
for ri, reg in enumerate([a, b, c, d]):
if (reg >> 31): continue # register is reserved
for bi, byte in enumerate(get_bytes(reg)):
if (ri == 0) and (bi == 0): continue # least-significant byte in EAX
if byte == 0: continue # Null descriptor
log.info(' - ' + leaf2_descriptors[byte][1])
log.info('\nCPUID Leaf 4 information:')
index = 0
while (True):
a, b, c, d = cpu(0x04, index)
cacheType = ''
bits3_0 = get_bits(a, 0, 3)
if bits3_0 == 0: break
if bits3_0 == 1: cacheType = 'Data Cache'
if bits3_0 == 2: cacheType = 'Instruction Cache'
if bits3_0 == 3: cacheType = 'Unified Cache'
level = get_bits(a, 5, 7)
log.info(' Level ' + str(level) + ' (' + cacheType + '):')
parameters = []
if get_bit(a, 8): parameters.append('Self Initializing cache level (does not need SW initialization)')
if get_bit(a, 9): parameters.append('Fully Associative cache')
parameters.append('Maximum number of addressable IDs for logical processors sharing this cache: ' + str(get_bits(a, 14, 25)+1))
parameters.append('Maximum number of addressable IDs for processor cores in the physical package: ' + str(get_bits(a, 26, 31)+1))
L = int(get_bits(b, 0, 11)+1)
P = int(get_bits(b, 12, 21)+1)
W = int(get_bits(b, 22, 31)+1)
S = int(c+1)
parameters.append('System Coherency Line Size (L): ' + str(L) + ' B')
parameters.append('Physical Line partitions (P): ' + str(P))
parameters.append('Ways of associativity (W): ' + str(W))
parameters.append('Number of Sets (S): ' + str(S))
parameters.append('Cache Size: ' + str(W*P*L*S/1024) + ' kB')
if get_bit(d, 0): parameters.append('WBINVD/INVD is not guaranteed to act upon lower level caches of non-originating threads sharing this cache')
else: parameters.append('WBINVD/INVD from threads sharing this cache acts upon lower level caches for threads sharing this cache')
if get_bit(d, 1): parameters.append('Cache is inclusive of lower cache levels')
else: parameters.append('Cache is not inclusive of lower cache levels')
complexAddressing = False
if get_bit(d, 2):
complexAddressing = True
parameters.append('A complex function is used to index the cache, potentially using all address bits')
cacheInfo['L' + str(level) + (cacheType[0] if cacheType[0] in ['D', 'I'] else '')] = {
'lineSize': L,
'nSets': S,
'assoc': W,
'complex': complexAddressing
}
for par in parameters:
log.info(' - ' + par)
index += 1
elif vendor == 'AuthenticAMD':
_, _, c, d = cpu(0x80000005)
L1DcLineSize = int(get_bits(c, 0, 7))
L1DcLinesPerTag = int(get_bits(c, 8, 15))
L1DcAssoc = int(get_bits(c, 16, 23))
L1DcSize = int(get_bits(c, 24, 31))
log.info(' L1DcLineSize: ' + str(L1DcLineSize) + ' B')
log.info(' L1DcLinesPerTag: ' + str(L1DcLinesPerTag))
log.info(' L1DcAssoc: ' + str(L1DcAssoc))
log.info(' L1DcSize: ' + str(L1DcSize) + ' kB')
cacheInfo['L1D'] = {
'lineSize': L1DcLineSize,
'nSets': L1DcSize*1024/L1DcAssoc/L1DcLineSize,
'assoc': L1DcAssoc
}
L1IcLineSize = int(get_bits(d, 0, 7))
L1IcLinesPerTag = int(get_bits(d, 8, 15))
L1IcAssoc = int(get_bits(d, 16, 23))
L1IcSize = int(get_bits(d, 24, 31))
log.info(' L1IcLineSize: ' + str(L1IcLineSize) + ' B')
log.info(' L1IcLinesPerTag: ' + str(L1IcLinesPerTag))
log.info(' L1IcAssoc: ' + str(L1IcAssoc))
log.info(' L1IcSize: ' + str(L1IcSize) + ' kB')
cacheInfo['L1I'] = {
'lineSize': L1IcLineSize,
'nSets': L1IcSize*1024/L1IcAssoc/L1IcLineSize,
'assoc': L1IcAssoc
}
_, _, c, d = cpu(0x80000006)
L2LineSize = int(get_bits(c, 0, 7))
L2LinesPerTag = int(get_bits(c, 8, 11))
L2Size = int(get_bits(c, 16, 31))
L2Assoc = 0
c_15_12 = get_bits(c, 12, 15)
if c_15_12 == 0x1: L2Assoc = 1
elif c_15_12 == 0x2: L2Assoc = 2
elif c_15_12 == 0x4: L2Assoc = 4
elif c_15_12 == 0x6: L2Assoc = 8
elif c_15_12 == 0x8: L2Assoc = 16
elif c_15_12 == 0xA: L2Assoc = 32
elif c_15_12 == 0xB: L2Assoc = 48
elif c_15_12 == 0xC: L2Assoc = 64
elif c_15_12 == 0xD: L2Assoc = 96
elif c_15_12 == 0xE: L2Assoc = 128
elif c_15_12 == 0x2: L2Assoc = L2Size*1024/L2LineSize
log.info(' L2LineSize: ' + str(L2LineSize) + ' B')
log.info(' L2LinesPerTag: ' + str(L2LinesPerTag))
log.info(' L2Assoc: ' + str(L2Assoc))
log.info(' L2Size: ' + str(L2Size) + ' kB')
cacheInfo['L2'] = {
'lineSize': L2LineSize,
'nSets': L2Size*1024/L2Assoc/L2LineSize,
'assoc': L2Assoc
}
L3LineSize = int(get_bits(d, 0, 7))
L3LinesPerTag = int(get_bits(d, 8, 11))
L3Size = int(get_bits(d, 18, 31)*512)
L3Assoc = 0
d_15_12 = get_bits(d, 12, 15)
if d_15_12 == 0x8: L3Assoc = 16
elif d_15_12 == 0xA: L3Assoc = 32
elif d_15_12 == 0xB: L3Assoc = 48
elif d_15_12 == 0xC: L3Assoc = 64
elif d_15_12 == 0xD: L3Assoc = 96
elif d_15_12 == 0xE: L3Assoc = 128
log.info(' L3LineSize: ' + str(L3LineSize) + ' B')
log.info(' L3LinesPerTag: ' + str(L3LinesPerTag))
log.info(' L3Assoc: ' + str(L3Assoc))
log.info(' L3Size: ' + str(L3Size/1024) + ' MB')
cacheInfo['L3'] = {
'lineSize': L3LineSize,
'nSets': L3Size*1024/L3Assoc/L3LineSize,
'assoc': L3Assoc
}
return cacheInfo
def get_basic_info(cpu):
strs = ['Vendor: ' + cpu_vendor(cpu)]
strs += ['CPU Name: ' + cpu_name(cpu)]
vi = version_info(cpu)
strs += ['Family: 0x%02X' % vi.displ_family]
strs += ['Model: 0x%02X' % vi.displ_model]
strs += ['Stepping: 0x%X' % vi.stepping]
strs += ['Microarchitecture: ' + micro_arch(cpu)]
return '\n'.join(strs)
if __name__ == "__main__":
logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.INFO)
cpuid = CPUID()
def valid_inputs():
for eax in (0x0, 0x80000000):
highest, _, _, _ = cpuid(eax)
while eax <= highest:
regs = cpuid(eax)
yield (eax, regs)
eax += 1
print " ".join(x.ljust(8) for x in ("CPUID", "A", "B", "C", "D")).strip()
for eax, regs in valid_inputs():
print "%08x" % eax, " ".join("%08x" % reg for reg in regs)
print ''
print get_basic_info(cpuid)
print '\nCache information:'
get_cache_info(cpuid)

53
tools/CacheAnalyzer/hitMiss.py Executable file
View File

@@ -0,0 +1,53 @@
#!/usr/bin/python
import argparse
import sys
from cacheLib import *
import cacheSim
import logging
log = logging.getLogger(__name__)
def main():
parser = argparse.ArgumentParser(description='Outputs whether the last access of a sequence results in a hit or miss')
parser.add_argument("-seq", help="Access sequence", required=True)
parser.add_argument("-seq_init", help="Initialization sequence", default='')
parser.add_argument("-level", help="Cache level (Default: 1)", type=int, default=1)
parser.add_argument("-sets", help="Cache sets (if not specified, all cache sets are used)")
parser.add_argument("-cBox", help="cBox (default: 1)", type=int, default=1) # use 1 as default, as, e.g., on SNB, box 0 only has 15 ways instead of 16
parser.add_argument("-noClearHL", help="Do not clear higher levels", action='store_true')
parser.add_argument("-loop", help="Loop count (Default: 1)", type=int, default=1)
parser.add_argument("-noWbinvd", help="Do not call wbinvd before each run", action='store_true')
parser.add_argument("-logLevel", help="Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)", default='WARNING')
parser.add_argument("-sim", help="Simulate the given policy instead of running the experiment on the hardware")
parser.add_argument("-simAssoc", help="Associativity of the simulated cache (default: 8)", type=int, default=8)
args = parser.parse_args()
logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.getLevelName(args.logLevel))
if args.sim:
policyClass = cacheSim.Policies[args.sim]
seq = re.sub('[?!]', '', ' '.join([args.seq_init, args.seq])).strip() + '?'
hits = cacheSim.getHits(policyClass(args.simAssoc), seq)
if hits > 0:
print 'HIT'
exit(1)
else:
print 'MISS'
exit(0)
else:
setCount = len(parseCacheSetsStr(args.level, True, args.sets))
seq = re.sub('[?!]', '', args.seq).strip() + '?'
nb = runCacheExperiment(args.level, seq, initSeq=args.seq_init, cacheSets=args.sets, cBox=args.cBox, clearHL=(not args.noClearHL), loop=args.loop,
wbinvd=(not args.noWbinvd))
if nb['L' + str(args.level) + '_HIT']/setCount > .5:
print 'HIT'
exit(1)
else:
print 'MISS'
exit(0)
if __name__ == "__main__":
main()

104
tools/CacheAnalyzer/permPolicy.py Executable file
View File

@@ -0,0 +1,104 @@
#!/usr/bin/python
from itertools import count
from collections import namedtuple, OrderedDict
import argparse
import math
import os
import re
import subprocess
import sys
from plotly.offline import plot
import plotly.graph_objects as go
from cacheLib import *
from cacheGraph import *
import logging
log = logging.getLogger(__name__)
def getPermutations(level, html, cacheSets=None, getInitialAges=True, maxAge=None, cBox=1):
assoc = getCacheInfo(level).assoc
if not maxAge:
maxAge=2*assoc
hitEvent = 'L' + str(level) + '_HIT'
missEvent = 'L' + str(level) + '_MISS'
if getInitialAges:
initBlocks = ['I' + str(i) for i in range(0, assoc)]
seq = ' '.join(initBlocks)
initAges, nbDict = getAgesOfBlocks(initBlocks, level, seq, cacheSets=cacheSets, clearHL=True, wbinvd=True, returnNbResults=True, maxAge=maxAge, cBox=cBox)
accSeqStr = 'Access sequence: <wbinvd> ' + seq
print accSeqStr
print 'Ages: {' + ', '.join(b + ': ' + str(initAges[b]) for b in initBlocks) + '}'
event = (hitEvent if hitEvent in next(iter(nbDict.items()))[1][0] else missEvent)
traces = [(b, [nb[event] for nb in nbDict[b]]) for b in initBlocks]
html.append(getPlotlyGraphDiv(accSeqStr + ' <n fresh blocks> <block>?', '# of fresh blocks', hitEvent, traces))
else:
initBlocks = []
blocks = ['B' + str(i) for i in range(0, assoc)]
baseSeq = ' '.join(initBlocks + blocks)
ages, nbDict = getAgesOfBlocks(blocks, level, baseSeq, cacheSets=cacheSets, clearHL=True, wbinvd=True, returnNbResults=True, maxAge=maxAge, cBox=cBox)
accSeqStr = 'Access sequence: <wbinvd> ' + baseSeq
print accSeqStr
print 'Ages: {' + ', '.join(b + ': ' + str(ages[b]) for b in blocks) + '}'
event = (hitEvent if hitEvent in next(iter(nbDict.items()))[1][0] else missEvent)
traces = [(b, [nb[event] for nb in nbDict[b]]) for b in blocks]
html.append(getPlotlyGraphDiv(accSeqStr + ' <n fresh blocks> <block>?', '# of fresh blocks', hitEvent, traces))
blocksSortedByAge = [a[0] for a in sorted(ages.items(), key=lambda x: -x[1])] # most recent block first
for permI, permBlock in enumerate(blocksSortedByAge):
seq = baseSeq + ' ' + permBlock
permAges, nbDict = getAgesOfBlocks(blocks, level, seq, cacheSets=cacheSets, clearHL=True, wbinvd=True, returnNbResults=True, maxAge=maxAge, cBox=cBox)
accSeqStr = 'Access sequence: <wbinvd> ' + seq
traces = [(b, [nb[event] for nb in nbDict[b]]) for b in blocks]
html.append(getPlotlyGraphDiv(accSeqStr + ' <n fresh blocks> <block>?', '# of fresh blocks', hitEvent, traces))
perm = [-1] * assoc
for bi, b in enumerate(blocksSortedByAge):
permAge = permAges[b]
if permAge < 1 or permAge > assoc:
break
perm[assoc-permAge] = bi
print u'\u03A0_' + str(permI) + ' = ' + str(tuple(perm))
def main():
parser = argparse.ArgumentParser(description='Replacement Policies')
parser.add_argument("-level", help="Cache level (Default: 1)", type=int, default=1)
parser.add_argument("-sets", help="Cache sets (if not specified, all cache sets are used)")
parser.add_argument("-noInit", help="Do not fill sets with associativity many elements first", action='store_true')
parser.add_argument("-maxAge", help="Maximum age", type=int)
parser.add_argument("-cBox", help="cBox (default: 1)", type=int, default=1)
parser.add_argument("-logLevel", help="Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)", default='WARNING')
parser.add_argument("-output", help="Output file name", default='permPolicy.html')
args = parser.parse_args()
logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.getLevelName(args.logLevel))
title = cpuid.cpu_name(cpuid.CPUID()) + ', Level: ' + str(args.level)
html = ['<html>', '<head>', '<title>' + title + '</title>', '<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>', '</head>', '<body>']
html += ['<h3>' + title + '</h3>']
getPermutations(args.level, html, cacheSets=args.sets, getInitialAges=(not args.noInit), maxAge=args.maxAge, cBox=args.cBox)
html += ['</body>', '</html>']
with open(args.output ,'w') as f:
f.write('\n'.join(html))
if __name__ == "__main__":
main()

171
tools/CacheAnalyzer/replPolicy.py Executable file
View File

@@ -0,0 +1,171 @@
#!/usr/bin/python
import argparse
import random
from numpy import median
from cacheLib import *
import cacheSim
import logging
log = logging.getLogger(__name__)
def getActualHits(seq, level, cacheSets, cBox, nMeasurements=10):
nb = runCacheExperiment(level, seq, cacheSets=cacheSets, cBox=cBox, clearHL=True, loop=1, wbinvd=True, nMeasurements=nMeasurements, agg='med')
return int(nb['L' + str(level) + '_HIT']+0.1)
def findSmallCounterexample(policy, initSeq, level, sets, cBox, assoc, seq, nMeasurements):
setCount = len(parseCacheSetsStr(level, True, sets))
seqSplit = seq.split()
for seqPrefix in [seqSplit[:i] for i in range(assoc+1, len(seqSplit)+1)]:
seq = initSeq + ' '.join(seqPrefix)
actual = getActualHits(seq, level, sets, cBox, nMeasurements)
sim = cacheSim.getHits(seq, cacheSim.AllPolicies[policy], assoc, setCount)
print 'seq:' + seq + ', actual: ' + str(actual) + ', sim: ' + str(sim)
if sim != actual:
break
for i in reversed(range(0, len(seqPrefix)-1)):
tmpPrefix = seqPrefix[:i] + seqPrefix[(i+1):]
seq = initSeq + ' '.join(tmpPrefix)
actual = getActualHits(seq, level, sets, cBox, nMeasurements)
sim = cacheSim.getHits(seq, cacheSim.AllPolicies[policy], assoc, setCount)
print 'seq:' + seq + ', actual: ' + str(actual) + ', sim: ' + str(sim)
if sim != actual:
seqPrefix = tmpPrefix
return ((initSeq + ' ') if initSeq else '') + ' '.join(seqPrefix)
def getRandomSeq(n):
seq = [0]
seqAct = ['']
for _ in range(0,n):
if random.choice([True, False]):
seq.append(max(seq)+1)
seqAct.append('')
else:
seq.append(random.choice(seq))
if random.randint(0,8)==0:
seqAct.append('?')
else:
seqAct.append('?')
return ' '.join(str(s) + a for s, a in zip(seq, seqAct))
def main():
parser = argparse.ArgumentParser(description='Replacement Policies')
parser.add_argument("-level", help="Cache level (Default: 1)", type=int, default=1)
parser.add_argument("-sets", help="Cache sets (if not specified, all cache sets are used)")
parser.add_argument("-cBox", help="cBox (default: 0)", type=int)
parser.add_argument("-nMeasurements", help="Number of measurements", type=int, default=3)
parser.add_argument("-findCtrEx", help="Tries to find a small counterexample for each policy (only available for deterministic policies)", action='store_true')
parser.add_argument("-policies", help="Comma-separated list of policies to consider (Default: all deterministic policies)")
parser.add_argument("-randPolicies", help="Test randomized policies", action='store_true')
parser.add_argument("-allQLRUVariants", help="Test all QLRU variants", action='store_true')
parser.add_argument("-assoc", help="Override the associativity", type=int)
parser.add_argument("-initSeq", help="Adds an initialization sequence to each sequence")
parser.add_argument("-nRandSeq", help="Number of random sequences (default: 100)", type=int, default=100)
parser.add_argument("-lRandSeq", help="Length of random sequences (default: 50)", type=int, default=50)
parser.add_argument("-logLevel", help="Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)", default='WARNING')
parser.add_argument("-output", help="Output file name", default='replPolicy.html')
args = parser.parse_args()
logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.getLevelName(args.logLevel))
policies = sorted(cacheSim.CommonPolicies.keys())
if args.policies:
policies = args.policies.split(',')
elif args.allQLRUVariants:
policies = sorted(set(cacheSim.CommonPolicies.keys())|set(cacheSim.AllDetQLRUVariants.keys()))
elif args.randPolicies:
policies = sorted(cacheSim.AllRandPolicies.keys())
if args.assoc:
assoc = args.assoc
else:
assoc = getCacheInfo(args.level).assoc
cBox = 0
if args.cBox:
cBox = args.cBox
setCount = len(parseCacheSetsStr(args.level, True, args.sets))
title = cpuid.cpu_name(cpuid.CPUID()) + ', Level: ' + str(args.level) + (', CBox: ' + str(cBox) if args.cBox else '')
html = ['<html>', '<head>', '<title>' + title + '</title>', '</head>', '<body>']
html += ['<h3>' + title + '</h3>']
html += ['<table border="1" style="white-space:nowrap;">']
html += ['<tr><th>Sequence</th><th>Actual</th>']
html += ['<th>' + p.replace('_', '<br>_') + '</th>' for p in policies]
html += ['</tr>']
possiblePolicies = set(policies)
counterExamples = dict()
dists = {p: 0.0 for p in policies}
seqList = []
seqList.extend(getRandomSeq(args.lRandSeq) for _ in range(0,args.nRandSeq))
for seq in seqList:
fullSeq = ((args.initSeq + ' ') if args.initSeq else '') + seq
print fullSeq
html += ['<tr><td>' + fullSeq + '</td>']
actual = getActualHits(fullSeq, args.level, args.sets, cBox, args.nMeasurements)
html += ['<td>' + str(actual) + '</td>']
outp = ''
for p in policies:
if not args.randPolicies:
sim = cacheSim.getHits(fullSeq, cacheSim.AllPolicies[p], assoc, setCount)
if sim != actual:
possiblePolicies.discard(p)
color = 'red'
if args.findCtrEx and not p in counterExamples:
counterExamples[p] = findSmallCounterexample(p, ((args.initSeq + ' ') if args.initSeq else ''), args.level, args.sets, cBox, assoc, seq,
args.nMeasurements)
else:
color = 'green'
else:
sim = median(sum(cacheSim.getHits(fullSeq, cacheSim.AllPolicies[p], assoc, setCount) for _ in range(0, args.nMeasurements)))
dist = (sim - actual) ** 2
dists[p] += dist
colorR = min(255, dist)
colorG = max(0, min(255, 512 - dist))
color = 'rgb(' + str(colorR) + ',' + str(colorG) + ',0)'
html += ['<td style="background-color:' + color + ';">' + str(sim) + '</td>']
html += ['</tr>']
if not args.randPolicies:
print 'Possible policies: ' + ', '.join(possiblePolicies)
if not possiblePolicies: break
if not args.randPolicies and args.findCtrEx:
print ''
print 'Counter example(s): '
for p, ctrEx in counterExamples.items():
print ' ' + p + ': ' + ctrEx
html += ['</table>', '</body>', '</html>']
with open(args.output ,'w') as f:
f.write('\n'.join(html))
if not args.randPolicies:
print 'Possible policies: ' + ', '.join(possiblePolicies)
else:
for p, d in reversed(sorted(dists.items(), key=lambda d: d[1])):
print p + ': ' + str(d)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,69 @@
#!/usr/bin/python
import argparse
import random
from plotly.offline import plot
import plotly.graph_objects as go
from cacheLib import *
import logging
log = logging.getLogger(__name__)
def main():
parser = argparse.ArgumentParser(description='Tests if the L3 cache uses set dueling')
parser.add_argument("-level", help="Cache level (Default: 3)", type=int, default=3)
parser.add_argument("-nRuns", help="Maximum number of runs", type=int, default=25)
parser.add_argument("-loop", help="Loop count", type=int, default=25)
parser.add_argument("-output", help="Output file name", default='setDueling.html')
parser.add_argument("-logLevel", help="Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)", default='INFO')
args = parser.parse_args()
logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.getLevelName(args.logLevel))
assoc = getCacheInfo(args.level).assoc
nSets = getCacheInfo(args.level).nSets
nCBoxes = max(1, getNCBoxUnits())
seq = ' '.join('B' + str(i) + '?' for i in range(0, assoc*4/3))
title = cpuid.cpu_name(cpuid.CPUID()) + ', Level: ' + str(args.level)
html = ['<html>', '<head>', '<title>' + title + '</title>', '<script src="https://cdn.plot.ly/plotly-latest.min.js">', '</script>', '</head>', '<body>']
html += ['<h3>' + title + '</h3>']
allSets = range(0,nSets)
yValuesForCBox = {cBox: [[] for s in range(0, nSets)] for cBox in range(0, nCBoxes)}
for i in range(0, args.nRuns):
for cBox in range(0, nCBoxes):
yValuesList = yValuesForCBox[cBox]
for s in list(allSets) * 2 + list(reversed(allSets)) * 2:
if yValuesList[s] and max(yValuesList[s]) > 2 and min(yValuesList[s]) < assoc/2:
continue
log.info('CBox ' + str(cBox) + ', run ' + str(i) + ', set: ' + str(s))
nb = runCacheExperiment(args.level, seq, cacheSets=str(s), clearHL=True, loop=args.loop, wbinvd=False, cBox=cBox, nMeasurements=1, warmUpCount=0)
yValuesList[s].append(nb['L' + str(args.level) + '_HIT'])
for cBox in range(0, nCBoxes):
yValues = [min(x) + (max(x)-min(x))/2 for x in yValuesForCBox[cBox] if x]
fig = go.Figure()
fig.update_layout(title_text='CBox ' + str(cBox) + ', Sequence (accessed ' + str(args.loop) + ' times in each set): ' + seq)
fig.update_layout(showlegend=True)
fig.update_xaxes(title_text='Set')
fig.add_trace(go.Scatter(y=yValues, mode='lines+markers', name='L3 Hits'))
html.append(plot(fig, include_plotlyjs=False, output_type='div'))
html += ['</body>', '</html>']
with open(args.output ,'w') as f:
f.write('\n'.join(html))
print 'Output written to ' + args.output
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,63 @@
#!/usr/bin/python
import argparse
import math
from plotly.offline import plot
import plotly.graph_objects as go
from cacheLib import *
def main():
parser = argparse.ArgumentParser(description='Generates a graph obtained by sweeping over a memory area repeatedly with a given stride')
parser.add_argument("-stride", help="Stride (in bytes) (Default: 64)", type=int, default=64)
parser.add_argument("-startSize", help="Start size of the memory area (in kB) (Default: 4)", type=int, default=4)
parser.add_argument("-endSize", help="End size of the memory area (in kB) (Default: 32768)", type=int, default=32768)
parser.add_argument("-loop", help="Loop count (Default: 100)", type=int, default=100)
parser.add_argument("-output", help="Output file name", default='strideGraph.html')
args = parser.parse_args()
resetNanoBench()
setNanoBenchParameters(config=getDefaultCacheConfig(), nMeasurements=1, warmUpCount=0, unrollCount=1, loopCount=args.loop, basicMode=False, noMem=True)
nbDicts = []
xValues = []
nAddresses = []
tickvals = []
pt = args.startSize*1024
while pt <= args.endSize*1024:
tickvals.append(pt)
for x in ([int(math.pow(2, math.log(pt, 2) + i/16.0)) for i in range(0,16)] if pt < args.endSize*1024 else [pt]):
print x/1024
xValues.append(str(x))
addresses = range(0, x, args.stride)
nAddresses.append(len(addresses))
ec = getCodeForAddressLists([AddressList(addresses,False,False)], wbinvd=True)
nbDicts.append(runNanoBench(code=ec.code, init=ec.init, oneTimeInit=ec.oneTimeInit))
pt *= 2
title = cpuid.cpu_name(cpuid.CPUID())
html = ['<html>', '<head>', '<title>' + title + '</title>', '<script src="https://cdn.plot.ly/plotly-latest.min.js">', '</script>', '</head>', '<body>']
html += ['<h3>' + title + '</h3>']
for evtType in ['Core cycles', 'APERF', 'HIT', 'MISS']:
if not any(e for e in nbDicts[0].keys() if evtType in e): continue
fig = go.Figure()
fig.update_layout(showlegend=True)
fig.update_xaxes(title_text='Size (in kB)', type='category', tickvals=tickvals, ticktext=[x/1024 for x in tickvals])
for event in sorted(e for e in nbDicts[0].keys() if evtType in e):
yValues = [nb[event]/nAddr for nb, nAddr in zip(nbDicts, nAddresses)]
fig.add_trace(go.Scatter(x=xValues, y=yValues, mode='lines+markers', name=event))
html.append(plot(fig, include_plotlyjs=False, output_type='div'))
html += ['</body>', '</html>']
with open(args.output ,'w') as f:
f.write('\n'.join(html))
print 'Graph written to ' + args.output
if __name__ == "__main__":
main()