nanoBench Cache Analyzer

2025-12-16 11:30:07 +01:00 · 2019-09-24 14:48:08 +02:00
parent 5d651050da
commit 9c075f04a5
12 changed files with 2258 additions and 0 deletions
--- a/tools/CacheAnalyzer/README.md
+++ b/tools/CacheAnalyzer/README.md
@@ -0,0 +1,133 @@
+# nanoBench Cache Analyzer
+
+This folder contains several tools for analyzing caches using hardware performance counters.
+
+Results for recent Intel CPUs that were obtained using these tools are available on [uops.info/cache.html](https://uops.info/cache.html).
+
+Make sure to read the [Prerequisites](#prerequisites) section before trying any of the tools.
+
+# Tools
+
+## cacheSeq.py
+
+This tool can be used to measure how many cache hits and misses executing an access sequence generates.
+As an example, consider the following call:
+
+    sudo ./cacheSeq.py -level 2 -sets 10-14,20,35 -seq "A B C D A? C! B?" 
+
+The tool will make memory accesses to four different blocks in all of the specified sets of the second-level cache. Elements of the sequence that end with a `?` will be included in the performance counter measurements; the other elements will be accessed, but the number of hits/misses they generate will not be recorded. Elements that end with a `!` will be flushed (using the `CLFLUSH` instruction) instead of being accessed. The order of the accesses is as follows: First, block `A` will be accessed in all of the specified sets, then block `B` will be accessed in all of the specified sets, and so on.
+
+Between every two accesses to the same set in a lower-level cache, the tool automatically adds enough accesses to the higher-level caches (that map to different sets and/or slices in the lower-level cache) to make sure that the corresponding lines are evicted from the higher-level cache and the access actually reaches the lower-level cache. These additional accesses are excluded from the performance counter measurements.
+
+By default, the `WBINVD` instruction is call before executing the access sequence. This can be disabled with the `-noWbinvd` option.
+
+The tool has the following command-line options:
+
+| Option                       | Description |
+|------------------------------|-------------|
+| `-seq <sequence>`            | Main access sequence. |
+| `-loop <n>`                  | Number of times the main access sequence is executed. `[Default: n=1]` |
+| `-seq_init <sequence>`       | Access sequence that is executed once in the beginning before the main sequence. |
+| `-level <n>`                 | Cache level `[Default: n=1]` |
+| `-sets <sets>`               | Cache sets in which the access sequence will be executed. By default all cache sets are used, except number of sets of the higher level cache, which are needed for clearing the higher level cache. |
+| `-cBox <n>`                  | CBox in which the access sequence will be executed. `[Default: n=1]` |
+| `-noClearHL`                 | Do not clear higher level caches. |
+| `-noWbinvd`                  | Do not call wbinvd before each run. |
+| `-sim <policy>`              | Simulate the given policy instead of running the experiment on the hardware. For a list of available policies, see `cacheSim.py`. |
+| `-simAssoc <n>`              | Associativity of the simulated cache. `[Default: n=1]` |
+
+## hitMiss.py
+
+Similar to `cacheSeq.py`, but only outputs whether the last access of a sequence is a hit or a miss.
+
+## cacheGraph.py
+
+Generates an HTML file with graph that shows the *ages* of all cache blocks after executing an access sequence. The *age* of a block B is the number of fresh blocks that need to be accessed before B is evicted.
+
+The tool supports all of the command-line options of `cacheSeq.py`, except the `-loop` option. In addition to that, the following options are supported:
+
+| Option                       | Description |
+|------------------------------|-------------|
+| `-blocks <blocks>`           | Only determine the ages of the blocks in the given list. `[Default: consider all blocks in the access sequence]` |
+| `-maxAge <n>`                | The maximum age to consider. `[Default: 2*associativity]` |
+| `-output <file>`             | File name of the HTML file. `[Default: graph.html]` |
+
+## replPolicy.py
+
+Determines the replacement policy by generating random access sequences and comparing the number of hits on the actual hardware to the number of hits in a simulation of different policies. By default, a number of commonly used policies are simulated. With the `-allQLRUVariants` option, a more comprehensive list of more than 300 QLRU variants is tested.
+
+The tool outputs all results in the form of an HTML table.
+
+If the `-findCtrEx` option is used, it will try to find a small counterexample for each policy.
+
+It supports the following additional command-line parameters:
+
+| Option                       | Description |
+|------------------------------|-------------|
+| `-level <n>`                 | Cache level `[Default: n=1]` |
+| `-sets <sets>`               | Cache sets for which the replacement policy will be tested. |
+| `-cBox <n>`                  | CBox for which the replacement policy will be tested. `[Default: n=0]` |
+| `-policies <policies>`       | Only consider the policies in the given comma-separated list |
+| `-useInitSeq <seq>`          | Adds a fixed prefix to each randomly generated sequence. This can be used to initialize the cache to a specific state. |
+| `-nRandSeq <n>`              | Number of random sequences. `[Default: n=100]` |
+| `-lRandSeq <n>`              | Length of random sequences. `[Default: n=50]` |
+| `-output <file>`             | File name of the HTML file. `[Default: replPolicy.html]` |
+
+## permPolicy.py
+
+If the replacement policy is a permutation policy (see [Measurement-based Modeling of the Cache Replacement Policy](http://embedded.cs.uni-saarland.de/publications/CacheModelingRTAS2013.pdf)), this tool determines the permutation vectors. In addition to that, it outputs a set of age graphs for the access sequences generated by the permutation policy inference algorithm. These graphs can be a useful starting point for analyzing policies that are not permutation policies.
+
+## strideGraph.py
+
+Generates a graph that shows the number of core cycles (per access) when accessing memory areas of different sizes repeatedly using a given stride (which can be specified with the `-stride` option). An example can be seen [here](https://uops.info/cache/lat_CFL.html).
+
+## cpuid.py
+
+Obtains cache and TLB information using the `CPUID` instruction.
+
+## cacheInfo.py
+
+Combines information from `cpuid.py` with information on the number of slices of the L3 cache that is obtained through measurements.
+
+## setDueling.py
+
+For caches that use set dueling to choose between two different policies, this tool can generate a graph that shows the sets that use a fixed policy.
+
+## cacheLib.py
+
+Library containing helper functions used by the other tools.
+
+## cacheSim.py
+
+This file contains the implementations of the simulated policies used by some of the other tools.
+
+# Prerequisites
+
+To use the tools in this folder, the nanoBench kernel module needs to be loaded. Instructions on how to do this, can be found in the main README on <https://github.com/andreas-abel/nanoBench>.
+
+nanoBench needs to be configured to use a physically contiguous memory area that is large enough for the access sequences that you want to test. This can be achieved with the `set-R14-size.sh` script in the main NanoBench folder.
+You can, e.g., call it as follows
+
+    sudo ./set-R14-size.sh 1G
+
+to reserve a memory area of 1 GB. If your system has enough memory, but the above call is unable to reserve a memory area of the requested size, a reboot often helps.
+
+For analyzing shared caches, it can make sense to disable other cores using the same cache. The `single-core-mode.sh` script in the main nanoBench folder can be used to disable all but one core.
+
+Furthermore, it can also be helpful to disable cache prefetching. On recent Intel CPUs, this can be done by executing
+
+    sudo modprobe msr; sudo wrmsr -a 0x1a4 15
+
+On some not so recent Intel CPUs (e.g., Core 2 Duo), you can use
+
+    sudo modprobe msr; sudo wrmsr -a 0x1a0 0xE0668D2689‬
+
+instead.
+
+I'm not aware of a way to disable cache prefetching on recent AMD CPUs. If you know how to do this, please consider posting an answer to [this question](https://stackoverflow.com/questions/57855793/how-to-disable-cache-prefetching-on-amd-family-17h-cpus).
+
+
+The tools that generate graphs need [Plotly](https://plot.ly/python/) to be installed. This can be achieved via
+
+    sudo apt install python-pip; pip install plotly
+
--- a/tools/CacheAnalyzer/cacheGraph.py
+++ b/tools/CacheAnalyzer/cacheGraph.py
@@ -0,0 +1,86 @@
+#!/usr/bin/python
+from itertools import count
+from collections import namedtuple, OrderedDict
+
+import argparse
+import sys
+
+from cacheLib import *
+import cacheSim
+
+from plotly.offline import plot
+import plotly.graph_objects as go
+
+import logging
+log = logging.getLogger(__name__)
+
+
+# traces is a list of (name, y value list) pairs
+def getPlotlyGraphDiv(title, x_title, y_title, traces):
+   fig = go.Figure()
+   fig.update_layout(title_text=title)
+   fig.update_xaxes(title_text=x_title)
+   fig.update_yaxes(title_text=y_title)
+
+   for name, y_values in traces:
+      fig.add_trace(go.Scatter(y=y_values, mode='lines+markers', name=name))
+
+   return plot(fig, include_plotlyjs=False, output_type='div')
+
+def main():
+   parser = argparse.ArgumentParser(description='Generates a graph with the ages of each block')
+   parser.add_argument("-seq", help="Access sequence", required=True)
+   parser.add_argument("-seq_init", help="Initialization sequence", default='')
+   parser.add_argument("-level", help="Cache level (Default: 1)", type=int, default=1)
+   parser.add_argument("-sets", help="Cache set (if not specified, all cache sets are used)")
+   parser.add_argument("-noClearHL", help="Do not clear higher levels", action='store_true')
+   parser.add_argument("-noWbinvd", help="Do not call wbinvd before each run", action='store_true')
+   parser.add_argument("-nMeasurements", help="Number of measurements", type=int, default=10)
+   parser.add_argument("-agg", help="Aggregate function", default='med')
+   parser.add_argument("-cBox", help="cBox (default: 1)", type=int, default=1)
+   parser.add_argument("-logLevel", help="Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)", default='WARNING')
+   parser.add_argument("-blocks", help="Blocks to consider (default: all blocks in seq)")
+   parser.add_argument("-maxAge", help="Maximum age", type=int)
+   parser.add_argument("-output", help="Output file name", default='graph.html')
+   parser.add_argument("-sim", help="Simulate the given policy instead of running the experiment on the hardware")
+   parser.add_argument("-simAssoc", help="Associativity of the simulated cache (default: 8)", type=int, default=8)
+   parser.add_argument("-simRep", help="Number of repetitions", type=int, default=1)
+   args = parser.parse_args()
+
+   logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.getLevelName(args.logLevel))
+
+   if args.blocks:
+      blocksStr = args.blocks
+   else:
+      blocksStr = args.seq
+   blocks = list(OrderedDict.fromkeys(re.sub('[?!,;]', ' ', blocksStr).split()))
+
+   html = ['<html>', '<head>', '<script src="https://cdn.plot.ly/plotly-latest.min.js">', '</script>', '</head>', '<body>']
+
+   if args.sim:
+      policyClass = cacheSim.AllPolicies[args.sim]
+      if not args.maxAge:
+         maxAge = 2*args.simAssoc
+      else:
+         maxAge = args.maxAge
+      nSets = len(parseCacheSetsStr(args.level, True, args.sets))
+      traces = cacheSim.getGraph(blocks, args.seq, policyClass, args.simAssoc, maxAge, nSets=nSets, nRep=args.simRep, agg=args.agg)
+      title = 'Access Sequence: ' + args.seq.replace('?','').strip() + ' <n fresh blocks> <block>?'
+      html.append(getPlotlyGraphDiv(title, '# of fresh blocks', 'Hits', traces))
+   else:
+      _, nbDict = getAgesOfBlocks(blocks, args.level, args.seq, initSeq=args.seq_init, cacheSets=args.sets, cBox=args.cBox, clearHL=(not args.noClearHL),
+                               wbinvd=(not args.noWbinvd), returnNbResults=True, maxAge=args.maxAge, nMeasurements=args.nMeasurements, agg=args.agg)
+      for event in sorted(e for e in nbDict.values()[0][0].keys() if 'HIT' in e or 'MISS' in e):
+         traces = [(b, [nb[event] for nb in nbDict[b]]) for b in blocks]
+         title = 'Access Sequence: ' + (args.seq_init + ' ' + args.seq).replace('?','').strip() + ' <n fresh blocks> <block>?'
+         html.append(getPlotlyGraphDiv(title, '# of fresh blocks', event, traces))
+
+   html += ['</body>', '</html>']
+
+   with open(args.output ,'w') as f:
+      f.write('\n'.join(html))
+      print 'Graph written to ' + args.output
+
+
+if __name__ == "__main__":
+    main()
--- a/tools/CacheAnalyzer/cacheInfo.py
+++ b/tools/CacheAnalyzer/cacheInfo.py
@@ -0,0 +1,27 @@
+#!/usr/bin/python
+import argparse
+
+from cacheLib import *
+
+import logging
+log = logging.getLogger(__name__)
+
+
+def main():
+   parser = argparse.ArgumentParser(description='Cache Information')
+   parser.add_argument("-logLevel", help="Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)", default='INFO')
+   args = parser.parse_args()
+
+   logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.getLevelName(args.logLevel))
+
+   cpuidInfo = getCpuidCacheInfo()
+
+   print ''
+   print getCacheInfo(1)
+   print getCacheInfo(2)
+   if 'L3' in cpuidInfo:
+      print getCacheInfo(3)
+
+
+if __name__ == "__main__":
+    main()
--- a/tools/CacheAnalyzer/cacheLib.py
+++ b/tools/CacheAnalyzer/cacheLib.py
@@ -0,0 +1,600 @@
+#!/usr/bin/python
+from itertools import count
+from collections import namedtuple
+
+import math
+import re
+import subprocess
+import sys
+
+import cpuid
+
+sys.path.append('../..')
+from kernelNanoBench import *
+
+import logging
+log = logging.getLogger(__name__)
+
+
+def getEventConfig(event):
+   arch = getArch()
+   if event == 'L1_HIT':
+      if arch in ['Core', 'EnhancedCore']: return '40.0E ' + event # L1D_CACHE_LD.MES
+      if arch in ['NHM', 'WSM']: return 'CB.01 ' + event
+      if arch in ['SNB', 'IVB', 'HSW', 'BDW', 'SKL', 'SKX', 'KBL', 'CFL', 'CNL']: return 'D1.01 ' + event
+   if event == 'L1_MISS':
+      if arch in ['Core', 'EnhancedCore']: return 'CB.01.CTR=0 ' + event
+      if arch in ['IVB', 'HSW', 'BDW', 'SKL', 'SKX', 'KBL', 'CFL', 'CNL']: return 'D1.08 ' + event
+      if arch in ['ZEN+']: return '064.70 ' + event
+   if event == 'L2_HIT':
+      if arch in ['Core', 'EnhancedCore']: return '29.7E ' + event # L2_LD.THIS_CORE.ALL_INCL.MES
+      if arch in ['NHM', 'WSM']: return 'CB.02 ' + event
+      if arch in ['SNB', 'IVB', 'HSW', 'BDW', 'SKL', 'SKX', 'KBL', 'CFL', 'CNL']: return 'D1.02 ' + event
+      if arch in ['ZEN+']: return '064.70 ' + event
+   if event == 'L2_MISS':
+      if arch in ['Core', 'EnhancedCore']: return 'CB.04.CTR=0 ' + event
+      if arch in ['IVB', 'HSW', 'BDW', 'SKL', 'SKX', 'KBL', 'CFL', 'CNL']: return 'D1.10 ' + event
+      if arch in ['ZEN+']: return '064.08 ' + event
+   if event == 'L3_HIT':
+      if arch in ['NHM', 'WSM']: return 'CB.04 ' + event
+      if arch in ['SNB', 'IVB', 'HSW', 'BDW', 'SKL', 'SKX', 'KBL', 'CFL', 'CNL']: return 'D1.04 ' + event
+   if event == 'L3_MISS':
+      if arch in ['NHM', 'WSM']: return 'CB.10 ' + event
+      if arch in ['SNB', 'IVB', 'HSW', 'BDW', 'SKL', 'SKX', 'KBL', 'CFL', 'CNL']: return 'D1.20 ' + event
+   return ''
+
+def getDefaultCacheConfig():
+   return '\n'.join(filter(None, [getEventConfig('L' + str(l) + '_' + hm) for l in range(1,4) for hm in ['HIT', 'MISS']]))
+
+
+def getDefaultCacheMSRConfig():
+   if 'Intel' in getCPUVendor() and 'L3' in getCpuidCacheInfo() and getCpuidCacheInfo()['L3']['complex']:
+      if getArch() in ['CNL']:
+         dist = 8
+         ctrOffset = 2
+      else:
+         dist = 16
+         ctrOffset = 6
+      return '\n'.join('msr_0xE01=0x20000000.msr_' + format(0x700 + dist*cbo, 'x') + '=0x408F34 msr_' + format(0x700 + ctrOffset + dist*cbo, 'x') +
+                       ' CACHE_LOOKUP_CBO_' + str(cbo) for cbo in range(0, getNCBoxUnits()))
+   return ''
+
+
+def isClose(a, b, rel_tol=1e-09, abs_tol=0.0):
+    return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)
+
+
+class CacheInfo:
+   def __init__(self, level, assoc, lineSize, nSets, nSlices=None, nCboxes=None):
+      self.level = level
+      self.assoc = assoc
+      self.lineSize = lineSize
+      self.nSets = nSets
+      self.waySize = lineSize * nSets
+      self.size = self.waySize * assoc * (nSlices if nSlices is not None else 1)
+      self.nSlices = nSlices
+      self.nCboxes = nCboxes
+
+   def __str__(self):
+      return '\n'.join(['L' + str(self.level) + ':',
+                        '  Size: ' + str(self.size/1024) + ' kB',
+                        '  Associativity: ' + str(self.assoc),
+                        '  Line Size: ' + str(self.lineSize) + ' B',
+                        '  Number of sets' + (' (per slice)' if self.nSlices is not None else '') + ': ' + str(self.nSets),
+                        '  Way size' + (' (per slice)' if self.nSlices is not None else '') + ': ' + str(self.waySize/1024) + ' kB',
+                       ('  Number of CBoxes: ' + str(self.nCboxes) if self.nCboxes is not None else ''),
+                       ('  Number of slices: ' + str(self.nSlices) if self.nSlices is not None else '')])
+
+
+def getArch():
+   if not hasattr(getArch, 'arch'):
+      cpu = cpuid.CPUID()
+      getArch.arch = cpuid.micro_arch(cpu)
+   return getArch.arch
+
+def getCPUVendor():
+   if not hasattr(getCPUVendor, 'vendor'):
+      cpu = cpuid.CPUID()
+      getCPUVendor.vendor = cpuid.cpu_vendor(cpu)
+   return getCPUVendor.vendor
+
+def getCpuidCacheInfo():
+   if not hasattr(getCpuidCacheInfo, 'cpuidCacheInfo'):
+      cpu = cpuid.CPUID()
+      log.debug(cpuid.get_basic_info(cpu))
+      getCpuidCacheInfo.cpuidCacheInfo = cpuid.get_cache_info(cpu)
+
+      if not len(set(c['lineSize'] for c in getCpuidCacheInfo.cpuidCacheInfo.values())) == 1:
+         raise ValueError('All line sizes must be the same')
+   return getCpuidCacheInfo.cpuidCacheInfo
+
+
+def getCacheInfo(level):
+   if level == 1:
+      if not hasattr(getCacheInfo, 'L1CacheInfo'):
+         cpuidInfo = getCpuidCacheInfo()['L1D']
+         getCacheInfo.L1CacheInfo = CacheInfo(1, cpuidInfo['assoc'], cpuidInfo['lineSize'], cpuidInfo['nSets'])
+      return getCacheInfo.L1CacheInfo
+   elif level == 2:
+      if not hasattr(getCacheInfo, 'L2CacheInfo'):
+         cpuidInfo = getCpuidCacheInfo()['L2']
+         getCacheInfo.L2CacheInfo = CacheInfo(2, cpuidInfo['assoc'], cpuidInfo['lineSize'], cpuidInfo['nSets'])
+      return getCacheInfo.L2CacheInfo
+   elif level == 3:
+      if not hasattr(getCacheInfo, 'L3CacheInfo'):
+         if not 'L3' in getCpuidCacheInfo():
+            raise ValueError('invalid level')
+         cpuidInfo = getCpuidCacheInfo()['L3']
+         if not 'complex' in cpuidInfo or not cpuidInfo['complex']:
+            getCacheInfo.L3CacheInfo = CacheInfo(3, cpuidInfo['assoc'], cpuidInfo['lineSize'], cpuidInfo['nSets'])
+         else:
+            lineSize = cpuidInfo['lineSize']
+            assoc = cpuidInfo['assoc']
+            nSets = cpuidInfo['nSets']
+
+            stride = 2**((lineSize*nSets/getNCBoxUnits())-1).bit_length() # smallest power of two larger than lineSize*nSets/nCBoxUnits
+            ms = findMaximalNonEvictingL3SetInCBox(0, stride, assoc, 0)
+            log.debug('Maximal non-evicting L3 set: ' + str(len(ms)) + ' ' + str(ms))
+            nCboxes = getNCBoxUnits()
+            nSlices = nCboxes * int(math.ceil(float(len(ms))/assoc))
+
+            getCacheInfo.L3CacheInfo = CacheInfo(3, assoc, lineSize, nSets/nSlices, nSlices, nCboxes)
+      return getCacheInfo.L3CacheInfo
+   else:
+      raise ValueError('invalid level')
+
+
+def getNCBoxUnits():
+   if not hasattr(getNCBoxUnits, 'nCBoxUnits'):
+      try:
+         subprocess.check_output(['modprobe', 'msr'])
+         cbo_config = subprocess.check_output(['rdmsr', '0x396'])
+         if getArch() in ['CNL']:
+            getNCBoxUnits.nCBoxUnits = int(cbo_config)
+         else:
+            getNCBoxUnits.nCBoxUnits = int(cbo_config) - 1
+         log.debug('Number of CBox Units: ' + str(getNCBoxUnits.nCBoxUnits))
+      except subprocess.CalledProcessError as e:
+         log.critical('Error: ' + e.output)
+         sys.exit()
+      except OSError as e:
+         log.critical("rdmsr not found. Try 'sudo apt install msr-tools'")
+         sys.exit()
+   return getNCBoxUnits.nCBoxUnits
+
+
+def getCBoxOfAddress(address):
+   if not hasattr(getCBoxOfAddress, 'cBoxMap'):
+      getCBoxOfAddress.cBoxMap = dict()
+   cBoxMap = getCBoxOfAddress.cBoxMap
+
+   if not address in cBoxMap:
+      setNanoBenchParameters(config='', msrConfig=getDefaultCacheMSRConfig(), nMeasurements=10, unrollCount=1, loopCount=10, aggregateFunction='min',
+                             basicMode=True, noMem=True)
+
+      ec = getCodeForAddressLists([AddressList([address],False,True)])
+      nb = runNanoBench(code=ec.code, oneTimeInit=ec.oneTimeInit)
+
+      nCacheLookups = [nb['CACHE_LOOKUP_CBO_'+str(cBox)] for cBox in range(0, getNCBoxUnits())]
+      cBoxMap[address] = nCacheLookups.index(max(nCacheLookups))
+
+   return cBoxMap[address]
+
+
+def getNewAddressesInCBox(n, cBox, cacheSet, prevAddresses, notInCBox=False):
+   if not prevAddresses:
+      maxPrevAddress = cacheSet * getCacheInfo(3).lineSize
+   else:
+      maxPrevAddress = max(prevAddresses)
+   addresses = []
+   for addr in count(maxPrevAddress+getCacheInfo(3).waySize, getCacheInfo(3).waySize):
+      if not notInCBox and getCBoxOfAddress(addr) == cBox:
+         addresses.append(addr)
+      if notInCBox and getCBoxOfAddress(addr) != cBox:
+         addresses.append(addr)
+      if len(addresses) >= n:
+         return addresses
+
+
+def getNewAddressesNotInCBox(n, cBox, cacheSet, prevAddresses):
+   return getNewAddressesInCBox(n, cBox, cacheSet, prevAddresses, notInCBox=True)
+
+
+pointerChasingInits = dict()
+
+#addresses must not contain duplicates
+def getPointerChasingInit(addresses):
+   if tuple(addresses) in pointerChasingInits:
+      return pointerChasingInits[tuple(addresses)]
+
+   init = 'lea RAX, [R14+' + str(addresses[0]) + ']; '
+   init += 'mov RBX, RAX; '
+
+   i = 0
+   while i < len(addresses)-1:
+      stride = addresses[i+1] - addresses[i]
+      init += '1: add RBX, ' + str(stride) + '; '
+      init += 'mov [RAX], RBX; '
+      init += 'mov RAX, RBX; '
+
+      i += 1
+      oldI = i
+
+      while i < len(addresses)-1 and (addresses[i+1] - addresses[i]) == stride:
+         i += 1
+
+      if oldI != i:
+         init += 'lea RCX, [R14+' + str(addresses[i]) + ']; '
+         init += 'cmp RAX, RCX; '
+         init += 'jne 1b; '
+
+   init += 'mov qword ptr [R14 + ' + str(addresses[-1]) + '], 0; '
+   pointerChasingInits[tuple(addresses)] = init
+   return init
+
+
+ExperimentCode = namedtuple('ExperimentCode', 'code init oneTimeInit')
+
+def getCodeForAddressLists(codeAddressLists, initAddressLists=[], wbinvd=False):
+   distinctAddrLists = set(tuple(l.addresses) for l in initAddressLists+codeAddressLists)
+   if len(distinctAddrLists) > 1 and set.intersection(*list(set(l) for l in distinctAddrLists)):
+      raise ValueError('same address in different lists')
+
+   code = []
+   init = (['wbinvd; '] if wbinvd else [])
+   oneTimeInit = []
+
+   r14Size = getR14Size()
+   alreadyAddedOneTimeInits = set()
+
+   for addressLists, codeList, isInit in [(initAddressLists, init, True), (codeAddressLists, code, False)]:
+      if addressLists is None: continue
+
+      pfcEnabled = True
+      for addressList in addressLists:
+         addresses = addressList.addresses
+         if len(addresses) < 1: continue
+
+         if any(addr >= r14Size for addr in addresses):
+            sys.stderr.write('Size of memory area too small. Try increasing it with set-R14-size.sh.\n')
+            exit(1)
+
+         if not isInit:
+            if addressList.exclude and pfcEnabled:
+               codeList.append(PFC_STOP_ASM + '; ')
+               pfcEnabled = False
+            elif not addressList.exclude and not pfcEnabled:
+               codeList.append(PFC_START_ASM + '; ')
+               pfcEnabled = True
+
+         # use multiple lfence instructions to make sure that the block is actually in the cache and not still in a fill buffer
+         codeList.append('lfence; ' * 10)
+
+         if addressList.flush:
+            for address in addresses:
+               codeList.append('clflush [R14 + ' + str(address) + ']; ')
+         else:
+            if len(addresses) == 1:
+               codeList.append('mov RCX, [R14 + ' + str(addresses[0]) + ']; ')
+            else:
+               if not tuple(addresses) in alreadyAddedOneTimeInits:
+                  oneTimeInit.append(getPointerChasingInit(addresses))
+                  alreadyAddedOneTimeInits.add(tuple(addresses))
+
+               codeList.append('lea RCX, [R14+' + str(addresses[0]) + ']; 1: mov RCX, [RCX]; jrcxz 2f; jmp 1b; 2: ')
+
+      if not isInit and not pfcEnabled:
+         codeList.append(PFC_START_ASM + '; ')
+
+   return ExperimentCode(''.join(code), ''.join(init), ''.join(oneTimeInit))
+
+
+def getClearHLAddresses(level, cacheSetList, cBox=1):
+   lineSize = getCacheInfo(1).lineSize
+
+   if level == 1:
+      return []
+   elif (level == 2) or (level == 3 and getCacheInfo(3).nSlices is None):
+      nSets = getCacheInfo(level).nSets
+      if not all(nSets > getCacheInfo(lLevel).nSets for lLevel in range(1, level)):
+         raise ValueError('L' + str(level) + ' way size must be greater than lower level way sizes')
+
+      nHLSets = getCacheInfo(level-1).nSets
+      nClearAddresses = 2*sum(getCacheInfo(hLevel).assoc for hLevel in range(1, level))
+
+      HLSets = set(cs % nHLSets for cs in cacheSetList)
+      addrForClearingHL = []
+
+      for HLSet in HLSets:
+         possibleSets = [cs for cs in range(HLSet, nSets, nHLSets) if cs not in cacheSetList]
+         if not possibleSets:
+            raise ValueError("not enough cache sets available for clearing higher levels")
+
+         addrForClearingHLSet = []
+
+         for setIndex in count(HLSet, nHLSets):
+            if not setIndex % nSets in possibleSets:
+               continue
+            addrForClearingHLSet.append(setIndex*lineSize)
+            if len(addrForClearingHLSet) >= nClearAddresses:
+               break
+
+         addrForClearingHL += addrForClearingHLSet
+
+      return addrForClearingHL
+   elif level == 3:
+      if not hasattr(getClearHLAddresses, 'clearL2Map'):
+         getClearHLAddresses.clearL2Map = dict()
+      clearL2Map = getClearHLAddresses.clearL2Map
+
+      if not cBox in clearL2Map:
+         clearL2Map[cBox] = dict()
+
+      clearAddresses = []
+      for L3Set in cacheSetList:
+         if not L3Set in clearL2Map[cBox]:
+            clearL2Map[cBox][L3Set] = getNewAddressesNotInCBox(2*(getCacheInfo(1).assoc+getCacheInfo(2).assoc), cBox, L3Set, [])
+         clearAddresses += clearL2Map[cBox][L3Set]
+
+      return clearAddresses
+
+L3SetToWayIDMap = dict()
+def getAddresses(level, wayID, cacheSetList, cBox=1, clearHL=True):
+   lineSize = getCacheInfo(1).lineSize
+
+   if level <= 2 or (level == 3 and getCacheInfo(3).nSlices is None):
+      nSets = getCacheInfo(level).nSets
+      waySize = getCacheInfo(level).waySize
+      return [(wayID*waySize) + s*lineSize for s in cacheSetList]
+   elif level == 3:
+      if not cBox in L3SetToWayIDMap:
+         L3SetToWayIDMap[cBox] = dict()
+
+      addresses = []
+      for L3Set in cacheSetList:
+         if not L3Set in L3SetToWayIDMap[cBox]:
+            L3SetToWayIDMap[cBox][L3Set] = dict()
+            if getCacheInfo(3).nSlices != getNCBoxUnits():
+               for i, addr in enumerate(findMinimalL3EvictionSet(L3Set, cBox)):
+                  L3SetToWayIDMap[cBox][L3Set][i] = addr
+         if not wayID in L3SetToWayIDMap[cBox][L3Set]:
+            if getCacheInfo(3).nSlices == getNCBoxUnits():
+               L3SetToWayIDMap[cBox][L3Set][wayID] = next(iter(getNewAddressesInCBox(1, cBox, L3Set, L3SetToWayIDMap[cBox][L3Set].values())))
+            else:
+               L3SetToWayIDMap[cBox][L3Set][wayID] = next(iter(findCongruentL3Addresses(1, L3SetToWayIDMap[cBox][L3Set].values())))
+         addresses.append(L3SetToWayIDMap[cBox][L3Set][wayID])
+
+      return addresses
+
+   raise ValueError('invalid level')
+
+
+# removes ?s and !s
+def getBlockName(blockStr):
+   return re.sub('[?!]', '', blockStr)
+
+
+def parseCacheSetsStr(level, clearHL, cacheSetsStr):
+   cacheSetList = []
+   if cacheSetsStr is not None:
+      for s in cacheSetsStr.split(','):
+         if '-' in s:
+            first, last = s.split('-')[:2]
+            cacheSetList += range(int(first), int(last)+1)
+         else:
+            cacheSetList.append(int(s))
+   else:
+      nSets = getCacheInfo(level).nSets
+      if level > 1 and clearHL:
+         nHLSets = getCacheInfo(level-1).nSets
+         cacheSetList = range(nHLSets, nSets)
+      else:
+         cacheSetList = range(0, nSets)
+   return cacheSetList
+
+
+AddressList = namedtuple('AddressList', 'addresses exclude flush')
+
+# cacheSets=None means do access in all sets
+# in this case, the first nL1Sets many sets of L2 will be reserved for clearing L1
+# if wbinvd is set, wbinvd will be called before initSeq
+def runCacheExperiment(level, seq, initSeq='', cacheSets=None, cBox=1, clearHL=True, loop=1, wbinvd=False, nMeasurements=10, warmUpCount=1, agg='avg'):
+   lineSize = getCacheInfo(1).lineSize
+
+   cacheSetList = parseCacheSetsStr(level, clearHL, cacheSets)
+
+   clearHLAddrList = None
+   if (clearHL and level > 1):
+      clearHLAddrList = AddressList(getClearHLAddresses(level, cacheSetList, cBox), True, False)
+
+   initAddressLists = []
+   seqAddressLists = []
+   nameToID = dict()
+
+   for seqString, addrLists in [(initSeq, initAddressLists), (seq, seqAddressLists)]:
+      for seqEl in seqString.split():
+         name = getBlockName(seqEl)
+         wayID = nameToID.setdefault(name, len(nameToID))
+         exclude = not '?' in seqEl
+         flush = '!' in seqEl
+
+         addresses = getAddresses(level, wayID, cacheSetList, cBox=cBox, clearHL=clearHL)
+
+         if clearHLAddrList is not None and not flush:
+            addrLists.append(clearHLAddrList)
+         addrLists.append(AddressList(addresses, exclude, flush))
+
+   ec = getCodeForAddressLists(seqAddressLists, initAddressLists, wbinvd)
+
+   log.debug('\nInitAddresses: ' + str(initAddressLists))
+   log.debug('\nSeqAddresses: ' + str(seqAddressLists))
+   log.debug('\nOneTimeInit: ' + ec.oneTimeInit)
+   log.debug('\nInit: ' + ec.init)
+   log.debug('\nCode: ' + ec.code)
+
+   resetNanoBench()
+   setNanoBenchParameters(config=getDefaultCacheConfig(), msrConfig=getDefaultCacheMSRConfig(), nMeasurements=nMeasurements, unrollCount=1, loopCount=loop,
+                           warmUpCount=warmUpCount, aggregateFunction=agg, basicMode=True, noMem=True, verbose=None)
+
+   return runNanoBench(code=ec.code, init=ec.init, oneTimeInit=ec.oneTimeInit)
+
+
+def printNB(nb_result):
+   for r in nb_result.items():
+      print r[0] + ': ' + str(r[1])
+
+
+def findMinimalL3EvictionSet(cacheSet, cBox):
+   setNanoBenchParameters(config='\n'.join([getEventConfig('L3_HIT'), getEventConfig('L3_MISS')]), msrConfig=None, nMeasurements=10, unrollCount=1, loopCount=10,
+                           warmUpCount=None, initialWarmUpCount=None, aggregateFunction='med', basicMode=True, noMem=True, verbose=None)
+
+   if not hasattr(findMinimalL3EvictionSet, 'evSetForCacheSet'):
+      findMinimalL3EvictionSet.evSetForCacheSet = dict()
+   evSetForCacheSet = findMinimalL3EvictionSet.evSetForCacheSet
+
+   if cacheSet in evSetForCacheSet:
+      return evSetForCacheSet[cacheSet]
+
+   addresses = []
+   curAddress = cacheSet*getCacheInfo(3).lineSize
+
+   while len(addresses) < getCacheInfo(3).assoc:
+      curAddress += getCacheInfo(3).waySize
+      if getCBoxOfAddress(curAddress) == cBox:
+         addresses.append(curAddress)
+
+   while True:
+      curAddress += getCacheInfo(3).waySize
+      if not getCBoxOfAddress(curAddress) == cBox: continue
+
+      addresses += [curAddress]
+      ec = getCodeForAddressLists([AddressList(addresses,False,False)])
+
+      setNanoBenchParameters(config=getDefaultCacheConfig(), msrConfig='', nMeasurements=10, unrollCount=1, loopCount=100,
+                             aggregateFunction='med', basicMode=True, noMem=True)
+      nb = runNanoBench(code=ec.code, oneTimeInit=ec.oneTimeInit)
+
+      if nb['L3_HIT'] < len(addresses) - .9:
+         break
+
+   for i in reversed(range(0, len(addresses))):
+      tmpAddresses = addresses[:i] + addresses[(i+1):]
+
+      ec = getCodeForAddressLists([AddressList(tmpAddresses,False,False)])
+      nb = runNanoBench(code=ec.code, oneTimeInit=ec.oneTimeInit)
+
+      if nb['L3_HIT'] < len(tmpAddresses) - 0.9:
+         addresses = tmpAddresses
+
+   evSetForCacheSet[cacheSet] = addresses
+   return addresses
+
+
+def findCongruentL3Addresses(n, L3EvictionSet):
+   setNanoBenchParameters(config=getEventConfig('L3_HIT'), msrConfig=None, nMeasurements=10, unrollCount=1, loopCount=100,
+                           warmUpCount=None, initialWarmUpCount=None, aggregateFunction='med', basicMode=True, noMem=True, verbose=None)
+   congrAddresses = []
+   L3WaySize = getCacheInfo(3).waySize
+
+   for newAddr in count(max(L3EvictionSet)+L3WaySize, L3WaySize):
+      tmpAddresses = L3EvictionSet[:getCacheInfo(3).assoc] + [newAddr]
+
+      ec = getCodeForAddressLists([AddressList(tmpAddresses,False,False)])
+      nb = runNanoBench(code=ec.code, oneTimeInit=ec.oneTimeInit)
+
+      if nb['L3_HIT'] < len(tmpAddresses) - 0.9:
+         congrAddresses.append(newAddr)
+
+      if len(congrAddresses) >= n: break
+
+   return congrAddresses
+
+
+def findMaximalNonEvictingL3SetInCBox(start, stride, L3Assoc, cBox):
+   curAddress = start
+   addresses = []
+
+   while len(addresses) < L3Assoc:
+      if getCBoxOfAddress(curAddress) == cBox:
+         addresses.append(curAddress)
+      curAddress += stride
+
+   notAdded = 0
+   while notAdded < L3Assoc:
+      curAddress += stride
+
+      if not getCBoxOfAddress(curAddress) == cBox:
+         continue
+
+      newAddresses = addresses + [curAddress]
+      ec = getCodeForAddressLists([AddressList(newAddresses,False,False)])
+
+      setNanoBenchParameters(config=getEventConfig('L3_HIT'), msrConfig='', nMeasurements=10, unrollCount=1, loopCount=10,
+                             aggregateFunction='med', basicMode=True, noMem=True)
+      nb = runNanoBench(code=ec.code, oneTimeInit=ec.oneTimeInit)
+
+      if nb['L3_HIT'] > len(newAddresses) - .9:
+         addresses = newAddresses
+         notAdded = 0
+      else:
+         notAdded += 1
+
+   return addresses
+
+
+def getUnusedBlockNames(n, usedBlockNames, prefix=''):
+   newBlockNames = []
+   i = 0
+   while len(newBlockNames) < n:
+      name = prefix + str(i)
+      if not name in usedBlockNames: newBlockNames.append(name)
+      i += 1
+   return newBlockNames
+
+
+# Returns a dict with the age of each block, i.e., how many fresh blocks need to be accessed until the block is evicted
+# if returnNbResults is True, the function returns additionally all measurment results (as the second component of a tuple)
+def getAgesOfBlocks(blocks, level, seq, initSeq='', maxAge=None, cacheSets=None, cBox=1, clearHL=True,  wbinvd=False, returnNbResults=False, nMeasurements=10, agg='avg'):
+   ages = dict()
+   if returnNbResults: nbResults = dict()
+
+   if maxAge is None:
+      maxAge = 2*getCacheInfo(level).assoc
+
+   nSets = len(parseCacheSetsStr(level, clearHL, cacheSets))
+
+   for block in blocks:
+      if returnNbResults: nbResults[block] = []
+
+      for nNewBlocks in range(0, maxAge+1):
+         curSeq = seq.replace('?', '') + ' '
+         newBlocks = getUnusedBlockNames(nNewBlocks, seq+initSeq, 'N')
+         curSeq += ' '.join(newBlocks) + ' ' + block + '?'
+
+         nb = runCacheExperiment(level, curSeq, initSeq=initSeq, cacheSets=cacheSets, cBox=cBox, clearHL=clearHL, loop=0, wbinvd=wbinvd, nMeasurements=nMeasurements)
+         if returnNbResults: nbResults[block].append(nb)
+
+         hitEvent = 'L' + str(level) + '_HIT'
+         missEvent = 'L' + str(level) + '_MISS'
+
+         if hitEvent in nb:
+            if isClose(nb[hitEvent], 0.0, abs_tol=0.1):
+               if not block in ages:
+                  ages[block] = nNewBlocks
+               #if not returnNbResults:
+               #break
+         elif missEvent in nb:
+            if nb[missEvent] > nSets - 0.1:
+               if not block in ages:
+                  ages[block] = nNewBlocks
+               #if not returnNbResults:
+               #break
+         else:
+            raise ValueError('no cache results available')
+      if not block in ages:
+         ages[block] = -1
+
+   if returnNbResults:
+      return (ages, nbResults)
+   else:
+      return ages
--- a/tools/CacheAnalyzer/cacheSeq.py
+++ b/tools/CacheAnalyzer/cacheSeq.py
@@ -0,0 +1,47 @@
+#!/usr/bin/python
+from itertools import count, cycle, islice
+from collections import namedtuple, OrderedDict
+
+import argparse
+import sys
+
+from cacheLib import *
+import cacheSim
+
+import logging
+log = logging.getLogger(__name__)
+
+
+def main():
+   parser = argparse.ArgumentParser(description='Cache Benchmarks')
+   parser.add_argument("-seq", help="Access sequence", required=True)
+   parser.add_argument("-seq_init", help="Initialization sequence", default='')
+   parser.add_argument("-level", help="Cache level (Default: 1)", type=int, default=1)
+   parser.add_argument("-sets", help="Cache sets (if not specified, all cache sets are used)")
+   parser.add_argument("-cBox", help="cBox (default: 1)", type=int, default=1) # use 1 as default, as, e.g., on SNB, box 0 only has 15 ways instead of 16
+   parser.add_argument("-noClearHL", help="Do not clear higher levels", action='store_true')
+   parser.add_argument("-nMeasurements", help="Number of measurements", type=int, default=10)
+   parser.add_argument("-agg", help="Aggregate function", default='med')
+   parser.add_argument("-loop", help="Loop count (Default: 1)", type=int, default=1)
+   parser.add_argument("-noWbinvd", help="Do not call wbinvd before each run", action='store_true')
+   parser.add_argument("-logLevel", help="Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)", default='WARNING')
+   parser.add_argument("-sim", help="Simulate the given policy instead of running the experiment on the hardware")
+   parser.add_argument("-simAssoc", help="Associativity of the simulated cache (default: 8)", type=int, default=8)
+   args = parser.parse_args()
+
+   logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.getLevelName(args.logLevel))
+
+   if args.sim:
+      policyClass = cacheSim.AllPolicies[args.sim]
+      setCount = len(parseCacheSetsStr(args.level, (not args.noClearHL), args.sets))
+      seq = args.seq_init + (' ' + args.seq) * args.loop
+      hits = cacheSim.getHits(seq, policyClass, args.simAssoc, setCount) / args.loop
+      print 'Hits: ' + str(hits)
+   else:
+      nb = runCacheExperiment(args.level, args.seq, initSeq=args.seq_init, cacheSets=args.sets, cBox=args.cBox, clearHL=(not args.noClearHL), loop=args.loop,
+                              wbinvd=(not args.noWbinvd), nMeasurements=args.nMeasurements, agg=args.agg)
+      printNB(nb)
+
+
+if __name__ == "__main__":
+    main()
--- a/tools/CacheAnalyzer/cacheSim.py
+++ b/tools/CacheAnalyzer/cacheSim.py
@@ -0,0 +1,360 @@
+#!/usr/bin/python
+import random
+
+from itertools import count
+from numpy import median
+
+from cacheLib import *
+
+import logging
+log = logging.getLogger(__name__)
+
+
+def rindex(lst, value):
+   return len(lst) - lst[::-1].index(value) - 1
+
+
+class ReplPolicySim(object):
+   def __init__(self, assoc):
+      self.assoc = assoc
+      self.blocks = [None] * assoc
+
+   def acc(self, block):
+      raise NotImplementedError()
+
+   def flush(self, block):
+      if block in self.blocks:
+         self.blocks[self.blocks.index(block)] = block
+
+
+class FIFOSim(ReplPolicySim):
+   def __init__(self, assoc):
+      super(FIFOSim, self).__init__(assoc)
+
+   def acc(self, block):
+      hit = block in self.blocks
+      if not hit:
+         self.blocks = [block] + self.blocks[0:self.assoc-1]
+      return hit
+
+
+class LRUSim(ReplPolicySim):
+   def __init__(self, assoc):
+      super(LRUSim, self).__init__(assoc)
+
+   def acc(self, block):
+      hit = block in self.blocks
+      self.blocks = [block] + [b for b in self.blocks if b!=block][0:self.assoc-1]
+      return hit
+
+
+class PLRUSim(ReplPolicySim):
+   def __init__(self, assoc, linearInit=False, randLeaf=False, randRoot=False):
+      super(PLRUSim, self).__init__(assoc)
+      self.linearInit = linearInit
+      self.randLeaf = randLeaf
+      self.randRoot = randRoot
+      self.bits = [[0 for _ in range(0, 2**(level))] for level in range(0, int(math.ceil(math.log(assoc,2))))]
+
+   def acc(self, block):
+      hit = block in self.blocks
+      if hit:
+         self.updateIndexBits(self.blocks.index(block))
+      else:
+         if self.linearInit and None in self.blocks:
+            idx = self.blocks.index(None)
+         else:
+            idx = self.getIndexForBits()
+         self.blocks[idx] = block
+         self.updateIndexBits(idx)
+      return hit
+
+   def getIndexForBits(self, level=0, idx = 0):
+      if level == len(self.bits) - 1:
+         ret = 2*idx
+         if self.randLeaf:
+            ret += random.randint(0,1)
+         else:
+            ret += self.bits[level][idx]
+         return min(self.assoc - 1, ret)
+      elif level == 0 and self.randRoot:
+         return self.getIndexForBits(level + 2, random.randint(0,2))
+      else:
+         return self.getIndexForBits(level + 1, 2*idx + self.bits[level][idx])
+
+   def updateIndexBits(self, accIndex):
+      lastIdx = accIndex
+      for level in reversed(range(0, len(self.bits))):
+         curIdx = lastIdx/2
+         self.bits[level][curIdx] = 1 - (lastIdx % 2)
+         lastIdx = curIdx
+
+
+class PLRUlSim(PLRUSim):
+   def __init__(self, assoc):
+      super(PLRUlSim, self).__init__(assoc, linearInit=True)
+
+
+class PLRURandSim(PLRUSim):
+   def __init__(self, assoc):
+      super(PLRURandSim, self).__init__(assoc, randLeaf=True)
+
+class RandPLRUSim(PLRUSim):
+   def __init__(self, assoc):
+      super(RandPLRUSim, self).__init__(assoc, randRoot=True)
+
+
+AllRandPLRUVariants = {
+   'RandPLRU': RandPLRUSim,
+   'PLRURand': PLRURandSim,
+}
+
+class QLRUSim(ReplPolicySim):
+   def __init__(self, assoc, hitFunc, missFunc, replIdxFunc, updFunc, updOnMissOnly=False):
+      super(QLRUSim, self).__init__(assoc)
+      self.hitFunc = hitFunc
+      self.missFunc = missFunc
+      self.replIdxFunc = replIdxFunc
+      self.updFunc = updFunc
+      self.updOnMissOnly = updOnMissOnly
+      self.bits = [3] * assoc
+
+   def acc(self, block):
+      hit = block in self.blocks
+
+      if hit:
+         index = self.blocks.index(block)
+         self.bits[index] = self.hitFunc(self.bits[index])
+      else:
+         if self.updOnMissOnly:
+            self.bits = self.updFunc(self.bits, -1)
+
+         index = self.replIdxFunc(self.bits, self.blocks)
+         self.blocks[index] = block
+         self.bits[index] = self.missFunc()
+
+      if not self.updOnMissOnly:
+         self.bits = self.updFunc(self.bits, index)
+
+      return hit
+
+QLRUHitFuncs = {
+   'H21': lambda x: {3:2, 2:1, 1:0, 0:0}[x],
+   'H20': lambda x: {3:2, 2:0, 1:0, 0:0}[x],
+   'H11': lambda x: {3:1, 2:1, 1:0, 0:0}[x],
+   'H10': lambda x: {3:1, 2:0, 1:0, 0:0}[x],
+   'H00': lambda x: {3:0, 2:0, 1:0, 0:0}[x],
+}
+
+QLRUMissFuncs = {
+   'M0': lambda: 0,
+   'M1': lambda: 1,
+   'M2': lambda: 2,
+   'M3': lambda: 3,
+}
+
+QLRUMissRandFuncs = {
+   'MR32': lambda: (2 if random.randint(0,15) == 0 else 3),
+   'MR31': lambda: (1 if random.randint(0,15) == 0 else 3),
+   'MR30': lambda: (0 if random.randint(0,15) == 0 else 3),
+   'MR21': lambda: (1 if random.randint(0,15) == 0 else 2),
+   'MR20': lambda: (0 if random.randint(0,15) == 0 else 2),
+   'MR10': lambda: (0 if random.randint(0,15) == 0 else 1),
+}
+
+QLRUReplIdxFuncs = {
+   'R0': lambda bits, blocks: blocks.index(None) if None in blocks else bits.index(3), #CFL L3
+   'R1': lambda bits, blocks: blocks.index(None) if None in blocks else (bits.index(3) if 3 in bits else 0), #IVB
+   'R2': lambda bits, blocks: rindex(blocks, None) if None in blocks else bits.index(3), # CFL L2
+}
+
+QLRUUpdFuncs = {
+   'U0': lambda bits, replIdx: [b + (3 - max(bits)) for b in bits], #CFL L3
+   'U1': lambda bits, replIdx: [(b + (3 - max(bits[:replIdx]+bits[replIdx+1:])) if i != replIdx else b) for i, b in enumerate(bits)], #CFL L2
+   'U2': lambda bits, replIdx: [b+1 for b in bits] if not 3 in bits else bits, # IVB
+   'U3': lambda bits, replIdx: [((b+1) if i != replIdx else b) for i, b in enumerate(bits)] if not 3 in bits else bits,
+}
+
+# all deterministic QLRU variants
+AllDetQLRUVariants = {
+   'QLRU_' + hf[0] + '_' + mf[0] + '_' + rf[0] + '_' + uf[0] + ('_UMO' if umo else ''):
+      type('QLRU_' + hf[0] + '_' + mf[0] + '_' + rf[0] + '_' + uf[0] + ('_UMO' if umo else ''), (QLRUSim,),
+        {'__init__': lambda self, assoc, hfl=hf[1], mfl=mf[1], rfl=rf[1], ufl=uf[1], umol=umo: QLRUSim.__init__(self, assoc, hfl, mfl, rfl, ufl, umol)})
+          for hf in QLRUHitFuncs.items()
+            for mf in QLRUMissFuncs.items()
+              for rf in QLRUReplIdxFuncs.items()
+                for uf in QLRUUpdFuncs.items()
+                  for umo in [False, True]
+                    if not (rf[0] in ['R0', 'R2'] and uf[0] in ['U2', 'U3'])
+}
+
+# all randomized QLRU variants
+AllRandQLRUVariants = {
+   'QLRU_' + hf[0] + '_' + mf[0] + '_' + rf[0] + '_' + uf[0] + ('_UMO' if umo else ''):
+      type('QLRU_' + hf[0] + '_' + mf[0] + '_' + rf[0] + '_' + uf[0] + ('_UMO' if umo else ''), (QLRUSim,),
+        {'__init__': lambda self, assoc, hfl=hf[1], mfl=mf[1], rfl=rf[1], ufl=uf[1], umol=umo: QLRUSim.__init__(self, assoc, hfl, mfl, rfl, ufl, umol)})
+          for hf in QLRUHitFuncs.items()
+            for mf in QLRUMissRandFuncs.items()
+              for rf in QLRUReplIdxFuncs.items()
+                for uf in QLRUUpdFuncs.items()
+                  for umo in [False, True]
+                    if not (rf[0] in ['R0', 'R2'] and uf[0] in ['U2', 'U3'])
+}
+
+
+class MRUSim(ReplPolicySim):
+   def __init__(self, assoc, updIfNotFull=True):
+      super(MRUSim, self).__init__(assoc)
+      self.bits = [1] * assoc
+      self.updIfNotFull = updIfNotFull
+
+   def acc(self, block):
+      hit = block in self.blocks
+      full = not (None in self.blocks)
+      if hit:
+         index = self.blocks.index(block)
+      else:
+         if not full:
+            index = self.blocks.index(None)
+         else:
+            index = self.bits.index(1)
+         self.blocks[index] = block
+
+      if (full or self.updIfNotFull):
+         self.bits[index] = 0
+
+      if not 1 in self.bits:
+         self.bits = [(1 if bi!=index else 0) for bi, _ in enumerate(self.bits)]
+
+      return hit
+
+
+class MRUNSim(MRUSim):
+   def __init__(self, assoc):
+      super(MRUNSim, self).__init__(assoc, False)
+
+
+# according to ISCA'10 paper
+class NRUSim(ReplPolicySim):
+   def __init__(self, assoc):
+      super(NRUSim, self).__init__(assoc)
+      self.bits = [1] * assoc
+
+   def acc(self, block):
+      hit = block in self.blocks
+      if hit:
+         index = self.blocks.index(block)
+         self.bits[index] = 0
+      else:
+         while not 1 in self.bits:
+            self.bits = [1] * self.assoc
+         index = self.bits.index(1)
+         self.blocks[index] = block
+         self.bits[index] = 0
+      return hit
+
+
+CommonPolicies = {
+   'FIFO': FIFOSim,
+   'LRU': LRUSim,
+   'PLRU': PLRUSim,
+   'PLRUl': PLRUlSim,
+   'MRU': MRUSim, # NHM
+   'MRU_N': MRUNSim, # SNB
+   'NRU': NRUSim,
+   'QLRU_H11_M1_R0_U0': AllDetQLRUVariants['QLRU_H11_M1_R0_U0'], # CFL L3
+   'QLRU_H21_M1_R0_U0_UMO': AllDetQLRUVariants['QLRU_H21_M2_R0_U0_UMO'], # https://arxiv.org/pdf/1904.06278.pdf paper
+   'QLRU_H11_M1_R1_U2': AllDetQLRUVariants['QLRU_H11_M1_R1_U2'], # IVB
+   'QLRU_H00_M1_R2_U1': AllDetQLRUVariants['QLRU_H00_M1_R2_U1'], # CFL L2
+   'QLRU_H00_M1_R0_U1': AllDetQLRUVariants['QLRU_H00_M1_R0_U1'], # CNL L2
+   'SRRIP': AllDetQLRUVariants['QLRU_H00_M2_R0_U0_UMO'],
+}
+
+AllDetPolicies = dict(CommonPolicies.items() + AllDetQLRUVariants.items())
+AllRandPolicies = dict(AllRandQLRUVariants.items() + AllRandPLRUVariants.items())
+AllPolicies = dict(AllDetPolicies.items() + AllRandPolicies.items())
+
+
+def getHits(seq, policySimClass, assoc, nSets):
+   hits = 0
+   policySims = [policySimClass(assoc) for _ in range(0, nSets)]
+
+   for blockStr in seq.split():
+      blockName = getBlockName(blockStr)
+      if '!' in blockStr:
+         for policySim in policySims:
+            policySim.flush(blockName)
+      else:
+         for policySim in policySims:
+            hit = policySim.acc(blockName)
+            if '?' in blockStr:
+               hits += int(hit)
+   return hits
+
+
+def getAges(blocks, seq, policySimClass, assoc):
+   ages = {}
+   for block in blocks:
+      for i in count(0):
+         curSeq = seq + ' ' + ' '.join('N' + str(n) for n in range(0,i)) + ' ' + block + '?'
+         if getHits(policySimClass(assoc), curSeq) == 0:
+            ages[block] = i
+            break
+   return ages
+
+
+def getGraph(blocks, seq, policySimClass, assoc, maxAge, nSets=1, nRep=1, agg="med"):
+   traces = []
+   for block in blocks:
+      trace = []
+      for i in range(0, maxAge):
+         curSeq = seq + ' ' + ' '.join('N' + str(n) for n in range(0,i)) + ' ' + block + '?'
+         hits = [getHits(curSeq, policySimClass, assoc, nSets) for _ in range(0, nRep)]
+         if agg == "med":
+            aggValue = median(hits)
+         elif agg == "min":
+            aggValue = min(hits)
+         else:
+            aggValue = float(sum(hits))/nRep
+         trace.append(aggValue)
+      traces.append((block, trace))
+   return traces
+
+
+def getPermutations(policySimClass, assoc, maxAge=None):
+   # initial ages
+   initBlocks = ['I' + str(i) for i in range(0, assoc)]
+   seq = ' '.join(initBlocks)
+
+   initAges = getAges(initBlocks, seq, policySimClass, assoc)
+
+   accSeqStr = 'Access sequence: <wbinvd> ' + seq
+   print accSeqStr
+   print 'Ages: {' + ', '.join(b + ': ' + str(initAges[b]) for b in initBlocks) + '}'
+
+   blocks = ['B' + str(i) for i in range(0, assoc)]
+   baseSeq = ' '.join(initBlocks + blocks)
+
+   ages = getAges(blocks, baseSeq, policySimClass, assoc)
+
+   accSeqStr = 'Access sequence: <wbinvd> ' + baseSeq
+   print accSeqStr
+   print 'Ages: {' + ', '.join(b + ': ' + str(ages[b]) for b in blocks) + '}'
+
+   blocksSortedByAge = [a[0] for a in sorted(ages.items(), key=lambda x: -x[1])] # most recent block first
+
+   for permI, permBlock in enumerate(blocksSortedByAge):
+      seq = baseSeq + ' ' + permBlock
+      permAges = getAges(blocks, seq, policySimClass, assoc)
+
+      accSeqStr = 'Access sequence: <wbinvd> ' + seq
+
+      perm = [-1] * assoc
+      for bi, b in enumerate(blocksSortedByAge):
+         permAge = permAges[b]
+         if permAge < 1 or permAge > assoc:
+            break
+         perm[assoc-permAge] = bi
+
+      print u'\u03A0_' + str(permI) + ' = ' + str(tuple(perm))
+
--- a/tools/CacheAnalyzer/cpuid.py
+++ b/tools/CacheAnalyzer/cpuid.py
@@ -0,0 +1,545 @@
+#!/usr/bin/python
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2019 Andreas Abel
+#
+# This file was modified from https://github.com/flababah/cpuid.py
+#
+# Original license and copyright notice:
+#
+#    The MIT License (MIT)
+#
+#    Copyright (c) 2014 Anders Høst
+#
+#    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+#    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+#    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+import collections
+import ctypes
+import os
+import platform
+import struct
+import sys
+
+from ctypes import c_uint32, c_int, c_long, c_ulong, c_size_t, c_void_p, POINTER, CFUNCTYPE
+
+import logging
+log = logging.getLogger(__name__)
+
+# Posix x86_64:
+# Three first call registers : RDI, RSI, RDX
+# Volatile registers         : RAX, RCX, RDX, RSI, RDI, R8-11
+
+# Windows x86_64:
+# Three first call registers : RCX, RDX, R8
+# Volatile registers         : RAX, RCX, RDX, R8-11
+
+# cdecl 32 bit:
+# Three first call registers : Stack (%esp)
+# Volatile registers         : EAX, ECX, EDX
+
+_POSIX_64_OPC = [
+        0x53,                    # push   %rbx
+        0x89, 0xf0,              # mov    %esi,%eax
+        0x89, 0xd1,              # mov    %edx,%ecx
+        0x0f, 0xa2,              # cpuid
+        0x89, 0x07,              # mov    %eax,(%rdi)
+        0x89, 0x5f, 0x04,        # mov    %ebx,0x4(%rdi)
+        0x89, 0x4f, 0x08,        # mov    %ecx,0x8(%rdi)
+        0x89, 0x57, 0x0c,        # mov    %edx,0xc(%rdi)
+        0x5b,                    # pop    %rbx
+        0xc3                     # retq
+]
+
+_WINDOWS_64_OPC = [
+        0x53,                    # push   %rbx
+        0x89, 0xd0,              # mov    %edx,%eax
+        0x49, 0x89, 0xc9,        # mov    %rcx,%r9
+        0x44, 0x89, 0xc1,        # mov    %r8d,%ecx
+        0x0f, 0xa2,              # cpuid
+        0x41, 0x89, 0x01,        # mov    %eax,(%r9)
+        0x41, 0x89, 0x59, 0x04,  # mov    %ebx,0x4(%r9)
+        0x41, 0x89, 0x49, 0x08,  # mov    %ecx,0x8(%r9)
+        0x41, 0x89, 0x51, 0x0c,  # mov    %edx,0xc(%r9)
+        0x5b,                    # pop    %rbx
+        0xc3                     # retq
+]
+
+_CDECL_32_OPC = [
+        0x53,                    # push   %ebx
+        0x57,                    # push   %edi
+        0x8b, 0x7c, 0x24, 0x0c,  # mov    0xc(%esp),%edi
+        0x8b, 0x44, 0x24, 0x10,  # mov    0x10(%esp),%eax
+        0x8b, 0x4c, 0x24, 0x14,  # mov    0x14(%esp),%ecx
+        0x0f, 0xa2,              # cpuid
+        0x89, 0x07,              # mov    %eax,(%edi)
+        0x89, 0x5f, 0x04,        # mov    %ebx,0x4(%edi)
+        0x89, 0x4f, 0x08,        # mov    %ecx,0x8(%edi)
+        0x89, 0x57, 0x0c,        # mov    %edx,0xc(%edi)
+        0x5f,                    # pop    %edi
+        0x5b,                    # pop    %ebx
+        0xc3                     # ret
+]
+
+is_windows = os.name == "nt"
+is_64bit   = ctypes.sizeof(ctypes.c_voidp) == 8
+
+class CPUID_struct(ctypes.Structure):
+    _fields_ = [(r, c_uint32) for r in ("eax", "ebx", "ecx", "edx")]
+
+class CPUID(object):
+    def __init__(self):
+        if platform.machine() not in ("AMD64", "x86_64", "x86", "i686"):
+            raise SystemError("Only available for x86")
+
+        if is_windows:
+            if is_64bit:
+                # VirtualAlloc seems to fail under some weird
+                # circumstances when ctypes.windll.kernel32 is
+                # used under 64 bit Python. CDLL fixes this.
+                self.win = ctypes.CDLL("kernel32.dll")
+                opc = _WINDOWS_64_OPC
+            else:
+                # Here ctypes.windll.kernel32 is needed to get the
+                # right DLL. Otherwise it will fail when running
+                # 32 bit Python on 64 bit Windows.
+                self.win = ctypes.windll.kernel32
+                opc = _CDECL_32_OPC
+        else:
+            opc = _POSIX_64_OPC if is_64bit else _CDECL_32_OPC
+
+        size = len(opc)
+        code = (ctypes.c_ubyte * size)(*opc)
+
+        if is_windows:
+            self.win.VirtualAlloc.restype = c_void_p
+            self.win.VirtualAlloc.argtypes = [ctypes.c_void_p, ctypes.c_size_t, ctypes.c_ulong, ctypes.c_ulong]
+            self.addr = self.win.VirtualAlloc(None, size, 0x1000, 0x40)
+            if not self.addr:
+                raise MemoryError("Could not allocate RWX memory")
+        else:
+            self.libc = ctypes.cdll.LoadLibrary(None)
+            self.libc.valloc.restype = ctypes.c_void_p
+            self.libc.valloc.argtypes = [ctypes.c_size_t]
+            self.addr = self.libc.valloc(size)
+            if not self.addr:
+                raise MemoryError("Could not allocate memory")
+
+            self.libc.mprotect.restype = c_int
+            self.libc.mprotect.argtypes = [c_void_p, c_size_t, c_int]
+            ret = self.libc.mprotect(self.addr, size, 1 | 2 | 4)
+            if ret != 0:
+                raise OSError("Failed to set RWX")
+
+
+        ctypes.memmove(self.addr, code, size)
+
+        func_type = CFUNCTYPE(None, POINTER(CPUID_struct), c_uint32, c_uint32)
+        self.func_ptr = func_type(self.addr)
+
+    def __call__(self, eax, ecx=0):
+        struct = CPUID_struct()
+        self.func_ptr(struct, eax, ecx)
+        return struct.eax, struct.ebx, struct.ecx, struct.edx
+
+    def __del__(self):
+        if is_windows:
+            self.win.VirtualFree.restype = c_long
+            self.win.VirtualFree.argtypes = [c_void_p, c_size_t, c_ulong]
+            self.win.VirtualFree(self.addr, 0, 0x8000)
+        elif self.libc:
+            # Seems to throw exception when the program ends and
+            # libc is cleaned up before the object?
+            self.libc.free.restype = None
+            self.libc.free.argtypes = [c_void_p]
+            self.libc.free(self.addr)
+
+def cpu_vendor(cpu):
+    _, b, c, d = cpu(0)
+    return str(struct.pack("III", b, d, c).decode("ascii"))
+
+def cpu_name(cpu):
+    return " ".join(str("".join((struct.pack("IIII", *cpu(0x80000000 + i)).decode("ascii")
+            for i in range(2, 5))).replace('\x00', '')).split())
+
+VersionInfo = collections.namedtuple('VersionInfo', 'displ_family displ_model stepping')
+
+def version_info(cpu):
+   a, _, _, _ = cpu(0x01)
+
+   displ_family = (a >> 8) & 0xF
+   if (displ_family == 0x0F):
+      displ_family += (a >> 20) & 0xFF
+
+   displ_model = (a >> 4) & 0xF
+   if (displ_family == 0x06 or displ_family == 0x0F):
+      displ_model += (a >> 12) & 0xF0
+
+   stepping = a & 0xF
+
+   return VersionInfo(int(displ_family), int(displ_model), int(stepping))
+
+def micro_arch(cpu):
+   vi = version_info(cpu)
+
+   if (vi.displ_family, vi.displ_model) in [(0x06, 0x0F)]:
+      return 'Core'
+   if (vi.displ_family, vi.displ_model) in [(0x06, 0x17)]:
+      return 'EnhancedCore'
+   if (vi.displ_family, vi.displ_model) in [(0x06, 0x1A), (0x06, 0x1E), (0x06, 0x1F), (0x06, 0x2E)]:
+      return 'NHM'
+   if (vi.displ_family, vi.displ_model) in [(0x06, 0x25), (0x06, 0x2C), (0x06, 0x2F)]:
+      return 'WSM'
+   if (vi.displ_family, vi.displ_model) in [(0x06, 0x2A), (0x06, 0x2D)]:
+      return 'SNB'
+   if (vi.displ_family, vi.displ_model) in [(0x06, 0x3A)]:
+      return 'IVB'
+   if (vi.displ_family, vi.displ_model) in [(0x06, 0x3C), (0x06, 0x45), (0x06, 0x46)]:
+      return 'HSW'
+   if (vi.displ_family, vi.displ_model) in [(0x06, 0x3D), (0x06, 0x47), (0x06, 0x56), (0x06, 0x4F)]:
+      return 'BDW'
+   if (vi.displ_family, vi.displ_model) in [(0x06, 0x4E), (0x06, 0x5E)]:
+      return 'SKL'
+   if (vi.displ_family, vi.displ_model) in [(0x06, 0x55)]:
+      return 'SKX'
+   if (vi.displ_family, vi.displ_model) in [(0x06, 0x8E), (0x06, 0x9E)]:
+      # ToDo: not sure if this is correct
+      if vi.stepping <= 0x9:
+         return 'KBL'
+      else:
+         return 'CFL'
+   if (vi.displ_family, vi.displ_model) in [(0x06, 0x66)]:
+      return 'CNL'
+   if (vi.displ_family, vi.displ_model) in [(0x17, 0x01), (0x17, 0x11)]:
+      return 'ZEN'
+   if (vi.displ_family, vi.displ_model) in [(0x17, 0x08), (0x17, 0x18)]:
+      return 'ZEN+'
+   if (vi.displ_family, vi.displ_model) in [(0x17, 0x71)]:
+      return 'ZEN2'
+
+   return 'unknown'
+
+# See Table 3-12 (Encoding of CPUID Leaf 2 Descriptors) in Intel's Instruction Set Reference
+leaf2_descriptors = {
+   0x01: ('TLB', 'Instruction TLB: 4 KByte pages, 4-way set associative, 32 entries'),
+   0x02: ('TLB', 'Instruction TLB: 4 MByte pages, fully associative, 2 entries'),
+   0x03: ('TLB', 'Data TLB: 4 KByte pages, 4-way set associative, 64 entries'),
+   0x04: ('TLB', 'Data TLB: 4 MByte pages, 4-way set associative, 8 entries'),
+   0x05: ('TLB', 'Data TLB1: 4 MByte pages, 4-way set associative, 32 entries'),
+   0x06: ('Cache', '1st-level instruction cache: 8 KBytes, 4-way set associative, 32 byte line size'),
+   0x08: ('Cache', '1st-level instruction cache: 16 KBytes, 4-way set associative, 32 byte line size'),
+   0x09: ('Cache', '1st-level instruction cache: 32KBytes, 4-way set associative, 64 byte line size'),
+   0x0A: ('Cache', '1st-level data cache: 8 KBytes, 2-way set associative, 32 byte line size'),
+   0x0B: ('TLB', 'Instruction TLB: 4 MByte pages, 4-way set associative, 4 entries'),
+   0x0C: ('Cache', '1st-level data cache: 16 KBytes, 4-way set associative, 32 byte line size'),
+   0x0D: ('Cache', '1st-level data cache: 16 KBytes, 4-way set associative, 64 byte line size'),
+   0x0E: ('Cache', '1st-level data cache: 24 KBytes, 6-way set associative, 64 byte line size'),
+   0x1D: ('Cache', '2nd-level cache: 128 KBytes, 2-way set associative, 64 byte line size'),
+   0x21: ('Cache', '2nd-level cache: 256 KBytes, 8-way set associative, 64 byte line size'),
+   0x22: ('Cache', '3rd-level cache: 512 KBytes, 4-way set associative, 64 byte line size, 2 lines per sector'),
+   0x23: ('Cache', '3rd-level cache: 1 MBytes, 8-way set associative, 64 byte line size, 2 lines per sector'),
+   0x24: ('Cache', '2nd-level cache: 1 MBytes, 16-way set associative, 64 byte line size'),
+   0x25: ('Cache', '3rd-level cache: 2 MBytes, 8-way set associative, 64 byte line size, 2 lines per sector'),
+   0x29: ('Cache', '3rd-level cache: 4 MBytes, 8-way set associative, 64 byte line size, 2 lines per sector'),
+   0x2C: ('Cache', '1st-level data cache: 32 KBytes, 8-way set associative, 64 byte line size'),
+   0x30: ('Cache', '1st-level instruction cache: 32 KBytes, 8-way set associative, 64 byte line size'),
+   0x40: ('Cache', 'No 2nd-level cache or, if processor contains a valid 2nd-level cache, no 3rd-level cache'),
+   0x41: ('Cache', '2nd-level cache: 128 KBytes, 4-way set associative, 32 byte line size'),
+   0x42: ('Cache', '2nd-level cache: 256 KBytes, 4-way set associative, 32 byte line size'),
+   0x43: ('Cache', '2nd-level cache: 512 KBytes, 4-way set associative, 32 byte line size'),
+   0x44: ('Cache', '2nd-level cache: 1 MByte, 4-way set associative, 32 byte line size'),
+   0x45: ('Cache', '2nd-level cache: 2 MByte, 4-way set associative, 32 byte line size'),
+   0x46: ('Cache', '3rd-level cache: 4 MByte, 4-way set associative, 64 byte line size'),
+   0x47: ('Cache', '3rd-level cache: 8 MByte, 8-way set associative, 64 byte line size'),
+   0x48: ('Cache', '2nd-level cache: 3MByte, 12-way set associative, 64 byte line size'),
+   0x49: ('Cache', '3rd-level cache: 4MB, 16-way set associative, 64-byte line size (Intel Xeon processor MP, Family 0FH, Model 06H); 2nd-level cache: 4 MByte, 16-way set associative, 64 byte line size'),
+   0x4A: ('Cache', '3rd-level cache: 6MByte, 12-way set associative, 64 byte line size'),
+   0x4B: ('Cache', '3rd-level cache: 8MByte, 16-way set associative, 64 byte line size'),
+   0x4C: ('Cache', '3rd-level cache: 12MByte, 12-way set associative, 64 byte line size'),
+   0x4D: ('Cache', '3rd-level cache: 16MByte, 16-way set associative, 64 byte line size'),
+   0x4E: ('Cache', '2nd-level cache: 6MByte, 24-way set associative, 64 byte line size'),
+   0x4F: ('TLB', 'Instruction TLB: 4 KByte pages, 32 entries'),
+   0x50: ('TLB', 'Instruction TLB: 4 KByte and 2-MByte or 4-MByte pages, 64 entries'),
+   0x51: ('TLB', 'Instruction TLB: 4 KByte and 2-MByte or 4-MByte pages, 128 entries'),
+   0x52: ('TLB', 'Instruction TLB: 4 KByte and 2-MByte or 4-MByte pages, 256 entries'),
+   0x55: ('TLB', 'Instruction TLB: 2-MByte or 4-MByte pages, fully associative, 7 entries'),
+   0x56: ('TLB', 'Data TLB0: 4 MByte pages, 4-way set associative, 16 entries'),
+   0x57: ('TLB', 'Data TLB0: 4 KByte pages, 4-way associative, 16 entries'),
+   0x59: ('TLB', 'Data TLB0: 4 KByte pages, fully associative, 16 entries'),
+   0x5A: ('TLB', 'Data TLB0: 2 MByte or 4 MByte pages, 4-way set associative, 32 entries'),
+   0x5B: ('TLB', 'Data TLB: 4 KByte and 4 MByte pages, 64 entries'),
+   0x5C: ('TLB', 'Data TLB: 4 KByte and 4 MByte pages,128 entries'),
+   0x5D: ('TLB', 'Data TLB: 4 KByte and 4 MByte pages,256 entries'),
+   0x60: ('Cache', '1st-level data cache: 16 KByte, 8-way set associative, 64 byte line size'),
+   0x61: ('TLB', 'Instruction TLB: 4 KByte pages, fully associative, 48 entries'),
+   0x63: ('TLB', 'Data TLB: 2 MByte or 4 MByte pages, 4-way set associative, 32 entries and a separate array with 1 GByte pages, 4-way set associative, 4 entries'),
+   0x64: ('TLB', 'Data TLB: 4 KByte pages, 4-way set associative, 512 entries'),
+   0x66: ('Cache', '1st-level data cache: 8 KByte, 4-way set associative, 64 byte line size'),
+   0x67: ('Cache', '1st-level data cache: 16 KByte, 4-way set associative, 64 byte line size'),
+   0x68: ('Cache', '1st-level data cache: 32 KByte, 4-way set associative, 64 byte line size'),
+   0x6A: ('Cache', 'uTLB: 4 KByte pages, 8-way set associative, 64 entries'),
+   0x6B: ('Cache', 'DTLB: 4 KByte pages, 8-way set associative, 256 entries'),
+   0x6C: ('Cache', 'DTLB: 2M/4M pages, 8-way set associative, 128 entries'),
+   0x6D: ('Cache', 'DTLB: 1 GByte pages, fully associative, 16 entries'),
+   0x70: ('Cache', 'Trace cache: 12 K-μop, 8-way set associative'),
+   0x71: ('Cache', 'Trace cache: 16 K-μop, 8-way set associative'),
+   0x72: ('Cache', 'Trace cache: 32 K-μop, 8-way set associative'),
+   0x76: ('TLB', 'Instruction TLB: 2M/4M pages, fully associative, 8 entries'),
+   0x78: ('Cache', '2nd-level cache: 1 MByte, 4-way set associative, 64byte line size'),
+   0x79: ('Cache', '2nd-level cache: 128 KByte, 8-way set associative, 64 byte line size, 2 lines per sector'),
+   0x7A: ('Cache', '2nd-level cache: 256 KByte, 8-way set associative, 64 byte line size, 2 lines per sector'),
+   0x7B: ('Cache', '2nd-level cache: 512 KByte, 8-way set associative, 64 byte line size, 2 lines per sector'),
+   0x7C: ('Cache', '2nd-level cache: 1 MByte, 8-way set associative, 64 byte line size, 2 lines per sector'),
+   0x7D: ('Cache', '2nd-level cache: 2 MByte, 8-way set associative, 64byte line size'),
+   0x7F: ('Cache', '2nd-level cache: 512 KByte, 2-way set associative, 64-byte line size'),
+   0x80: ('Cache', '2nd-level cache: 512 KByte, 8-way set associative, 64-byte line size'),
+   0x82: ('Cache', '2nd-level cache: 256 KByte, 8-way set associative, 32 byte line size'),
+   0x83: ('Cache', '2nd-level cache: 512 KByte, 8-way set associative, 32 byte line size'),
+   0x84: ('Cache', '2nd-level cache: 1 MByte, 8-way set associative, 32 byte line size'),
+   0x85: ('Cache', '2nd-level cache: 2 MByte, 8-way set associative, 32 byte line size'),
+   0x86: ('Cache', '2nd-level cache: 512 KByte, 4-way set associative, 64 byte line size'),
+   0x87: ('Cache', '2nd-level cache: 1 MByte, 8-way set associative, 64 byte line size'),
+   0xA0: ('DTLB', 'DTLB: 4k pages, fully associative, 32 entries'),
+   0xB0: ('TLB', 'Instruction TLB: 4 KByte pages, 4-way set associative, 128 entries'),
+   0xB1: ('TLB', 'Instruction TLB: 2M pages, 4-way, 8 entries or 4M pages, 4-way, 4 entries'),
+   0xB2: ('TLB', 'Instruction TLB: 4KByte pages, 4-way set associative, 64 entries'),
+   0xB3: ('TLB', 'Data TLB: 4 KByte pages, 4-way set associative, 128 entries'),
+   0xB4: ('TLB', 'Data TLB1: 4 KByte pages, 4-way associative, 256 entries'),
+   0xB5: ('TLB', 'Instruction TLB: 4KByte pages, 8-way set associative, 64 entries'),
+   0xB6: ('TLB', 'Instruction TLB: 4KByte pages, 8-way set associative, 128 entries'),
+   0xBA: ('TLB', 'Data TLB1: 4 KByte pages, 4-way associative, 64 entries'),
+   0xC0: ('TLB', 'Data TLB: 4 KByte and 4 MByte pages, 4-way associative, 8 entries'),
+   0xC1: ('STLB', 'Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries'),
+   0xC2: ('DTLB', 'DTLB: 4 KByte/2 MByte pages, 4-way associative, 16 entries'),
+   0xC3: ('STLB', 'Shared 2nd-Level TLB: 4 KByte /2 MByte pages, 6-way associative, 1536 entries. Also 1GBbyte pages, 4-way, 16 entries.'),
+   0xC4: ('DTLB', 'DTLB: 2M/4M Byte pages, 4-way associative, 32 entries'),
+   0xCA: ('STLB', 'Shared 2nd-Level TLB: 4 KByte pages, 4-way associative, 512 entries'),
+   0xD0: ('Cache', '3rd-level cache: 512 KByte, 4-way set associative, 64 byte line size'),
+   0xD1: ('Cache', '3rd-level cache: 1 MByte, 4-way set associative, 64 byte line size'),
+   0xD2: ('Cache', '3rd-level cache: 2 MByte, 4-way set associative, 64 byte line size'),
+   0xD6: ('Cache', '3rd-level cache: 1 MByte, 8-way set associative, 64 byte line size'),
+   0xD7: ('Cache', '3rd-level cache: 2 MByte, 8-way set associative, 64 byte line size'),
+   0xD8: ('Cache', '3rd-level cache: 4 MByte, 8-way set associative, 64 byte line size'),
+   0xDC: ('Cache', '3rd-level cache: 1.5 MByte, 12-way set associative, 64 byte line size'),
+   0xDD: ('Cache', '3rd-level cache: 3 MByte, 12-way set associative, 64 byte line size'),
+   0xDE: ('Cache', '3rd-level cache: 6 MByte, 12-way set associative, 64 byte line size'),
+   0xE2: ('Cache', '3rd-level cache: 2 MByte, 16-way set associative, 64 byte line size'),
+   0xE3: ('Cache', '3rd-level cache: 4 MByte, 16-way set associative, 64 byte line size'),
+   0xE4: ('Cache', '3rd-level cache: 8 MByte, 16-way set associative, 64 byte line size'),
+   0xEA: ('Cache', '3rd-level cache: 12MByte, 24-way set associative, 64 byte line size'),
+   0xEB: ('Cache', '3rd-level cache: 18MByte, 24-way set associative, 64 byte line size'),
+   0xEC: ('Cache', '3rd-level cache: 24MByte, 24-way set associative, 64 byte line size'),
+   0xF0: ('Prefetch', '64-Byte prefetching'),
+   0xF1: ('Prefetch', '128-Byte prefetching'),
+   0xFE: ('General', 'CPUID leaf 2 does not report TLB descriptor information; use CPUID leaf 18H to query TLB and other address translation parameters.'),
+   0xFF: ('General', 'CPUID leaf 2 does not report cache descriptor information, use CPUID leaf 4 to query cache parameters')
+}
+
+# 0xAABBCCDD -> [0xDD, 0xCC, 0xBB, 0xAA]
+def get_bytes(reg):
+   return [((reg >> s) & 0xFF) for s in range(0, 32, 8)]
+
+def get_bit(reg, bit):
+   return (reg >> bit) & 1
+
+# Returns the bits between the indexes start and end (inclusive); start must be <= end
+def get_bits(reg, start, end):
+   return (reg >> start) & ((1 << (end-start+1)) - 1)
+
+
+def get_cache_info(cpu):
+   vendor = cpu_vendor(cpu)
+
+   cacheInfo = dict()
+
+   if vendor == 'GenuineIntel':
+      log.info('\nCPUID Leaf 2 information:')
+
+      a, b, c, d = cpu(0x02)
+      for ri, reg in enumerate([a, b, c, d]):
+         if (reg >> 31): continue # register is reserved
+
+         for bi, byte in enumerate(get_bytes(reg)):
+            if (ri == 0) and (bi == 0): continue # least-significant byte in EAX
+            if byte == 0: continue # Null descriptor
+
+            log.info('  - ' + leaf2_descriptors[byte][1])
+
+      log.info('\nCPUID Leaf 4 information:')
+
+      index = 0
+      while (True):
+         a, b, c, d = cpu(0x04, index)
+
+         cacheType = ''
+         bits3_0 = get_bits(a, 0, 3)
+
+         if bits3_0 == 0: break
+         if bits3_0 == 1: cacheType = 'Data Cache'
+         if bits3_0 == 2: cacheType = 'Instruction Cache'
+         if bits3_0 == 3: cacheType = 'Unified Cache'
+
+         level = get_bits(a, 5, 7)
+         log.info('  Level ' + str(level) + ' (' + cacheType + '):')
+
+         parameters = []
+         if get_bit(a, 8): parameters.append('Self Initializing cache level (does not need SW initialization)')
+         if get_bit(a, 9): parameters.append('Fully Associative cache')
+
+         parameters.append('Maximum number of addressable IDs for logical processors sharing this cache: ' + str(get_bits(a, 14, 25)+1))
+         parameters.append('Maximum number of addressable IDs for processor cores in the physical package: ' + str(get_bits(a, 26, 31)+1))
+         L = int(get_bits(b, 0, 11)+1)
+         P = int(get_bits(b, 12, 21)+1)
+         W = int(get_bits(b, 22, 31)+1)
+         S = int(c+1)
+         parameters.append('System Coherency Line Size (L): ' + str(L) + ' B')
+         parameters.append('Physical Line partitions (P): ' + str(P))
+         parameters.append('Ways of associativity (W): ' + str(W))
+         parameters.append('Number of Sets (S): ' + str(S))
+         parameters.append('Cache Size: ' + str(W*P*L*S/1024) + ' kB')
+
+         if get_bit(d, 0): parameters.append('WBINVD/INVD is not guaranteed to act upon lower level caches of non-originating threads sharing this cache')
+         else: parameters.append('WBINVD/INVD from threads sharing this cache acts upon lower level caches for threads sharing this cache')
+
+         if get_bit(d, 1): parameters.append('Cache is inclusive of lower cache levels')
+         else: parameters.append('Cache is not inclusive of lower cache levels')
+
+         complexAddressing = False
+         if get_bit(d, 2):
+            complexAddressing = True
+            parameters.append('A complex function is used to index the cache, potentially using all address bits')
+
+         cacheInfo['L' + str(level) + (cacheType[0] if cacheType[0] in ['D', 'I'] else '')] = {
+            'lineSize': L,
+            'nSets': S,
+            'assoc': W,
+            'complex': complexAddressing
+         }
+
+         for par in parameters:
+            log.info('    - ' + par)
+
+         index += 1
+   elif vendor == 'AuthenticAMD':
+      _, _, c, d = cpu(0x80000005)
+
+      L1DcLineSize = int(get_bits(c, 0, 7))
+      L1DcLinesPerTag = int(get_bits(c, 8, 15))
+      L1DcAssoc = int(get_bits(c, 16, 23))
+      L1DcSize = int(get_bits(c, 24, 31))
+
+      log.info('  L1DcLineSize: ' + str(L1DcLineSize) + ' B')
+      log.info('  L1DcLinesPerTag: ' + str(L1DcLinesPerTag))
+      log.info('  L1DcAssoc: ' + str(L1DcAssoc))
+      log.info('  L1DcSize: ' + str(L1DcSize) + ' kB')
+
+      cacheInfo['L1D'] = {
+         'lineSize': L1DcLineSize,
+         'nSets': L1DcSize*1024/L1DcAssoc/L1DcLineSize,
+         'assoc': L1DcAssoc
+      }
+
+      L1IcLineSize = int(get_bits(d, 0, 7))
+      L1IcLinesPerTag = int(get_bits(d, 8, 15))
+      L1IcAssoc = int(get_bits(d, 16, 23))
+      L1IcSize = int(get_bits(d, 24, 31))
+
+      log.info('  L1IcLineSize: ' + str(L1IcLineSize) + ' B')
+      log.info('  L1IcLinesPerTag: ' + str(L1IcLinesPerTag))
+      log.info('  L1IcAssoc: ' + str(L1IcAssoc))
+      log.info('  L1IcSize: ' + str(L1IcSize) + ' kB')
+
+      cacheInfo['L1I'] = {
+         'lineSize': L1IcLineSize,
+         'nSets': L1IcSize*1024/L1IcAssoc/L1IcLineSize,
+         'assoc': L1IcAssoc
+      }
+
+      _, _, c, d = cpu(0x80000006)
+
+      L2LineSize = int(get_bits(c, 0, 7))
+      L2LinesPerTag = int(get_bits(c, 8, 11))
+      L2Size = int(get_bits(c, 16, 31))
+      L2Assoc = 0
+      c_15_12 = get_bits(c, 12, 15)
+      if c_15_12 == 0x1: L2Assoc = 1
+      elif c_15_12 == 0x2: L2Assoc = 2
+      elif c_15_12 == 0x4: L2Assoc = 4
+      elif c_15_12 == 0x6: L2Assoc = 8
+      elif c_15_12 == 0x8: L2Assoc = 16
+      elif c_15_12 == 0xA: L2Assoc = 32
+      elif c_15_12 == 0xB: L2Assoc = 48
+      elif c_15_12 == 0xC: L2Assoc = 64
+      elif c_15_12 == 0xD: L2Assoc = 96
+      elif c_15_12 == 0xE: L2Assoc = 128
+      elif c_15_12 == 0x2: L2Assoc = L2Size*1024/L2LineSize
+
+      log.info('  L2LineSize: ' + str(L2LineSize) + ' B')
+      log.info('  L2LinesPerTag: ' + str(L2LinesPerTag))
+      log.info('  L2Assoc: ' + str(L2Assoc))
+      log.info('  L2Size: ' + str(L2Size) + ' kB')
+
+      cacheInfo['L2'] = {
+         'lineSize': L2LineSize,
+         'nSets': L2Size*1024/L2Assoc/L2LineSize,
+         'assoc': L2Assoc
+      }
+
+      L3LineSize = int(get_bits(d, 0, 7))
+      L3LinesPerTag = int(get_bits(d, 8, 11))
+      L3Size = int(get_bits(d, 18, 31)*512)
+      L3Assoc = 0
+      d_15_12 = get_bits(d, 12, 15)
+      if d_15_12 == 0x8: L3Assoc = 16
+      elif d_15_12 == 0xA: L3Assoc = 32
+      elif d_15_12 == 0xB: L3Assoc = 48
+      elif d_15_12 == 0xC: L3Assoc = 64
+      elif d_15_12 == 0xD: L3Assoc = 96
+      elif d_15_12 == 0xE: L3Assoc = 128
+
+      log.info('  L3LineSize: ' + str(L3LineSize) + ' B')
+      log.info('  L3LinesPerTag: ' + str(L3LinesPerTag))
+      log.info('  L3Assoc: ' + str(L3Assoc))
+      log.info('  L3Size: ' + str(L3Size/1024) + ' MB')
+
+      cacheInfo['L3'] = {
+         'lineSize': L3LineSize,
+         'nSets': L3Size*1024/L3Assoc/L3LineSize,
+         'assoc': L3Assoc
+      }
+
+   return cacheInfo
+
+def get_basic_info(cpu):
+    strs = ['Vendor: ' + cpu_vendor(cpu)]
+    strs += ['CPU Name: ' + cpu_name(cpu)]
+    vi = version_info(cpu)
+    strs += ['Family: 0x%02X' % vi.displ_family]
+    strs += ['Model: 0x%02X' % vi.displ_model]
+    strs += ['Stepping: 0x%X' % vi.stepping]
+    strs += ['Microarchitecture: ' + micro_arch(cpu)]
+    return '\n'.join(strs)
+
+if __name__ == "__main__":
+    logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.INFO)
+    cpuid = CPUID()
+
+    def valid_inputs():
+        for eax in (0x0, 0x80000000):
+            highest, _, _, _ = cpuid(eax)
+            while eax <= highest:
+                regs = cpuid(eax)
+                yield (eax, regs)
+                eax += 1
+
+    print " ".join(x.ljust(8) for x in ("CPUID", "A", "B", "C", "D")).strip()
+    for eax, regs in valid_inputs():
+        print "%08x" % eax, " ".join("%08x" % reg for reg in regs)
+
+    print ''
+    print get_basic_info(cpuid)
+
+    print '\nCache information:'
+    get_cache_info(cpuid)
+
--- a/tools/CacheAnalyzer/hitMiss.py
+++ b/tools/CacheAnalyzer/hitMiss.py
@@ -0,0 +1,53 @@
+#!/usr/bin/python
+import argparse
+import sys
+
+from cacheLib import *
+import cacheSim
+
+import logging
+log = logging.getLogger(__name__)
+
+
+def main():
+   parser = argparse.ArgumentParser(description='Outputs whether the last access of a sequence results in a hit or miss')
+   parser.add_argument("-seq", help="Access sequence", required=True)
+   parser.add_argument("-seq_init", help="Initialization sequence", default='')
+   parser.add_argument("-level", help="Cache level (Default: 1)", type=int, default=1)
+   parser.add_argument("-sets", help="Cache sets (if not specified, all cache sets are used)")
+   parser.add_argument("-cBox", help="cBox (default: 1)", type=int, default=1) # use 1 as default, as, e.g., on SNB, box 0 only has 15 ways instead of 16
+   parser.add_argument("-noClearHL", help="Do not clear higher levels", action='store_true')
+   parser.add_argument("-loop", help="Loop count (Default: 1)", type=int, default=1)
+   parser.add_argument("-noWbinvd", help="Do not call wbinvd before each run", action='store_true')
+   parser.add_argument("-logLevel", help="Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)", default='WARNING')
+   parser.add_argument("-sim", help="Simulate the given policy instead of running the experiment on the hardware")
+   parser.add_argument("-simAssoc", help="Associativity of the simulated cache (default: 8)", type=int, default=8)
+   args = parser.parse_args()
+
+   logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.getLevelName(args.logLevel))
+
+   if args.sim:
+      policyClass = cacheSim.Policies[args.sim]
+      seq = re.sub('[?!]', '', ' '.join([args.seq_init, args.seq])).strip() + '?'
+      hits = cacheSim.getHits(policyClass(args.simAssoc), seq)
+      if hits > 0:
+         print 'HIT'
+         exit(1)
+      else:
+         print 'MISS'
+         exit(0)
+   else:
+      setCount = len(parseCacheSetsStr(args.level, True, args.sets))
+      seq = re.sub('[?!]', '', args.seq).strip() + '?'
+      nb = runCacheExperiment(args.level, seq, initSeq=args.seq_init, cacheSets=args.sets, cBox=args.cBox, clearHL=(not args.noClearHL), loop=args.loop,
+                              wbinvd=(not args.noWbinvd))
+      if nb['L' + str(args.level) + '_HIT']/setCount > .5:
+         print 'HIT'
+         exit(1)
+      else:
+         print 'MISS'
+         exit(0)
+
+
+if __name__ == "__main__":
+    main()
--- a/tools/CacheAnalyzer/permPolicy.py
+++ b/tools/CacheAnalyzer/permPolicy.py
@@ -0,0 +1,104 @@
+#!/usr/bin/python
+from itertools import count
+from collections import namedtuple, OrderedDict
+
+import argparse
+import math
+import os
+import re
+import subprocess
+import sys
+
+from plotly.offline import plot
+import plotly.graph_objects as go
+
+from cacheLib import *
+from cacheGraph import *
+
+import logging
+log = logging.getLogger(__name__)
+
+
+def getPermutations(level, html, cacheSets=None, getInitialAges=True, maxAge=None, cBox=1):
+   assoc = getCacheInfo(level).assoc
+   if not maxAge:
+      maxAge=2*assoc
+
+   hitEvent = 'L' + str(level) + '_HIT'
+   missEvent = 'L' + str(level) + '_MISS'
+
+   if getInitialAges:
+      initBlocks = ['I' + str(i) for i in range(0, assoc)]
+      seq = ' '.join(initBlocks)
+
+      initAges, nbDict = getAgesOfBlocks(initBlocks, level, seq, cacheSets=cacheSets, clearHL=True, wbinvd=True, returnNbResults=True, maxAge=maxAge, cBox=cBox)
+
+      accSeqStr = 'Access sequence: <wbinvd> ' + seq
+      print accSeqStr
+      print 'Ages: {' + ', '.join(b + ': ' + str(initAges[b]) for b in initBlocks) + '}'
+
+      event = (hitEvent if hitEvent in next(iter(nbDict.items()))[1][0] else missEvent)
+      traces = [(b, [nb[event] for nb in nbDict[b]]) for b in initBlocks]
+      html.append(getPlotlyGraphDiv(accSeqStr + ' <n fresh blocks> <block>?', '# of fresh blocks', hitEvent, traces))
+   else:
+      initBlocks = []
+
+   blocks = ['B' + str(i) for i in range(0, assoc)]
+   baseSeq = ' '.join(initBlocks + blocks)
+
+   ages, nbDict = getAgesOfBlocks(blocks, level, baseSeq, cacheSets=cacheSets, clearHL=True, wbinvd=True, returnNbResults=True, maxAge=maxAge, cBox=cBox)
+
+   accSeqStr = 'Access sequence: <wbinvd> ' + baseSeq
+   print accSeqStr
+   print 'Ages: {' + ', '.join(b + ': ' + str(ages[b]) for b in blocks) + '}'
+
+   event = (hitEvent if hitEvent in next(iter(nbDict.items()))[1][0] else missEvent)
+   traces = [(b, [nb[event] for nb in nbDict[b]]) for b in blocks]
+   html.append(getPlotlyGraphDiv(accSeqStr + ' <n fresh blocks> <block>?', '# of fresh blocks', hitEvent, traces))
+
+   blocksSortedByAge = [a[0] for a in sorted(ages.items(), key=lambda x: -x[1])] # most recent block first
+
+   for permI, permBlock in enumerate(blocksSortedByAge):
+      seq = baseSeq + ' ' + permBlock
+      permAges, nbDict = getAgesOfBlocks(blocks, level, seq, cacheSets=cacheSets, clearHL=True, wbinvd=True, returnNbResults=True, maxAge=maxAge, cBox=cBox)
+
+      accSeqStr = 'Access sequence: <wbinvd> ' + seq
+      traces = [(b, [nb[event] for nb in nbDict[b]]) for b in blocks]
+      html.append(getPlotlyGraphDiv(accSeqStr + ' <n fresh blocks> <block>?', '# of fresh blocks', hitEvent, traces))
+
+      perm = [-1] * assoc
+      for bi, b in enumerate(blocksSortedByAge):
+         permAge = permAges[b]
+         if permAge < 1 or permAge > assoc:
+            break
+         perm[assoc-permAge] = bi
+
+      print u'\u03A0_' + str(permI) + ' = ' + str(tuple(perm))
+
+
+def main():
+   parser = argparse.ArgumentParser(description='Replacement Policies')
+   parser.add_argument("-level", help="Cache level (Default: 1)", type=int, default=1)
+   parser.add_argument("-sets", help="Cache sets (if not specified, all cache sets are used)")
+   parser.add_argument("-noInit", help="Do not fill sets with associativity many elements first", action='store_true')
+   parser.add_argument("-maxAge", help="Maximum age", type=int)
+   parser.add_argument("-cBox", help="cBox (default: 1)", type=int, default=1)
+   parser.add_argument("-logLevel", help="Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)", default='WARNING')
+   parser.add_argument("-output", help="Output file name", default='permPolicy.html')
+   args = parser.parse_args()
+
+   logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.getLevelName(args.logLevel))
+
+   title = cpuid.cpu_name(cpuid.CPUID()) + ', Level: ' + str(args.level)
+
+   html = ['<html>', '<head>',  '<title>' + title + '</title>', '<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>', '</head>', '<body>']
+   html += ['<h3>' + title + '</h3>']
+   getPermutations(args.level, html, cacheSets=args.sets, getInitialAges=(not args.noInit), maxAge=args.maxAge, cBox=args.cBox)
+   html += ['</body>', '</html>']
+
+   with open(args.output ,'w') as f:
+      f.write('\n'.join(html))
+
+
+if __name__ == "__main__":
+    main()
--- a/tools/CacheAnalyzer/replPolicy.py
+++ b/tools/CacheAnalyzer/replPolicy.py
@@ -0,0 +1,171 @@
+#!/usr/bin/python
+import argparse
+import random
+
+from numpy import median
+
+from cacheLib import *
+import cacheSim
+
+import logging
+log = logging.getLogger(__name__)
+
+
+def getActualHits(seq, level, cacheSets, cBox, nMeasurements=10):
+   nb = runCacheExperiment(level, seq, cacheSets=cacheSets, cBox=cBox, clearHL=True, loop=1, wbinvd=True, nMeasurements=nMeasurements, agg='med')
+   return int(nb['L' + str(level) + '_HIT']+0.1)
+
+
+def findSmallCounterexample(policy, initSeq, level, sets, cBox, assoc, seq, nMeasurements):
+   setCount = len(parseCacheSetsStr(level, True, sets))
+
+   seqSplit = seq.split()
+   for seqPrefix in [seqSplit[:i] for i in range(assoc+1, len(seqSplit)+1)]:
+      seq = initSeq + ' '.join(seqPrefix)
+      actual = getActualHits(seq, level, sets, cBox, nMeasurements)
+      sim = cacheSim.getHits(seq, cacheSim.AllPolicies[policy], assoc, setCount)
+      print 'seq:' + seq + ', actual: ' + str(actual) + ', sim: ' + str(sim)
+      if sim != actual:
+         break
+
+   for i in reversed(range(0, len(seqPrefix)-1)):
+      tmpPrefix = seqPrefix[:i] + seqPrefix[(i+1):]
+      seq = initSeq + ' '.join(tmpPrefix)
+      actual = getActualHits(seq, level, sets, cBox, nMeasurements)
+      sim = cacheSim.getHits(seq, cacheSim.AllPolicies[policy], assoc, setCount)
+      print 'seq:' + seq + ', actual: ' + str(actual) + ', sim: ' + str(sim)
+      if sim != actual:
+         seqPrefix = tmpPrefix
+
+   return ((initSeq + ' ') if initSeq else '') + ' '.join(seqPrefix)
+
+
+def getRandomSeq(n):
+   seq = [0]
+   seqAct = ['']
+   for _ in range(0,n):
+      if random.choice([True, False]):
+         seq.append(max(seq)+1)
+         seqAct.append('')
+      else:
+         seq.append(random.choice(seq))
+         if random.randint(0,8)==0:
+            seqAct.append('?')
+         else:
+            seqAct.append('?')
+   return ' '.join(str(s) + a for s, a in zip(seq, seqAct))
+
+
+def main():
+   parser = argparse.ArgumentParser(description='Replacement Policies')
+   parser.add_argument("-level", help="Cache level (Default: 1)", type=int, default=1)
+   parser.add_argument("-sets", help="Cache sets (if not specified, all cache sets are used)")
+   parser.add_argument("-cBox", help="cBox (default: 0)", type=int)
+   parser.add_argument("-nMeasurements", help="Number of measurements", type=int, default=3)
+   parser.add_argument("-findCtrEx", help="Tries to find a small counterexample for each policy (only available for deterministic policies)", action='store_true')
+   parser.add_argument("-policies", help="Comma-separated list of policies to consider (Default: all deterministic policies)")
+   parser.add_argument("-randPolicies", help="Test randomized policies", action='store_true')
+   parser.add_argument("-allQLRUVariants", help="Test all QLRU variants", action='store_true')
+   parser.add_argument("-assoc", help="Override the associativity", type=int)
+   parser.add_argument("-initSeq", help="Adds an initialization sequence to each sequence")
+   parser.add_argument("-nRandSeq", help="Number of random sequences (default: 100)", type=int, default=100)
+   parser.add_argument("-lRandSeq", help="Length of random sequences (default: 50)", type=int, default=50)
+   parser.add_argument("-logLevel", help="Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)", default='WARNING')
+   parser.add_argument("-output", help="Output file name", default='replPolicy.html')
+   args = parser.parse_args()
+
+   logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.getLevelName(args.logLevel))
+
+   policies = sorted(cacheSim.CommonPolicies.keys())
+   if args.policies:
+      policies = args.policies.split(',')
+   elif args.allQLRUVariants:
+      policies = sorted(set(cacheSim.CommonPolicies.keys())|set(cacheSim.AllDetQLRUVariants.keys()))
+   elif args.randPolicies:
+      policies = sorted(cacheSim.AllRandPolicies.keys())
+
+   if args.assoc:
+      assoc = args.assoc
+   else:
+      assoc = getCacheInfo(args.level).assoc
+
+   cBox = 0
+   if args.cBox:
+      cBox = args.cBox
+
+   setCount = len(parseCacheSetsStr(args.level, True, args.sets))
+
+   title = cpuid.cpu_name(cpuid.CPUID()) + ', Level: ' + str(args.level) + (', CBox: ' + str(cBox) if args.cBox else '')
+
+   html = ['<html>', '<head>', '<title>' + title + '</title>', '</head>', '<body>']
+   html += ['<h3>' + title + '</h3>']
+   html += ['<table border="1" style="white-space:nowrap;">']
+   html += ['<tr><th>Sequence</th><th>Actual</th>']
+   html += ['<th>' + p.replace('_', '<br>_') + '</th>' for p in policies]
+   html += ['</tr>']
+
+   possiblePolicies = set(policies)
+   counterExamples = dict()
+   dists = {p: 0.0 for p in policies}
+
+   seqList = []
+   seqList.extend(getRandomSeq(args.lRandSeq) for _ in range(0,args.nRandSeq))
+
+   for seq in seqList:
+      fullSeq = ((args.initSeq + ' ') if args.initSeq else '') + seq
+      print fullSeq
+
+      html += ['<tr><td>' + fullSeq + '</td>']
+      actual = getActualHits(fullSeq, args.level, args.sets, cBox, args.nMeasurements)
+      html += ['<td>' + str(actual) + '</td>']
+
+      outp = ''
+      for p in policies:
+         if not args.randPolicies:
+            sim = cacheSim.getHits(fullSeq, cacheSim.AllPolicies[p], assoc, setCount)
+
+            if sim != actual:
+               possiblePolicies.discard(p)
+               color = 'red'
+               if args.findCtrEx and not p in counterExamples:
+                  counterExamples[p] = findSmallCounterexample(p, ((args.initSeq + ' ') if args.initSeq else ''), args.level, args.sets, cBox, assoc, seq,
+                                                               args.nMeasurements)
+            else:
+               color = 'green'
+         else:
+            sim = median(sum(cacheSim.getHits(fullSeq, cacheSim.AllPolicies[p], assoc, setCount) for _ in range(0, args.nMeasurements)))
+            dist = (sim - actual) ** 2
+            dists[p] += dist
+
+            colorR = min(255, dist)
+            colorG = max(0, min(255, 512 - dist))
+            color = 'rgb(' + str(colorR) + ',' + str(colorG) + ',0)'
+
+         html += ['<td style="background-color:' + color + ';">' + str(sim) + '</td>']
+
+      html += ['</tr>']
+
+      if not args.randPolicies:
+         print 'Possible policies: ' + ', '.join(possiblePolicies)
+         if not possiblePolicies: break
+
+   if not args.randPolicies and args.findCtrEx:
+      print ''
+      print 'Counter example(s): '
+      for p, ctrEx in counterExamples.items():
+         print '  ' + p + ': ' + ctrEx
+
+   html += ['</table>', '</body>', '</html>']
+
+   with open(args.output ,'w') as f:
+      f.write('\n'.join(html))
+
+   if not args.randPolicies:
+      print 'Possible policies: ' + ', '.join(possiblePolicies)
+   else:
+      for p, d in reversed(sorted(dists.items(), key=lambda d: d[1])):
+         print p + ': ' + str(d)
+
+
+if __name__ == "__main__":
+    main()
--- a/tools/CacheAnalyzer/setDueling.py
+++ b/tools/CacheAnalyzer/setDueling.py
@@ -0,0 +1,69 @@
+#!/usr/bin/python
+import argparse
+import random
+
+from plotly.offline import plot
+import plotly.graph_objects as go
+
+from cacheLib import *
+
+import logging
+log = logging.getLogger(__name__)
+
+
+def main():
+   parser = argparse.ArgumentParser(description='Tests if the L3 cache uses set dueling')
+   parser.add_argument("-level", help="Cache level (Default: 3)", type=int, default=3)
+   parser.add_argument("-nRuns", help="Maximum number of runs", type=int, default=25)
+   parser.add_argument("-loop", help="Loop count", type=int, default=25)
+   parser.add_argument("-output", help="Output file name", default='setDueling.html')
+   parser.add_argument("-logLevel", help="Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)", default='INFO')
+   args = parser.parse_args()
+
+   logging.basicConfig(stream=sys.stdout, format='%(message)s', level=logging.getLevelName(args.logLevel))
+
+   assoc = getCacheInfo(args.level).assoc
+   nSets = getCacheInfo(args.level).nSets
+   nCBoxes = max(1, getNCBoxUnits())
+   seq = ' '.join('B' + str(i) + '?' for i in range(0, assoc*4/3))
+
+   title = cpuid.cpu_name(cpuid.CPUID()) + ', Level: ' + str(args.level)
+   html = ['<html>', '<head>', '<title>' + title + '</title>', '<script src="https://cdn.plot.ly/plotly-latest.min.js">', '</script>', '</head>', '<body>']
+   html += ['<h3>' + title + '</h3>']
+
+   allSets = range(0,nSets)
+
+   yValuesForCBox = {cBox: [[] for s in range(0, nSets)] for cBox in range(0, nCBoxes)}
+
+   for i in range(0, args.nRuns):
+      for cBox in range(0, nCBoxes):
+         yValuesList = yValuesForCBox[cBox]
+         for s in list(allSets) * 2 + list(reversed(allSets)) * 2:
+            if yValuesList[s] and max(yValuesList[s]) > 2 and min(yValuesList[s]) < assoc/2:
+               continue
+
+            log.info('CBox ' + str(cBox) + ', run ' + str(i) + ', set: ' + str(s))
+
+            nb = runCacheExperiment(args.level, seq, cacheSets=str(s), clearHL=True, loop=args.loop, wbinvd=False, cBox=cBox, nMeasurements=1, warmUpCount=0)
+            yValuesList[s].append(nb['L' + str(args.level) + '_HIT'])
+
+   for cBox in range(0, nCBoxes):
+      yValues = [min(x) + (max(x)-min(x))/2 for x in yValuesForCBox[cBox] if x]
+
+      fig = go.Figure()
+      fig.update_layout(title_text='CBox ' + str(cBox) + ', Sequence (accessed ' + str(args.loop) + ' times in each set): ' + seq)
+      fig.update_layout(showlegend=True)
+      fig.update_xaxes(title_text='Set')
+      fig.add_trace(go.Scatter(y=yValues, mode='lines+markers', name='L3 Hits'))
+
+      html.append(plot(fig, include_plotlyjs=False, output_type='div'))
+
+   html += ['</body>', '</html>']
+
+   with open(args.output ,'w') as f:
+      f.write('\n'.join(html))
+      print 'Output written to ' + args.output
+
+
+if __name__ == "__main__":
+    main()
--- a/tools/CacheAnalyzer/strideGraph.py
+++ b/tools/CacheAnalyzer/strideGraph.py
@@ -0,0 +1,63 @@
+#!/usr/bin/python
+import argparse
+import math
+
+from plotly.offline import plot
+import plotly.graph_objects as go
+
+from cacheLib import *
+
+def main():
+   parser = argparse.ArgumentParser(description='Generates a graph obtained by sweeping over a memory area repeatedly with a given stride')
+   parser.add_argument("-stride", help="Stride (in bytes) (Default: 64)", type=int, default=64)
+   parser.add_argument("-startSize", help="Start size of the memory area (in kB) (Default: 4)", type=int, default=4)
+   parser.add_argument("-endSize", help="End size of the memory area (in kB) (Default: 32768)", type=int, default=32768)
+   parser.add_argument("-loop", help="Loop count (Default: 100)", type=int, default=100)
+   parser.add_argument("-output", help="Output file name", default='strideGraph.html')
+   args = parser.parse_args()
+
+   resetNanoBench()
+   setNanoBenchParameters(config=getDefaultCacheConfig(), nMeasurements=1, warmUpCount=0, unrollCount=1, loopCount=args.loop, basicMode=False, noMem=True)
+
+   nbDicts = []
+   xValues = []
+   nAddresses = []
+   tickvals = []
+
+   pt = args.startSize*1024
+   while pt <= args.endSize*1024:
+      tickvals.append(pt)
+      for x in ([int(math.pow(2, math.log(pt, 2) + i/16.0)) for i in range(0,16)] if pt < args.endSize*1024 else [pt]):
+         print x/1024
+         xValues.append(str(x))
+         addresses = range(0, x, args.stride)
+         nAddresses.append(len(addresses))
+         ec = getCodeForAddressLists([AddressList(addresses,False,False)], wbinvd=True)
+         nbDicts.append(runNanoBench(code=ec.code, init=ec.init, oneTimeInit=ec.oneTimeInit))
+      pt *= 2
+
+   title = cpuid.cpu_name(cpuid.CPUID())
+   html = ['<html>', '<head>', '<title>' + title + '</title>', '<script src="https://cdn.plot.ly/plotly-latest.min.js">', '</script>', '</head>', '<body>']
+   html += ['<h3>' + title + '</h3>']
+
+   for evtType in ['Core cycles', 'APERF', 'HIT', 'MISS']:
+      if not any(e for e in nbDicts[0].keys() if evtType in e): continue
+
+      fig = go.Figure()
+      fig.update_layout(showlegend=True)
+      fig.update_xaxes(title_text='Size (in kB)', type='category', tickvals=tickvals, ticktext=[x/1024 for x in tickvals])
+
+      for event in sorted(e for e in nbDicts[0].keys() if evtType in e):
+         yValues = [nb[event]/nAddr for nb, nAddr in zip(nbDicts, nAddresses)]
+         fig.add_trace(go.Scatter(x=xValues, y=yValues, mode='lines+markers', name=event))
+
+      html.append(plot(fig, include_plotlyjs=False, output_type='div'))
+
+   html += ['</body>', '</html>']
+
+   with open(args.output ,'w') as f:
+      f.write('\n'.join(html))
+      print 'Graph written to ' + args.output
+
+if __name__ == "__main__":
+    main()