DINEROIII() UNIX Programmer's Manual DINEROIII() NNNNAAAAMMMMEEEE dineroIII - uniprocessor cache simulator, version III SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS ddddiiiinnnneeeerrrrooooIIIIIIIIIIII -b block_size -u unified_cache_size -i instruction_cache_size -d data_cache_size [ other_options ] DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN _d_i_n_e_r_o_I_I_I is a trace-driven cache simulator that supports sub-block placement. Simulation results are determined by the input trace and the cache parameters. A trace is a fin- ite sequence of memory references usually obtained by the interpretive execution of a program or set of programs. Trace input is read by the simulator in _d_i_n format (described later). Cache parameters, e.g. block size and associativity, are set with command line options (also described later). _d_i_n_e_r_o_I_I_I uses the priority stack method of memory hierarchy simulation to increase flexibility and improve simulator performance in highly associative caches. One can simulate either a unified cache (mixed, data and instructions cached together) or separate instruction and data caches. This version of _d_i_n_e_r_o_I_I_I does not permit the simultaneous simulation of multiple alternative caches. _d_i_n_e_r_o_I_I_I differs from most other cache simulators because it supports sub-block placement (also known as sector place- ment) in which address tags are still associated with cache blocks but data is transferred to and from the cache in smaller sub-blocks. This organization is especially useful for on-chip microprocessor caches which have to load data on cache misses over a limited number of pins. In traditional cache design, this constraint leads to small blocks. Unfor- tunately, a cache with small block devotes much more on-chip RAM to address tags than does one with large blocks. Sub- block placement allows a cache to have small sub-blocks for fast data transfer and large blocks to associate with address tags for efficient use of on-chip RAM. Trace-driven simulation is frequently used to evaluating memory hierarchy performance. These simulations are repeat- able and allow cache design parameters to be varied so that effects can be isolated. They are cheaper than hardware monitoring and do not require access to or the existence of the machine being studied. Simulation results can be obtained in many situations where analytic model solutions are intractable without questionable simplifying assump- tions. Further, there does not currently exist any gen- erally accepted model for program behavior, let alone one that is suitable for cache evaluation; workloads in trace- driven simulation are represented by samples of real work- loads and contain complex embedded correlations that syn- thetic workloads often lack. Lastly, a trace-driven Printed 6/7/90 1 DINEROIII() UNIX Programmer's Manual DINEROIII() simulation is guaranteed to be representative of at least one program in execution. _d_i_n_e_r_o_I_I_I reads trace input in _d_i_n format from _s_t_d_i_n. A _d_i_n record is two-tuple _l_a_b_e_l _a_d_d_r_e_s_s. Each line of the trace file must contain one _d_i_n record. The rest of the line is ignored so that comments can be included in the trace file. The _l_a_b_e_l gives the access type of a reference. 0 read data. 1 write data. 2 instruction fetch. 3 escape record (treated as unknown access type). 4 escape record (causes cache flush). The _a_d_d_r_e_s_s is a hexadecimal byte-address between 0 and ffffffff inclusively. By default, hex addresses should NOT be preceded by ``0x'', should be constructed with lower-case digits (0123456789abcdef, NOT ABCDEF) and no error checking is done. Programmers can re-enable error-checking by com- menting out ``#define FAST_BUT_DANGEROUS_INPUT'' in file global.h. Cache parameters are set by command line options. Parame- ters _b_l_o_c_k__s_i_z_e and either _u_n_i_f_i_e_d__c_a_c_h_e__s_i_z_e or both _d_a_t_a__c_a_c_h_e__s_i_z_e and _i_n_s_t_r_u_c_t_i_o_n__c_a_c_h_e__s_i_z_e must be speci- fied. Other parameters are optional. The suffixes _K, _M and _G multiply numbers by 1024, 1024^2 and 1024^3, respectively. The following command line options are available: ----bbbb _b_l_o_c_k__s_i_z_e sets the cache block size in bytes. Must be explicitly set (e.g. -b16). ----uuuu _u_n_i_f_i_e_d__c_a_c_h_e__s_i_z_e sets the unified cache size in bytes (e.g., -u16K). A unified cache, also called a mixed cache, caches both data and instructions. If _u_n_i_f_i_e_d__c_a_c_h_e__s_i_z_e is posi- tive, both _i_n_s_t_r_u_c_t_i_o_n__c_a_c_h_e__s_i_z_e and _d_a_t_a__c_a_c_h_e__s_i_z_e must be zero. If zero, implying separate instruction and data caches will be simulated, both _i_n_s_t_r_u_c_t_i_o_n__c_a_c_h_e__s_i_z_e and _d_a_t_a__c_a_c_h_e__s_i_z_e must be set to positive values. Defaults to 0. ----iiii _i_n_s_t_r_u_c_t_i_o_n__c_a_c_h_e__s_i_z_e sets the instruction cache size in bytes (e.g. -i16384). Defaults to 0 indicating a unified cache simulation. If positive, the _d_a_t_a__c_a_c_h_e__s_i_z_e must be positive as well. Printed 6/7/90 2 DINEROIII() UNIX Programmer's Manual DINEROIII() ----dddd _d_a_t_a__c_a_c_h_e__s_i_z_e sets the data cache size in bytes (e.g. -d1M). Defaults to 0 indicating a unified cache simulation. If positive, the _i_n_s_t_r_u_c_t_i_o_n__c_a_c_h_e__s_i_z_e must be posi- tive as well. ----SSSS _s_u_b_b_l_o_c_k__s_i_z_e sets the cache sub-block size in bytes. Defaults to 0 indicating that sub-block placement is not being used (i.e. -S0). ----aaaa _a_s_s_o_c_i_a_t_i_v_i_t_y sets the cache associativity. A direct-mapped cache has associativity 1. A two-way set-associative cache has associativity 2. A fully associative cache has associativity _d_a_t_a__c_a_c_h_e__s_i_z_e/_b_l_o_c_k__s_i_z_e. Defaults to direct-mapped placement (i.e. -a1). ----rrrr _r_e_p_l_a_c_e_m_e_n_t__p_o_l_i_c_y sets the cache replacement policy. Valid replacement policies are _l (LRU), _f (FIFO), and _r (RANDOM). Defaults to LRU (i.e. -rl). ----ffff _f_e_t_c_h__p_o_l_i_c_y sets the cache fetch policy. Demand-fetch (_d), which fetches blocks that are needed to service a cache reference, is the most common fetch policy. All other fetch policies are methods of prefetching. Prefetching is never done after writes. The prefetch target is determined by the ----pppp option and whether sub-block placement is enabled. d demand-fetch which never prefetches. a always-prefetch which prefetches after every demand reference. m miss-prefetch which prefetches after every demand miss. t tagged-prefetch which prefetches after the first demand miss to a (sub)-block. The next two prefetch options work only with sub-block placement. l load-forward-prefetch (sub-block placement only) works like prefetch-always within a block, but it will not attempt to prefetch sub-blocks in other blocks. S sub-block-prefetch (sub-block placement only) works like prefetch-always within a block except when references near the end of a block. At this point sub-block-prefetches references will wrap around within the current block. Defaults to demand-fetch (i.e. -fd). ----pppp _p_r_e_f_e_t_c_h__d_i_s_t_a_n_c_e Printed 6/7/90 3 DINEROIII() UNIX Programmer's Manual DINEROIII() sets the prefetch distance in sub-blocks if sub-block placement is enabled or in blocks if it is not. A prefetch_distance of 1 means that the next sequential (sub)-block is the potential target of a prefetch. Defaults to 1 (i.e. -p1). ----PPPP _a_b_o_r_t__p_r_e_f_e_t_c_h__p_e_r_c_e_n_t sets the percentage of prefetches that are aborted. This can be used to examine the effects of data refer- ences blocking prefetch references from reaching a shared cache. Defaults to no prefetches aborted (i.e. -P0). ----wwww _w_r_i_t_e__p_o_l_i_c_y selects one of two the cache write policies. Write- through (_w) updates main memory on all writes. Copy- back (_c) updates main memory only when a dirty block is replaced or the cache is flushed. Defaults to copy- back (i.e. -wc) ----AAAA _w_r_i_t_e__a_l_l_o_c_a_t_i_o_n__p_o_l_i_c_y selects whether a (sub)-block is loaded on a write reference. Write-allocate (_w) causes (sub)-blocks to be loaded on all references that miss. Non-write- allocate (_n) causes (sub)-blocks to be loaded only on non-write references that miss. Defaults to write- allocate (i.e. -Aw). ----DDDD _d_e_b_u_g__f_l_a_g used by implementor to debug simulator. A debug_flag of _0 disables debugging; _1 prints the priority stacks after every reference; and _2 prints the priority stacks and performance metrics after every reference. Debugging information may be useful to the user to understand the precise meaning of all cache parameter settings. Defaults to no-debug (i.e. -D0). ----oooo _o_u_t_p_u_t__s_t_y_l_e sets the output style. Terse-output (_0) prints results only at the end of the simulation run. Verbose-output (_1) prints results at half-million reference increments and at the end of the simulation run. Bus-output (_2) prints an output record for every memory bus transfer. Bus_and_snoop-output (_3) prints an output record for every memory bus transfer and clean sub-block that is replaced. Defaults to terse-output (i.e. -o0). For bus-output, each bus record is a six-tuple: _B_U_S_2 are four literal characters to start bus record _a_c_c_e_s_s is the access type ( _r for a bus-read, _w for a bus-write, _p for a bus-prefetch, _s for snoop activity (output style 3 only). Printed 6/7/90 4 DINEROIII() UNIX Programmer's Manual DINEROIII() _s_i_z_e is the transfer size in bytes _a_d_d_r_e_s_s is a hexadecimal byte-address between 0 and ffffffff inclusively _r_e_f_e_r_e_n_c_e__c_o_u_n_t is the number of demand references since the last bus transfer _i_n_s_t_r_u_c_t_i_o_n__c_o_u_n_t is the number of demand instruction fetches since the last bus transfer ----ZZZZ _s_k_i_p__c_o_u_n_t sets the number of trace references to be skipped before beginning cache simulation. Defaults to none (i.e. -Z0). ----zzzz _m_a_x_i_m_u_m__c_o_u_n_t sets the maximum number of trace references to be pro- cessed after skipping the trace references specified by _s_k_i_p__c_o_u_n_t . Note, references generated by the simula- tor not read from the trace (e.g. prefetch references) are not included in this count. Defaults to 10 million (i.e. -z10000000). ----QQQQ _f_l_u_s_h__c_o_u_n_t sets the number of references between cache flushes. Can be used to crudely simulate multiprogramming. Defaults to no flushing (i.e. -Q0). FFFFIIIILLLLEEEESSSS _d_o_c._h contains additional programmer documentation. SSSSEEEEEEEE AAAALLLLSSSSOOOO Mark D. Hill, _T_e_s_t _D_r_i_v_i_n_g _Y_o_u_r _N_e_x_t _C_a_c_h_e, _M_a_g_a_z_i_n_e _o_f _I_n_t_e_l_l_i_g_e_n_t _P_e_r_s_o_n_a_l _S_y_s_t_e_m_s (_M_I_P_S), August 1989, pp. 84-92. Mark D. Hill and Alan Jay Smith, _E_x_p_e_r_i_m_e_n_t_a_l _E_v_a_l_u_a_t_i_o_n _o_f _O_n-_C_h_i_p _M_i_c_r_o_p_r_o_c_e_s_s_o_r _C_a_c_h_e _M_e_m_o_r_i_e_s, _P_r_o_c. _E_l_e_v_e_n_t_h _I_n_t_e_r_- _n_a_t_i_o_n_a_l _S_y_m_p_o_s_i_u_m _o_n _C_o_m_p_u_t_e_r _A_r_c_h_i_t_e_c_t_u_r_e, June 1984, Ann Arbor, MI, pp. 158-174. Alan Jay Smith, _C_a_c_h_e _M_e_m_o_r_i_e_s, _C_o_m_p_u_t_i_n_g _S_u_r_v_e_y_s, 14-3, September 1982, pp. 473-530. BBBBUUUUGGGGSSSS Many things break when the number of addresses simulated approaches the maximum 32-bit integer (4,294,967,295). AAAAUUUUTTTTHHHHOOOORRRR Mark D. Hill Computer Sciences Dept. 1210 West Dayton St. Univ. of Wisconsin Madison, WI 53706 Printed 6/7/90 5 DINEROIII() UNIX Programmer's Manual DINEROIII() markhill@cs.wisc.edu Printed 6/7/90 6