CS246_Final_Project - Value Prediction Unit

Final project for CS246 - Advanced Computer Architecture

This is a simulation of a Value Prediction Unit (VPU) as introduced in the Exceeding the Dataflow Limit via Value Prediction [Mikko H. Lipasti et. all, 1996], using Intel Pin Utility

The basic structure includes a Value Prediction Table (VPT), containing the previous write values for each address indexed by the lowest n bits of the instruction pointer (PC), and a Classification Table (CT) containing an y bit saturating counter that represents the likelihood of a positive prediction, indexed by the lowest m bits of PC address

Values for m,n, and y are all configurable knob parameters.

Later versions of this VPU includes an implimentation of variable-depth value histories, where multiple previous values can be stored in the VPT under a Least Recently Used (LRU) eviction policy.

The most recent version of this VPU adds the option to add a Victim Cache to the the VPT. The victim cache stores values evicted from the VPT with an x bit LRU stack.

Installing and Running

install Intel Pin Utility

make
pin -t obj-intel64/main.so -outfile results.out -size 10 -- /bin/ls

To create a 1024 entry VPT (and default 256 entry), and simulate on /bin/ls bash command

CLI arguments

ARG	DEFAULT	DESCRIPTION
"outfile"	"tool.out"	"Output file for the pintool"
"pid"	"0"	"Append pid to output"
"inst_limit"	"1000000000"	"Quit after executing x number of instructions"
"inst_cat"	"ALL"	"What category of instructions?" (Not supported)
"size"	"8"	"Size of Value Prediction table in bits. Total length = 2**size"
"CTbits"	"1"	"Size of CT prediction history counter in bits"
"CTsize"	"8"	"Size of Classification table in bits. Total length = 2**size""
"HistDepth"	"1"	"Value history size"
"VictimCache"	"0"	"Entries in victim cache"

Instructions are catagorized by the following for processing:

Instruction Category	Description
I_PURE_LOAD	Integer Pure Load Instruction (e.g. mov esi, dword ptr [rsi] )
I_LOAD_ARITH	Integer Load + Arithmetic Instruction (e.g. add r8, qword ptr [rsi+0x10] )
I_ARITH_1OP	Integer Pure Arithmetic Instruction, 1 operand (e.g add rcx, 0x40 )
I_ARITH_2OP	Integer Pure Arithmetic Instruction, 2 operand (e.g add rcx, rax )
I_REG_MOV	Integer Register Move Instruction (e.g. mov rax, rdi )
F_PURE_LOAD	Floating Point Pure Load Instruction (e.g. movdqa xmm5, xmmword ptr [rdi] )
F_LOAD_ARITH	Floating Point Load with Arithmetic Instruction (e.g. pminub xmm4, xmmword ptr [rdi+0x30] )
F_PURE_ARITH	Floating Point Pure Arithmetic Instruction (e.g. paddd xmm0, xmm6 )
F_REG_MOVE	Floating Point Register Move Instructions (e.g. movd xmm0, esi )
UNKNOWN	Anything not classified above

Results

Full results can be found here: Report

Instruction Value Locality

The following bars indicated the % of cases where the output of an operation was the same as the previous time the operation was executed (i.e. history depth of 1). The dashed line shows the % of instructions seen for each catagory.

We used a simple calculation to model the speedup for an architecture implementing our Value Prediction Unit. The maximum possible speedup under this would be 2x.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CS246_Final_Project - Value Prediction Unit

Installing and Running

Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

CS246_Final_Project - Value Prediction Unit

Installing and Running

Results