Homework 6
Computer Architecture I @ ShanghaiTech University
Objective
In this assignment, you will implement a simplified RISC-V simulator and a single-level unified cache. Through this task, you will observe performance issues caused by using a shared cache for both instructions and data, which will help you understand why modern processors use separate L1 instruction and data caches.
Before starting, accept the assignment from GitHub Classroom.
Code Structure
simulate.c
: source code for the RISC-V simulator.cache.c
: source code for the cache simulator.main.c
: main program.
Part 1: RISC-V Simulator
Implement a basic RISC-V simulator to execute assembly code. Your simulator must:
- Support a subset of RISC-V instructions:
- Refer to
enum Instruction_type
insimulate.h
for the list of instructions you need to support. - You do not have to support instructions not in the list.
- Refer to
- Maintain:
- 32 integer registers (
x0
−x31
). - Program counter (PC).
- 32 integer registers (
- Parse and execute input assembly code sequentially.
- Finish a simple parser. We have finished some helper macros for you. If you do not understand what they do, you can refer to the GCC manual.
- Assume instructions are preloaded into memory starting at address
0x0
. - For simplicity, treat all values as integers and ignore pipeline effects.
Input Assembly Pattern
All test cases are a simple for-loop and will follow this structure:
- The assembly will implement a loop similar to this C structure:
for (i = 0; i < n; ++i) { // Operations on a[i] and/or b[i] }
- Initialization (first 4 instructions):
li t0, n li t1, 0 li a0, 0x100 li a1, 0x200
t0
is the loop counter limit (n
iterations),t1
is the loop variablei
,a0
anda1
are the base address of arraya
andb
. - Simplifications:
- No label resolution needed − the assembly code contains no labels and all branch offsets are precomputed integers.
- The assembly code contains no comments.
- The assembly code contains no invalid instructions.
- All load/store instructions are aligned to 4-byte boundaries. This guarantees that no memory access straddles two cache lines (Why?), simplifying your implementation.
- When simulating
lw
/sw
instructions, only calculate the accessed memory address. Do not store/load actual values to/from registers or memory.- Your cache simulator's results depend only on memory access patterns, not the actual data values.
- This reduces implementation complexity and aligns with the assignment's focus on cache behavior.
- When simulating the
li
instruction, you do not have to expand the pseudo-instruction, simply load the immediate into the register. The immediate is guaranteed to be in the range [-2048, 2047].
Part 2: Unified Cache Simulator
Implement a cache simulator to track:
- Total memory accesses: It includes both fetching instruction and data from memory.
- Instruction cache hits: When an instruction is fetched from the cache.
- Data cache hits: When a load/store operation accesses data in the cache.
- Cache misses: When either an instruction or data is not found in the cache.
Cache Specifications
- Unified design: Both instructions and data share the same cache.
- Allocation policy: Write-allocate cache.
- Block size: 16 bytes.
- Number of blocks: 4.
- Associativity: Fully Associative.
- Replacement policy: LRU.
Implementation Steps
- On each instruction fetch, check if the instruction address exists in the cache.
- If yes: increment the counter for instruction cache hits.
- If no: evict a cache line using the LRU algorithm, and load the block into the cache and count it as a miss.
- On each
lw
/sw
operation, check if the data address exists in the cache.- If yes: increment the counter for data cache hits.
- If no: evict a cache line using the LRU algorithm, and load the block into the cache and count it as a miss.
Output
After executing the input assembly, print:
Total memory accesses: [value].
Instruction cache hit: [value].
Data cache hit: [value].
Total cache misses: [value].
Sample & Testing
- Example Code
We provide a sample assembly code
sample/vec_add.S
(vector addition). Reference output of the assembly code:Total memory accesses: 228. Instruction cache hit: 143. Data cache hit: 12. Total cache misses: 73.
Debug logs are in
sample/main.log
. - How to Test
- Compile: Run
make
to build the executablemain
. - Execute
./main <assembly code to run> 2>main.log
. - Program output (instruction/data hits and misses) will print to
stdout
. - Debug logs (e.g., cache access details) will be saved to
main.log
.
- Compile: Run
- Tips
- While GDB is powerful, simple print statements are often faster for tracing cache behavior.
- Use
fprintf(stderr, ...)
for debug prints to avoid mixing diagnostics with program output.- Example:
fprintf(stderr, "Cache hit at address 0x%x\n", addr);
- This keeps your final results (
printf
tostdout
) clean and readable.
- Example:
Submission Guidelines
- Submit your GitHub repository with the right branch on Gradescope.
- Only
simulate.c
simulate.h
cache.c
cache.h
will be used for grading. - Only the last active submission will be accounted for your grade.
This assignment will deepen your understanding of cache architecture trade-offs. Good luck!
Optional: Compare with Split Caches
You can use Venus' Cache Simulator to observe the performance benefits of separating instruction/data caches:
- Go to Venus RISC-V Simulator.
- Paste your assembly code into the editor.
- Click "Simulator" −> "Cache" to configure:
- Cache levels: 1.
- Block size: 16 bytes.
- Number of blocks: 2.
- Replacement policy: LRU.
- Run the code and compare data cache hit rate with your implementation.