Homework 6

Computer Architecture I @ ShanghaiTech University

Objective

In this assignment, you will implement a simplified RISC-V simulator and a single-level unified cache. Through this task, you will observe performance issues caused by using a shared cache for both instructions and data, which will help you understand why modern processors use separate L1 instruction and data caches.

Before starting, accept the assignment from GitHub Classroom.

Code Structure

simulate.c: source code for the RISC-V simulator.
cache.c: source code for the cache simulator.
main.c: main program.

Part 1: RISC-V Simulator

Implement a basic RISC-V simulator to execute assembly code. Your simulator must:

Support a subset of RISC-V instructions:
- Refer to enum Instruction_type in simulate.h for the list of instructions you need to support.
- You do not have to support instructions not in the list.
Maintain:
- 32 integer registers (x0 − x31).
- Program counter (PC).
Parse and execute input assembly code sequentially.
- Finish a simple parser. We have finished some helper macros for you. If you do not understand what they do, you can refer to the GCC manual.
- Assume instructions are preloaded into memory starting at address 0x0.
- For simplicity, treat all values as integers and ignore pipeline effects.

Input Assembly Pattern

All test cases are a simple for-loop and will follow this structure:

The assembly will implement a loop similar to this C structure:

for (i = 0; i < n; ++i) {
    // Operations on a[i] and/or b[i]
}

Initialization (first 4 instructions):
```
li t0, n
li t1, 0
li a0, 0x100
li a1, 0x200
```
Where t0 is the loop counter limit (n iterations), t1 is the loop variable i, a0 and a1 are the base address of array a and b.
Simplifications:
- No label resolution needed − the assembly code contains no labels and all branch offsets are precomputed integers.
- The assembly code contains no comments.
- The assembly code contains no invalid instructions.
- All load/store instructions are aligned to 4-byte boundaries. This guarantees that no memory access straddles two cache lines (Why?), simplifying your implementation.
- When simulating lw/sw instructions, only calculate the accessed memory address. Do not store/load actual values to/from registers or memory.
  - Your cache simulator's results depend only on memory access patterns, not the actual data values.
  - This reduces implementation complexity and aligns with the assignment's focus on cache behavior.
- When simulating the li instruction, you do not have to expand the pseudo-instruction, simply load the immediate into the register. The immediate is guaranteed to be in the range [-2048, 2047].

Part 2: Unified Cache Simulator

Implement a cache simulator to track:

Total memory accesses: It includes both fetching instruction and data from memory.
Instruction cache hits: When an instruction is fetched from the cache.
Data cache hits: When a load/store operation accesses data in the cache.
Cache misses: When either an instruction or data is not found in the cache.

Cache Specifications

Unified design: Both instructions and data share the same cache.
Allocation policy: Write-allocate cache.
Block size: 16 bytes.
Number of blocks: 4.
Associativity: Fully Associative.
Replacement policy: LRU.

Implementation Steps

On each instruction fetch, check if the instruction address exists in the cache.
- If yes: increment the counter for instruction cache hits.
- If no: evict a cache line using the LRU algorithm, and load the block into the cache and count it as a miss.
On each lw/sw operation, check if the data address exists in the cache.
- If yes: increment the counter for data cache hits.
- If no: evict a cache line using the LRU algorithm, and load the block into the cache and count it as a miss.

Output

After executing the input assembly, print:

Total memory accesses: [value].
Instruction cache hit: [value].
Data cache hit: [value].
Total cache misses: [value].

Sample & Testing

Example Code
We provide a sample assembly code sample/vec_add.S (vector addition). Reference output of the assembly code:
```
Total memory accesses: 228.
Instruction cache hit: 143.
Data cache hit: 12.
Total cache misses: 73.
```
Debug logs are in sample/main.log.
How to Test
- Compile: Run make to build the executable main.
- Execute ./main <assembly code to run> 2>main.log.
- Program output (instruction/data hits and misses) will print to stdout.
- Debug logs (e.g., cache access details) will be saved to main.log.
Tips
- While GDB is powerful, simple print statements are often faster for tracing cache behavior.
- Use fprintf(stderr, ...) for debug prints to avoid mixing diagnostics with program output.
  - Example: fprintf(stderr, "Cache hit at address 0x%x\n", addr);
  - This keeps your final results (printf to stdout) clean and readable.

Submission Guidelines

Submit your GitHub repository with the right branch on Gradescope.
Only simulate.c simulate.h cache.c cache.h will be used for grading.
Only the last active submission will be accounted for your grade.

This assignment will deepen your understanding of cache architecture trade-offs. Good luck!

Optional: Compare with Split Caches

You can use Venus' Cache Simulator to observe the performance benefits of separating instruction/data caches:

Go to Venus RISC-V Simulator.
Paste your assembly code into the editor.
Click "Simulator" −> "Cache" to configure:
- Cache levels: 1.
- Block size: 16 bytes.
- Number of blocks: 2.
- Replacement policy: LRU.
Run the code and compare data cache hit rate with your implementation.