Project 1: Make a Toy Venus

Project 1.2: A RISC-V Emulator

Computer Architecture I ShanghaiTech University

Project 1.1 Project 1.2

IMPORTANT INFO - PLEASE READ

The projects are part of your design project worth 2 credit points. As such they run in parallel to the actual course. So be aware that the due date for project and homework might be very close to each other! Start early and do not procrastinate.

Contents

  1. Goal of Project 1
  2. Introduction to Project 1.2
  3. Background of The Instruction Set
  4. Getting Started ( Step 0, Step 1, Step 2, Step 3, Step 4 )
  5. Testing
  6. Running the Emulator
  7. Notices on Grading
  8. How many lines of code I need to write?
  9. Submission
  10. Autolab Results

Goal of Project 1 [<<] [<] [>]

In Project 1, you will make a simple toy Venus, consisting of both assembler and emulator. If you are not yet familiar with Venus, try it out now. At the end of this project, you will understand the C programming language and the underlying execution of RISC-V assembly.

Introduction to Project 1.2 [<<] [<] [>]

In Project 1.1, we have completed an RISC-V assembler. Now we need to complete a simple emulator in C programming language to execute instructions. The emulator can be divided into the following components.

  1. Launching and Running: This part, mainly in emulator.c, is responsible for creating a emulator and performing operations such as run(), step(), prev(), etc. We've done this part for you so you can ignore it!
  2. Execution: This part, mainly in execute.c and execute_utils.c, is responsible for executing an instruction and finishing the corresponding operations on cpu and memory. You need to work on this component.
  3. Memory: This part, mainly in mem.c, is responsible for simple conversion of memory addresses and manipulation of memory such as store() and load(). You need to work on this component.
  4. Logs component: This part, mainly in logs.c, is responsible for record and save logs to go back in time. You need to work on this component.

Background of The Instruction Set [<<] [<] [>]

Registers

Please consult the RISC-V Green Sheet (PDF) for register numbers, instruction opcodes, and bitwise formats. Our asembler will support all 32 registers: x0, ra, sp, gp, tp, t0-t6, s0 - s11, a0 - a7. The name x0 can be used in lieu of zero. Other register numbers (eg. x1, x2, etc.) shall be also supported. Note that floating point registers are not included in project 1.

Instructions

We will have 42 instructions 8 pseudo-instructions. The instructions are:

Instruction Type Opcode Funct3 Funct7/IMM Operation
add rd, rs1, rs2 R 0x33 0x0 0x00 R[rd] ← R[rs1] + R[rs2]
mul rd, rs1, rs2 0x0 0x01 R[rd] ← (R[rs1] * R[rs2])[31:0]
sub rd, rs1, rs2 0x0 0x20 R[rd] ← R[rs1] - R[rs2]
sll rd, rs1, rs2 0x1 0x00 R[rd] ← R[rs1] << R[rs2]
mulh rd, rs1, rs2 0x1 0x01 R[rd] ← (R[rs1] * R[rs2])[63:32]
slt rd, rs1, rs2 0x2 0x00 R[rd] ← (R[rs1] < R[rs2]) ? 1 : 0
sltu rd, rs1, rs2 0x3 0x00 R[rd] ← (U(R[rs1]) < U(R[rs2])) ? 1 : 0
xor rd, rs1, rs2 0x4 0x00 R[rd] ← R[rs1] ^ R[rs2]
div rd, rs1, rs2 0x4 0x01 R[rd] ← R[rs1] / R[rs2]
srl rd, rs1, rs2 0x5 0x00 R[rd] ← R[rs1] >> R[rs2]
sra rd, rs1, rs2 0x5 0x20 R[rd] ← R[rs1] >> R[rs2]
or rd, rs1, rs2 0x6 0x00 R[rd] ← R[rs1] | R[rs2]
rem rd, rs1, rs2 0x6 0x01 R[rd] ← (R[rs1] % R[rs2]
and rd, rs1, rs2 0x7 0x00 R[rd] ← R[rs1] & R[rs2]
lb rd, offset(rs1) I 0x03 0x0 R[rd] ← SignExt(Mem(R[rs1] + offset, byte))
lh rd, offset(rs1) 0x1 R[rd] ← SignExt(Mem(R[rs1] + offset, half))
lw rd, offset(rs1) 0x2 R[rd] ← Mem(R[rs1] + offset, word)
lbu rd, offset(rs1) 0x4 R[rd] ← U(Mem(R[rs1] + offset, byte))
lhu rd, offset(rs1) 0x5 R[rd] ← U(Mem(R[rs1] + offset, half))
addi rd, rs1, imm 0x13 0x0 R[rd] ← R[rs1] + imm
slli rd, rs1, imm 0x1 0x00 R[rd] ← R[rs1] << imm
slti rd, rs1, imm 0x2 R[rd] ← (R[rs1] < imm) ? 1 : 0
sltiu rd, rs1, imm 0x3 R[rd] ← (U(R[rs1]) < U(imm)) ? 1 : 0
xori rd, rs1, imm 0x4 R[rd] ← R[rs1] ^ imm
srli rd, rs1, imm 0x5 0x00 R[rd] ← R[rs1] >> imm
srai rd, rs1, imm 0x5 0x20 R[rd] ← R[rs1] >> imm
ori rd, rs1, imm 0x6 R[rd] ← R[rs1] | imm
andi rd, rs1, imm 0x7 R[rd] ← R[rs1] & imm
jalr rd, rs1, imm 0x67 0x0 R[rd] ← PC + 4
PCR[rs1] + imm
ecall 0x73 0x0 0x000 (Transfers control to operating system)
a0 = 1 is print value of a1 as an integer.
a0 = 4 is print the string at address a1.
a0 = 10 is exit or end of code indicator.
a0 = 11 is print value of a1 as a character.
sb rs2, offset(rs1) S 0x23 0x0 Mem(R[rs1] + offset) ← R[rs2][7:0]
sh rs2, offset(rs1) 0x1 Mem(R[rs1] + offset) ← R[rs2][15:0]
sw rs2, offset(rs1) 0x2 Mem(R[rs1] + offset) ← R[rs2]
beq rs1, rs2, offset SB 0x63 0x0 if(R[rs1] == R[rs2])
 PCPC + {offset, 1b'0}
bne rs1, rs2, offset 0x1 if(R[rs1] != R[rs2])
 PCPC + {offset, 1b'0}
blt rs1, rs2, offset 0x4 if(R[rs1] < R[rs2])
 PCPC + {offset, 1b'0}
bge rs1, rs2, offset 0x5 if(R[rs1] >= R[rs2])
 PCPC + {offset, 1b'0}
bltu rs1, rs2, offset 0x6 if(U(R[rs1]) < U(R[rs2]))
 PCPC + {offset, 1b'0}
bgeu rs1, rs2, offset 0x7 if(U(R[rs1]) >= U(R[rs2]))
 PCPC + {offset, 1b'0}
auipc rd, offset U 0x17 R[rd] ← PC + {offset, 12b'0}
lui rd, offset 0x37 R[rd] ← {offset, 12b'0}
jal rd, imm UJ 0x6f R[rd] ← PC + 4
PCPC + {imm, 1b'0}

The pseudo-instructions are:

Pseudo-instruction Format Uses
Branch on Equal to Zero beqz rs1, label beq
Branch on not Equal to Zero bnez rs1, label bne
Jump j label jal
Jump Register jr rs1 jalr
Load Address la rd, label auipc, addi
Load Immediate li rd, immediate lui, addi
Load Word at address of Label lw rd, label auipc, lw
Move mv rd, rs1 addi

For further reference, here are the bit lengths of the instruction components.

R-TYPE funct7 rs2 rs1 funct3 rd opcode
Bits 7 5 5 3 5 7

I-TYPE imm[11:0] rs1 funct3 rd opcode
Bits 12 5 3 5 7

S-TYPE imm[11:5] rs2rs1 funct3 imm[4:0] opcode
Bits 7 5 5 3 5 7

SB-TYPE imm[12] imm[10:5] rs2 rs1 funct3 imm[4:1] imm[11] opcode
Bits 1 6 5 5 3 4 1 7

U-TYPE imm[31:12] rd opcode
Bits 20 5 7

UJ-TYPE imm[20] imm[10:1] imm[11] imm[19:12] rd opcode
Bits 1 10 1 8 5 7

Getting Started [<<] [<] [>]

Step 0: Obtaining the Files

Caveat
If you want to use the files from Project 1.1, which may help you to generate some testcases, you are recommended to copy previous source files into Project 1.2 working directory. We modified the Makefile for project 1.2. Please be aware of that when importing your Project 1.1 code.
  1. Clone your p1 repository from gitlab. You may want to change http to https.
  2. In the repository add a remote repo that contains the framework files:
    git remote add framework https://autolab.sist.shanghaitech.edu.cn/gitlab/cs110_23s/p1.2_framework.git (or change to http)
  3. Go and fetch the files:
    git fetch framework
  4. Now merge those files with your main branch:
    git merge framework/main
  5. The rest of the git commands work as usual.

Step 1: Building Blocks

Finish the implementation of bitSigner(), get_imm_operand(), get_branch_offset(), get_jump_offset(), get_store_offset() in execute_utils.c.

These functions extract the immediate numbers and offsets contained in the instructions to facilitate subsequent execution. If you have completed project 1.1, you should be familiar with this procedure, as you have done the reverse of it.

Step 2: Complete Memory Component

You should complete the memory component before the execution component. Finish the implementation of store() and load() in mem.c. You should check the alignment first if needed. check_alignment() in mem.c could help you.

Before operating on memory, a conversion from virtual memory addresses to real memory addresses needs to be completed. To keep things simple, we will convert a section of space from the start of segment directly to a real section of "physical memory address". You should follow the mapping given below.

Caveat
You have to fix the mapping from virtual DATA_BASE and TEXT_BASE to be sequential onto physical memory. Plus, DATA_BASE and TEXT_BASE have to be mapped to MEMORY_DATA and MEMORY_TEXT, respectively.
Memory Mapping
Caveat
Raise segmentation fault when accessing an unmapped memory location and attempting to store in text segment. For an incorrect load you should raise the corresponding error and return a value of 0. For an incorrect store you should raise the corresponding error and return. Do not interrupt or exit the program.

Step 3: Time to Execute

execute() in execute.c executes the instruction provided as a parameter. This should modify the appropriate registers, make any necessary calls to memory, and update the program counter to refer to the next instruction to execute. You need to complete all the above instructions.

You can use the framework we provide. In that case you will need to complete execute_rtype(), execute_itype_except_load(), execute_ecall(), execute_branch(), execute_load(), execute_store(), execute_jal(), execute_jalr() and execute_utype(), or you can implement your own function, but please do not change declaration or feature of execute().

Note that you should use write_to_log() when implementing printing function of ecall.

You will find what you have completed in the first two steps very useful in this step.

Caveat
For unrecognised instructions you should use raise_machine_code_error(). For undefined ecalld calls you should use raise_undefined_ecall_error(). During access operations, the alignment is checked according to the corresponding instruction. Raise error when an error occurs and then continue with the next instruction without interrupting or exiting the program.

Step 4: Don't Forget Logs

In the logs component logs.c, you need to implement a data structure to save the current state and return to the previous state. The main purpose of this component is to implement the prev function of venus. Try the prev function in venus and think about what you would do to implement one of these functions. You need to ensure that the following interfaces are completed.

Logs: You can change its contents or define more structures, but you can't change its name. It represents the log component you created.

You should use dynamic memory allocation to resize the logs component when needed. Consider what information you need to record, the smaller the size of each record, the better.

Testing [<<] [<] [>]

You are responsible for testing your code. While we have provided a few test cases, they are by no means comprehensive. Fortunately, you have a variety of testing tools at your service.

Valgrind

You should use Valgrind to check whether your code has any memory leaks. We have included a file, run-valgrind, which will run Valgrind on any executable of your choosing. If you get a permission denied error, try changing adding the execute permission to the file:

chmod u+x run-valgrind

Then you can run by typing:

./run-valgrind <whatever program you want to run>

Venus

Since you're writing an emulator, why not refer to an existing one? Venus is a powerful reference for you to use, and you are encouraged to write your own RISC-V files and assemble them using Venus.

Caveat
The initial values of the registers and the initial values of the memory will be somewhat different, this is also related to the version of venus you are using, you can ignore these differences as we have done the initialisation of the memory and registers for you in the project.

Diff

diff is a utility for comparing the contents of files. Running the following command will print out the differences between file1 and file2:

diff <file1> <file2>

To see how to interpret diff results, click here.

Running the Emulator [<<] [<] [>]

First, make sure your emulator executable is up to date by running make.

By default, the program gives help message if you do not specify any arguments.

To run emulator, use the following command.

./riscv emulator <text_file> <data_file>

If you have a finished part1, you can also use an assembly file as input file like this:

./riscv <input file>

When emulator is running you can use the following commands to debug.

run, step, prev, dump, trace, reset, exit, print <args>.

where the <args> in print <args> can be either "pc" "reg" or a memory address

Notices on Grading [<<] [<] [>]

How many lines of code I need to write? [<<] [<] [>]

Here is a summary of the solution code. The final row gives total lines inserted and deleted; a changed line counts as both an insertion and a deletion. However, there are many possible solutions and many of them may differ.


      part2/execute.c       | 492 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      part2/execute_utils.c |  30 +++++++++++++
      part2/logs.c          | 161 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
      part2/logs.h          |  17 ++++++--
      part2/mem.c           |  73 ++++++++++++++++++++++++++++++++
      5 files changed, 769 insertions(+), 4 deletions(-)
    

Submission [<<] [<] [>]

  1. Create a text file autolab.txt. The first line has to be the name of your p1.2 project in gitlab. So p1.2_email1_email2.
  2. The following lines have to contain a long, random secret. Commit and push to gitlab. We will test the length and randomness of this secret by running tar -cjf size.tar.bz2 autolab.txt.
  3. When you want to run the autograder in autolab, you have to upload your autolab.txt. Autolab will clone, from gitlab, the default branch (in most case is "main" branch), of the repo specified in the autolab.txt you uploaded and then continue grading only if all of these conditions are met:
    1. The autolab.txt you uploaded and the one in the clone repo are identical.
    2. The size of the generated size.tar.bz2 is at least 1000B.
    3. Only the files from the framework are present in the cloned repo.
  4. Autolab will replace all files except *.c, *.h with the framework versions before compiling your code.

Autolab Results [<<] [<]

tests 0-xx stands for common results, and the cases are listed below.

Test Case Number Point of Interest
0-00 Basic arithmetic operations
0-01 Instructions involving unsigned numbers
0-02 & 0-03 Logs component and prev operations
0-04 ecall
0-05 branch and jump
0-06 Error handling
0-07 & 0-08 Arithmetic involving negative numbers
0-09 ~ 0-14 Complex

tests 0-xx-mem stands for memory leak detection test. All common tests are enforced with a memory leak detection test.