Project 1: Make a Toy Venus
Project 1.2: A RISC-V Emulator
IMPORTANT INFO - PLEASE READ
The projects are part of your design project worth 2 credit points. As such they run in parallel to the actual course. So be aware that the due date for project and homework might be very close to each other! Start early and do not procrastinate.
Contents
Goal of Project 1 [<<] [<] [>]
In Project 1, you will make a simple toy Venus, consisting of both assembler and emulator. If you are not yet familiar with Venus, try it out now. At the end of this project, you will understand the C programming language and the underlying execution of RISC-V assembly.
Introduction to Project 1.2 [<<] [<] [>]
In Project 1.1, we have completed an RISC-V assembler. Now we need to complete a simple emulator in C programming language to execute instructions. The emulator can be divided into the following components.
- Launching and Running: This part, mainly in emulator.c, is responsible for creating a emulator and performing operations such as run(), step(), prev(), etc. We've done this part for you so you can ignore it!
- Execution: This part, mainly in execute.c and execute_utils.c, is responsible for executing an instruction and finishing the corresponding operations on cpu and memory. You need to work on this component.
- Memory: This part, mainly in mem.c, is responsible for simple conversion of memory addresses and manipulation of memory such as store() and load(). You need to work on this component.
- Logs component: This part, mainly in logs.c, is responsible for record and save logs to go back in time. You need to work on this component.
Background of The Instruction Set [<<] [<] [>]
Registers
Please consult the RISC-V Green Sheet (PDF) for register numbers, instruction opcodes, and bitwise formats. Our asembler will support all 32 registers: x0, ra, sp, gp, tp, t0-t6, s0 - s11, a0 - a7. The name x0 can be used in lieu of zero. Other register numbers (eg. x1, x2, etc.) shall be also supported. Note that floating point registers are not included in project 1.
Instructions
We will have 42 instructions 8 pseudo-instructions. The instructions are:
Instruction | Type | Opcode | Funct3 | Funct7/IMM | Operation |
add rd, rs1, rs2 | R | 0x33 | 0x0 | 0x00 | R[rd] ← R[rs1] + R[rs2] |
mul rd, rs1, rs2 | 0x0 | 0x01 | R[rd] ← (R[rs1] * R[rs2])[31:0] | ||
sub rd, rs1, rs2 | 0x0 | 0x20 | R[rd] ← R[rs1] - R[rs2] | ||
sll rd, rs1, rs2 | 0x1 | 0x00 | R[rd] ← R[rs1] << R[rs2] | ||
mulh rd, rs1, rs2 | 0x1 | 0x01 | R[rd] ← (R[rs1] * R[rs2])[63:32] | ||
slt rd, rs1, rs2 | 0x2 | 0x00 | R[rd] ← (R[rs1] < R[rs2]) ? 1 : 0 | ||
sltu rd, rs1, rs2 | 0x3 | 0x00 | R[rd] ← (U(R[rs1]) < U(R[rs2])) ? 1 : 0 | ||
xor rd, rs1, rs2 | 0x4 | 0x00 | R[rd] ← R[rs1] ^ R[rs2] | ||
div rd, rs1, rs2 | 0x4 | 0x01 | R[rd] ← R[rs1] / R[rs2] | ||
srl rd, rs1, rs2 | 0x5 | 0x00 | R[rd] ← R[rs1] >> R[rs2] | ||
sra rd, rs1, rs2 | 0x5 | 0x20 | R[rd] ← R[rs1] >> R[rs2] | ||
or rd, rs1, rs2 | 0x6 | 0x00 | R[rd] ← R[rs1] | R[rs2] | ||
rem rd, rs1, rs2 | 0x6 | 0x01 | R[rd] ← (R[rs1] % R[rs2] | ||
and rd, rs1, rs2 | 0x7 | 0x00 | R[rd] ← R[rs1] & R[rs2] | ||
lb rd, offset(rs1) | I | 0x03 | 0x0 | R[rd] ← SignExt(Mem(R[rs1] + offset, byte)) | |
lh rd, offset(rs1) | 0x1 | R[rd] ← SignExt(Mem(R[rs1] + offset, half)) | |||
lw rd, offset(rs1) | 0x2 | R[rd] ← Mem(R[rs1] + offset, word) | |||
lbu rd, offset(rs1) | 0x4 | R[rd] ← U(Mem(R[rs1] + offset, byte)) | |||
lhu rd, offset(rs1) | 0x5 | R[rd] ← U(Mem(R[rs1] + offset, half)) | |||
addi rd, rs1, imm | 0x13 | 0x0 | R[rd] ← R[rs1] + imm | ||
slli rd, rs1, imm | 0x1 | 0x00 | R[rd] ← R[rs1] << imm | ||
slti rd, rs1, imm | 0x2 | R[rd] ← (R[rs1] < imm) ? 1 : 0 | |||
sltiu rd, rs1, imm | 0x3 | R[rd] ← (U(R[rs1]) < U(imm)) ? 1 : 0 | |||
xori rd, rs1, imm | 0x4 | R[rd] ← R[rs1] ^ imm | |||
srli rd, rs1, imm | 0x5 | 0x00 | R[rd] ← R[rs1] >> imm | ||
srai rd, rs1, imm | 0x5 | 0x20 | R[rd] ← R[rs1] >> imm | ||
ori rd, rs1, imm | 0x6 | R[rd] ← R[rs1] | imm | |||
andi rd, rs1, imm | 0x7 | R[rd] ← R[rs1] & imm | |||
jalr rd, rs1, imm | 0x67 | 0x0 |
R[rd] ← PC + 4
PC ← R[rs1] + imm |
||
ecall | 0x73 | 0x0 | 0x000 |
(Transfers control to operating system)
a0 = 1 is print value of a1 as an integer. a0 = 4 is print the string at address a1. a0 = 10 is exit or end of code indicator. a0 = 11 is print value of a1 as a character. |
|
sb rs2, offset(rs1) | S | 0x23 | 0x0 | Mem(R[rs1] + offset) ← R[rs2][7:0] | |
sh rs2, offset(rs1) | 0x1 | Mem(R[rs1] + offset) ← R[rs2][15:0] | |||
sw rs2, offset(rs1) | 0x2 | Mem(R[rs1] + offset) ← R[rs2] | |||
beq rs1, rs2, offset | SB | 0x63 | 0x0 |
if(R[rs1] == R[rs2])
PC ← PC + {offset, 1b'0} |
|
bne rs1, rs2, offset | 0x1 |
if(R[rs1] != R[rs2])
PC ← PC + {offset, 1b'0} |
|||
blt rs1, rs2, offset | 0x4 |
if(R[rs1] < R[rs2])
PC ← PC + {offset, 1b'0} |
|||
bge rs1, rs2, offset | 0x5 |
if(R[rs1] >= R[rs2])
PC ← PC + {offset, 1b'0} |
|||
bltu rs1, rs2, offset | 0x6 |
if(U(R[rs1]) < U(R[rs2]))
PC ← PC + {offset, 1b'0} |
|||
bgeu rs1, rs2, offset | 0x7 |
if(U(R[rs1]) >= U(R[rs2]))
PC ← PC + {offset, 1b'0} |
|||
auipc rd, offset | U | 0x17 | R[rd] ← PC + {offset, 12b'0} | ||
lui rd, offset | 0x37 | R[rd] ← {offset, 12b'0} | |||
jal rd, imm | UJ | 0x6f |
R[rd] ← PC + 4
PC ← PC + {imm, 1b'0} |
The pseudo-instructions are:
Pseudo-instruction | Format | Uses |
Branch on Equal to Zero | beqz rs1, label | beq |
Branch on not Equal to Zero | bnez rs1, label | bne |
Jump | j label | jal |
Jump Register | jr rs1 | jalr |
Load Address | la rd, label | auipc, addi |
Load Immediate | li rd, immediate | lui, addi |
Load Word at address of Label | lw rd, label | auipc, lw |
Move | mv rd, rs1 | addi |
For further reference, here are the bit lengths of the instruction components.
R-TYPE | funct7 | rs2 | rs1 | funct3 | rd | opcode |
Bits | 7 | 5 | 5 | 3 | 5 | 7 |
I-TYPE | imm[11:0] | rs1 | funct3 | rd | opcode |
Bits | 12 | 5 | 3 | 5 | 7 |
S-TYPE | imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode |
Bits | 7 | 5 | 5 | 3 | 5 | 7 |
SB-TYPE | imm[12] | imm[10:5] | rs2 | rs1 | funct3 | imm[4:1] | imm[11] | opcode |
Bits | 1 | 6 | 5 | 5 | 3 | 4 | 1 | 7 |
U-TYPE | imm[31:12] | rd | opcode |
Bits | 20 | 5 | 7 |
UJ-TYPE | imm[20] | imm[10:1] | imm[11] | imm[19:12] | rd | opcode |
Bits | 1 | 10 | 1 | 8 | 5 | 7 |
Getting Started [<<] [<] [>]
Step 0: Obtaining the Files
If you want to use the files from Project 1.1, which may help you to generate some testcases, you are recommended to copy previous source files into Project 1.2 working directory. We modified the Makefile for project 1.2. Please be aware of that when importing your Project 1.1 code.
- Clone your p1 repository from gitlab. You may want to change http to https.
- In the repository add a remote repo that contains the framework files:
git remote add framework https://autolab.sist.shanghaitech.edu.cn/gitlab/cs110_23s/p1.2_framework.git (or change to http) - Go and fetch the files:
git fetch framework - Now merge those files with your main branch:
git merge framework/main - The rest of the git commands work as usual.
Step 1: Building Blocks
Finish the implementation of bitSigner(), get_imm_operand(), get_branch_offset(), get_jump_offset(), get_store_offset() in execute_utils.c.
These functions extract the immediate numbers and offsets contained in the instructions to facilitate subsequent execution. If you have completed project 1.1, you should be familiar with this procedure, as you have done the reverse of it.
Step 2: Complete Memory Component
You should complete the memory component before the execution component. Finish the implementation of store() and load() in mem.c. You should check the alignment first if needed. check_alignment() in mem.c could help you.
Before operating on memory, a conversion from virtual memory addresses to real memory addresses needs to be completed. To keep things simple, we will convert a section of space from the start of segment directly to a real section of "physical memory address". You should follow the mapping given below.
You have to fix the mapping from virtual DATA_BASE and TEXT_BASE to be sequential onto physical memory. Plus, DATA_BASE and TEXT_BASE have to be mapped to MEMORY_DATA and MEMORY_TEXT, respectively.
Raise segmentation fault when accessing an unmapped memory location and attempting to store in text segment. For an incorrect load you should raise the corresponding error and return a value of 0. For an incorrect store you should raise the corresponding error and return. Do not interrupt or exit the program.
Step 3: Time to Execute
execute() in execute.c executes the instruction provided as a parameter. This should modify the appropriate registers, make any necessary calls to memory, and update the program counter to refer to the next instruction to execute. You need to complete all the above instructions.
You can use the framework we provide. In that case you will need to complete execute_rtype(), execute_itype_except_load(), execute_ecall(), execute_branch(), execute_load(), execute_store(), execute_jal(), execute_jalr() and execute_utype(), or you can implement your own function, but please do not change declaration or feature of execute().
Note that you should use write_to_log() when implementing printing function of ecall.
You will find what you have completed in the first two steps very useful in this step.
For unrecognised instructions you should use raise_machine_code_error(). For undefined ecalld calls you should use raise_undefined_ecall_error(). During access operations, the alignment is checked according to the corresponding instruction. Raise error when an error occurs and then continue with the next instruction without interrupting or exiting the program.
Step 4: Don't Forget Logs
In the logs component logs.c, you need to implement a data structure to save the current state and return to the previous state. The main purpose of this component is to implement the prev function of venus. Try the prev function in venus and think about what you would do to implement one of these functions. You need to ensure that the following interfaces are completed.
Logs: You can change its contents or define more structures, but you can't change its name. It represents the log component you created.
- create_logs(): Create new logs component, return a pointer to its Logs.
- free_logs(): Free a logs component.
- record(): record the current status in a logs component.
- undo(): back to the previous state in a logs component.
You should use dynamic memory allocation to resize the logs component when needed. Consider what information you need to record, the smaller the size of each record, the better.
Testing [<<] [<] [>]
You are responsible for testing your code. While we have provided a few test cases, they are by no means comprehensive. Fortunately, you have a variety of testing tools at your service.
Valgrind
You should use Valgrind to check whether your code has any memory leaks. We have included a file, run-valgrind, which will run Valgrind on any executable of your choosing. If you get a permission denied error, try changing adding the execute permission to the file:
chmod u+x run-valgrind
Then you can run by typing:
./run-valgrind <whatever program you want to run>
Venus
Since you're writing an emulator, why not refer to an existing one? Venus is a powerful reference for you to use, and you are encouraged to write your own RISC-V files and assemble them using Venus.
The initial values of the registers and the initial values of the memory will be somewhat different, this is also related to the version of venus you are using, you can ignore these differences as we have done the initialisation of the memory and registers for you in the project.
Diff
diff is a utility for comparing the contents of files. Running the following command will print out the differences between file1 and file2:
diff <file1> <file2>
To see how to interpret diff results, click here.
Running the Emulator [<<] [<] [>]
First, make sure your emulator executable is up to date by running make.
By default, the program gives help message if you do not specify any arguments.
To run emulator, use the following command.
./riscv emulator <text_file> <data_file>
If you have a finished part1, you can also use an assembly file as input file like this:
./riscv <input file>
When emulator is running you can use the following commands to debug.
run, step, prev, dump, trace, reset, exit, print <args>.
where the <args> in print <args> can be either "pc" "reg" or a memory address
Notices on Grading [<<] [<] [>]
- The Autolab will enforce a proper amount of comments again!
- Make sure you add proper comments - about two every eight lines of code that you ADD. We will check this by hand.
- The Autolab will again use -Wpedantic -Wall -Wextra -Werror -std=c89. You may edit the Makefile if you need to - but the Autolab will replace the CFLAGS line. Your compilation step will have to use the CFLAGS!
How many lines of code I need to write? [<<] [<] [>]
Here is a summary of the solution code. The final row gives total lines inserted and deleted; a changed line counts as both an insertion and a deletion. However, there are many possible solutions and many of them may differ.
part2/execute.c | 492 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
part2/execute_utils.c | 30 +++++++++++++
part2/logs.c | 161 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
part2/logs.h | 17 ++++++--
part2/mem.c | 73 ++++++++++++++++++++++++++++++++
5 files changed, 769 insertions(+), 4 deletions(-)
Submission [<<] [<] [>]
- Create a text file autolab.txt. The first line has to be the name of your p1.2 project in gitlab. So p1.2_email1_email2.
- The following lines have to contain a long, random secret. Commit and push to gitlab. We will test the length and randomness of this secret by running tar -cjf size.tar.bz2 autolab.txt.
- When you want to run the autograder in autolab, you have to upload your autolab.txt. Autolab will clone, from gitlab, the default branch (in most case is "main" branch), of the repo specified in the autolab.txt you uploaded and then continue grading only if all of these conditions are met:
- The autolab.txt you uploaded and the one in the clone repo are identical.
- The size of the generated size.tar.bz2 is at least 1000B.
- Only the files from the framework are present in the cloned repo.
- Autolab will replace all files except *.c, *.h with the framework versions before compiling your code.
Autolab Results [<<] [<]
tests 0-xx stands for common results, and the cases are listed below.
Test Case Number | Point of Interest |
0-00 | Basic arithmetic operations |
0-01 | Instructions involving unsigned numbers |
0-02 & 0-03 | Logs component and prev operations |
0-04 | ecall |
0-05 | branch and jump |
0-06 | Error handling |
0-07 & 0-08 | Arithmetic involving negative numbers |
0-09 ~ 0-14 | Complex |
tests 0-xx-mem stands for memory leak detection test. All common tests are enforced with a memory leak detection test.