Project 1: Make a Toy Venus

Project 1.2: A RISC-V Emulator

Computer Architecture I ShanghaiTech University

Project 1.1 Project 1.2

IMPORTANT INFO - PLEASE READ

The projects are part of your design project worth 2 credit points. As such they run in parallel to the actual course. So be aware that the due date for project and homework might be very close to each other! Start early and do not procrastinate.

Goal of Project 1
Introduction to Project 1.2
Background of The Instruction Set
Getting Started ( Step 0, Step 1, Step 2, Step 3, Step 4 )
Testing
Running the Emulator
Notices on Grading
How many lines of code I need to write?
Submission
Autolab Results

Goal of Project 1 [<<] [<] [>]

In Project 1, you will make a simple toy Venus, consisting of both assembler and emulator. If you are not yet familiar with Venus, try it out now. At the end of this project, you will understand the C programming language and the underlying execution of RISC-V assembly.

Introduction to Project 1.2 [<<] [<] [>]

In Project 1.1, we have completed an RISC-V assembler. Now we need to complete a simple emulator in C programming language to execute instructions. The emulator can be divided into the following components.

Launching and Running: This part, mainly in emulator.c, is responsible for creating a emulator and performing operations such as run(), step(), prev(), etc. We've done this part for you so you can ignore it!
Execution: This part, mainly in execute.c and execute_utils.c, is responsible for executing an instruction and finishing the corresponding operations on cpu and memory. You need to work on this component.
Memory: This part, mainly in mem.c, is responsible for simple conversion of memory addresses and manipulation of memory such as store() and load(). You need to work on this component.
Logs component: This part, mainly in logs.c, is responsible for record and save logs to go back in time. You need to work on this component.

Background of The Instruction Set [<<] [<] [>]

Registers

Please consult the RISC-V Green Sheet (PDF) for register numbers, instruction opcodes, and bitwise formats. Our asembler will support all 32 registers: x0, ra, sp, gp, tp, t0-t6, s0 - s11, a0 - a7. The name x0 can be used in lieu of zero. Other register numbers (eg. x1, x2, etc.) shall be also supported. Note that floating point registers are not included in project 1.

Instructions

We will have 42 instructions 8 pseudo-instructions. The instructions are:

Instruction	Type	Opcode	Funct3	Funct7/IMM	Operation
add rd, rs1, rs2	R	0x33	0x0	0x00	R[rd] ← R[rs1] + R[rs2]
mul rd, rs1, rs2			0x0	0x01	R[rd] ← (R[rs1] * R[rs2])[31:0]
sub rd, rs1, rs2			0x0	0x20	R[rd] ← R[rs1] - R[rs2]
sll rd, rs1, rs2			0x1	0x00	R[rd] ← R[rs1] << R[rs2]
mulh rd, rs1, rs2			0x1	0x01	R[rd] ← (R[rs1] * R[rs2])[63:32]
slt rd, rs1, rs2			0x2	0x00	R[rd] ← (R[rs1] < R[rs2]) ? 1 : 0
sltu rd, rs1, rs2			0x3	0x00	R[rd] ← (U(R[rs1]) < U(R[rs2])) ? 1 : 0
xor rd, rs1, rs2			0x4	0x00	R[rd] ← R[rs1] ^ R[rs2]
div rd, rs1, rs2			0x4	0x01	R[rd] ← R[rs1] / R[rs2]
srl rd, rs1, rs2			0x5	0x00	R[rd] ← R[rs1] >> R[rs2]
sra rd, rs1, rs2			0x5	0x20	R[rd] ← R[rs1] >> R[rs2]
or rd, rs1, rs2			0x6	0x00	R[rd] ← R[rs1] \| R[rs2]
rem rd, rs1, rs2			0x6	0x01	R[rd] ← (R[rs1] % R[rs2]
and rd, rs1, rs2			0x7	0x00	R[rd] ← R[rs1] & R[rs2]
lb rd, offset(rs1)	I	0x03	0x0		R[rd] ← SignExt(Mem(R[rs1] + offset, byte))
lh rd, offset(rs1)			0x1		R[rd] ← SignExt(Mem(R[rs1] + offset, half))
lw rd, offset(rs1)			0x2		R[rd] ← Mem(R[rs1] + offset, word)
lbu rd, offset(rs1)			0x4		R[rd] ← U(Mem(R[rs1] + offset, byte))
lhu rd, offset(rs1)			0x5		R[rd] ← U(Mem(R[rs1] + offset, half))
addi rd, rs1, imm		0x13	0x0		R[rd] ← R[rs1] + imm
slli rd, rs1, imm			0x1	0x00	R[rd] ← R[rs1] << imm
slti rd, rs1, imm			0x2		R[rd] ← (R[rs1] < imm) ? 1 : 0
sltiu rd, rs1, imm			0x3		R[rd] ← (U(R[rs1]) < U(imm)) ? 1 : 0
xori rd, rs1, imm			0x4		R[rd] ← R[rs1] ^ imm
srli rd, rs1, imm			0x5	0x00	R[rd] ← R[rs1] >> imm
srai rd, rs1, imm			0x5	0x20	R[rd] ← R[rs1] >> imm
ori rd, rs1, imm			0x6		R[rd] ← R[rs1] \| imm
andi rd, rs1, imm			0x7		R[rd] ← R[rs1] & imm
jalr rd, rs1, imm		0x67	0x0		R[rd] ← PC + 4 PC ← R[rs1] + imm
ecall		0x73	0x0	0x000	(Transfers control to operating system) a0 = 1 is print value of a1 as an integer. a0 = 4 is print the string at address a1. a0 = 10 is exit or end of code indicator. a0 = 11 is print value of a1 as a character.
sb rs2, offset(rs1)	S	0x23	0x0		Mem(R[rs1] + offset) ← R[rs2][7:0]
sh rs2, offset(rs1)			0x1		Mem(R[rs1] + offset) ← R[rs2][15:0]
sw rs2, offset(rs1)			0x2		Mem(R[rs1] + offset) ← R[rs2]
beq rs1, rs2, offset	SB	0x63	0x0		if(R[rs1] == R[rs2]) PC ← PC + {offset, 1b'0}
bne rs1, rs2, offset			0x1		if(R[rs1] != R[rs2]) PC ← PC + {offset, 1b'0}
blt rs1, rs2, offset			0x4		if(R[rs1] < R[rs2]) PC ← PC + {offset, 1b'0}
bge rs1, rs2, offset			0x5		if(R[rs1] >= R[rs2]) PC ← PC + {offset, 1b'0}
bltu rs1, rs2, offset			0x6		if(U(R[rs1]) < U(R[rs2])) PC ← PC + {offset, 1b'0}
bgeu rs1, rs2, offset			0x7		if(U(R[rs1]) >= U(R[rs2])) PC ← PC + {offset, 1b'0}
auipc rd, offset	U	0x17			R[rd] ← PC + {offset, 12b'0}
lui rd, offset	U	0x37			R[rd] ← {offset, 12b'0}
jal rd, imm	UJ	0x6f			R[rd] ← PC + 4 PC ← PC + {imm, 1b'0}

The pseudo-instructions are:

Pseudo-instruction	Format	Uses
Branch on Equal to Zero	beqz rs1, label	beq
Branch on not Equal to Zero	bnez rs1, label	bne
Jump	j label	jal
Jump Register	jr rs1	jalr
Load Address	la rd, label	auipc, addi
Load Immediate	li rd, immediate	lui, addi
Load Word at address of Label	lw rd, label	auipc, lw
Move	mv rd, rs1	addi

For further reference, here are the bit lengths of the instruction components.

R-TYPE	funct7	rs2	rs1	funct3	rd	opcode
Bits	7	5	5	3	5	7

I-TYPE	imm[11:0]	rs1	funct3	rd	opcode
Bits	12	5	3	5	7

S-TYPE	imm[11:5]	rs2	rs1	funct3	imm[4:0]	opcode
Bits	7	5	5	3	5	7

SB-TYPE	imm[12]	imm[10:5]	rs2	rs1	funct3	imm[4:1]	imm[11]	opcode
Bits	1	6	5	5	3	4	1	7

U-TYPE	imm[31:12]	rd	opcode
Bits	20	5	7

UJ-TYPE	imm[20]	imm[10:1]	imm[11]	imm[19:12]	rd	opcode
Bits	1	10	1	8	5	7

Getting Started [<<] [<] [>]

Step 0: Obtaining the Files

    Caveat

    If you want to use the files from Project 1.1, which may help you to generate some testcases, you are recommended to copy previous source files into Project 1.2 working directory.
    We modified the Makefile for project 1.2.  Please be aware of that when importing your Project 1.1 code.
  

Clone your p1 repository from gitlab. You may want to change http to https.
In the repository add a remote repo that contains the framework files:
git remote add framework https://autolab.sist.shanghaitech.edu.cn/gitlab/cs110_23s/p1.2_framework.git (or change to http)
Go and fetch the files:
git fetch framework
Now merge those files with your main branch:
git merge framework/main
The rest of the git commands work as usual.

Step 1: Building Blocks

Finish the implementation of bitSigner(), get_imm_operand(), get_branch_offset(), get_jump_offset(), get_store_offset() in execute_utils.c.

These functions extract the immediate numbers and offsets contained in the instructions to facilitate subsequent execution. If you have completed project 1.1, you should be familiar with this procedure, as you have done the reverse of it.

Step 2: Complete Memory Component

You should complete the memory component before the execution component. Finish the implementation of store() and load() in mem.c. You should check the alignment first if needed. check_alignment() in mem.c could help you.

Before operating on memory, a conversion from virtual memory addresses to real memory addresses needs to be completed. To keep things simple, we will convert a section of space from the start of segment directly to a real section of "physical memory address". You should follow the mapping given below.

    Caveat

    You have to fix the mapping from virtual DATA_BASE and TEXT_BASE to be sequential onto physical memory.  Plus, DATA_BASE and TEXT_BASE have to be mapped to MEMORY_DATA and MEMORY_TEXT, respectively.

    Caveat

    Raise segmentation fault when accessing an unmapped memory location and attempting to store in text segment. For an incorrect load you should raise the corresponding error and return a value of 0. For an incorrect store you should raise the corresponding error and return. Do not interrupt or exit the program.

Step 3: Time to Execute

execute() in execute.c executes the instruction provided as a parameter. This should modify the appropriate registers, make any necessary calls to memory, and update the program counter to refer to the next instruction to execute. You need to complete all the above instructions.

You can use the framework we provide. In that case you will need to complete execute_rtype(), execute_itype_except_load(), execute_ecall(), execute_branch(), execute_load(), execute_store(), execute_jal(), execute_jalr() and execute_utype(), or you can implement your own function, but please do not change declaration or feature of execute().

Note that you should use write_to_log() when implementing printing function of ecall.

You will find what you have completed in the first two steps very useful in this step.

    Caveat

    For unrecognised instructions you should use raise_machine_code_error(). For undefined ecalld calls you should use raise_undefined_ecall_error().
    During access operations, the alignment is checked according to the corresponding instruction.
    Raise error when an error occurs and then continue with the next instruction without interrupting or exiting the program.
  

Step 4: Don't Forget Logs

In the logs component logs.c, you need to implement a data structure to save the current state and return to the previous state. The main purpose of this component is to implement the prev function of venus. Try the prev function in venus and think about what you would do to implement one of these functions. You need to ensure that the following interfaces are completed.

Logs: You can change its contents or define more structures, but you can't change its name. It represents the log component you created.

create_logs(): Create new logs component, return a pointer to its Logs.
free_logs(): Free a logs component.
record(): record the current status in a logs component.
undo(): back to the previous state in a logs component.

You should use dynamic memory allocation to resize the logs component when needed. Consider what information you need to record, the smaller the size of each record, the better.

Testing [<<] [<] [>]

You are responsible for testing your code. While we have provided a few test cases, they are by no means comprehensive. Fortunately, you have a variety of testing tools at your service.

Valgrind

You should use Valgrind to check whether your code has any memory leaks. We have included a file, run-valgrind, which will run Valgrind on any executable of your choosing. If you get a permission denied error, try changing adding the execute permission to the file:

chmod u+x run-valgrind

Then you can run by typing:

./run-valgrind <whatever program you want to run>

Venus

Since you're writing an emulator, why not refer to an existing one? Venus is a powerful reference for you to use, and you are encouraged to write your own RISC-V files and assemble them using Venus.

        Caveat

        The initial values of the registers and the initial values of the memory will be somewhat different, this is also related to the version of venus you are using, you can ignore these differences as we have done the initialisation of the memory and registers for you in the project.

Diff

diff is a utility for comparing the contents of files. Running the following command will print out the differences between file1 and file2:

diff <file1> <file2>

To see how to interpret diff results, click here.

Running the Emulator [<<] [<] [>]

First, make sure your emulator executable is up to date by running make.

By default, the program gives help message if you do not specify any arguments.

To run emulator, use the following command.

./riscv emulator <text_file> <data_file>

If you have a finished part1, you can also use an assembly file as input file like this:

./riscv <input file>

When emulator is running you can use the following commands to debug.

run, step, prev, dump, trace, reset, exit, print <args>.

where the <args> in print <args> can be either "pc" "reg" or a memory address

Notices on Grading [<<] [<] [>]

The Autolab will enforce a proper amount of comments again!
Make sure you add proper comments - about two every eight lines of code that you ADD. We will check this by hand.
The Autolab will again use -Wpedantic -Wall -Wextra -Werror -std=c89. You may edit the Makefile if you need to - but the Autolab will replace the CFLAGS line. Your compilation step will have to use the CFLAGS!

How many lines of code I need to write? [<<] [<] [>]

Here is a summary of the solution code. The final row gives total lines inserted and deleted; a changed line counts as both an insertion and a deletion. However, there are many possible solutions and many of them may differ.


      part2/execute.c       | 492 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      part2/execute_utils.c |  30 +++++++++++++
      part2/logs.c          | 161 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
      part2/logs.h          |  17 ++++++--
      part2/mem.c           |  73 ++++++++++++++++++++++++++++++++
      5 files changed, 769 insertions(+), 4 deletions(-)

Submission [<<] [<] [>]

Create a text file autolab.txt. The first line has to be the name of your p1.2 project in gitlab. So p1.2_email1_email2.
The following lines have to contain a long, random secret. Commit and push to gitlab. We will test the length and randomness of this secret by running tar -cjf size.tar.bz2 autolab.txt.
When you want to run the autograder in autolab, you have to upload your autolab.txt. Autolab will clone, from gitlab, the default branch (in most case is "main" branch), of the repo specified in the autolab.txt you uploaded and then continue grading only if all of these conditions are met:
1. The autolab.txt you uploaded and the one in the clone repo are identical.
2. The size of the generated size.tar.bz2 is at least 1000B.
3. Only the files from the framework are present in the cloned repo.
Autolab will replace all files except *.c, *.h with the framework versions before compiling your code.

Autolab Results [<<] [<]

tests 0-xx stands for common results, and the cases are listed below.

Test Case Number	Point of Interest
0-00	Basic arithmetic operations
0-01	Instructions involving unsigned numbers
0-02 & 0-03	Logs component and prev operations
0-04	ecall
0-05	branch and jump
0-06	Error handling
0-07 & 0-08	Arithmetic involving negative numbers
0-09 ~ 0-14	Complex

tests 0-xx-mem stands for memory leak detection test. All common tests are enforced with a memory leak detection test.