Project 1.2: MD5 in RISC-V (Individual Project)
IMPORTANT INFO - PLEASE READ
The projects are part of your design project worth 2 credit points. As such they run in parallel to the actual course. So be aware that the due date for project and homework might be very close to each other! Start early and do not procrastinate.
Introduction
In project 1.2, you will implement the MD5 cryptographic hash algorithm using the RISC-V instruction set. This task will provide practical experience in both cryptographic algorithms and assembly language programming.
Before you start, please accept the assignment via this link. A repository containing the starter code will be generated to you. You can then start on the assignment by cloning it to your Ubuntu system. To submit the assignment, you should push your local clone to GitHub and turn in your work on Gradescope by connecting to your GitHub repo.
Academic integrity is strictly enforced in this course and any plagiarism behavior will cause serious consequence. You have been warned.
Background
MD5 (Message-Digest Algorithm 5) is a widely used hash function and was employed in various security applications. It can be utilized as a checksum to verify data integrity and detect accidental corruption, producing a 128-bit message digest from messages of any length.
Security Note
While MD5 is cryptographically broken, its low computational requirements make it well-suited for some non-cryptographic applications. Also, it is easy to implement, renders it an excellent educational tool for learning about hash functions and low-level bitwise operations.
MD5 Algorithm Description
The algorithm takes a message of b bits as input, where b is an arbitrary nonnegative integer, and produces a corresponding message digest. In this project, you may assume that b is always a multiple of 8.
Step 1: Append Padding Bits
The message is first padded so that its length is congruent to 448 modulo 512 (64 bit less than the exact multiple of 512).
The padding procedure works as follows:
- First, append a single bit "1" to the end of the message.
- Then, append as many "0" bits as needed to achieve the required message length.
Note
Padding is always performed, even if the length of message is already congruent to 448 modulo 512. In this case, a '1' bit and 511 '0' bits are added, creating a new 512-bit block to reserve space for the next step.
Step 2: Append Length Bits
A 64-bit representation of b (the length of the message before padded) is appended to the result of previous step. In this project, you can assume b is never greater than 264.
Step 3: Initialize MD Buffer
A 4-word buffer (A, B, C, D) is used to compute the message digest. Each of A, B, C, D is 32 bits, and initialize to following values in hexadecimal:
# Initial buffer value
initial_buffer:
.word 0x67452301 # A
.word 0xefcdab89 # B
.word 0x98badcfe # C
.word 0x10325476 # D
After applying the previous steps, the final padded message length will be an exact multiple of 512 bits.
The MD5 algorithm processes the padded message in successive 512-bit blocks.
Step 4: Process Message in 512-bit Blocks
Each 512-bit block undergoes four similar rounds, each consisting of 16 operations. The figure illustrate one operation within a round:
The operation update the MD buffer. For better explanation, we refer to the updated buffer as A', B', C', D'. The buffer values are updated as follows:
- B is directly fed into C',
- C is directly fed into D',
- D is directly fed into A'.
To figure out B':
-
Apply Auxiliary Function F: F is an auxiliary function that takes three 32-bit words, i.e., B, C, D, and produce one 32-bit word. A different auxiliary function is used in each round:
Round Auxiliary function F(B, C, D) 1 (B ∧ C) ∨ (¬B ∧ D) 2 (B ∧ D) ∨ (C ∧ ¬D) 3 B ⊕ C ⊕ D 4 C ⊕ (B ∨ ¬D) -
Add Message Word Mi: A 512-bit message block is divided to 16 32-bit words. The word used in i-th operation Mi is indexed as follows:
Round Mi 1 M[i] 2 M[(5i + 1) mod 16] 3 M[(3i + 5) mod 16] 4 M[7i mod 16] -
Add Table Constant Ki: Ki denotes i-th element of a 64-element table, which is constructed from the sine function. The elements of the table are given:
# 64-element table constructed from the sine function K: .word 0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee .word 0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501 .word 0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be .word 0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821 .word 0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa .word 0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8 .word 0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed .word 0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a .word 0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c .word 0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70 .word 0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x04881d05 .word 0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665 .word 0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039 .word 0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1 .word 0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1 .word 0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391
-
Left bit rotation by s bits: s varies for each operation and per-operation shift amounts are given:
# Per-round shift amounts S: .word 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22 .word 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20 .word 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23 .word 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21
-
⊞ denotes addition modulo 232.
After completing all 64 operations, update A, B, C, D by adding their original values (before this 512-bit block was start) to the new computed values.
Step 5: Output
The message digest produced as output is A, B, C, D. That is, we begin with the low-order byte of A, and end with the high-order byte of D.The pseudocode can serve as a helpful resource to enhance your understanding.
Implementation
Input
A reference input is already provided to you in the input.S file. The final tests's input have the same format as the provided input except the length and contents of input message.
.data
# Length of Input Message
# A 32-bit integer specifying the exact number of bytes in 'message'.
.globl length_of_input_message
length_of_input_message:
.word 6
# Input Message
# ASCII string to be processed by the MD5 hash function.
# Ensure the length matches the integer specified above (excluding the null terminator).
.globl message
message:
.asciiz "CS110P"
Output
In this project, you need to output the message digest as a string of hexadecimal digit.
For input.S, the output should be:
aa324c08d002568e95fe519b355aa3db
Exited with error code 0
It's usually the duty of the supervisor (operating system) to deal with input/output and halting program execution. Venus, being a simple emulator, does not offer us such luxury, but supports a list of primitive environmental calls. Since Venus does not provide an environmental call to directly print binary, you may need to combine some of the environmental calls to achieve the above output. The following environmental calls could be helpful.
ID (a0 ) | Name | Description |
---|---|---|
1 | print_int | prints integer in a1 |
11 | print_character | prints ASCII character in a1 |
17 | exit2 | ends the program with return code in a1 |
A snippet of assembly of how to exit with error code 0 is already provided to you in main.S
. The output line 'Exited with error code 0' is neccessary. So please use ID17(exit2) environmental environmental call instead of ID10(exit) environmental call.
Test
The command that we use to test your program's correctness is
diff <your_md5_output> <reference_output>
You can also test your result using this command.
Execution
You need java to run the venus in terminal. Try to run sudo apt install openjdk-17-jre to download java-17.
Make sure that venus-jvm-latest.jar, md5.S, main.S and input.S reside in the same directory. To run your program locally and write the output to md5.out, use the following command. (Note that this command will overwrite the result file even if there's something there)
java -jar venus-jvm-latest.jar -cc main.S > md5.out
Please notice this command run main.S with the calling convention checker (-cc). Ensure that your implementation adheres to the calling conventions and that no calling convention errors are reported during execution.
To debug your program online, you might want to replace .import input.S
and .import md5.S in main.S with the content of input.S and md5.S.
Tips
- You can use any risc-v instructions as long as the venus can recognize them.
- Handwritten assembly are postfixed with extension .S to distinguish from compiler generated assembly .s
- You can learn more about how to use ecall from here.
- We will test your program using RISC-V emulator venus. Actually almost all things you need can be learnt from venus Wiki.
- Learn save and load from memory using RISC-V.
- Be careful about the calling convention, it will make life easier.
- Write comments.
- The test cases are very friendly! Don't focus too much on the edge cases, focus on the correctness on the common cases.
Submission
You should submit your code via Github. Please follow the guidance in Gradescope to submit your codes on Github. Please make sure you do not modify main.S.