Project 1.2: MD5 in RISC-V (Individual Project)

Computer Architecture I ShanghaiTech University
Project 1.1 Project 1.2

IMPORTANT INFO - PLEASE READ

The projects are part of your design project worth 2 credit points. As such they run in parallel to the actual course. So be aware that the due date for project and homework might be very close to each other! Start early and do not procrastinate.

Introduction

In project 1.2, you will implement the MD5 cryptographic hash algorithm using the RISC-V instruction set. This task will provide practical experience in both cryptographic algorithms and assembly language programming.

Before you start, please accept the assignment via this link. A repository containing the starter code will be generated to you. You can then start on the assignment by cloning it to your Ubuntu system. To submit the assignment, you should push your local clone to GitHub and turn in your work on Gradescope by connecting to your GitHub repo.

Academic integrity is strictly enforced in this course and any plagiarism behavior will cause serious consequence. You have been warned.

Background

MD5 (Message-Digest Algorithm 5) is a widely used hash function and was employed in various security applications. It can be utilized as a checksum to verify data integrity and detect accidental corruption, producing a 128-bit message digest from messages of any length.

Security Note

While MD5 is cryptographically broken, its low computational requirements make it well-suited for some non-cryptographic applications. Also, it is easy to implement, renders it an excellent educational tool for learning about hash functions and low-level bitwise operations.

MD5 Algorithm Description

The algorithm takes a message of b bits as input, where b is an arbitrary nonnegative integer, and produces a corresponding message digest. In this project, you may assume that b is always a multiple of 8.

Step 1: Append Padding Bits

The message is first padded so that its length is congruent to 448 modulo 512 (64 bit less than the exact multiple of 512).

The padding procedure works as follows:

Note

Padding is always performed, even if the length of message is already congruent to 448 modulo 512. In this case, a '1' bit and 511 '0' bits are added, creating a new 512-bit block to reserve space for the next step.

Step 2: Append Length Bits

A 64-bit representation of b (the length of the message before padded) is appended to the result of previous step. In this project, you can assume b is never greater than 264.

Step 3: Initialize MD Buffer

A 4-word buffer (A, B, C, D) is used to compute the message digest. Each of A, B, C, D is 32 bits, and initialize to following values in hexadecimal:

# Initial buffer value
initial_buffer:
    .word 0x67452301 # A
    .word 0xefcdab89 # B
    .word 0x98badcfe # C
    .word 0x10325476 # D

After applying the previous steps, the final padded message length will be an exact multiple of 512 bits.

The MD5 algorithm processes the padded message in successive 512-bit blocks.

Step 4: Process Message in 512-bit Blocks

Each 512-bit block undergoes four similar rounds, each consisting of 16 operations. The figure illustrate one operation within a round:

The operation update the MD buffer. For better explanation, we refer to the updated buffer as A', B', C', D'. The buffer values are updated as follows:

To figure out B':

  1. Apply Auxiliary Function F: F is an auxiliary function that takes three 32-bit words, i.e., B, C, D, and produce one 32-bit word. A different auxiliary function is used in each round:

    Round Auxiliary function F(B, C, D)
    1 (B ∧ C) ∨ (¬B ∧ D)
    2 (B ∧ D) ∨ (C ∧ ¬D)
    3 B ⊕ C ⊕ D
    4 C ⊕ (B ∨ ¬D)
  2. Add Message Word Mi: A 512-bit message block is divided to 16 32-bit words. The word used in i-th operation Mi is indexed as follows:

    Round Mi
    1 M[i]
    2 M[(5i + 1) mod 16]
    3 M[(3i + 5) mod 16]
    4 M[7i mod 16]
  3. Add Table Constant Ki: Ki denotes i-th element of a 64-element table, which is constructed from the sine function. The elements of the table are given:

    # 64-element table constructed from the sine function
    K:
        .word 0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee
        .word 0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501
        .word 0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be
        .word 0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821
        .word 0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa
        .word 0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8
        .word 0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed
        .word 0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a
        .word 0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c
        .word 0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70
        .word 0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x04881d05
        .word 0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665
        .word 0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039
        .word 0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1
        .word 0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1
        .word 0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391
  4. Left bit rotation by s bits: s varies for each operation and per-operation shift amounts are given:

    # Per-round shift amounts
    S:
        .word 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22
        .word 5,  9, 14, 20, 5,  9, 14, 20, 5,  9, 14, 20, 5,  9, 14, 20
        .word 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23
        .word 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21
  5. ⊞ denotes addition modulo 232.

After completing all 64 operations, update A, B, C, D by adding their original values (before this 512-bit block was start) to the new computed values.

Step 5: Output

The message digest produced as output is A, B, C, D. That is, we begin with the low-order byte of A, and end with the high-order byte of D.

The pseudocode can serve as a helpful resource to enhance your understanding.

Implementation

Input

A reference input is already provided to you in the input.S file. The final tests's input have the same format as the provided input except the length and contents of input message.


    .data

# Length of Input Message
# A 32-bit integer specifying the exact number of bytes in 'message'.
    .globl length_of_input_message
length_of_input_message:
    .word 6

# Input Message
# ASCII string to be processed by the MD5 hash function.
# Ensure the length matches the integer specified above (excluding the null terminator).
    .globl message
message:
    .asciiz "CS110P"

Output

In this project, you need to output the message digest as a string of hexadecimal digit.

For input.S, the output should be:

aa324c08d002568e95fe519b355aa3db
Exited with error code 0

It's usually the duty of the supervisor (operating system) to deal with input/output and halting program execution. Venus, being a simple emulator, does not offer us such luxury, but supports a list of primitive environmental calls. Since Venus does not provide an environmental call to directly print binary, you may need to combine some of the environmental calls to achieve the above output. The following environmental calls could be helpful.

ID (a0)NameDescription
1print_intprints integer in a1
11print_characterprints ASCII character in a1
17exit2ends the program with return code in a1

A snippet of assembly of how to exit with error code 0 is already provided to you in main.S. The output line 'Exited with error code 0' is neccessary. So please use ID17(exit2) environmental environmental call instead of ID10(exit) environmental call.

Test

The command that we use to test your program's correctness is

diff <your_md5_output> <reference_output>

You can also test your result using this command.

Execution

You need java to run the venus in terminal. Try to run sudo apt install openjdk-17-jre to download java-17.

Make sure that venus-jvm-latest.jar, md5.S, main.S and input.S reside in the same directory. To run your program locally and write the output to md5.out, use the following command. (Note that this command will overwrite the result file even if there's something there)

java -jar venus-jvm-latest.jar -cc main.S > md5.out

Please notice this command run main.S with the calling convention checker (-cc). Ensure that your implementation adheres to the calling conventions and that no calling convention errors are reported during execution.

To debug your program online, you might want to replace .import input.S and .import md5.S in main.S with the content of input.S and md5.S.

Tips

Submission

You should submit your code via Github. Please follow the guidance in Gradescope to submit your codes on Github. Please make sure you do not modify main.S.


In Project 1.2 are,
Chundong Wang <wangchd AT shanghaitech.edu.cn>
Siting Liu <liust AT shanghaitech.edu.cn>
Yuan Xiao <Yuan Xiao AT shanghaitech.edu.cn>
and,
Daqian Cao <caodq AT shanghaitech.edu.cn>
Hanjia Cui <cuihj2024 AT shanghaitech.edu.cn>

Last modified: