Homework 2

Computer Architecture I @ ShanghaiTech University

Fetch the homework here on Github classroom, you will use Github for version control and Gradescope for submission.

Introduction

Please write a program that takes 32-bit binary strings as input, simulates the addition operation of IEEE 754 single-precision floating-point numbers, and outputs the binary representation of the sum.
Before you begin, make sure that you understand the following concepts:

Sign bit (1 bit), exponent bits (8 bits), and mantissa bits (23 bits).
The IEEE 754 standard stores a biased exponent, the bias of normalized numbers is different from that of denormalized numbers.
There are several possible representations of zero, infinity, and NaN.

As stated in the lecture slides, the steps of the addition operation are as follows:

Align the exponents of the two floating-point numbers to make them equal.
Add the two mantissa to get the result mantissa.
Normalize the result mantissa to ensure that the number of mantissa bits meets the standard requirements.
If the result mantissa overflows, perform rounding and adjust the exponent.

If you have trouble understanding the concepts, you may try this tiny calculator.
Note that this calculator uses round to nearest, ties to even as the rounding method, which is different from the rounding method we use in this assignment.

Specification

Input

We guarantee that the input bitstring shall never be NaN, infinity, or zero.

Precision

Before calculating the mantissa, we add three extra bits to preserve the precision of the result. The three extra bits are the guard bit, the round bit, and the sticky bit.
Here is a brief explanation of these bits:

Mantissa of the number (assume normalized):
     1.XXXXXXXXXXXXXXXXXXXXXXX   0   0   0

     ^         ^                 ^   ^   ^
     |         |                 |   |   |
     |         |                 |   |   -  sticky bit (s)
     |         |                 |   -  round bit (r)
     |         |                 -  guard bit (g)
     |         -  23-bit mantissa from a representation
     -  hidden bit

The guard bit and the round bit are the two bits immediately to the right of the least significant bit of the mantissa. They simply give two extra bits of precision. The sticky bit is an indication of whether there are any non-zero bits to the right of the round bit.

                                              g r s
    Initial:        1.11000000000000000010100 0 0 0
    Shift 1 bit:    0.11100000000000000001010 0 0 0
    Shift 2 bits:   0.01110000000000000000101 0 0 0
    Shift 3 bits:   0.00111000000000000000010 1 0 0
    Shift 4 bits:   0.00011100000000000000001 0 1 0
    Shift 5 bits:   0.00001110000000000000000 1 0 1
    Shift 6 bits:   0.00000111000000000000000 0 1 1
    Shift 7 bits:   0.00000011100000000000000 0 0 1
    Shift 8 bits:   0.00000001110000000000000 0 0 1

After the mantissa calculation and the normalization, the result mantissa should be rounded to 23 bits.

Rounding

In this assignment, we use truncation as the rounding method.
Truncation, so-called "round-towards-zero", is a method of rounding towards the nearest integer by dropping the fractional part of the number.

Submission

Upload FloatCalculate.c and FloatCalculate.h to Gradescope and wait for the result.

Notice

We have provided a test.c for you to generate new testcases and find the correct answer. You can use it to test your program.
You might notice that a possible trick for bypassing the calculation is to convert the input bitstring to a (builtin) float and then use the float to do the rest of the calculation.
However, this is NOT ALLOWED in this assignment. We will check the validity of your program both automatically and manually.
On the autograder, we had banned those function calls that would change the rounding mode, like fesetround(FE_TOWARDZERO). We have also banned inline assembly asm, __asm, and __asm__ in the autograder.
In the template, we have provided CMakeLists.txt and Makefile to help you compile the program. You can choose either of them. They will not be submitted to Gradescope anyway.

Code quality

Your code with be compiled with -Wpedantic -Wall -Wextra -Wvla -Werror -std=c11
No memory leak is allowed, Valgrind will be used to detect memory leaks on runtime.