CS110 Lab 12

Computer Architecture I @ ShanghaiTech University
Download the starter code here

Introduction to SIMD

SIMD makes a program faster by executing the same instruction on multiple data at the same time. In this lab, we will use Intel Intrinsics to implement simple programs.

Part 1: Vector shifting

In this part, you will implement a vector shifting program using SIMD.

1. Please "translate" naive_shift_right() to simd_shift_right(). You may use the following intrinsics, search in the Intel Intrinsics Guide:

2. Try to tell the difference of the following "load" intrisics:

3. Determine which header files are needed for these intrinsics: (e.g. To use _mm_srlv_epi32, we need to include "immintrin.h")

Checkoff

  1. Successfully run part 1 without assertion errors.
  2. Explain the differences between the loading intrinsics.
  3. Identify the correct header files for each intrinsic.

Part 2: Matrix multiplication

In this part, you will implement a matrix multiplication program using SIMD. Please "translate" naive_matmul() to simd_matmul().
You may use the following intrinsics:

Checkoff

  1. Successfully run part `simd_matmul()` without assertion errors.

Part 3: Loop unrolling

Read Wikipedia and try to understand the concept of loop unrolling:

Checkoff

  1. Implement loop_unroll_matmul() and loop_unroll_simd_matmul(). Explain the performance boost they brought.

Part 4: Compiler optimization

Run make test2, explain why -O3 makes the program much faster.

Checkoff

  1. Explain why -O3 makes the program much faster.

Hint: Put this piece of code into godbolt.org , compile them with a risc-v compiler, and tell the difference between -O0 and -O3.


int a = 0;



void modify(int j) {

    a += j;

}





int main() {

    for (int i = 0; i < 1000; i++) {

        a += 1;

    }



    for (int i = 0; i < 1000; i++) {

        a += i;

    }



    return a;

}