CS110 Lab 12
Computer Architecture I
@
ShanghaiTech University
Download the starter code here
Introduction to SIMD
SIMD makes a program faster by executing the same instruction on multiple data at the same time.
In this lab, we will use Intel Intrinsics to implement simple programs.
Part 1: Vector shifting
In this part, you will implement a vector shifting program using SIMD.
1. Please "translate" naive_shift_right() to simd_shift_right().
You may use the following intrinsics, search in the Intel Intrinsics Guide:
- _mm_loadu_si128
- _mm_storeu_si128
- _mm_srlv_epi32
2. Try to tell the difference of the following "load" intrisics:
- _mm_load_si128
- _mm_loadu_si128
- _mm_load_pd
- _mm_load1_pd
3. Determine which header files are needed for these intrinsics: (e.g. To use _mm_srlv_epi32, we need to include "immintrin.h")
- __m128i _mm_abs_epi16 (__m128i a)
- __m128i _mm_add_epi32 (__m128i a, __m128i b)
- __m128 _mm_ceil_ps (__m128 a)
- _mm_load1_pd
Checkoff
- Successfully run part 1 without assertion errors.
- Explain the differences between the loading intrinsics.
- Identify the correct header files for each intrinsic.
Part 2: Matrix multiplication
In this part, you will implement a matrix multiplication program using SIMD. Please "translate" naive_matmul() to simd_matmul().
You may use the following intrinsics:
- _mm_setzero_ps
- _mm_set1_ps
- _mm_loadu_ps
- _mm_add_ps
- _mm_mul_ps
- _mm_storeu_ps
Checkoff
- Successfully run part `simd_matmul()` without assertion errors.
Part 3: Loop unrolling
Read Wikipedia and try to understand the concept of loop unrolling:
Checkoff
- Implement loop_unroll_matmul() and loop_unroll_simd_matmul(). Explain the performance boost they brought.
Part 4: Compiler optimization
Run make test2
, explain why -O3
makes the program much faster.
Checkoff
- Explain why
-O3
makes the program much faster.
Hint: Put this piece of code into godbolt.org , compile them with a risc-v compiler, and tell the difference between -O0
and -O3
.
int a = 0;
void modify(int j) {
a += j;
}
int main() {
for (int i = 0; i < 1000; i++) {
a += 1;
}
for (int i = 0; i < 1000; i++) {
a += i;
}
return a;
}