Homework 3

Computer Architecture I @ ShanghaiTech University

Overview

In this homework, you will implement an efficient leaderboard system composed of two complementary data structures:

Before starting, accept the assignment from GitHub Classroom.

The Skip List maintains a mapping from score to member (allowing for rank-based operations), while the Hash Map maintains a mapping from member to score (enabling quick score lookups). Together, they provide optimal performance for all leaderboard operations.

The Hash Map (Dict) implementation (see src/dict.h) is provided for you; you should refer to its declaration for details.

Your task is to complete the interfaces of Skip List (see src/skiplist.h and src/skiplist.c), and use interfaces of Skip List and Hash Map to support Leaderboard operations.

Task 1: Implement Skip List

In this part, you only need to modify src/skiplist.h and src/skiplist.c.

A Skip List is a probabilistic data structure that uses multiple layers of linked lists to "skip" elements during lookups. This provides average time complexity similar to balanced trees but with simpler implementation. In our leaderboard, the Skip List orders all members by their scores, enabling efficient rank queries and range operations.

Skip List Structure and Visualization

A Skip List can be visualized as multiple linked lists stacked on top of each other, with the bottom level containing all elements and higher levels providing "express lanes" for faster traversal:

Level 3:  Header -----------> C -----------> Nil
Level 2:  Header -> A ------> C -----------> Nil
Level 1:  Header -> A -> B -> C -> D -> E -> Nil
Scores:            60   75   90  100  120

In this example:

The Skip List's efficiency depends on its probabilistic level structure. New nodes receive a random level between 1 and MAX_LEVEL based on a probability distribution where:

This creates a structure where approximately:

This multi-level structure enables faster average-case search time than linked lists by skipping many elements in higher levels.

You should refer to William Pugh's original paper Skip Lists: A Probabilistic Alternative to Balanced Trees to gain a thorough understanding of Skip Lists. This paper contains implementation details and performance analysis, explaining why a randomized structure can achieve predictably good performance. The pseudocode may help you clearly implement some interfaces. Details may differ, so please follow our documentation in the code.

Open Data Structures (in Java) (pages 99-115) also provides detailed and vivid explanations of Skip Lists with helpful illustrations that clarify the structure maintenance operations.

Data Structure

typedef struct SkipListNode {
  char *member;                   // Member name
  int score;                      // Member score
  struct SkipListNode **forward;  // Array of forward pointers
  int level;                      // Node level, range [1,MAX_LEVEL]
} SkipListNode;

typedef struct SkipList {
  SkipListNode *header;           // Sentinel node as entry point
  int level;                      // Current max level, range [1,MAX_LEVEL]
  int length;                     // Number of nodes (excluding header)
} SkipList;

Functions to Implement

The following functions can be used in your leaderboard implementation. It is recommended to implement them in the following order (see src/skiplist.c for details):

For detailed requirements of each function, please refer to the comments in src/skiplist.c. Here are some general guidelines for implementing key Skip List operations:

When finding a node by score, use the Skip List's multi-level structure for efficiency. Start from the highest level and work downward, moving forward at each level as far as possible without exceeding the target score. This allows you to "skip" many nodes during the search process.

For rank-based operations, you'll need to traverse the list from the beginning at level 1 and count nodes. For simplicity, our base implementation doesn't include a mechanism to track the number of skipped nodes, which means rank-based operations require linear-time-complexity traversal. Interested students can implement a span field at SkipList to track the number of nodes skipped at each level and start traversing from the highest level as searching by score, significantly accelerating rank-based queries.

When inserting nodes, carefully manage the multi-level structure by finding the right position, generating an appropriate random level for the new node, updating all forward pointers and adjust Skip List's level if needed. Similarly, when removing nodes, you'll need to update forward pointers to maintain the Skip List structure, adjust its level if needed and free any allocated resources.

Duplicate Insertion Handling

A key constraint in our Skip List implementation is that scores must be unique. As specified in skiplist.c:sl_insert(), insertion fails if a node with the same score already exists. This ensures that each score maps to exactly one member, which simplifies rank calculations and range queries.

Note that duplicate member checking is time-consuming in the Skip List and should be handled at a higher level using the dictionary (Hash Map). The leaderboard implementation should handle this by checking and updating members in the Hash Map before modifying the Skip List. So you can ignore this case when implementing Skip List.

Rank & Reverse Rank

Rank represents a member's position when ordered by score in ascending order. The member with the lowest score has rank 1, while the member with the highest score has rank N (where N is the total number of members).

Reverse Rank represents a member's position when ordered by score in descending order. The member with the highest score has reverse rank 1, while the member with the lowest score has reverse rank N.

Consider this example of members and their scores:

Member Score Rank Reverse Rank
A 70 1 4
B 85 2 3
C 100 3 2
D 120 4 1

Task 2: Implement the Leaderboard

In this part, you only need to modify src/leaderboard.c.

The leaderboard combines a skip list and hash map to provide a comprehensive ranking system. You can start the leaderboard CLI (command-line interface) by make run or make && ./build/bin/leaderboard_cli. Type "help" or "HELP" to see supported instructions and formats (or see main.c:print_help()).

Usage Example

Here's a complete example session with the leaderboard (messages following "//" are comments and are not actual outputs):

================================
Leaderboard
Type HELP for available commands
> ZADD Alice 100
(integer) 1         // Add successfully
> ZADD Bob 85
(integer) 1
> ZADD Charlie 120
(integer) 1  
> ZADD Dave 75
(integer) 1
> ZCARD
(integer) 4         // 4 members
> ZSCORE Alice
(integer) 100       // Alice's score: 100
> ZSCORE Eve
(nil)               // Eve is not in the leaderboard

> ZRANK Dave
(integer) 1         // Dave has the lowest score, so rank 1
> ZRANK Bob
(integer) 2         // Bob has the second lowest score, so rank 2
> ZRANK Alice
(integer) 3
> ZRANK Charlie
(integer) 4
> ZREVRANK Charlie
(integer) 1         // Charlie has the highest score, so reverse rank 1
> ZREVRANK Alice
(integer) 2
> ZREVRANK Bob
(integer) 3
> ZREVRANK Dave
(integer) 4

> ZRANGE 1 2
(array) 2 item(s)   // Members with rank 1-2 (lowest scores)
  #1: Dave (75)
  #2: Bob (85)
> ZREVRANGE 1 3
(array) 3 item(s)   // Members with reverse rank 1-3 (highest scores)
  #1: Charlie (120)
  #2: Alice (100)
  #3: Bob (85)
> ZRANGEBYSCORE 80 110 
(array) 2 item(s)   // Members with scores between 80-110
  #1: Bob (85)
  #2: Alice (100)

> ZADD Alice 95
(integer) 1         // Update successfully
> ZSCORE Alice
(integer) 95
> ZREM Bob
(integer) 1         // Remove Bob from leaderboard successfully
> ZSCORE Bob
(nil)               // Bob is now not in the leaderboard
> ZCARD
(integer) 3
> ZRANGE 2 1
(empty array)       // Invalid range
> Program exited!

The Hash Map (Dict) Component

The Hash Map provides O(1) lookups by member name, complementing the Skip List's efficient score-based operations. The implementation is provided for you in src/dict.h and src/dict.c. Here are the key operations you'll use:

Time Complexity Comparison

This time complexity comparison illustrates why combining a Skip List (for score-based operations) with a Hash Map (for member-based operations) creates an efficient leaderboard implementation.

Operation Linked List Hash Map Skip List (avg) Skip List (worst)
Search by member O(n) O(1) O(n) O(n)
Search by score O(n) O(n) O(log n) O(n)
Insert O(n) O(1) O(log n) O(n)
Delete O(n) O(1) O(log n) O(n)
Find by rank O(n) O(n) O(n) or O(log n)* O(n)
Range query O(n) O(n) O(log n + k) O(n)

*O(log n) if implemented with span field optimization

Where

Functions to Implement

All these functions should use print_int(), print_nil(), or print_array() (see src/leaderboard.c) to display their results.

Duplicate Insertion Handling

In Leaderboard, zadd() is responsible for adding or updating members. When zadd() prints 1, it indicates a successful addition or update; when it prints 0, it means no change was made to the leaderboard. Unlike the Skip List, this function handles several cases:

Negative (Reverse) Rank Range Queries Handling

For rank-based range operation (zrange), the start and end parameters can be negative to indicate positions relative to the end of the list, similar to array slicing in many programming languages:

To convert negative rank to positive ones:

if (rank < 0)
  rank = length + rank + 1;

A range query is considered valid if:

For example, with a leaderboard of 10 elements:

For reverse rank-based range operation (zrevrange), the start and end parameters can also be negative, but their interpretation is relative to the reversed order of the list:

The negative index conversion works the same way mathematically:

if (reverse_rank < 0)
  reverse_rank = length + reverse_rank + 1;

Similarly, a reverse range query is considered valid if:

However, the result interpretation differs because zrevrange prints elements in descending score order. For example, with the same leaderboard of 10 elements:

Build and Testing

To verify your skip list implementation, use the provided test program in test/test_skiplist.c. Run make test_skiplist or execute ./build/test/test_skiplist directly. You can extend these tests or create your own. The assertion utilities in test/debug_util.h provide helpful debugging messages when tests fail. Refer to test/debug_util.h and test/test_skiplist.c for detailed usage information.

For end-to-end testing of your leaderboard implementation, we provide sample input (test/leaderboard_in.txt) and expected output (test/leaderboard_ref.txt). Compare your program's output against the reference to confirm correct behavior.

Tips

Submission and Grading

Good luck with coding!