Homework 3
Computer Architecture I @ ShanghaiTech University
Overview
In this homework, you will implement an efficient leaderboard system composed of two complementary data structures:
- Skip List: Provides efficient sorted access by score and rank lookups
- Hash Map: Provides fast lookups by member name
Before starting, accept the assignment from GitHub Classroom.
The Skip List maintains a mapping from score to member (allowing for rank-based operations), while the Hash Map maintains a mapping from member to score (enabling quick score lookups). Together, they provide optimal performance for all leaderboard operations.
The Hash Map (Dict
) implementation (see src/dict.h
) is provided for you; you should refer to its declaration for details.
Your task is to complete the interfaces of Skip List (see src/skiplist.h
and src/skiplist.c
), and use interfaces of Skip List and Hash Map to support Leaderboard operations.
Task 1: Implement Skip List
In this part, you only need to modify src/skiplist.h
and src/skiplist.c
.
A Skip List is a probabilistic data structure that uses multiple layers of linked lists to "skip" elements during lookups. This provides average time complexity similar to balanced trees but with simpler implementation. In our leaderboard, the Skip List orders all members by their scores, enabling efficient rank queries and range operations.
Skip List Structure and Visualization
A Skip List can be visualized as multiple linked lists stacked on top of each other, with the bottom level containing all elements and higher levels providing "express lanes" for faster traversal:
Level 3: Header -----------> C -----------> Nil
Level 2: Header -> A ------> C -----------> Nil
Level 1: Header -> A -> B -> C -> D -> E -> Nil
Scores: 60 75 90 100 120
In this example:
- All elements exist at Level 1 (the bottom level)
- Elements A and C have references at Level 2
- Only element C has a reference at Level 3
- To search for element D (score 100), we would:
- Start at Header at Level 3
- Move to C at Level 3 (90 < 100, skipping over A and B)
- Move down to Level 2 (C has no successor at Level 2)
- Move down to Level 1, and find D by linear traversal
The Skip List's efficiency depends on its probabilistic level structure. New nodes receive a random level between 1 and MAX_LEVEL
based on a probability distribution where:
- All nodes have Level 1
- Each node has a 50% chance of also having Level 2
- Each node has a 25% chance of also having Level 3
- And so on...
This creates a structure where approximately:
- All nodes appear at Level 1
- 1/2 of the nodes appear at Level 2
- 1/4 of the nodes appear at Level 3
- 1/8 of the nodes appear at Level 4
- And so on...
This multi-level structure enables faster average-case search time than linked lists by skipping many elements in higher levels.
You should refer to William Pugh's original paper Skip Lists: A Probabilistic Alternative to Balanced Trees to gain a thorough understanding of Skip Lists. This paper contains implementation details and performance analysis, explaining why a randomized structure can achieve predictably good performance. The pseudocode may help you clearly implement some interfaces. Details may differ, so please follow our documentation in the code.
Open Data Structures (in Java) (pages 99-115) also provides detailed and vivid explanations of Skip Lists with helpful illustrations that clarify the structure maintenance operations.
Data Structure
typedef struct SkipListNode {
char *member; // Member name
int score; // Member score
struct SkipListNode **forward; // Array of forward pointers
int level; // Node level, range [1,MAX_LEVEL]
} SkipListNode;
typedef struct SkipList {
SkipListNode *header; // Sentinel node as entry point
int level; // Current max level, range [1,MAX_LEVEL]
int length; // Number of nodes (excluding header)
} SkipList;
Functions to Implement
The following functions can be used in your leaderboard implementation. It is recommended to implement them in the following order (see src/skiplist.c
for details):
static SkipListNode *sl_node_create(int level, const char *member, int score)
: Create a new nodestatic void sl_node_free(SkipListNode *node)
: Free a nodestatic int rand_level(void)
: Generate a random level for a new nodeSkipList *sl_create(void)
: Create a new empty skip listvoid sl_free(SkipList *sl)
: Free all resources used by the skip listint sl_get_length(SkipList *sl)
: Return the number of elements in the skip listSkipListNode *sl_get_by_score(SkipList *sl, int score)
: Find a node by its scoreint _sl_insert(SkipList *sl, const char *member, int score, int level)
: Insert a new node with given level, which is a helper function ofsl_insert()
and used for test. In leaderboard, you should callsl_insert()
to insert nodes into underlying skip list rather than_sl_insert()
int sl_remove(SkipList *sl, int score)
: Remove a node with the given scoreSkipListNode *sl_get_by_rank(SkipList *sl, int rank)
: Find a node by its rankint sl_get_rank_by_score(SkipList *sl, int score)
: Get the rank of a node with the given scoreSkipListNode **sl_get_range_by_rank(SkipList *sl, int start, int end, int *count)
: Get nodes by rank rangeSkipListNode **sl_get_range_by_score(SkipList *sl, int min_score, int max_score, int *count)
: Get nodes by score range
For detailed requirements of each function, please refer to the comments in src/skiplist.c
. Here are some general guidelines for implementing key Skip List operations:
When finding a node by score, use the Skip List's multi-level structure for efficiency. Start from the highest level and work downward, moving forward at each level as far as possible without exceeding the target score. This allows you to "skip" many nodes during the search process.
For rank-based operations, you'll need to traverse the list from the beginning at level 1 and count nodes. For simplicity, our base implementation doesn't include a mechanism to track the number of skipped nodes, which means rank-based operations require linear-time-complexity traversal. Interested students can implement a span
field at SkipList
to track the number of nodes skipped at each level and start traversing from the highest level as searching by score, significantly accelerating rank-based queries.
When inserting nodes, carefully manage the multi-level structure by finding the right position, generating an appropriate random level for the new node, updating all forward pointers and adjust Skip List's level if needed. Similarly, when removing nodes, you'll need to update forward pointers to maintain the Skip List structure, adjust its level if needed and free any allocated resources.
Duplicate Insertion Handling
A key constraint in our Skip List implementation is that scores must be unique. As specified in skiplist.c:sl_insert()
, insertion fails if a node with the same score already exists. This ensures that each score maps to exactly one member, which simplifies rank calculations and range queries.
Note that duplicate member checking is time-consuming in the Skip List and should be handled at a higher level using the dictionary (Hash Map). The leaderboard implementation should handle this by checking and updating members in the Hash Map before modifying the Skip List. So you can ignore this case when implementing Skip List.
Rank & Reverse Rank
Rank represents a member's position when ordered by score in ascending order. The member with the lowest score has rank 1, while the member with the highest score has rank N (where N is the total number of members).
Reverse Rank represents a member's position when ordered by score in descending order. The member with the highest score has reverse rank 1, while the member with the lowest score has reverse rank N.
Consider this example of members and their scores:
Member | Score | Rank | Reverse Rank |
---|---|---|---|
A | 70 | 1 | 4 |
B | 85 | 2 | 3 |
C | 100 | 3 | 2 |
D | 120 | 4 | 1 |
Task 2: Implement the Leaderboard
In this part, you only need to modify src/leaderboard.c
.
The leaderboard combines a skip list and hash map to provide a comprehensive ranking system. You can start the leaderboard CLI (command-line interface) by make run
or make && ./build/bin/leaderboard_cli
. Type "help" or "HELP" to see supported instructions and formats (or see main.c:print_help()
).
Usage Example
Here's a complete example session with the leaderboard (messages following "//" are comments and are not actual outputs):
================================
Leaderboard
Type HELP for available commands
> ZADD Alice 100
(integer) 1 // Add successfully
> ZADD Bob 85
(integer) 1
> ZADD Charlie 120
(integer) 1
> ZADD Dave 75
(integer) 1
> ZCARD
(integer) 4 // 4 members
> ZSCORE Alice
(integer) 100 // Alice's score: 100
> ZSCORE Eve
(nil) // Eve is not in the leaderboard
> ZRANK Dave
(integer) 1 // Dave has the lowest score, so rank 1
> ZRANK Bob
(integer) 2 // Bob has the second lowest score, so rank 2
> ZRANK Alice
(integer) 3
> ZRANK Charlie
(integer) 4
> ZREVRANK Charlie
(integer) 1 // Charlie has the highest score, so reverse rank 1
> ZREVRANK Alice
(integer) 2
> ZREVRANK Bob
(integer) 3
> ZREVRANK Dave
(integer) 4
> ZRANGE 1 2
(array) 2 item(s) // Members with rank 1-2 (lowest scores)
#1: Dave (75)
#2: Bob (85)
> ZREVRANGE 1 3
(array) 3 item(s) // Members with reverse rank 1-3 (highest scores)
#1: Charlie (120)
#2: Alice (100)
#3: Bob (85)
> ZRANGEBYSCORE 80 110
(array) 2 item(s) // Members with scores between 80-110
#1: Bob (85)
#2: Alice (100)
> ZADD Alice 95
(integer) 1 // Update successfully
> ZSCORE Alice
(integer) 95
> ZREM Bob
(integer) 1 // Remove Bob from leaderboard successfully
> ZSCORE Bob
(nil) // Bob is now not in the leaderboard
> ZCARD
(integer) 3
> ZRANGE 2 1
(empty array) // Invalid range
> Program exited!
The Hash Map (Dict) Component
The Hash Map provides O(1) lookups by member name, complementing the Skip List's efficient score-based operations. The implementation is provided for you in src/dict.h
and src/dict.c
. Here are the key operations you'll use:
DictEntry *dict_get(Dict *dict, const char *key)
: Find an entry by keyint dict_insert(Dict *dict, const char *key, int val)
: Insert a new key-value pair or update an existing keyint dict_remove(Dict *dict, const char *key)
: Remove an entry by key
Time Complexity Comparison
This time complexity comparison illustrates why combining a Skip List (for score-based operations) with a Hash Map (for member-based operations) creates an efficient leaderboard implementation.
Operation | Linked List | Hash Map | Skip List (avg) | Skip List (worst) |
---|---|---|---|---|
Search by member | O(n) | O(1) | O(n) | O(n) |
Search by score | O(n) | O(n) | O(log n) | O(n) |
Insert | O(n) | O(1) | O(log n) | O(n) |
Delete | O(n) | O(1) | O(log n) | O(n) |
Find by rank | O(n) | O(n) | O(n) or O(log n)* | O(n) |
Range query | O(n) | O(n) | O(log n + k) | O(n) |
*O(log n) if implemented with span
field optimization
Where
- n is the total number of elements in the data structure
- k is the number of elements in the result set (for range queries)
Functions to Implement
All these functions should use print_int()
, print_nil()
, or print_array()
(see src/leaderboard.c
) to display their results.
void zadd(Leaderboard *lb, const char *member, int score)
: Print 1 if member added/updated, 0 otherwisevoid zrem(Leaderboard *lb, const char *member)
: Print 1 if member removed, 0 otherwisevoid zcard(Leaderboard *lb)
: Print the number of membersvoid zscore(Leaderboard *lb, const char *member)
: Print the member's score or NIL if not foundvoid zrank(Leaderboard *lb, const char *member, int reverse)
: Print the member's rank or NIL if not foundvoid zrevrank(Leaderboard *lb, const char *member)
: Print the member's reverse rank or NIL if not foundvoid zrange(Leaderboard *lb, int start, int end)
: Print members in the given rank rangevoid zrevrange(Leaderboard *lb, int start, int end)
: Print members in the given reverse rank rangevoid zrangebyscore(Leaderboard *lb, int min_score, int max_score)
: Print members in the given score range
Duplicate Insertion Handling
In Leaderboard, zadd()
is responsible for adding or updating members. When zadd()
prints 1, it indicates a successful addition or update; when it prints 0, it means no change was made to the leaderboard. Unlike the Skip List, this function handles several cases:
- If the member doesn't exist in the leaderboard, it adds the member with the given score.
- If the member exists but with a different score, it updates the member's score.
- If a member with the same score already exists, it doesn't add or update anything.
Negative (Reverse) Rank Range Queries Handling
For rank-based range operation (zrange
), the start
and end
parameters can be negative to indicate positions relative to the end of the list, similar to array slicing in many programming languages:
- rank -1 refers to the last element in the leaderboard's skip list
- rank -2 refers to the second-to-last element
- And so on...
To convert negative rank to positive ones:
if (rank < 0)
rank = length + rank + 1;
A range query is considered valid if:
- After conversion, 1 <=
start
<=end
<=length
For example, with a leaderboard of 10 elements:
zrange 1 3
: Valid, returns elements at positions 1, 2, and 3zrange -3 -1
: Valid, returns the last three elements (positions 8, 9, 10)zrange 3 2
: Invalid,start
>end
zrange -15 -13
: Invalid if converted indices are out of bounds (neither convertedstart
, -4, nor convertedend
, -2, in [1, 10])
For reverse rank-based range operation (zrevrange
), the start and end parameters can also be negative, but their interpretation is relative to the reversed order of the list:
- reverse rank -1 refers to the lowest-scoring element in the leaderboard
- reverse rank -2 refers to the second-lowest-scoring element
- And so on...
The negative index conversion works the same way mathematically:
if (reverse_rank < 0)
reverse_rank = length + reverse_rank + 1;
Similarly, a reverse range query is considered valid if:
- After conversion, 1 <=
start
<=end
<=length
However, the result interpretation differs because zrevrange
prints elements in descending score order. For example, with the same leaderboard of 10 elements:
zrevrange 1 3
: Valid, returns elements with the 3 highest scores (reverse ranks 1, 2, 3, ranks 10, 9, 8)zrevrange -3 -1
: Valid, returns the 3 lowest-scoring elements (reverse ranks 8, 9, 10, ranks 3, 2, 1)zrevrange 3 2
: Invalid,start
>end
zrevrange -15 -13
: Invalid if converted indices are out of bounds
Build and Testing
- Run
make
to build your leaderboard CLI as./build/bin/leaderboard_cli
. - Run
make run
or./build/bin/leaderboard_cli
to start the leaderboard CLI. - Run
make valgrind
orvalgrind --tool=memcheck --leak-check=full --track-origins=yes ./build/bin/leaderboard_cli
to check for memory leaks and other issues using Valgrind's memory checker. - Use input/output redirection to test with files:
./build/bin/leaderboard_cli <input_file >output_file
. For example,./build/bin/leaderboard_cli <in.txt >out.txt
processes commands fromin.txt
and writes results toout.txt
.
To verify your skip list implementation, use the provided test program in test/test_skiplist.c
. Run make test_skiplist
or execute ./build/test/test_skiplist
directly. You can extend these tests or create your own. The assertion utilities in test/debug_util.h
provide helpful debugging messages when tests fail. Refer to test/debug_util.h
and test/test_skiplist.c
for detailed usage information.
For end-to-end testing of your leaderboard implementation, we provide sample input (test/leaderboard_in.txt
) and expected output (test/leaderboard_ref.txt
). Compare your program's output against the reference to confirm correct behavior.
Tips
- Carefully understand the Skip List's working principles, especially the multi-level pointer structure.
- Follow the documentation in code to ensure correct implementation.
- Handle different parameter scenarios properly.
- Be careful with memory allocation and deallocation to avoid memory leaks and program crashes.
- Test incrementally -- implement basic functionality before adding advanced features.
Submission and Grading
- Submit your GitHub repository with the right branch on Gradescope.
- Only
src/skiplist.h
,src/skiplist.c
andsrc/leaderboard.c
will be used for grading. - Only the last active submission will be accounted for your grade.
- Due: 2025/04/01, 23:59
Good luck with coding!