Due Tuesday, July 18, at 10pm
In this lab you’ll code a set of data structures that will support the needs of the Tiny Search Engine (TSE).
- General notes
- What to hand in, and how
In this lab you will develop some general-purpose data structures that, with modular design, can be re-used for other labs - most notably, the Tiny Search Engine.
Grading will focus on CS50 coding style - including consistent formatting, selection of identifier names, and use of meaningful comments - in addition to correctness and testing.
Your C code must compile without producing any compiler warnings. You will lose points if the compiler produces warnings when using our CS50-standard compiler flags.
If your submitted code fails to compile, or triggers a segmentation fault, we’ll notify you and give you an opportunity to repair and resubmit. (See programs that crash.) Write defensive code: each function should check its pointer parameters for NULL, and take some appropriate (safe) action.
Obtain our starter kit, which implements the bag module and includes the readlinep and memory module, to add three new modules, each of which defines a different data structure.
- (30 points) set
- (30 points) counters
- (40 points) hashtable
You can copy the starter kit to your own work site,
mkdir -p ~/cs50/labs/lab3 cp -r ~cs50/public_html/Labs/Lab3/starter/* ~/cs50/labs/lab3/
or to your Mac using
scp -r username@flume:~cs50/public_html/Labs/Lab3/starter lab3
The specific data structures are defined in the sections below.
In the table below, we compare these data structures with the two we explored in class. Most of these data structures allow you to store a collection of “items”. Both the set and hashtable are examples of an abstract data structure called a dictionary, which provide methods like insert(key, item) and item = retrieve(key); the key allows the structure to distinguish among the items it stores.
|stores an item||yes||yes||yes||no||yes|
|uses a key||no||no||yes||yes||yes|
|keeps items in order||yes||no||no||no||no|
|retrieval||first item||any item||by key||by key||by key|
|insertion of duplicates||allowed||allowed||error||increment count||error|
- a list keeps items in order, but a bag or a set does not.
- a set and hashtable allow you to retrieve a specific item (indicated by its key) whereas a bag might return any item.
- because the bag and list don’t distinguish among items they store, they can hold duplicates; the others cannot.
- the counters data structure maintains a set of counters, each identified by a key, but it stores no items. Instead, it keeps a counter for each key; inserting a duplicate key increments the counter.
A bag is an unordered collection of (items). The bag starts empty, grows as the caller adds one item at a time, and shrinks as the caller extracts one item at a time. It could be empty, or could contain hundreds of items. Items are indistinguishable, so the extract function is free to return any item from the bag.
The starter kit includes our bag module;
bag.c implements a bag of
void*, and exports exactly the following functions through
/* Create a new (empty) bag; return NULL if error. */ bag_t *bag_new(void); /* Add new item to the bag; a NULL bag is ignored; a NULL item is ignored. */ void bag_insert(bag_t *bag, void *item); /* Return any data item from the bag; return NULL if bag is NULL or empty. */ void *bag_extract(bag_t *bag); /* Print the whole bag; provide the output file and func to print each item. * If fp==NULL; do nothing. If bag==NULL, print (null). * If itemprint==NULL, print nothing for each item. */ void bag_print(bag_t *bag, FILE *fp, void (*itemprint)(FILE *fp, void *item)); /* Iterate over the whole bag; call the given function on each item, * passing both the item and an argument. Ignore if NULL bag or NULL itemfunc. */ void bag_iterate(bag_t *bag, void *arg, void (*itemfunc)(void *arg, void *item) ); /* Delete the whole bag; ignore NULL bag. * Provide a function that will delete each item (may be NULL). */ void bag_delete(bag_t *bag, void (*itemdelete)(void *item) );
A set maintains an unordered collection of (key,item) pairs; any given key can only occur in the set once. It starts out empty and grows as the caller inserts new (key,item) pairs. The caller can retrieve items by asking for their key, but cannot remove or update pairs. Items are distinguished by their key.
set.c should implement a set of
char* keys, and export exactly the following functions through
/* Create a new (empty) set; return NULL if error. */ set_t *set_new(void); /* Insert item, identified by a key (string), into the given set. * The key string is copied for use by the set. * Return false if key exists, any parameter is NULL, or error; * return true iff new item was inserted. */ bool set_insert(set_t *set, const char *key, void *item); /* Return the item associated with the given key; * return NULL if set is NULL, key is NULL, or key is not found. */ void *set_find(set_t *set, const char *key); /* Print the whole set; provide the output file and func to print each item. * Ignore if NULL fp. Print (null) if NULL set. * Print a set with no items if NULL itemprint. */ void set_print(set_t *set, FILE *fp, void (*itemprint)(FILE *fp, const char *key, void *item) ); /* Iterate over all items in the set, in undefined order. * Call the given function on each item, with (arg, key, item). * If set==NULL or itemfunc==NULL, do nothing. */ void set_iterate(set_t *set, void *arg, void (*itemfunc)(void *arg, const char *key, void *item) ); /* Delete the whole set; ignore NULL set. * Provide a function that will delete each item (may be NULL). */ void set_delete(set_t *set, void (*itemdelete)(void *item) );
A counter set is a set of counters, each distinguished by an integer key.
It’s a set - each key can only occur once in the set - but instead of storing (key,item) pairs, it tracks a counter for each key.
It starts empty.
counters_add is called on a given key, that key’s counter is incremented.
The current counter value can be retrieved by asking for the relevant key.
counters.c should implement a set of integer counters with
int keys (where keys must be non-negative) and export exactly the following functions through
/* Create a new (empty) counter structure; return NULL if error. */ counters_t *counters_new(void); /* Increment the counter indicated by key; key must be >= 0. * If the key does not yet exist, create a counter for it and initialize to 1. * Ignore if ctrs is NULL or key is negative. * Return the new value of the counter related to the inserted key */ int counters_add(counters_t *ctrs, const int key); /* Return current value of counter associated with the given key; * return 0 if ctrs is NULL or if key is not found. */ int counters_get(counters_t *ctrs, const int key); /* Set the current value of counter associated with the given key; * If the key does not yet exist, create a counter for it and initialize to * the given value. Ignore if ctrs is NULL, if key < 0, or count < 0. */ void counters_set(counters_t *ctrs, const int key, int count); /* Print all counters; provide the output file. * Ignore if NULL fp. Print (null) if NULL ctrs. */ void counters_print(counters_t *ctrs, FILE *fp); /* Iterate over all counters in the set (in undefined order): * call itemfunc for each item, with (arg, key, count). * If ctrs==NULL or itemfunc==NULL, do nothing. */ void counters_iterate(counters_t *ctrs, void *arg, void (*itemfunc)(void *arg, const int key, int count)); /* Delete the whole counters. ignore NULL ctrs. */ void counters_delete(counters_t *ctrs);
A hashtable is a set of (key,item) pairs. It acts just like a set, but is far more efficient for large collections.
hashtable.c should implement a set of
char* keys, and export exactly the following functions through
/* Create a new (empty) hashtable; return NULL if error. */ hashtable_t *hashtable_new(const int num_slots); /* Insert item, identified by key (string), into the given hashtable. * The key string is copied for use by the hashtable. * Return false if key exists in ht, any parameter is NULL, or error; * return true iff new item was inserted. */ bool hashtable_insert(hashtable_t *ht, const char *key, void *item); /* Return the item associated with the given key; * return NULL if hashtable is NULL, key is NULL, key is not found. */ void *hashtable_find(hashtable_t *ht, const char *key); /* Print the whole table; provide the output file and func to print each item. * Ignore if NULL fp. Print (null) if NULL ht. * Print a table with no items if NULL itemprint. */ void hashtable_print(hashtable_t *ht, FILE *fp, void (*itemprint)(FILE *fp, const char *key, void *item)); /* Iterate over all items in the table; in undefined order. * Call the given function on each item, with (arg, key, item). * If ht==NULL or itemfunc==NULL, do nothing. */ void hashtable_iterate(hashtable_t *ht, void *arg, void (*itemfunc)(void *arg, const char *key, void *item) ); /* Delete the whole hashtable; ignore NULL ht. * Provide a function that will delete each item (may be NULL). */ void hashtable_delete(hashtable_t *ht, void (*itemdelete)(void *item) );
The starter kit provides code for the hash function.
- Your modules must implement exactly the above interface. Do not modify those function prototypes.
- If you need helper functions or data types (likely
structtypes), those should be defined within your module.c and marked
staticso they are not visible to users of the module.
- Your modules must print nothing (except, of course, in the
xxx_print()method). If you want to add debugging printfs, they must be protected by something like
#ifdef TEST. (You can see some examples in
bag.cwhere I’ve protected some debugging code with
#ifdef MEMTEST, and a spot in the
bag/Makefilethat controls that flag from the compiler command line.)
- Your modules must have no global variables.
- Your modules must have no
main()function; as modules, they are meant to be used by other programs. Recall how the module defined by
bag.his used by a test program
- Your modules store
void*items; this type is C’s way of describing a “pointer to anything”. The caller (user of your module) must pass a pointer (address of some item) to you; your data structure holds that pointer, and later returns it to the caller in response to an ‘extract’ or ‘find’ operation. Your module doesn’t know, or doesn’t care, what kind of things are these items. Your module doesn’t allocate memory for items, free memory for items, or copy items - it just tracks the pointer to the item.
- For all modules, the caller is responsible for the item pointer, which must be allocated (somehow) by the caller.
_deletefunction (like a destructor) allows the caller to provide a custom
itemdeletefunction that knows how to free any memory consumed by an item.
- For this reason, the caller must avoid inserting the same item (same address) multiple times; later, the
itemdeletemethod would be called multiple times on that item… which could lead to trouble.
- Both set and hashtable work with string-type keys.
When adding a new item with set_insert or hashtable_insert, both modules make their own copy of the string - presumably in memory allocated by
malloc(). (The module is then responsible for this memory - and later freeing it - just like any other memory it allocates.) This ‘copy’ semantics is convenient for the caller, who need not worry about how to allocate and manage the key string after inserting it into the set or hashtable.
- You may assume that a non-NULL
keyis a proper C string; that is, it is null-terminated.
- You may use the memory module - or use the native
valgrind- or not. But we’ll be checking your code for memory leaks.
- Your modules must each have a
Makefileto compile and test the module code.
- Your directory for module
xmust, therefore, have an
You are encouraged to follow the style and layout of the bag module when developing new modules.
You can also learn a lot from our binary trees example. You are welcome to copy snippets of code from this (or any other) CS50 example code as long as you add a comment indicating you’ve done so.
We suggest implementing the set and counters as simplified linked lists, much like we did for bag. Each should be a separate implementation because they differ in detail and semantics.
Your hashtable module, on the other hand, should make use of the set data structure. Indeed, your hashtable will be an array of pointers to sets. Allocating an array of pointers can be tricky; consider these tips.
What to hand in, and how
Make sure to compile and test your solutions on one of the CS Linux systems before submission. If you choose to develop your solutions on some other system, you are responsible to ensure that the work you submit runs correctly on a CS system — which is where where we will test it!
Indeed, you must place your solutions in your
cs50/labs/lab3 directory on the department Linux servers.
(If you worked on your laptop, use
scp to copy files from your laptop to the server.)
In addition to your code, each of the four subdirectories of
lab3 must include two simple text files:
README, which describes how your program should be compiled and executed, along with an explanation of any assumptions you made about the assignment. See the README.md for the bag module.
TESTING, which describes how you tested the program, including any test inputs, test programs, and test scripts; these test files should be included in your submission. See the TESTING.md for the bag module in the starter kit.
These examples are written in Markdown format; hence the filename extension
.md. You’ll need to use Markdown in later labs, so now’s a great time to learn. In this lab, the use of Markdown is optional.
Think of your audience for each file:
READMEfile is written for the user of your module. For example, bag/README.md refers to the interface (
bag.h) and describe any assumptions, implementation details, or compilation instructions (basically,
TESTINGfile is written for your grader. For example, bag/TESTING.md describes how we tested
bag.c, by referring to
bagtest.cand including the results of a test run in
bagtest.out. If your test requires some input files, include those data files in your submission and refer to them in your
TESTINGfile. (In the solution, the Makefile generates the necessary input files by downloading them from the web.)
lab3 directory must contain the following:
- Five subdirectories:
commondirectory with contents inherited from our starter kit
README.md) explains any assumptions and acknowledges any limitations.
TESTING.md) shows how you tested the module.
- Include any special input files you used for testing.
Submitting your lab
Before the deadline, run
Make sure it confirms success.
If you need to update your submission, run the same command again; it will overwrite the prior submission with the current contents of your
If you wish to use one of your 24-hour extensions, run this command before the deadline:
~cs50/labs/submit 3 extension
This command deletes any submission you may have made previously, and leaves a single file “extension” there as an indication you are requesting an extension. When you are later ready to submit your work, do so as above:
This overwrites any prior submission - even the extension request.
(Thus, if you submit before the deadline, you effectively ‘cancel’ your request for an extension.) We will grade the first submission present at 0h, 24h, 48h, or 72h after the original deadline.
To avoid confusion, please blitz
cs50 AT cs.dartmouth.edu if you want a late submission to be graded instead of your on-time submission.