From scratch Heap file and Hash index #
Originally written for some undergraduate DB systems class I took.
This is still a bit interesting to me as it’s a from-scratch implementation of part of a DB and I think it would be a good exercise to revisit this code and fix the bugs that are definitely in there somewhere.
Will be funny to see how bad the Java code is too.
Usage #
The dataset used in this project is available at https://data.gov.au/data/dataset/asic-business-names but I’ve also copied a subset of that dataset into ./data/
dbload
#
Implementing a heap file in Java. Load a database relation (.tsv) and write a heap file.
bazel run //databases/fromscratch_heapfile_and_hashindex:dbload -- \
-p 4096 \
"$(git rev-parse --show-toplevel)/databases/fromscratch_heapfile_and_hashindex/data/TRUNCATED_DATASET_WO_HEADER.csv"
hashload
#
A hash indexer that uses the heap file to build an index, hash.<pagesize>
.
dbquery
#
Perform a text search using the heap file, with or without an index.
Build #
bazel build //databases/fromscratch_heapfile_and_hashindex/...
Tests #
… I guess tests weren’t part of the assignment.