Storage and Retrieval
What should a database do? 2 things
- Take in data and store it
- Give it back to you
Good to know how database handles tuff internally for picking between them
The data structures that power your database
Most basic database as 2 hash function
db_get
would have terrible performance for large number of records in database
Need a different, aditional data structure
- Index
Hash Indexes
Approaches to key-value data
- Simple- in-memory hashmap- each key mapped to byte offset in the data file- location at which the value can be found
- But how to avoid running out of disk space?
- Break logs into segments of certian size by cosing segment file when it reaches a certain size- make subsequent writes to a new segment file
- Then perform compaction on these segments (throwing away duplicate keuys) Details go into making thi practically work
- File format- Binary better than CSV
- Deleting record- append specal deletion record to data file
- Carsh recovery- if databse is restarted
- Partially written records
- Concurrency cnotrol
STables and LSM-Trees
Simple change- sort sequence by key
- Sorted String Table Advantages over log segments with hash indexes
- Merging segments is simple + efficient (is like the mergesort algorithm)
- To find key in file- no longer need index of all the keys
- Can use the fact its sorted to your advantage
- Since read requests need to scan over several key-value pairs in range anyways, can group the records into a block and compress it before writing to disk
Constructing and maintaining SStables
- how to get data sorted by key in the first lace? Writes can occur in any order Storage engine works as follows
- Write comes in and gets added to in-memory balanced tree data stucture (e.g. red-black ree)- this is called a memtable
- When memtable size exceeds some threshold,