The `vm.c' module implements a simple virtual memory manager. Blocks of storage are named by integers, and swapped to and from disk transparently to the rest of the program. Pragmatically, the module is tuned to the assumption that the set of objects being managed consists of many (tens of thousands to millions) small (tens of bytes) objects, with a working set which is small (a few percent) relative to the size of virtual memory, and reasonably static.
Design decisions driven by the above pragmatics include:
hashtable bigbuf
------------ -------------
| | | unused |
| | |-----------|
| | | 0 | <------ (nullBlock)
| | | 0 | |
| | |-----------| |
| | | | |
| ... | | ... | |
| | | | |
|----------| |-----------| |
| o-----------> | o | |
|----------| | next o--------- |
| | | user data | | |
| ... | |-----------| | |
| | | | | |
| | | ... | | |
| | | | | |
| | |-----------| | |
| | | o | <--- |
| next o--------- |
| user data | | |
|-----------| | |
| | | |
| ... | | |
| | | |
|-----------| | |
| o | <--- |
| next o------------
| user data |
|-----------|
| |
| ... |
| |
This gives us a fixed overhead of sixteen bytes per in-memory
object. (The hashtable vector itself requires four bytes
per hashtable chain, of course, which must be amortized over
the objects on the chain. To keep performance up, we want
to keep the chains short, so this should likely be counted
as an additional 2-4 or so bytes of overhead per
ram-object.)
Recall that the bottom five bits of Vm_Obj pointers are
free for use. This means in particular that the bottom
five bits of the 'o' field in our bigbufBlocks are free for
use. We always set the low bit to '1', for reasons which
will become clear in a paragraph. We use the next bit as a
DIRTY flag recording which objects need to be written to
disk. (The remaining three bits are available to users via
vm_Get_Userbits(o) and vm_Set_Userbits(o,bits)... at the moment
actually only two are made available, this needs fixing.)
Since we don't really expect to use a 4Gbyte bigbuf any time
soon, we can likewise steal a few bits from the bottom of
our bigbufBlock->next field. We steal six bits (thus
limiting ourself to a bigbuf of no more than 64Mbytes ---
remember that bigbufBlocks are int-aligned) and use them to
hold size-in-bytes information for data blocks up through
the 33-64 byte octave. Beyond this octave, we prepend a
64-bit length field to our bigbufBlock, containing
fieldlength << 8. (Since this length field always has the
low bit set to 0, and our 'o' fields always have the low bit
set to 1, we can still unambiguously step through bigbuf
when compacting.)
This headerblock scheme seems nearly optimal: Clearly any
hashtable design will require a key field and a 'next' field
for each object, plus a root array to anchor and index the
chains, so we can hardly reduce our overhead much. Mallocs
tend to have a similar amount of overhead, which also
suggests we're near the practical minimum.
This design is, I hope, fairly immune to pathologically bad behavior, which immunity is more important to me than incrementally better average- or best-case behavior. The design adopted also appears to have near-minimal impact on the design and implementation of the rest of the system.
By the way: `vm.c' should work as a pretty decent compacting memory manager even if you never need to page anything to disk...
Go to the first, previous, next, last section, table of contents.