Go to the first, previous, next, last section, table of contents.


Procedural Data Compression

Ok, I admit this is blue-sky research, but it happens to be close to my heart:

The ultimate form of data compression is when you compile the dataset into a program which, when executed, produces the original dataset.

It is important not because it is an immediately practical way of compressing datasets for storage or transmission, but because there is a deep connection between understanding a dataset and being able to compress it: Anything you genuinely know about a dataset allows you to compress it better, and conversely anything that allows you to compress a dataset better represents real knowledge of some sort about that dataset. (And contrawise, if it doesn't help you compress the dataset, it isn't knowledge, just fanciful projection of your own thoughts onto the screen of the dataset.)

This in turn is important because it puts the notion of "understanding" on a firm quantitative footing: Shannon's information theory gives us a solid theoretical basis for measuring and comparing information content. In conventional qualitative artificial intelligence hacking, whether and how much a system learns is very much a matter of subjective handwaving rather than precise calculation: "Looks smart to me!" By looking at it as a matter of data compression, one can say precisely and confidently things like "This bit of knowledge explains 3.716% of the given dataset".

This in turn means that given a precise, reliable notion of "understanding", automated searches for additional understanding become much more feasible than when all evaluation must be done by grad students waving their hands at each other.

Physics didn't get properly started until the qualitative medieval methods ("Objects fall faster as they approach the earth because they are glad to return home!") were replaced by Galileo's and then Newton's quantitative methods (d==vt + 0.5*at**2). The field of AI won't get properly started until it adopts appropriate quantitative methods -- and Shannon's information theory will be to AI what Cartesian coordinates and algebra were to physics.

I'd love to explore this further, and Muq, as a highly programmable persistent db of complex datasets, will I think be a productive environment to do so, once it hits its stride.

There are also a whole slew of technically fascinating programming issues involved in all this! For example, how does one best work with information measured in fractions of a bit? Conventional programming just shrugs and sloppily rounds off to some integral number of bits, but that won't cut it here. Arithmetic encoding provides one way of efficiently storing fractional-bit information, but it is far from constituting a complete fractional-bit programming methodology. One quickly winds up rediscovering the virtues of pattern-directed subroutine invocation (shades of the 50s!!) since explicit invocation is often prohibitively expensive in the context of procedural data compression and fractional-bit programming. (4)


Go to the first, previous, next, last section, table of contents.