TechUser.Net

Digital Implementation of the Library of Babel

In the well-known story "The Library of Babel" author Jorge Luis Borges envisions a library that contains all possible books of 410 pages. The library (Library of Babel) contains all books that have ever been written and also includes all possible books that can ever be written. The library is not limited to books that make sense. It stores all books consisting of any combination of a 25 symbol alphabet.

On first thought it seems that a digital implementation of the Library of Babel is impossible as it contains almost unimaginable amount of content. Surprisingly, this is not the case and digital implementations of such a library can exist.

In the digital Library of Babel every book is stored in ascii. Ascii allows more symbols than the 25 used by the Library of Babel, but is more convenient to use and avoids the effort of defining a new characterset.

A digital Library of Babel needs to store all the books in a data-structure and provide an interface for people to search for a particular book. To locate a particular book in the Library of Babel, searches based on titles, author names, and ISBN numbers cannot be used.

Titles don't provide any useful information about the books in the library. The library contains gazillions of books with the same title but different contents. Exactly the same argument applies to author names as every author has written every possible book.

ISBN numbers in order to be practical must identify a book uniquely but not contribute to the book's uniqueness. Otherwise, the library will contain many copies of every book which differ only in ISBN numbers.

Even with this restriction ISBN numbers are useless in the Library of Babel. The ISBN number identifying a book (on average) has to be as big as the contents of the book. ISBN numbers are practical only if the ratio of books that exist to books that are possible is very small. Suppose the world has only two books, written with symbols a and b, and having the contents:

abab,baaa

It is easy to identify the first book with the tag a and the second book with the tag b. The Library of Babel is not so sparsely populated. It contains all possible books which includes the following list:

a,b,c..,aa,ab,ac..,ba,bb,bc..

It is possible to identify the books consisting of abab and baaa with a and b respectively, but then the books with contents a and b have to be identified with a larger tag than their length. On average an identification tag as big as the book itself will be required.

Without loss of generality it can be assumed that books in the Library of Babel don't have titles, author names, ISBN numbers, and any other identifying tags. Such tags as noted above don't contribute anything towards easing searching for books and are only a waste of space.

The only choice for uniquely identifying the books in Library of Babel is the text of the books. Specifying only partial text does not identify books uniquely. The library is too densely populated and even leaving one bit out of the full-text results in non-unique matches. Therefore, the full-text of a book is required in order to locate a particular book.

Unlike the original library the digital version does not impose any restrictions on the length of the books contained in it. The digital Library of Babel contains the books as raw text. It does not store any formatting for the books. The users are free to format the books as they wish after they have retrieved them.

The following data-structure given as Haskell code can be used to implement the library:

data Tree = Branch (Tree,Tree)
treeOfBabel = Branch (treeOfBabel,treeOfBabel)

The implementation uses a circular data-structure, treeOfBabel, to store all the data in the library using O(1) storage space. Searching for a book involves traversing the correct path in the tree. At each branch in the tree the first tree corresponds to a 0 bit and the second tree corresponds to a 1 bit. Given a search string, the search function navigates the tree using the bits of the search string. It accumulates the bits of the result while navigating the tree and after processing the search string it returns the accumulated bits as the result of the search. These bits can then be interpreted as ascii text.

The following implementation of the search function uses a slight optimization:

searchBabel = id -- id is Haskell's identity function (id x = x)

The optimization is based on the fact that the user is always going to provide the full-text of the book so the input can be returned to the user as an answer to his/her query. This optimization does not work for ordinary libraries as they might or might not contain a book but the Library of Babel is assured to have any book.

The library of Babel exists, there can be no doubt about it. The big question that needs to be answered is whether the library of Babel is full of information or is it devoid of any information? The perspective of a philosopher suggests that the library is as full as it can be. On the other hand intuition suggests that the library is empty, it carries no useful information. Is our intuition right or wrong?

I will examine this issue in detail, using insights from information theory, in another article.




by Usman Latif  [Nov 08, 2003]