Matt Mahoney's Home Page

Dissertation proposal: The Complexity of Natural Language (compressed PostScript)

Text Compression as a Test for Artificial Intelligence

Abstract: The Turing test for artificial intelligence is widely accepted, but is subjective, qualitative, non-repeatable, and difficult to implement. An alternative test without these drawbacks is to insert a machine’s language model into a predictive encoder and compress a corpus of natural language text. A ratio of 1.3 bits per character or less indicates that the machine has AI. Three pieces of evidence support this claim. First, text compression is shown to be more stringent than the Turing test under reasonable assumptions. Second, humans use high-level knowledge in character prediction tests. Third, compression, like AI, is unsolved: under conditions in which human text-prediction tests show an entropy of 1.3 bits per character or less, the best compression algorithm known achieves 1.87 bits per character.

Full text: PostScript RTF (Word 6.0) This paper is still in progress. Last update 10/20/98

Everything else is on my other home page

matmahoney@aol.com