NIST

Zipf's law

(definition)

Definition: The probability of occurrence of words or other items starts high and tapers off. Thus, a few occur very often while many others occur rarely.

Formal Definition: Pn similar to 1/na, where Pn is the frequency of occurrence of the nth ranked item and a is close to 1.

See also Zipfian distribution, Lotka's law, Benford's law, Bradford's law.

Note: In the English language words like "and," "the," "to," and "of" occur often while words like "undeniable" are rare. This law applies to words in human or computer languages, operating system calls, colors in images, etc., and is the basis of many (if not, all!) compression approaches.

Named for George Kingsley Zipf.

Author: PEB

More information

A brief explanation of the law with links to dozens of papers. Two graphs illustrating the law.


Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul E. Black.

Entry modified 24 April 2006.
HTML page formatted Mon Sep 11 09:46:09 2006.

Cite this as:
Paul E. Black, "Zipf's law", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed., U.S. National Institute of Standards and Technology. 24 April 2006. (accessed TODAY) Available from: http://www.nist.gov/dads/HTML/zipfslaw.html

to NIST home page