A lossless data compression algorithm which uses a small number of bits to encode common characters. Huffman coding approximates the probability for each character as a power of 1/2 to avoid complications associated with using a nonintegral number of bits to encode characters using their actual probabilities.
Huffman coding works on a list of weights by building an extended binary tree with minimum weighted external path length and proceeds by finding the two smallest s, and , viewed as external nodes, and replacing them with an internal node of weight . The procedure is them repeated stepwise until the root node is reached. An individual external node can then be encoded by a binary string of 0s (for left branches) and 1s (for right branches).
The procedure is summarized below for the weights 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, and 41 given by the first 13 primes, and the resulting tree is shown above (Knuth 1997, pp. 402-403). As is clear from the diagram, the paths to the larger weights are shorter than those to the smaller weights. In this example, the number 13 would be encoded as 1010.
2 | 3 | 5 | 7 | 11 | 13 | 17 | 19 | 23 | 29 | 31 | 37 | 41 |
5 | 5 | 7 | 11 | 13 | 17 | 19 | 23 | 29 | 31 | 37 | 41 | |
10 | 7 | 11 | 13 | 17 | 19 | 23 | 29 | 31 | 37 | 41 | ||
17 | 11 | 13 | 17 | 19 | 23 | 29 | 31 | 37 | 41 | |||
17 | 24 | 17 | 19 | 23 | 29 | 31 | 37 | 41 | ||||
24 | 34 | 19 | 23 | 29 | 31 | 37 | 41 | |||||
24 | 34 | 42 | 29 | 31 | 37 | 41 | ||||||
34 | 42 | 53 | 31 | 37 | 41 | |||||||
42 | 53 | 65 | 37 | 41 | ||||||||
42 | 53 | 65 | 78 | |||||||||
95 | 65 | 78 | ||||||||||
95 | 143 | |||||||||||
238 |
The following Wolfram Language code can be used to construct the list of internal nodes and table of iterations:
HuffmanStep[l0_List] := Module[ { l = l0, s2 = Take[Select[Sort[l0], Positive], 2] }, l[[Take[Flatten[Position[l, #]& /@ s2], 2]]] = 0; l[[Last[Position[l, 0]]]] = Plus @@ s2; {l, s2} ] HuffmanList[l_List] := Module[{}, Plus @@@ Last /@ NestWhileList[ HuffmanStep[First[#]]&, HuffmanStep[l], Length[Union[First[#]]] > 2&] ] HuffmanTable[l_List] := NestWhileList[First[HuffmanStep[#]]&, l, Length[Union[#]] > 2&]