BBC Home

Explore the BBC

h2g2
24th December 2009
Accessibility help
Text only

.

Conversation Forum


SEARCH h2g2
Edited Entries only
Search h2g2Advanced Search


New visitors: Create your membership
Returning members: Sign in
BBC Homepage
The Guide to Life, The Universe and Everything.

This is the Conversation Forum for MPEG Audio Layer 3 (MP3) Technical Guide
Contact Us


Like this page?
Send it to a friend!

 
Conversation list
<< King Canute
Exploding Myth >>

Huffman compression
Post: 1
Posted Nov 17, 2000 by The Cow
A similar system is used in morse code.
E, the most common letter, is "."
Q, is "--.-"



Reply 

No Previous PostNext Post
Click to Make a Complaint
The Parent Posting, to Which This is a Reply
An Older Reply to the Parent PostingThis PostingA Newer Reply to the Parent Posting
The First Reply to This Posting

Huffman compression
Post: 2
Posted Jan 6, 2001 by Ausnahmsweise, wie üblich (Consistently inconsistent)
The given example is not quite accurate. No single code can be the prefix of another, longer code.
If you wanted to compress, say, written English, you would count the frequency of each letter in the sample text and sort them into ascending order. Then you start looking for the lowest two frequencies. You're going to build a tree from the leaves to the root. Add the two lowest frequencies, create a node with the value of the sum and connect the leaves that contributed to them. Next iteration, include the new node (but not the leaves) in the search for the next two lowest. Join them with a node. Eventually everything meets at a root node. Now you walk the tree, from the root, assigning, say 0 for a left branch and 1 for a right branch. When you reach a leaf (a letter) its Hufman code is the string of 0s and 1s you chained together. If the letter E represented more then 50% of the population you would have a very unbalance tree. "E" might be a single 1. No other letter's code would begin with one because they are all down the left branch of the tree. (A picture would really help.) All of this is in the classic programmer's text by Knut.

I have used it to compress text. I used adaptive Hufman first for the letters, then I did the whole thing over again for words. That meant that I could store the most common words as a bit or two. To uncompress them I have to decode the word to its string of letter codes and the letter codes into actual letters.

Reply 

Previous PostNext Post
Click to Make a Complaint
The Parent Posting, to Which This is a Reply
An Older Reply to the Parent PostingThis PostingA Newer Reply to the Parent Posting
The First Reply to This Posting

Huffman compression
Post: 3
Posted Jan 9, 2001 by The Cow
I realise it's not a perfect match (after all, Morse code predates Huffman by quite a while) but the concept is similar.

Reply 

Previous PostNext Post
Click to Make a Complaint
The Parent Posting, to Which This is a Reply
An Older Reply to the Parent PostingThis PostingA Newer Reply to the Parent Posting
The First Reply to This Posting

Huffman compression
Post: 4
Posted Jan 9, 2001 by Ausnahmsweise, wie üblich (Consistently inconsistent)
Sorry - I was referring to the example in the text of the article. I think it had 10 and 101 as two discrete codes. That cannot happen. The only way you can resolve ambiguity when decoding is because none of the shorter codes has the same beginning sequence as the longer ones.

Reply 

Previous PostNext Post
Click to Make a Complaint
The Parent Posting, to Which This is a Reply
An Older Reply to the Parent PostingThis PostingA Newer Reply to the Parent Posting
The First Reply to This Posting

Huffman compression
Post: 5
Posted Jan 11, 2001 by The Cow
Ah, right.
Yes, of course.

Got any idea what the shortest bit pattern usually is? Because having a 5-bit minimum gives you many less possiblities than a 6-bit minimum.

Reply 

Previous PostNext Post
Click to Make a Complaint
The Parent Posting, to Which This is a Reply
An Older Reply to the Parent PostingThis PostingA Newer Reply to the Parent Posting
The First Reply to This Posting

Removed
Post: 6
Posted Jan 11, 2001
This Posting has been hidden during moderation because it broke the House Rules in some way. You can find out more about moderation on h2g2 here.

Reply 

Previous PostNext Post
The Parent Posting, to Which This is a Reply
An Older Reply to the Parent PostingThis PostingA Newer Reply to the Parent Posting
The First Reply to This Posting

Huffman compression
Post: 7
Posted May 2, 2001 by Ausnahmsweise, wie üblich (Consistently inconsistent)
In a very unbalanced population (say, over 50% of the characters were "E") then you can have a single bit code.
What the shortest usually is depends on the spread of the data. I only know about "adaptive" Huffman, which adapts to the current population. I assume there's another form that tries to be more general?

Reply 

Previous PostNext Post
Click to Make a Complaint
The Parent Posting, to Which This is a Reply
An Older Reply to the Parent PostingThis PostingA Newer Reply to the Parent Posting
The First Reply to This Posting

Huffman compression
Post: 8
Posted Jul 31, 2002 by Robbish
Thinking about Huffmann encoding, it is used to compress PC files (steps back in amazement). However, they are not always taken bit by bit. It is sometimes found that if you group the bites into groups of, say, 10 bits, the encoding can work much more effectively.

Also, with text, don't forget that some letters can be totally ignored, thus saving characters. With this little clump of text, you would not have to find an encryption for the letter 'Z'.

Except that I've just included it. But you know what I mean!!!


Reply 

Previous PostNext Post
Click to Make a Complaint
The Parent Posting, to Which This is a Reply
An Older Reply to the Parent PostingThis PostingA Newer Reply to the Parent Posting
The First Reply to This Posting

Huffman compression
Post: 9
Posted Jul 31, 2002 by Ausnahmsweise, wie üblich (Consistently inconsistent)
If you did not find a Z in the first pass when analyzing the population, then the frequency would be zero and it would not be assigned a bit pattern at all.


I've used it at two levels. I had codes for all of the letters in the population, but then I started looking at words too (which would be strings of compressed chars.). In the final compressed data a common word might be just one or two bits. Of course, I had to have more look-up tables to do the uncompress. First I'd decode the word, then I'd have to decode all of its characters. I squeezed more out of it by considering the leading space on a word as part of the word. (I think I used a one bit flag to say whether a word had a leading space or not - that way "THE" and " THE" would share the same Huffman code.)

All of this was used for messages in fire alarm panels, back when ROM was more expensive.

Awu
P.S. I was new to North Anerica and was goning through a dump of a crash. I starting piecing together a compressed message.

H A A G E N D A Z - I thought it was garbage and that I had a bug. It turns out the panel was in a shopping mall and one of the locations was a Haagen Daz ice cream shop. Phewww!



Reply 

Previous PostNo Next Post
Click to Make a Complaint
The Parent Posting, to Which This is a Reply
An Older Reply to the Parent PostingThis PostingA Newer Reply to the Parent Posting
The First Reply to This Posting

Key
Navigation Example
A: An older reply to the parent Posting
B: The parent Posting, to which this is a reply
C: A newer reply to the parent posting
D: The first reply to this Posting
Click to Make a Complaint
 Click on this icon to make a complaint about a specific Posting
Conversation list
<< King Canute
Exploding Myth >>






Disclaimer

Most of the content on h2g2 is created by h2g2's Researchers, who are members of the public. The views expressed are theirs and unless specifically stated are not those of the BBC. The BBC is not responsible for the content of any external sites referenced. In the event that you consider anything on this page to be in breach of the site's House Rules, please click here. For any other comments, please click on the Feedback button above.




About the BBC | Help | Terms of Use | Privacy & Cookies Policy