Data Compression
1. Why Compress Data?
Compression is the process of reducing the size of a file. This is crucial for several reasons:
- Faster Transmission: Files take less time to upload/download or send via email.
- Storage Savings: More files can be stored on a drive (SSD/HDD).
- Streaming: Required for smooth video (YouTube/Netflix) and music streaming.
- Website Speed: Smaller images help web pages load faster.
2. Lossy vs. Lossless Compression
Lossy
Permanently removes "unnecessary" data that the human eye or ear cannot easily perceive.
File size: Significantly reduced.
Formats: JPEG, MP3, MP4.
Lossless
Reduces file size without losing any original data. The file can be reconstructed exactly.
File size: Moderately reduced.
Formats: PNG, GIF, ZIP, FLAC.
3. How Lossless Works (RLE)
One common method of lossless compression is Run-Length Encoding (RLE). It looks for consecutive repeating data and stores it as a single value and a count.
Uncompressed Data:
AAAAABBBCCCDDDDD
RLE Compressed:
5A3B3C5D
AAAAABBBCCCDDDDD
RLE Compressed:
5A3B3C5D
In images, RLE works by identifying long runs of identical colored pixels.
4. How Lossy Works
Lossy compression algorithms use Perceptual Coding:
- Images (JPEG): Reduces the number of colors or simplifies areas where the eye won't notice a change.
- Sound (MP3): Removes frequencies that the human ear cannot hear and removes quieter sounds that are "masked" by louder sounds.
5. Comparison Summary
| Feature | Lossy | Lossless |
|---|---|---|
| Original Quality | Lost permanently | Kept perfectly |
| Compression Ratio | Very High (Tiny files) | Low (Larger files) |
| Best for... | Photos, Video, Streaming | Text files, Spreadsheets, Code |
⚠️ Exam Note: You cannot use Lossy compression for software programs or text files. If you lose even one bit of a program's code, the entire program may fail to run!