Data storage and compression
Storage and compression
- Data storage is measured in units that each step up by 1024.
- We can calculate the size of an image or sound file.
- Compression then makes files smaller.
Measuring storage
- bit (one 0/1) · nibble (4 bits) · byte (8 bits).
- Then each unit is 1024× the one before (because $1024 = 2^{10}$): kibibyte (KiB) = 1024 bytes, mebibyte (MiB) = 1024 KiB, gibibyte (GiB) = 1024 MiB, tebibyte (TiB) = 1024 GiB.
- Always divide by 1024 (not 1000) to step up a unit.
Practice
How many bytes are in 1 kibibyte (KiB)?
1 KiB = 1024 bytes (each storage unit steps up by 1024 = 2^10).
Calculating file size
- Image (bits) $=$ width × height × colour depth.
- Sound (bits) $=$ sample rate × sample resolution × seconds.
- Divide by 8 for bytes, then by 1024 for KiB, again for MiB.
- e.g. $1024 \times 1024$ at 16 bpp $= 16\,777\,216$ bits $= 2$ MiB.
Practice
An image is 100 × 100 pixels with a colour depth of 24 bits. What is its size in bits?
width × height × colour depth = 100 × 100 × 24 = 240 000 bits.
Compression
- Lossless compression makes a file smaller with no permanent loss — the original can be rebuilt exactly. Run-length encoding (RLE) stores a run of repeats once (e.g.
WWWWWWWW→ "8 W"). - Lossy compression makes it much smaller by permanently removing data (lower resolution/sample rate).
- Use lossless for text and program files; lossy for photos, music and video.
Practice
Lossless compression:
Lossless keeps all the data (e.g. RLE); lossy permanently discards some.
Practice
Run-length encoding (RLE) compresses data by:
RLE replaces a run like WWWWWWWW with "8 W" — great when data has many repeats.
Practice
Lossy compression is most appropriate for:
Lossy suits media where a slight quality loss buys a much smaller file; exact data needs lossless.
You've got it
Key idea
- storage units step up by 1024 (KiB, MiB, GiB, TiB)
- image size = width × height × colour depth (bits); sound = rate × resolution × seconds
- lossless = exact recovery (RLE); lossy = smaller, permanent loss
- lossless for text/programs; lossy for photos/music/video