1.2 Data Storage

Representing Text

1. The Conversion Process

When you press a key on a keyboard, the computer does not "see" a letter. It follows a specific sequence to translate that physical action into digital data.

Key Pressed
(Physical Act)
Character Set Scan
(Lookup Table)
Unique Value
(Denary/Hex Code)
Binary String
(Stored in RAM)

2. Key Definitions

  • Character Set: A list of characters and the unique binary codes that represent each one.
  • Character: A single unit of information (letter, digit, space, or symbol).
  • String: A sequence of characters stored together (e.g., "Computer").

3. Calculating Text File Size

In the IGCSE exam, you may be asked to estimate the storage required for a piece of text. The logic is simple:

The Formula:

Total Size = Number of Characters × Bits per Character


Example: How much space does the word IGCSE take in standard 8-bit ASCII?
  • Number of characters: 5
  • Bits per character: 8
  • Total: $5 \times 8 = 40 \text{ bits}$ (or 5 Bytes)

4. Factors Affecting Text Storage

Two main factors change the size of a text file:

  1. Length of the Text: More characters = more bytes.
  2. The Character Set Used:
    • ASCII: Uses 1 byte per character. Small and efficient for English.
    • Unicode (UTF-16): Uses 2 bytes per character. Doubles the file size compared to ASCII but supports all languages.

5. Sorting and Comparison

Because every character has a numeric value, computers can "alphabetize" text by comparing their binary codes.

  • Since 'A' is 65 and 'B' is 66, the computer knows 'A' comes first.
  • Warning: In ASCII, uppercase letters have lower values than lowercase letters ('A' = 65, 'a' = 97). This means "Zebra" would technically be sorted before "apple" in a raw binary sort!
Exam Summary:

To represent text, a Character Set (like ASCII or Unicode) is used to assign a unique Binary value to every character. ASCII is limited to 256 characters (1 byte each), while Unicode allows for millions of characters (multiple bytes each) to support global languages and symbols.

Digital Images

1. The Bitmap Concept

Most images on a computer are stored as Bitmaps. A bitmap image is composed of a grid of tiny dots called Pixels (short for Picture Elements).

In a 1-bit image, 0 = White, 1 = Black.

2. Key Terminology

Pixel
The smallest addressable element of a digital image.
Resolution
The number of pixels that make up an image (Width × Height). Higher resolution means more detail but larger file size.
Color Depth (Bit Depth)
The number of bits used to represent the color of a single pixel.

3. Color Depth Calculations

The number of colors available is calculated by 2n, where n is the bit depth.

Bit Depth Colors Available Usage
1-bit $2^1 = 2$ Monochrome (Black/White)
8-bit $2^8 = 256$ Basic web graphics
24-bit $2^{24} \approx 16.7$ Million "True Color" (8 bits each for R, G, B)

4. Estimating Image File Size

The Formula:

File Size (bits) = Resolution (W × H) × Color Depth


Example: An image is 1000 pixels wide, 500 pixels high, and uses 24-bit color.
  • Pixels: $1000 \times 500 = 500,000$
  • Size in bits: $500,000 \times 24 = 12,000,000 \text{ bits}$
  • Size in MiB: $12,000,000 \div 8 \div 1024 \div 1024 \approx 1.43 \text{ MiB}$

5. Metadata

An image file doesn't just contain pixel data. It also contains Metadata (data about data).

⚠️ Exam Alert: If you increase the Resolution OR the Color Depth, the file size will increase. This means it will take longer to download/upload and require more storage space.

Digital Sound

1. Analog vs. Digital

Sound is naturally Analog (a continuous wave). Computers are Digital (discrete binary). To store sound, we must convert the analog wave into digital data using an ADC (Analog-to-Digital Converter).

The Process: Sampling

The amplitude (height) of the sound wave is measured at regular intervals and recorded as a binary value.

||| | || | ||| | ||
Each bar represents a "sample" taken at a specific point in time.

2. Key Factors Affecting Sound Quality

A. Sample Rate (Frequency)

The number of samples taken per second, measured in Hertz (Hz).

  • High Sample Rate: More samples per second = Smoother, more accurate reproduction of the original wave.
  • Standard CD Quality: 44,100 Hz (44.1 kHz).

B. Sample Resolution (Bit Depth)

The number of bits used to store each sample. This determines how many different "levels" of volume (amplitude) can be recorded.

  • High Resolution: More bits per sample = Larger range of volumes and less "quantization" noise.

3. Calculating Sound File Size

The Formula:

File Size = Sample Rate (Hz) × Resolution (bits) × Time (seconds)


Example: A 10-second mono clip recorded at 44,100 Hz with 16-bit resolution.
  • Calculation: $44,100 \times 16 \times 10 = 7,056,000 \text{ bits}$
  • In MiB: $7,056,000 \div 8 \div 1024 \div 1024 \approx 0.84 \text{ MiB}$

4. Impact of Changing Settings

Action Impact on Quality Impact on File Size
Increase Sample Rate Higher (Better accuracy) Increases
Increase Sample Resolution Higher (Better dynamic range) Increases

5. Playback: The DAC

To hear the sound, the binary data must be converted back into an analog signal using a DAC (Digital-to-Analog Converter). This signal is then sent to an amplifier and speakers/headphones.

⚠️ Exam Tip: When describing the process, always use the word "intervals." "Samples of the sound wave amplitude are taken at regular intervals and stored as binary values." This is often a mark-earning phrase in IGCSE mark schemes.

Data Compression

1. Why Compress Data?

Compression is the process of reducing the size of a file. This is crucial for several reasons:

  • Faster Transmission: Files take less time to upload/download or send via email.
  • Storage Savings: More files can be stored on a drive (SSD/HDD).
  • Streaming: Required for smooth video (YouTube/Netflix) and music streaming.
  • Website Speed: Smaller images help web pages load faster.

2. Lossy vs. Lossless Compression

Lossy

Permanently removes "unnecessary" data that the human eye or ear cannot easily perceive.

File size: Significantly reduced.

Formats: JPEG, MP3, MP4.

Lossless

Reduces file size without losing any original data. The file can be reconstructed exactly.

File size: Moderately reduced.

Formats: PNG, GIF, ZIP, FLAC.

3. How Lossless Works (RLE)

One common method of lossless compression is Run-Length Encoding (RLE). It looks for consecutive repeating data and stores it as a single value and a count.

Uncompressed Data:
AAAAABBBCCCDDDDD

RLE Compressed:
5A3B3C5D

In images, RLE works by identifying long runs of identical colored pixels.

4. How Lossy Works

Lossy compression algorithms use Perceptual Coding:

  • Images (JPEG): Reduces the number of colors or simplifies areas where the eye won't notice a change.
  • Sound (MP3): Removes frequencies that the human ear cannot hear and removes quieter sounds that are "masked" by louder sounds.

5. Comparison Summary

Feature Lossy Lossless
Original Quality Lost permanently Kept perfectly
Compression Ratio Very High (Tiny files) Low (Larger files)
Best for... Photos, Video, Streaming Text files, Spreadsheets, Code
⚠️ Exam Note: You cannot use Lossy compression for software programs or text files. If you lose even one bit of a program's code, the entire program may fail to run!