1.2 Data Storage

Show All Section Notes

Representing Text

1. The Conversion Process

When you press a key on a keyboard, the computer does not "see" a letter. It follows a specific sequence to translate that physical action into digital data.

Key Pressed
(Physical Act)
Character Set Scan
(Lookup Table)
Unique Value
(Denary/Hex Code)
Binary String
(Stored in RAM)

2. Key Definitions

  • Character Set: A list of characters and the unique binary codes that represent each one.
  • Character: A single unit of information (letter, digit, space, or symbol).
  • String: A sequence of characters stored together (e.g., "Computer").

3. Calculating Text File Size

In the IGCSE exam, you may be asked to estimate the storage required for a piece of text. The logic is simple:

The Formula:

Total Size = Number of Characters × Bits per Character


Example: How much space does the word IGCSE take in standard 8-bit ASCII?
  • Number of characters: 5
  • Bits per character: 8
  • Total: $5 \times 8 = 40 \text{ bits}$ (or 5 Bytes)

4. Factors Affecting Text Storage

Two main factors change the size of a text file:

  1. Length of the Text: More characters = more bytes.
  2. The Character Set Used:
    • ASCII: Uses 1 byte per character. Small and efficient for English.
    • Unicode (UTF-16): Uses 2 bytes per character. Doubles the file size compared to ASCII but supports all languages.

5. Sorting and Comparison

Because every character has a numeric value, computers can "alphabetize" text by comparing their binary codes.

  • Since 'A' is 65 and 'B' is 66, the computer knows 'A' comes first.
  • Warning: In ASCII, uppercase letters have lower values than lowercase letters ('A' = 65, 'a' = 97). This means "Zebra" would technically be sorted before "apple" in a raw binary sort!
Exam Summary:

To represent text, a Character Set (like ASCII or Unicode) is used to assign a unique Binary value to every character. ASCII is limited to 256 characters (1 byte each), while Unicode allows for millions of characters (multiple bytes each) to support global languages and symbols.