Representing Text
1. The Conversion Process
When you press a key on a keyboard, the computer does not "see" a letter. It follows a specific sequence to translate that physical action into digital data.
(Physical Act)
(Lookup Table)
(Denary/Hex Code)
(Stored in RAM)
2. Key Definitions
- Character Set: A list of characters and the unique binary codes that represent each one.
- Character: A single unit of information (letter, digit, space, or symbol).
- String: A sequence of characters stored together (e.g., "Computer").
3. Calculating Text File Size
In the IGCSE exam, you may be asked to estimate the storage required for a piece of text. The logic is simple:
Total Size = Number of Characters × Bits per Character
Example: How much space does the word IGCSE take in standard 8-bit ASCII?
- Number of characters: 5
- Bits per character: 8
- Total: $5 \times 8 = 40 \text{ bits}$ (or 5 Bytes)
4. Factors Affecting Text Storage
Two main factors change the size of a text file:
- Length of the Text: More characters = more bytes.
- The Character Set Used:
- ASCII: Uses 1 byte per character. Small and efficient for English.
- Unicode (UTF-16): Uses 2 bytes per character. Doubles the file size compared to ASCII but supports all languages.
5. Sorting and Comparison
Because every character has a numeric value, computers can "alphabetize" text by comparing their binary codes.
- Since 'A' is 65 and 'B' is 66, the computer knows 'A' comes first.
- Warning: In ASCII, uppercase letters have lower values than lowercase letters ('A' = 65, 'a' = 97). This means "Zebra" would technically be sorted before "apple" in a raw binary sort!
To represent text, a Character Set (like ASCII or Unicode) is used to assign a unique Binary value to every character. ASCII is limited to 256 characters (1 byte each), while Unicode allows for millions of characters (multiple bytes each) to support global languages and symbols.