1.1 Data Representation

Show All Section Notes

Character Sets

1. What is a Character Set?

A Character Set is a defined list of characters recognized by computer hardware and software. Each character is assigned a unique binary number (binary code).

Without a character set, a computer would just see a string of bits and wouldn't know if they represented a number, a sound, or the letter 'A'.

2. ASCII (American Standard Code for Information Interchange)

ASCII was the first widely used character set. It originally used 7 bits, providing $2^7$ ($128$) unique characters.

Example Mapping:
Character 'A' → Denary 65 → Binary 01000001
Character 'a' → Denary 97 → Binary 01100001
Character '!' → Denary 33 → Binary 00100001

Extended ASCII: Later updated to 8 bits ($2^8$), allowing for $256$ characters. This added mathematical symbols and some non-English characters.

3. Unicode

As computing went global, 256 characters weren't enough for languages like Chinese, Arabic, or Hindi. Unicode was created to represent every character in every language.

  • Uses 16 bits (65,536 characters) or 32 bits (over 4 billion characters).
  • The first 128 codes in Unicode are identical to ASCII, making it "backward compatible."
  • Includes Emojis and historical scripts (like Hieroglyphics).

4. ASCII vs. Unicode: Comparison

Feature ASCII Unicode
Bits per Character 7 or 8 bits 16 or 32 bits
Number of Characters 128 to 256 Over 1 million (currently)
Storage Requirements Low (1 byte per char) High (2 to 4 bytes per char)
Global Use English/Western only Universal (all languages)

5. Key Exam Terms

Character:
A single symbol (letter, number, or punctuation mark).
Alphanumeric:
Characters that include both letters and numbers.
Control Characters:
Non-printing characters that perform actions, such as "Shift," "Backspaced," or "Enter."
💡 Exam Tip: If an exam question asks why Unicode is better than ASCII, mention that it allows for global communication and supports multilingual applications, even though it requires more storage space.