Data Types Revision Notes for OCR A-Level Computer Science

Character Sets

Overview

In computing, a character set is a standardised way to represent text and symbols in binary so that computers can process and display them correctly. Each character (e.g., letters, digits, punctuation) is assigned a unique binary code. Understanding how character sets like ASCII and Unicode work is crucial for handling text data across different systems and languages.

What is a Character Set?

A character set maps characters (letters, numbers, symbols) to specific binary values.
This enables computers to store, transmit, and display text correctly, even across different devices and platforms.
Each character is assigned a unique numeric code, which is then converted to binary.

ASCII (American Standard Code for Information Interchange)

Overview: One of the earliest character sets, ASCII uses 7 bits to represent characters.

7-bit ASCII

Can represent 128 characters ( $2^7$ = 128), including:

Uppercase and lowercase English letters (A–Z, a–z)
Digits (0–9)
Punctuation and special symbols (e.g., !, @, #)
Control characters (e.g., newline, tab)

8-bit ASCII (Extended ASCII)

Extends the set to 256 characters ( $2^8$ = 256), adding support for additional symbols and simple graphical characters.

Usage: Suitable for English text and basic symbols but limited for international use.

lightbulbExample

Example:

Character: A
ASCII Code: 65
Binary: 01000001

Unicode

Overview: A more comprehensive character set designed to support a wide range of characters from multiple languages and scripts.

16-bit Unicode (UTF-16): Initially supported 65,536 characters.
UTF-8 Encoding: Variable-length encoding that uses 1 to 4 bytes, ensuring compatibility with ASCII for the first 128 characters. Why Unicode?:
Supports thousands of characters, including non-Latin scripts (e.g., Chinese, Arabic).
Includes emojis, mathematical symbols, and more. Usage: Essential for global applications, such as web development, where diverse languages must be supported.

lightbulbExample

Example:

Character: € (Euro symbol)
Unicode Code Point: U+20AC
Binary (UTF-8): 11100010 10000010 10101100

Differences Between ASCII and Unicode

Feature	ASCII	Unicode
Bit Length	7-bit (or 8-bit extended)	8 to 32 bits (variable length)
Character Support	128 (7-bit) or 256 (8-bit)	Over 1 million characters
Scope	English and basic symbols	Global, supports all languages
Compatibility	Not suitable for international use	Backward-compatible with ASCII

Why Character Sets Matter

Data Interoperability:

Ensures consistent representation of text across different systems.

Internationalisation:

Unicode enables software to support multiple languages and scripts.

Storage and Transmission:

Efficient storage of text data in binary format, critical for data processing and network communication.

Examples

lightbulbExample

Example 1: ASCII to Binary Convert the ASCII character B to binary.

ASCII value of B = 66.

Binary representation: 01000010.

lightbulbExample

Example 2: Binary to Character (Unicode) Given the binary sequence 11000010 10100010 (UTF-8), determine the character.

Combine and convert to hexadecimal: C2 A2.

Unicode character for C2 A2 is ¢ (cent symbol).

Note Summary

infoNote

Common Mistakes

Confusing ASCII and Unicode:
ASCII is limited to basic English characters, while Unicode supports global characters.
Assuming Fixed Length for Unicode:
Unicode uses variable-length encoding (e.g., UTF-8), where different characters may take 1 to 4 bytes.
Incorrect Binary Conversion:
Ensure the correct binary length is used for ASCII (7 or 8 bits) or Unicode (variable).

infoNote

Key Takeaways

ASCII: Efficient for English text but limited in scope.
Unicode: A versatile character set that supports most languages and symbols.
Conversions:
Be able to convert characters to binary and vice versa.
Understand the encoding format (ASCII or Unicode) being used.
Purpose: Character sets are essential for consistent text representation in computing systems worldwide.

Data Types (OCR A-Level Computer Science): Revision Notes

Character Sets

Overview

What is a Character Set?

ASCII (American Standard Code for Information Interchange)

7-bit ASCII

8-bit ASCII (Extended ASCII)

Unicode

Differences Between ASCII and Unicode

Why Character Sets Matter

Examples

Note Summary

Common Mistakes

Key Takeaways

Explore OCR A-Level Computer Science Model Answers by Topics

Data Types

Data Structures

Boolean Algebra

Explore OCR A-Level Computer Science Quizzes by Topics

Data Types

Data Structures

Boolean Algebra

Explore OCR A-Level Computer Science Flashcards by Topics

Data Types

Data Structures

Boolean Algebra

Explore OCR A-Level Computer Science Exam Questions by Topics

Data Types

Data Structures

Boolean Algebra

Join 100,000+ A-Level students studying Revision Notes with us.