Classification of Programming Languages and Translation Revision Notes for AQA A-Level Computer Science

Classification of Programming Languages and Translation

Introduction

Modern programming is typically done using high-level languages that are easy to understand. However, computers can only process binary instructions (0s and 1s). This creates an interesting challenge: how do we translate the code we write into a format that the processor can actually execute?

In this topic, we'll explore the different types of programming languages, how they've evolved over time, and the various methods used to convert human-readable code into machine-executable instructions.

Types of programming languages

Programming languages can be grouped into three main categories based on how close they are to the hardware:

Machine code - Binary instructions (0s and 1s)
Assembly language - Short instruction codes (mnemonics)
High-level languages - Natural language keywords

infoNote

Machine code and assembly language are collectively known as low-level languages because they work closely with the processor's architecture. They provide direct access to hardware resources but are much harder for humans to read and write.

Machine code

The processor can only understand instructions written as binary digits (0s and 1s). These binary patterns form what we call machine code. It is the most basic form of programming and consists entirely of 0s and 1s that directly control the processor.

Key characteristics of machine code:

Written as sequences of binary digits (bits)
Can be represented in decimal or hexadecimal format to make it slightly more readable
Executes very quickly because the processor can use it directly
Extremely difficult to write and understand
Very time-consuming to develop programs
High risk of errors due to the repetitive nature of entering binary patterns
Almost impossible to debug when errors occur
Not portable - machine code written for one processor type won't work on a different processor

Advantages:

Fastest possible execution speed
Most efficient use of processor capabilities
Direct control over hardware

Disadvantages:

Extremely difficult for humans to write and read
Highly prone to errors
Time-consuming development process
Platform-specific (not portable between different processor types)

chatImportant

Despite these challenges, machine code programs run at maximum speed because there's no translation layer between your instructions and the processor's actions. This is why machine code is the lowest level of code - it's what the processor actually executes.

Assembly language

To make programming more manageable, assembly language was developed. Rather than writing pure binary, programmers can use short, memorable instruction codes called mnemonics. These mnemonics represent specific operations that the processor can perform.

Common assembly language mnemonics:

LDR - Load Register (loads a value from memory into a register)
STR - Store Register (stores a value from a register into memory)
ADD - Addition (adds values together)
SUB - Subtraction (subtracts one value from another)

lightbulbExample

Worked Example: Understanding Assembly Code

Consider this assembly language program:

LDR  20
ADD  43
STR  20
SUB  41
STR  45

This code performs the following operations:

Step 1: Loads the value from memory address 20 into the accumulator

Step 2: Adds the value from memory address 43 to the accumulator

Step 3: Stores the accumulator's contents back to memory address 20

Step 4: Subtracts the value from memory address 41 from the accumulator

Step 5: Stores the final result to memory address 45

This demonstrates how assembly language uses mnemonics to perform basic operations on data stored in memory.

Key characteristics of assembly language:

Uses words (mnemonics) instead of pure binary
Has a one-to-one relationship with machine code - each assembly instruction translates to exactly one machine code instruction
More readable than machine code but still quite cryptic
Must be converted to machine code by an assembler before execution
Still processor-specific (not portable)

Important terminology:

Source code: The original assembly language program written by the programmer, not yet converted to executable form
Assembler: A program that translates assembly language into machine code
Object code: The compiled machine code that can be executed by the computer

Advantages of assembly language:

Programs execute quickly because there's minimal translation overhead
More compact code compared to high-level languages
Allows direct control of hardware registers and memory
Useful when precise hardware manipulation is required

infoNote

Current uses of assembly language:

Assembly language is still used in specific situations where low-level control is essential:

Embedded systems: Small devices with limited processing power and memory benefit from assembly's efficiency
Device drivers: Software that controls hardware components often needs direct hardware access
Real-time applications: Systems that must respond immediately to inputs (like control systems) use assembly for speed
Custom hardware: Specialised processors may only support assembly language programming

High-level languages

Machine code and assembly language are both considered low-level languages because they're designed around the processor's architecture rather than human thinking patterns.

High-level languages were created to solve the problems associated with low-level programming. They use natural language keywords (like English words) and mathematical notation that humans find easier to understand.

Key characteristics of high-level languages:

Commands use recognisable English-style keywords
Platform-independent (portable) - the same code can run on different computer types
Have a one-to-many relationship with machine code - one high-level instruction may translate into many machine code instructions
Must be translated into machine code using a translator (either an interpreter or compiler)
Make use of program structures (loops, conditions, functions) to organise code logically
Easier to write, read, and maintain than low-level languages

infoNote

Why so many different high-level languages exist:

Different languages have been designed to tackle specific types of problems effectively:

Some languages excel at scientific calculations
Others are optimised for database management
Some specialise in web development
Others focus on artificial intelligence applications

The language a programmer chooses depends largely on the problem they're trying to solve.

Programming paradigms

High-level languages are often classified by their programming paradigm - the fundamental style or approach to structuring programs. The three main paradigms are imperative, object-oriented, and declarative.

Imperative languages

Also called procedural languages, imperative languages work by giving the computer a sequence of commands or procedures to follow. The program consists of step-by-step instructions (called subroutines or procedures) that tell the computer exactly what to do and in what order.

Think of it like following a recipe - you perform each instruction in sequence, and the same instructions are followed each time the program runs.

Characteristics:

Programs are structured as lists of instructions
Execution flows through the instructions in order
Focus on how to achieve a result

Object-oriented languages

Object-oriented languages organise programs by grouping instructions and data together into objects. An object is a self-contained unit that combines:

Data (properties or attributes)
Instructions (methods or functions that work with that data)

Objects can be further organised into classes, which act as templates for creating similar objects.

Characteristics:

Programs are structured around objects rather than procedures
Objects encapsulate both data and behaviour
Promotes code reuse through inheritance
Focus on modelling real-world entities

Declarative languages

Declarative languages specify what result you want rather than how to achieve it. Instead of listing step-by-step instructions, you declare the properties the result should have, and the system works out how to produce it.

There are two main types of declarative languages:

Logic programming languages: These work with facts and rules. The program uses artificial intelligence techniques to interrogate the facts and rules to produce results. They're commonly used in AI applications.

Functional languages: These treat computation like mathematical functions. Programs are built by composing functions together, where each function takes inputs and produces outputs without changing state or data. The building blocks are functions rather than instruction lists.

Characteristics:

Focus on what should be accomplished, not how
Use facts, rules, or mathematical functions
Often used in specialised domains (AI, mathematics, data analysis)

Translating high-level languages

High-level languages are programmer-friendly, but computers cannot understand them directly. The processor only executes machine code, so high-level source code must be converted (translated) into machine code before it can run.

This translation process requires special system software called a translator. There are two main types of translator for high-level languages:

Interpreters
Compilers

Interpreters

An interpreter translates and executes high-level code one statement at a time. It reads a line of source code, immediately performs the required action, then moves to the next line.

How interpreters work:

Read one statement from the source code
Translate it into machine code (or an intermediate format)
Execute that machine code immediately
Move to the next statement
Repeat until the program ends

Some interpreters may work by interpreting the syntax of each statement directly, whilst others may call predefined routines to handle common operations.

chatImportant

Interpreters are selective - they only translate code that actually needs to run. This selective translation can save time during development and testing.

lightbulbExample

Worked Example: Selective Translation

Consider this code:

If Age<17 Then Output = "Cannot drive a car"

If the condition Age<17 is false, the interpreter won't bother translating the output statement because it won't be executed. This saves translation time.

Some interpreters translate an entire line before executing it, whilst others execute as they read, which makes them extremely flexible.

Benefits of using an interpreter:

You can run sections of code immediately without compiling the whole program
Code can run on different processors as long as they have the appropriate interpreter installed
Ideal for program development because you can test code quickly
Easier to debug because errors are identified line-by-line

Drawbacks of using an interpreter:

Programs run more slowly because translation happens every time the code executes
Code that runs repeatedly (like loops) must be translated each time, which is inefficient
The source code must be distributed to users (rather than just an executable)
Users must have the correct interpreter installed on their system

Compilers

A compiler translates the entire source code into machine code (object code) in one complete process before the program runs. Once compilation is complete, you have an executable file that can run immediately.

How compilers work:

Read the entire source code
Check for syntax errors
Translate all the code into machine code (object code)
Create an executable file
The executable can then be run repeatedly without further translation

Benefits of using a compiler:

Programs run very quickly after compilation because no translation is needed during execution
Only the executable file (object code) needs to be distributed - users don't need the source code
Makes reverse engineering difficult because working backwards from object code to source code is very challenging

Drawbacks of using a compiler:

The entire program must be recompiled even if you make a tiny change, which slows down debugging
The compilation process itself can be time-consuming for large programs
Object code is platform-specific - it will only run on computers with the same type of processor it was compiled for

infoNote

Comparison: Interpreters vs Compilers

The choice between an interpreter and compiler depends on your needs:

Development phase: Interpreters are better because you can test code immediately
Production phase: Compilers are better because the final program runs faster
Portability: Interpreters offer better cross-platform support
Distribution: Compilers are better as you only distribute the executable

Bytecode

Some programming languages use an intermediate approach called bytecode. Bytecode is an instruction set that can be executed on any computer using a virtual machine.

How bytecode works:

Rather than compiling directly to machine code for a specific processor, the source code is compiled into bytecode. This bytecode can then run on any computer that has the appropriate virtual machine installed.

lightbulbExample

Worked Example: Java Bytecode

Java source code is compiled into bytecode format. The Java Virtual Machine (JVM) can then execute this bytecode on any computer, regardless of:

Processor type
Operating system
Hardware architecture

The bytecode instructions are typically one or two bytes that define the operation, followed by any parameters needed. This makes it compact and efficient.

Microsoft Common Intermediate Language (CIL):

Similarly, Microsoft's .NET languages (like C#) compile to an intermediate code called CIL. The .NET virtual machine can then execute this intermediate code on any supported platform.

Advantages of bytecode:

Platform independence - write once, run anywhere
More secure than distributing source code
Still relatively efficient compared to pure interpretation
Allows for platform-specific optimisation by the virtual machine

bookmarkSummary

Key Points to Remember:

Three types of languages: Machine code (binary), assembly language (mnemonics), and high-level languages (natural keywords)
Low-level languages (machine code and assembly) are fast and give hardware control but are difficult to write and platform-specific
High-level languages are easier to write and portable but require translation
Assembly has a one-to-one relationship with machine code (one instruction = one machine instruction), whilst high-level has a one-to-many relationship (one instruction = many machine instructions)
Interpreters translate and execute code line-by-line, which is flexible but slower; compilers translate all code at once, which produces faster executables but makes debugging slower
Bytecode provides platform independence by running on virtual machines rather than directly on the processor

Classification of Programming Languages and Translation (AQA A-Level Computer Science): Revision Notes