Classification of Programming Languages and Translation (AQA A-Level Computer Science): Revision Notes
Classification of Programming Languages and Translation
Introduction
Modern programming is typically done using high-level languages that are easy to understand. However, computers can only process binary instructions (0s and 1s). This creates an interesting challenge: how do we translate the code we write into a format that the processor can actually execute?
In this topic, we'll explore the different types of programming languages, how they've evolved over time, and the various methods used to convert human-readable code into machine-executable instructions.
Types of programming languages
Programming languages can be grouped into three main categories based on how close they are to the hardware:
- Machine code - Binary instructions (0s and 1s)
- Assembly language - Short instruction codes (mnemonics)
- High-level languages - Natural language keywords
Machine code and assembly language are collectively known as low-level languages because they work closely with the processor's architecture. They provide direct access to hardware resources but are much harder for humans to read and write.
Machine code
The processor can only understand instructions written as binary digits (0s and 1s). These binary patterns form what we call machine code. It is the most basic form of programming and consists entirely of 0s and 1s that directly control the processor.
Key characteristics of machine code:
- Written as sequences of binary digits (bits)
- Can be represented in decimal or hexadecimal format to make it slightly more readable
- Executes very quickly because the processor can use it directly
- Extremely difficult to write and understand
- Very time-consuming to develop programs
- High risk of errors due to the repetitive nature of entering binary patterns
- Almost impossible to debug when errors occur
- Not portable - machine code written for one processor type won't work on a different processor
Advantages:
- Fastest possible execution speed
- Most efficient use of processor capabilities
- Direct control over hardware
Disadvantages:
- Extremely difficult for humans to write and read
- Highly prone to errors
- Time-consuming development process
- Platform-specific (not portable between different processor types)
Despite these challenges, machine code programs run at maximum speed because there's no translation layer between your instructions and the processor's actions. This is why machine code is the lowest level of code - it's what the processor actually executes.
Assembly language
To make programming more manageable, assembly language was developed. Rather than writing pure binary, programmers can use short, memorable instruction codes called mnemonics. These mnemonics represent specific operations that the processor can perform.
Common assembly language mnemonics:
- LDR - Load Register (loads a value from memory into a register)
- STR - Store Register (stores a value from a register into memory)
- ADD - Addition (adds values together)
- SUB - Subtraction (subtracts one value from another)
Worked Example: Understanding Assembly Code
Consider this assembly language program:
LDR 20
ADD 43
STR 20
SUB 41
STR 45
This code performs the following operations:
Step 1: Loads the value from memory address 20 into the accumulator
Step 2: Adds the value from memory address 43 to the accumulator
Step 3: Stores the accumulator's contents back to memory address 20
Step 4: Subtracts the value from memory address 41 from the accumulator
Step 5: Stores the final result to memory address 45
This demonstrates how assembly language uses mnemonics to perform basic operations on data stored in memory.
Key characteristics of assembly language:
- Uses words (mnemonics) instead of pure binary
- Has a one-to-one relationship with machine code - each assembly instruction translates to exactly one machine code instruction
- More readable than machine code but still quite cryptic
- Must be converted to machine code by an assembler before execution
- Still processor-specific (not portable)
Important terminology:
- Source code: The original assembly language program written by the programmer, not yet converted to executable form
- Assembler: A program that translates assembly language into machine code
- Object code: The compiled machine code that can be executed by the computer
Advantages of assembly language:
- Programs execute quickly because there's minimal translation overhead
- More compact code compared to high-level languages
- Allows direct control of hardware registers and memory
- Useful when precise hardware manipulation is required
Current uses of assembly language:
Assembly language is still used in specific situations where low-level control is essential:
- Embedded systems: Small devices with limited processing power and memory benefit from assembly's efficiency
- Device drivers: Software that controls hardware components often needs direct hardware access
- Real-time applications: Systems that must respond immediately to inputs (like control systems) use assembly for speed
- Custom hardware: Specialised processors may only support assembly language programming
High-level languages
Machine code and assembly language are both considered low-level languages because they're designed around the processor's architecture rather than human thinking patterns.
High-level languages were created to solve the problems associated with low-level programming. They use natural language keywords (like English words) and mathematical notation that humans find easier to understand.
Key characteristics of high-level languages:
- Commands use recognisable English-style keywords
- Platform-independent (portable) - the same code can run on different computer types
- Have a one-to-many relationship with machine code - one high-level instruction may translate into many machine code instructions
- Must be translated into machine code using a translator (either an interpreter or compiler)
- Make use of program structures (loops, conditions, functions) to organise code logically
- Easier to write, read, and maintain than low-level languages
Why so many different high-level languages exist:
Different languages have been designed to tackle specific types of problems effectively:
- Some languages excel at scientific calculations
- Others are optimised for database management
- Some specialise in web development
- Others focus on artificial intelligence applications
The language a programmer chooses depends largely on the problem they're trying to solve.
Programming paradigms
High-level languages are often classified by their programming paradigm - the fundamental style or approach to structuring programs. The three main paradigms are imperative, object-oriented, and declarative.
Imperative languages
Also called procedural languages, imperative languages work by giving the computer a sequence of commands or procedures to follow. The program consists of step-by-step instructions (called subroutines or procedures) that tell the computer exactly what to do and in what order.
Think of it like following a recipe - you perform each instruction in sequence, and the same instructions are followed each time the program runs.
Characteristics:
- Programs are structured as lists of instructions
- Execution flows through the instructions in order
- Focus on how to achieve a result
Object-oriented languages
Object-oriented languages organise programs by grouping instructions and data together into objects. An object is a self-contained unit that combines:
- Data (properties or attributes)
- Instructions (methods or functions that work with that data)
Objects can be further organised into classes, which act as templates for creating similar objects.
Characteristics:
- Programs are structured around objects rather than procedures
- Objects encapsulate both data and behaviour
- Promotes code reuse through inheritance
- Focus on modelling real-world entities
Declarative languages
Declarative languages specify what result you want rather than how to achieve it. Instead of listing step-by-step instructions, you declare the properties the result should have, and the system works out how to produce it.
There are two main types of declarative languages:
Logic programming languages: These work with facts and rules. The program uses artificial intelligence techniques to interrogate the facts and rules to produce results. They're commonly used in AI applications.
Functional languages: These treat computation like mathematical functions. Programs are built by composing functions together, where each function takes inputs and produces outputs without changing state or data. The building blocks are functions rather than instruction lists.
Characteristics:
- Focus on what should be accomplished, not how
- Use facts, rules, or mathematical functions
- Often used in specialised domains (AI, mathematics, data analysis)
Translating high-level languages
High-level languages are programmer-friendly, but computers cannot understand them directly. The processor only executes machine code, so high-level source code must be converted (translated) into machine code before it can run.
This translation process requires special system software called a translator. There are two main types of translator for high-level languages:
- Interpreters
- Compilers
Interpreters
An interpreter translates and executes high-level code one statement at a time. It reads a line of source code, immediately performs the required action, then moves to the next line.
How interpreters work:
- Read one statement from the source code
- Translate it into machine code (or an intermediate format)
- Execute that machine code immediately
- Move to the next statement
- Repeat until the program ends
Some interpreters may work by interpreting the syntax of each statement directly, whilst others may call predefined routines to handle common operations.
Interpreters are selective - they only translate code that actually needs to run. This selective translation can save time during development and testing.
Worked Example: Selective Translation
Consider this code:
If Age<17 Then Output = "Cannot drive a car"
If the condition Age<17 is false, the interpreter won't bother translating the output statement because it won't be executed. This saves translation time.
Some interpreters translate an entire line before executing it, whilst others execute as they read, which makes them extremely flexible.
Benefits of using an interpreter:
- You can run sections of code immediately without compiling the whole program
- Code can run on different processors as long as they have the appropriate interpreter installed
- Ideal for program development because you can test code quickly
- Easier to debug because errors are identified line-by-line
Drawbacks of using an interpreter:
- Programs run more slowly because translation happens every time the code executes
- Code that runs repeatedly (like loops) must be translated each time, which is inefficient
- The source code must be distributed to users (rather than just an executable)
- Users must have the correct interpreter installed on their system
Compilers
A compiler translates the entire source code into machine code (object code) in one complete process before the program runs. Once compilation is complete, you have an executable file that can run immediately.
How compilers work:
- Read the entire source code
- Check for syntax errors
- Translate all the code into machine code (object code)
- Create an executable file
- The executable can then be run repeatedly without further translation
Benefits of using a compiler:
- Programs run very quickly after compilation because no translation is needed during execution
- Only the executable file (object code) needs to be distributed - users don't need the source code
- Makes reverse engineering difficult because working backwards from object code to source code is very challenging
Drawbacks of using a compiler:
- The entire program must be recompiled even if you make a tiny change, which slows down debugging
- The compilation process itself can be time-consuming for large programs
- Object code is platform-specific - it will only run on computers with the same type of processor it was compiled for
Comparison: Interpreters vs Compilers
The choice between an interpreter and compiler depends on your needs:
- Development phase: Interpreters are better because you can test code immediately
- Production phase: Compilers are better because the final program runs faster
- Portability: Interpreters offer better cross-platform support
- Distribution: Compilers are better as you only distribute the executable
Bytecode
Some programming languages use an intermediate approach called bytecode. Bytecode is an instruction set that can be executed on any computer using a virtual machine.
How bytecode works:
Rather than compiling directly to machine code for a specific processor, the source code is compiled into bytecode. This bytecode can then run on any computer that has the appropriate virtual machine installed.
Worked Example: Java Bytecode
Java source code is compiled into bytecode format. The Java Virtual Machine (JVM) can then execute this bytecode on any computer, regardless of:
- Processor type
- Operating system
- Hardware architecture
The bytecode instructions are typically one or two bytes that define the operation, followed by any parameters needed. This makes it compact and efficient.
Microsoft Common Intermediate Language (CIL):
Similarly, Microsoft's .NET languages (like C#) compile to an intermediate code called CIL. The .NET virtual machine can then execute this intermediate code on any supported platform.
Advantages of bytecode:
- Platform independence - write once, run anywhere
- More secure than distributing source code
- Still relatively efficient compared to pure interpretation
- Allows for platform-specific optimisation by the virtual machine
Key Points to Remember:
-
Three types of languages: Machine code (binary), assembly language (mnemonics), and high-level languages (natural keywords)
-
Low-level languages (machine code and assembly) are fast and give hardware control but are difficult to write and platform-specific
-
High-level languages are easier to write and portable but require translation
-
Assembly has a one-to-one relationship with machine code (one instruction = one machine instruction), whilst high-level has a one-to-many relationship (one instruction = many machine instructions)
-
Interpreters translate and execute code line-by-line, which is flexible but slower; compilers translate all code at once, which produces faster executables but makes debugging slower
-
Bytecode provides platform independence by running on virtual machines rather than directly on the processor