Search Algorithms – Binary, Binary Tree, and Linear Search Revision Notes for AQA A-Level Computer Science

Search Algorithms – Binary, Binary Tree, and Linear Search

Introduction to searching algorithms

One of the most powerful capabilities of computers is their ability to search through data quickly and efficiently. Searching is a fundamental operation that we encounter in countless everyday situations. For instance, when you use a cash machine to check your bank balance, the system searches through account records to find your specific account. Similarly, when you shop at a supermarket, the computerised till searches for product prices and details. Search engines on the internet help you find the cheapest holiday to the Algarve by searching through thousands of options.

Most searches are performed on data storage systems, though they're also used in other applications such as word processors that implement find and replace functions. A simple search might look for just one keyword, but more sophisticated search routines allow you to construct complex queries using logic statements such as OR, AND and NOT.

infoNote

Real-World Applications of Search Algorithms

Search algorithms power many everyday computing tasks:

Banking systems searching through millions of account records
E-commerce websites finding products from vast inventories
Search engines indexing billions of web pages
Database systems retrieving specific records
Word processors finding text in documents

There are several different searching algorithms available, each with its own strengths and weaknesses. The choice of which algorithm to use depends largely on the nature of the data being searched - particularly its size and how it's organised. The efficiency of algorithms is typically measured using Big O notation, which provides a mathematical way to compare how well different algorithms perform as dataset sizes grow.

Understanding these search methods will help you write more efficient programs and make informed decisions about which approach to use in different situations.

Linear search

What is a linear search?

A linear search is the simplest and most straightforward searching technique. It works by examining each item in a dataset one at a time, starting from the beginning and continuing sequentially until either the search term is found or the end of the data is reached.

Think of it like looking for a specific book on an unsorted bookshelf. If the books aren't organised in any particular order (not by title, not by author, not by any system), your only option is to look at each book one by one until you find the one you want. You might get lucky and find it immediately, or you might have to check every single book before finding it at the very end.

How linear search works

The process is beautifully simple:

Start at the first item in the dataset
Check if it matches what you're looking for
If it matches, you've found it - stop searching
If it doesn't match, move to the next item
Repeat until you find the item or run out of items to check

Here's how this might look in pseudocode:

Repeat
    Look at the Title
Until Title is the one you want OR there are no more books

This simplicity is both linear search's greatest strength and its greatest weakness. It's easy to implement and works on any dataset, regardless of how the data is organised. However, it's also the least efficient method, particularly when dealing with large amounts of data.

Efficiency considerations

The efficiency of a linear search is strongly influenced by where the item you're searching for is located within the dataset:

Best-case scenario: The item you want is at or near the beginning of the dataset. In this case, you'll find it very quickly with just one or a few comparisons.
Worst-case scenario: The item is at the end of the dataset, or doesn't exist in the dataset at all. In these cases, you'll need to check every single item, making it time-consuming for large datasets.
Average case: On average, you'll need to check approximately half of the items in the dataset before finding what you're looking for.

chatImportant

Performance Scales with Dataset Size

The speed of a linear search is directly proportionate to the size of the dataset. If you double the size of your dataset, you roughly double the time it takes to search through it. For computer scientists, we say that linear search has O(n) time complexity, where n represents the number of items in the dataset.

Implementation example

Here's how a linear search might be implemented in Visual Basic. This example searches through a table of data looking for a specific text string:

Private Sub btnSearch_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnSearch.Click
    
    Dim CountLinear As Integer = 0
    txtTraceBinary.Text = ""
    txtTraceLinear.text = ""
    
    ' linear search
    Do
        CountLinear = CountLinear + 1
        
        txtTraceLinear.Text = txtTraceLinear.Text & CountLinear & "-" & grdTable.Rows(CountLinear).Cells(0).Value & vbCrLf
        
        Loop Until CountLinear = RowCount Or grdTable.Rows(CountLinear).Cells(0).Value = txtFind.Text
        
        txtLinearLooks.Text = CountLinear
        
        ' match found?
        If grdTable.Rows(CountLinear).Cells(0).Value = txtFind.Text Then
            lblResult.Text = "Match Found"
        Else
            lblResult.Text = "No Match Found"
        End If
        
End Sub

This code keeps a count of how many comparisons it makes (stored in CountLinear), demonstrating how the algorithm examines each row of data systematically until it either finds a match or reaches the end of the dataset.

When to use linear search

Linear search is most appropriate when:

Your dataset is small
The data is unsorted or randomly organised
You need a simple solution that's quick to implement
You'll only be searching occasionally

Despite its limitations, linear search remains useful because of its simplicity and the fact that it works on any type of data organisation.

Binary search

What is a binary search?

A binary search is a much more efficient searching technique, but it comes with an important requirement: the data must be sorted in order (either ascending or descending). It works by repeatedly dividing the search space in half, eliminating large portions of the dataset with each comparison.

Think of binary search like the classic number guessing game. If someone thinks of a number between 1 and 100, a logical strategy is to start by guessing 50. If they say "higher", you've immediately eliminated all numbers from 1 to 50 - that's half the possibilities gone with just one guess! You'd then guess 75 (halfway between 51 and 100), and continue this halving process until you find the number.

chatImportant

Critical Requirement: Sorted Data

Binary search only works when your data is arranged in a logical order. If your data is unsorted, you must either sort it first or use a different search method like linear search. This preprocessing requirement is the trade-off for the dramatically improved search speed.

This divide-and-conquer approach makes binary search dramatically faster than linear search for large datasets. However, remember that it only works when your data is arranged in a logical order.

How binary search works

The binary search algorithm uses three key positions or "pointers" to keep track of where it's searching:

LowestPointer: Points to the first position in the current search range
HighestPointer: Points to the last position in the current search range
MiddlePointer: Points to the middle position, calculated as $(\text{LowestPointer} + \text{HighestPointer}) / 2$

Here's the step-by-step process:

Set LowestPointer to the first item and HighestPointer to the last item
Calculate MiddlePointer as the halfway point between them
Compare the value at MiddlePointer with your search term
If it matches, you've found it - stop searching
If the search term is less than the middle value, set HighestPointer to MiddlePointer - 1 (search the lower half)
If the search term is greater than the middle value, set LowestPointer to MiddlePointer + 1 (search the upper half)
Repeat steps 2-6 until you find the item or LowestPointer equals HighestPointer

Worked example

lightbulbExample

Worked Example: Binary Search for Number 51

Let's search for the number 51 in a sorted dataset of 15 numbers arranged in ascending order. Here's our starting position:

Table

Step 1: We start by looking at the middle position (cell 8). Halfway between position 1 and position 15 is position 8, which contains the number 37.

Table

Since 51 is greater than 37, we know the number we're looking for must be in the upper half of the dataset. We can eliminate positions 1 through 8 (shown marked with 'x'). Our new search range is positions 9 to 15.

Step 2: Now we calculate the middle of our remaining range. Halfway between position 9 and 15 is position 12, which contains the number 57.

Table

Since 51 is less than 57, we know to search in the lower portion of this range. We can now eliminate positions 12 through 15. Our search range is now positions 9 to 11.

Step 3: The middle position between 9 and 11 is position 10.

Table

Position 10 contains the number 51 - we've found our target! This search took just 3 comparisons to find the number in a dataset of 15 items.

Pseudocode representation

Here's how binary search can be represented in pseudocode:

FindMe stores the record title that we are searching for

LowestPointer ← 1
HighestPointer ← NumberofRecords

Do
    MiddlePointer ← (LowestPointer + HighestPointer) / 2
    
    If Record at MiddlePointer < FindMe Then
        LowestPointer ← MiddlePointer + 1
    End If
    
    If Record at MiddlePointer > FindMe Then
        HighestPointer ← MiddlePointer - 1
    End If
    
Until Record at MiddlePointer = FindMe OR LowestPointer = HighestPointer

The pointers track the boundaries of our search space, which shrinks by roughly half with each iteration. This is what makes binary search so efficient.

Efficiency and advantages

Binary search is remarkably efficient, especially as datasets grow larger. Here's why it's so powerful:

If you need to search through just three records, binary search will take a maximum of two comparisons. But the real advantage becomes apparent with larger datasets:

For 1 million records: Maximum of just 20 comparisons needed
For 6 billion records: Maximum of just 33 comparisons needed

infoNote

Dramatic Efficiency Gains

Compare this to linear search, which would potentially need to check all 1 million or all 6 billion items in the worst case! This efficiency comes from the fact that binary search has O(log n) time complexity. Each comparison eliminates half of the remaining possibilities, leading to logarithmic growth in the number of comparisons needed as the dataset size increases.

At first glance, it might seem slow to divide and eliminate repeatedly, but in practice it's incredibly fast. The key insight is that you're discarding huge amounts of data with each comparison, unlike linear search which only eliminates one item at a time.

Implementation considerations

When implementing binary search, you need to:

Ensure your data is sorted before searching (or maintain it in sorted order)
Handle the case where the item isn't found (when LowestPointer equals or exceeds HighestPointer)
Use integer division when calculating the midpoint to avoid fractional indices
Be careful with index boundaries to avoid errors

Binary search is the algorithm of choice whenever you have sorted data and need to perform frequent searches, making it invaluable for database systems, search features in applications, and many other computing scenarios.

Binary tree search

What is a binary tree search?

A binary tree search is a specialised searching technique designed for data stored in binary tree structures. Binary trees are particularly useful in programs where data is dynamic - meaning information is constantly being added to and removed from the dataset.

Unlike searching through a simple list or array, a binary tree search involves traversing through a tree structure, making decisions at each node about whether to move left or right based on the value you're searching for.

How binary tree search works

When searching a binary tree, you start at the root node (the top of the tree) and work your way down by following branches. At each node, you compare the value you're looking for with the value stored at that node:

If the search value is less than the current node's value, move to the left child node
If the search value is greater than the current node's value, move to the right child node
If the search value equals the current node's value, you've found it!
If you reach a node with no children and haven't found the value, it doesn't exist in the tree

This process continues until you either find the value or reach a dead end (a node with no children in the direction you need to go).

Pseudocode representation

Here's the basic algorithm in pseudocode:

CurrentNode ← RootNode

Repeat
    If Current_Node > Find_Me then
        Move left to child node
    Else
        Move right to child node
    End If
    
Until CurrentNode equals FindMe Or CurrentNode has no children

The variable FindMe contains the value being searched for, and CurrentNode tracks your current position as you traverse the tree.

Relationship to tree traversal

infoNote

Connection to Tree Traversal

Binary tree search is very similar to in-order tree traversal. Both techniques involve systematically visiting nodes in a tree structure by making left/right decisions based on values. The key difference is that tree traversal visits every node, whereas tree search stops as soon as it finds the target value or determines it doesn't exist.

If you're familiar with tree traversal algorithms, you already understand the fundamental concept of navigating a binary tree - searching just adds the additional logic of comparing values and deciding which path to follow.

Implementation example

Here's how binary tree search might be implemented in Visual Basic:

Private Sub txtSearchFor_KeyDown(ByVal sender As Object, ByVal e As System.Windows.Forms.KeyEventArgs) Handles txtSearchFor.KeyDown
    
    ' check for enter key press
    If e.KeyCode = Keys.Enter Then
        
        Dim ThisNode As Integer = 1
        txtOutput.Text = "Root - " & NodeValue(1) & vbCrLf
        
        Do Until NodeValue(ThisNode) = txtSearchFor.Text Or ThisNode = 0
            
            ' move to node at left pointer
            If txtSearchFor.Text < NodeValue(ThisNode) Then
                ThisNode = PointerLeft(ThisNode)
                txtOutput.Text = txtOutput.Text & "L - " & ThisNode & " - " & NodeValue(ThisNode) & vbCrLf
            End If
            
            ' move to node at right pointer
            If txtSearchFor.Text > NodeValue(ThisNode) Then
                ThisNode = PointerRight(ThisNode)
                txtOutput.Text = txtOutput.Text & "R - " & ThisNode & " - " & NodeValue(ThisNode) & vbCrLf
            End If
            
        Loop
        
        If NodeValue(ThisNode) = txtSearchFor.Text Then
            txtOutput.Text = txtOutput.Text & "FOUND"
        Else
            txtOutput.Text = txtOutput.Text & "NOT FOUND"
        End If
        
    End If
    
End Sub

This code demonstrates the key features of binary tree search:

Starting at the root node
Using pointer functions (PointerLeft and PointerRight) to navigate the tree
Comparing values to decide which direction to move
Continuing until the value is found or there are no more nodes to check
Tracking the path taken through the tree (shown in the output)

When to use binary tree search

Binary tree search is most appropriate when:

Your data is stored in a binary tree structure
Data is frequently being added or removed (dynamic datasets)
You need to maintain sorted order while allowing insertions and deletions
You want search efficiency similar to binary search but need more flexibility

Binary trees combine the efficiency of binary search with the flexibility to easily add and remove data, making them excellent for applications like databases, file systems, and many other dynamic data scenarios.

Comparing search algorithms

Efficiency comparison

Different search algorithms have different time complexities, which affects how their performance scales with dataset size:

Linear search: O(n) complexity - time increases proportionally with dataset size. Simple but slow for large datasets.
Binary search: O(log n) complexity - extremely efficient for large sorted datasets. Requires data to be in order.
Binary tree search: O(log n) average case complexity - efficient and flexible, but performance can degrade if the tree becomes unbalanced.

Choosing the right algorithm

The selection of an appropriate search method depends primarily on two factors:

How much data is being searched: Larger datasets benefit more from efficient algorithms like binary search
How the data is organised: Sorted data allows binary search; dynamic data suits binary tree structures; unsorted or rarely-searched data might use linear search

infoNote

Small vs Large Datasets

For small datasets (perhaps a few dozen items), the difference in performance is negligible, and linear search's simplicity might be preferable. However, as datasets grow into thousands or millions of items, choosing an efficient algorithm becomes crucial.

Practical considerations

When deciding which search algorithm to implement, also consider:

Implementation complexity: Linear search is trivially simple; binary search requires sorted data; binary tree search needs a tree structure
Data maintenance: Binary search needs data kept in order; binary trees need balancing for optimal performance
Search frequency: If you search rarely, a simple approach might suffice; frequent searches justify more complex efficient methods
Data volatility: Frequently changing data might benefit from tree structures over maintaining sorted arrays

Remember!

bookmarkSummary

Key Points to Remember:

Linear search checks each item sequentially from start to finish - simple but slow, works on any data organisation
Binary search repeatedly halves the search space - very efficient (maximum 20 comparisons for 1 million items) but requires sorted data
Binary tree search navigates through a tree structure by comparing values and moving left or right - efficient and flexible for dynamic data
The choice of algorithm depends on dataset size, data organisation, and how frequently data changes
Efficiency matters more as datasets grow larger - the difference between checking 1 million items versus 20 items is enormous!

Search Algorithms – Binary, Binary Tree, and Linear Search (AQA A-Level Computer Science): Revision Notes