General Normal Distributions Revision Notes for HSC SSCE Mathematics Advanced

General Normal Distributions

Introduction to general normal distributions

When we examine real-world data, we often find that it follows a bell-shaped curve pattern. For instance, when tossing 20 coins repeatedly and counting the number of heads, the resulting distribution looks very much like a normal curve. However, this curve differs from the standard normal distribution in two important ways: the mean is not zero, and the standard deviation is not equal to 1.

Note

The distribution shown above is centred at $x = 10$ (not 0) and has a spread that's wider than the standard normal distribution. This tells us we need a way to work with normal distributions that have any mean and any standard deviation, not just the standard normal with $\mu = 0$ and $\sigma = 1$ .

A general normal distribution is a bell-shaped probability distribution that can be centred anywhere on the number line and can have any degree of spread. We create these distributions by transforming the standard normal distribution through stretching and shifting operations.

Transforming the standard normal distribution

To create a general normal distribution with mean $\mu$ and standard deviation $\sigma$ , we apply a sequence of transformations to the standard normal distribution. Think of this as reshaping and repositioning the standard bell curve to match our data.

Stretching to accommodate the standard deviation

The first transformation adjusts the spread of the distribution to match the desired standard deviation $\sigma$ .

Step 1: Horizontal stretching

We begin by stretching the standard normal curve horizontally by a factor of $\sigma$ . This is achieved mathematically by replacing $x$ with $\frac{x}{\sigma}$ in the standard normal formula.

The standard normal PDF is:

$y = \phi(x) = \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}}$

After horizontal stretching, this becomes:

$y = \phi\left(\frac{x}{\sigma}\right) = \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2\sigma^2}}$

Note

This stretching has an important consequence: the inflection points (the points where the curve changes from concave down to concave up) move from $x = \pm 1$ to $x = \pm \sigma$ .

Step 2: Vertical adjustment

However, horizontal stretching increases the area under the curve by a factor of $\sigma$ . Since we need the total area to remain 1 (the total probability must equal 1), we must compress the curve vertically by multiplying by $\frac{1}{\sigma}$ :

$y = \frac{1}{\sigma}\phi\left(\frac{x}{\sigma}\right) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{x^2}{2\sigma^2}}$

Now we have a proper probability density function with area 1, spread controlled by $\sigma$ , but still centred at 0.

Shifting to accommodate the mean

The final transformation moves the entire curve horizontally to centre it at the desired mean $\mu$ .

Horizontal translation

We shift the curve $\mu$ units to the right by replacing $x$ with $(x - \mu)$ :

$y = \frac{1}{\sigma}\phi\left(\frac{x - \mu}{\sigma}\right) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$

This translation doesn't change the area under the curve or the positions of the inflection points relative to the centre. The inflection points are now located at $x = \mu - \sigma$ and $x = \mu + \sigma$ , exactly one standard deviation on either side of the mean.

The general normal distribution formula

Complete formula

Let $f(x)$ be the probability density function for a general normal distribution with mean $\mu$ and standard deviation $\sigma$ . The transformation process gives us:

$f(x) = \frac{1}{\sigma}\phi\left(\frac{x - \mu}{\sigma}\right)$

Alternatively, writing out the exponential form in full:

$f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$

Key properties

For a general normal distribution with mean $\mu$ and standard deviation $\sigma$ :

The curve is bell-shaped and symmetric about the vertical line $x = \mu$
The total area under the curve equals 1 (it's a valid probability density function)
The mean, median, and mode all coincide at $x = \mu$
The points of inflection occur at $x = \mu - \sigma$ and $x = \mu + \sigma$ (one standard deviation from the mean)
The curve approaches but never touches the horizontal axis as $x \to \pm\infty$

Connection to the standard normal

Note

The general normal distribution is simply a transformed version of the standard normal. This relationship is captured in the formula:

$f(x) = \frac{1}{\sigma}\phi\left(\frac{x - \mu}{\sigma}\right)$

This means we can use everything we know about the standard normal distribution to work with any normal distribution, as long as we make appropriate conversions using $\mu$ and $\sigma$ .

Comparing with real data

Let's return to our coin-tossing example. When we toss 20 fair coins repeatedly and count the heads, the mean number of heads is 10 and the standard deviation is $\sqrt{5} \approx 2.236$ . We can overlay a normal distribution with $\mu = 10$ and $\sigma = \sqrt{5}$ on our experimental data:

Note

The fit is remarkably good, though not perfect. This demonstrates how the normal distribution can approximate complex real-world probability distributions. As we increase the number of coin tosses, the fit becomes even better. This historical example was actually one of the first uses of the normal distribution to approximate another probability distribution.

Working with z-scores

What is a z-score?

When working with general normal distributions, we need a way to standardise values so we can use standard normal distribution tables and results. This is where z-scores come in.

Important

The z-score of a value $x$ tells us how many standard deviations that value lies above or below the mean. If the z-score is positive, $x$ is above the mean; if negative, $x$ is below the mean.

Conversion formulas

There are two essential formulas for converting between raw scores ( $x$ values) and z-scores:

From raw score to z-score:

$z = \frac{x - \mu}{\sigma}$

From z-score to raw score:

$x = \mu + \sigma z$

For sample data (rather than population data), we use the sample mean $\bar{x}$ and sample standard deviation $s$ instead:

$z = \frac{x - \bar{x}}{s} \quad \text{and} \quad x = \bar{x} + sz$

Example conversion table

Here's how z-scores correspond to raw scores for a distribution with mean $\mu = 10$ and standard deviation $\sigma = 2$ :

$x$	4	5	6	7	8	9	10	11	12	13	14	15	16
$z$	-3	-2.5	-2	-1.5	-1	-0.5	0	0.5	1	1.5	2	2.5	3

Notice how the mean ( $x = 10$ ) corresponds to $z = 0$ , and values that are 1, 2, or 3 standard deviations away from the mean have z-scores of $\pm 1$ , $\pm 2$ , or $\pm 3$ respectively.

Worked examples with z-scores

Example

Worked Example: Converting between scores and z-scores

A dataset has mean $\bar{x} = 12$ and standard deviation $s = 3.60$ .

(a) What scores would be 1, 2, and 3 standard deviations from the mean?

(b) How many standard deviations from the mean are scores of 24, 11, and 7.7?

Solution:

(a) We need to find the raw scores that correspond to $z = \pm 1$ , $z = \pm 2$ , and $z = \pm 3$ .

Using the formula $x = \bar{x} + sz$ :

For $z = 1$ : $x = 12 + (3.60)(1) = 15.60$
For $z = -1$ : $x = 12 + (3.60)(-1) = 8.40$
For $z = 2$ : $x = 12 + (3.60)(2) = 19.20$
For $z = -2$ : $x = 12 + (3.60)(-2) = 4.80$
For $z = 3$ : $x = 12 + (3.60)(3) = 22.80$
For $z = -3$ : $x = 12 + (3.60)(-3) = 1.20$

(b) We need to calculate z-scores using the formula $z = \frac{x - \bar{x}}{s}$ :

For $x = 24$ :

$z = \frac{24 - 12}{3.60} = \frac{12}{3.60} = 3.33$

This score is 3.33 standard deviations above the mean.

For $x = 11$ :

$z = \frac{11 - 12}{3.60} = \frac{-1}{3.60} = -0.28$

This score is 0.28 standard deviations below the mean.

For $x = 7.7$ :

$z = \frac{7.7 - 12}{3.60} = \frac{-4.3}{3.60} = -1.19$

This score is 1.19 standard deviations below the mean.

Using z-scores for probability calculations

Example

Worked Example: Z-scores and probability

A normally distributed random variable $X$ has mean 100 and standard deviation 20.

(a) Write down the conversion formulas.

(b) Find: (i) $P(X \leq 110)$ (ii) $P(X \geq 90)$

(c) Find the value of $a$ such that $P(X \leq a) = 0.98$ .

Solution:

(a) The conversion formulas are:

$z = \frac{x - 100}{20} \quad \text{and} \quad x = 100 + 20z$

(b)(i) For $x = 110$ :

$z = \frac{110 - 100}{20} = 0.5$

Therefore, $P(X \leq 110) = P(Z \leq 0.5) = 0.69$ (from standard normal table)

(b)(ii) For $x = 90$ :

$z = \frac{90 - 100}{20} = -0.5$

Therefore, $P(X \geq 90) = P(Z \geq -0.5) = P(Z \leq 0.5) = 0.69$ (using symmetry)

(c) From the standard normal table, we need to find $z$ such that $\Phi(z) = 0.98$ .

Looking at the table: $\Phi(2.0) = 0.9772$ and $\Phi(2.1) = 0.9821$

By interpolation: $z \approx 2.06$

Converting back to $x$ : $a = 100 + 20(2.06) = 141$

The empirical rule for general normal distributions

The 68-95-99.7 rule

One of the most useful properties of the normal distribution is the empirical rule, also called the 68-95-99.7 rule. When a random variable $X$ follows a normal distribution with mean $\mu$ and standard deviation $\sigma$ :

Approximately 68% of values lie within one standard deviation of the mean: $\mu - \sigma \leq x \leq \mu + \sigma$
Approximately 95% of values lie within two standard deviations of the mean: $\mu - 2\sigma \leq x \leq \mu + 2\sigma$
Approximately 99.7% of values lie within three standard deviations of the mean: $\mu - 3\sigma \leq x \leq \mu + 3\sigma$

Note

This rule is incredibly useful for quick mental calculations and for understanding what values are typical or unusual in a dataset.

Quartiles for normal distributions

The quartiles divide the distribution into quarters. For any normal distribution:

First quartile (Q₁): The value below which 25% of the data falls

$Q_1 = \mu - 0.67\sigma$

Second quartile (Q₂): The median, which equals the mean for normal distributions

$Q_2 = \mu$

Third quartile (Q₃): The value below which 75% of the data falls

$Q_3 = \mu + 0.67\sigma$

Interquartile range (IQR): The range of the middle 50% of the data

$\text{IQR} = Q_3 - Q_1 = 1.35\sigma$

The IQR criterion for outliers

The IQR criterion is a common method for identifying outliers in a dataset. A value is considered an outlier if it lies outside the range:

$Q_1 - 1.5 \times \text{IQR} \text{ to } Q_3 + 1.5 \times \text{IQR}$

For a normal distribution, this translates to:

$\mu - 2.70\sigma \leq x \leq \mu + 2.70\sigma$

Important

Any value outside this interval is flagged as a potential outlier. For a normal distribution, approximately 0.7% of values (about 7 in 1000) will be classified as outliers using this criterion.

Applying the empirical rule in practice

Example

Worked Example: Applying the empirical rule

A dataset with 1000 scores is known to be sampled from a normally distributed variable $X$ with mean $\mu = -32.6$ and standard deviation $\sigma = 5.7$ .

(a) Describe what the empirical rule predicts about the data.

(b) Predict roughly how many scores will: (i) lie in $[-30, \infty)$ , (ii) lie in $(-\infty, -40]$ , (iii) lie in $[-40, -30]$ .

(c) Using the IQR criterion, roughly how many outliers would you expect?

Solution:

(a) Applying the empirical rule:

Within 1 SD: $[-32.6 - 5.7, -32.6 + 5.7] = [-38.3, -26.9]$

About $0.68 \times 1000 = 680$ scores will fall in this range.

Within 2 SDs: $[-32.6 - 2(5.7), -32.6 + 2(5.7)] = [-44.0, -21.2]$

About $0.95 \times 1000 = 950$ scores will fall in this range.

Within 3 SDs: $[-32.6 - 3(5.7), -32.6 + 3(5.7)] = [-49.7, -15.5]$

About $0.997 \times 1000 = 997$ scores will fall in this range.

(b)(i) For $x = -30$ :

$z = \frac{-30 - (-32.6)}{5.7} = \frac{2.6}{5.7} = 0.456$

$P(X \geq -30) = P(Z \geq 0.456) = 0.324$

Predicted number of scores: $0.324 \times 1000 = 324$ scores

(b)(ii) For $x = -40$ :

$z = \frac{-40 - (-32.6)}{5.7} = \frac{-7.4}{5.7} = -1.298$

$P(X \leq -40) = P(Z \leq -1.298) = P(Z \geq 1.298) = 1 - 0.903 = 0.097$

Predicted number of scores: $0.097 \times 1000 = 97$ scores

(b)(iii) For the interval $[-40, -30]$ :

$P(-40 \leq X \leq -30) = 1 - P(X \leq -40) - P(X \geq -30)$

$= 1 - 0.097 - 0.324 = 0.579$

Predicted number of scores: $0.579 \times 1000 = 579$ scores

(c) The IQR criterion identifies values outside $\mu - 2.70\sigma$ to $\mu + 2.70\sigma$ as outliers.

For any normal distribution, approximately 7 in 1000 scores (0.7%) are classified as outliers by this criterion.

Therefore, we expect about 7 outliers in this dataset.

Key takeaways

Summary

Key Points to Remember:

Any normal distribution can be created by stretching and shifting the standard normal distribution.
The general normal PDF is: $f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ or equivalently $f(x) = \frac{1}{\sigma}\phi\left(\frac{x - \mu}{\sigma}\right)$
Z-scores allow us to standardise any normal distribution: $z = \frac{x - \mu}{\sigma}$ and $x = \mu + \sigma z$
The empirical rule (68-95-99.7) tells us that approximately 68%, 95%, and 99.7% of values lie within 1, 2, and 3 standard deviations of the mean respectively.
For a normal distribution: $Q_1 = \mu - 0.67\sigma$ , $Q_3 = \mu + 0.67\sigma$ , and $\text{IQR} = 1.35\sigma$
The IQR criterion identifies outliers as values outside $\mu - 2.70\sigma$ to $\mu + 2.70\sigma$

General Normal Distributions (HSC SSCE Mathematics Advanced): Revision Notes