Writing a Program From Scratch#

Learning Outcomes#

  • How to create a programme from scratch

  • How to create the programme outline

  • How to fill in the outline

  • How to debug and check the code

Prerequisites#

Writing a program to calculate the mean and standard deviation of a set of numbers#

Establishing the basics#

Now we can pull all we have learnt and write our own program.

We will create a program to calculate the mean and standard deviation of a set of numbers. We will therefore need the following equations:

For the mean:

\[<x> = \frac{1}{N}(x_1 + x_2 + x_3 + ...+x_n)= \frac{1}{N}\sum(x_i) \]

So the mean is the sum of N numbers from \(x_1\) to \(x_n\)

The Population has variance:

\[ v(x,x)=<(x-\bar{x})^2> = \frac{1}{N}\sum((x_i -\bar{x})^2) \]

The variance is the sum of the squared differences of N numbers from \(x_1\) to \(x_n\)

For the standard deviation:

\[\sigma(\bar{x}) = \sqrt{v(x,x)}\]

The standard deviation is the square root of the variance.

We will take these equations and turn them into a working programme

First steps#

The first thing we need are some numbers we can work with. We will use 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 as our numbers. We have choosen these as they give us a simple test case.

We can calculate the sum, the mean, variance and standard deviation of this list by hand easily. Which will allow us to test the code as we go along.

Sum = 55,

Mean = 5.5,

Variance = 8.25,

Standard deviation = 2.87

Outline 1

Our next step will be to create an initial outline of the programme and how it will work. This is our chance to make sure we have clear in our head how the programme and its algorithm will work

First we build up the outline using comments


# Start and import any libraries

# Set up key variables needed later

# Input data to work with

# Do calculations on data

# Output the results

This is a very sparse barebones outline. But the key points are covered


Outline 2

We will now fill in our outline with some more detail. We will not add any code yet, we first work out what is going to happen.

The main part of the programme is going to be under the Do calculation comment. So we will next look at that.

Lets expanded it, we know the aim of this programme is to calculate the mean, variance and standard deviation of a population

# Do calculations on data
## Calculate Mean

## Calculate Variance

## Calculate std

Next lets think of how we will calculate the mean.

\[<x> = \frac{1}{N}(x_1 + x_2 + x_3 + ...+x_n)= \frac{1}{N}\sum(x_i) \]

So we need to sum our numbers and then divide them by how many they are. Let us make that clear in our outline

# Do calculations on data
## Calculate Mean
    ## Sum up all the numbers - get sum
    ##divide the sum by the number of values - get mean

## Calculate Variance

## Calculate std

Next how do we calculate the variance?

\[ v(x,x)=<(x-\bar{x})^2> = \frac{1}{N}\sum((x_i -\bar{x})^2) \]

So, for each number we take its difference from the mean, square that difference and then sum all the difference. Finally we divide this summed squared difference by the number of values. We will add that to our outline

# Do calculations on data
## Calculate Mean
    ## Sum up all the numbers - get sum
    ##divide the sum by the number of values - get mean

## Calculate Variance
    ## For each number
        ## take difference from the mean - get diff
        ## square that diff - get square diff
        ## add to total square diff - get total diff
    ## divide the total diff by the number of values - get variance
    
## Calculate std

Finally, for the calculations how do we get the standard deviation?

\[\sigma(\bar{x}) = \sqrt{v(x,x)}\]

So we just need to take the square root of the variance. Lets add this to the outline

# Do calculations on data
## Calculate Mean
    ## Sum up all the numbers - get sum
    ##divide the sum by the number of values - get mean

## Calculate Variance
    ## For each number
        ## take difference from the mean - get diff
        ## square that diff - get square diff
        ## add to total square diff - get total diff
    ## divide the total diff by the number of values - get variance
    
## Calculate std
    ## Take square root of variance - get std

We can now go back and fill in the rest of our outline.

For the imports, we now know we need to import the math library as we need the square root function

# Start and import any libraries
## Import math library for square root

For the variables we don’t know what we’ll need just yet

For inputting data we know we will need a list of numbers from somewhere

# Input data to work with
## Get list of numbers

For the output of results, we need to put the values of the mean, variance and standard deviation somewhere

# Output the results
## Print out mean, variance and standard deviation

Putting that all together gives us a final outline. Note that we have done no coding yet, we have just laid out a plan to help us code


# Start and import any libraries
## Import math library for square root

# Set up key variables needed later

# Input data to work with
## Get list of numbers

# Do calculations on data
## Calculate Mean
    ## Sum up all the numbers - get sum
    ##divide the sum by the number of values - get mean

## Calculate Variance
    ## For each number
        ## take difference from the mean - get diff
        ## square that diff - get square diff
        ## add to total square diff - get total diff
    ## divide the total diff by the number of values - get variance
    
## Calculate std
    ## Take square root of variance - get std


# Output the results
## Print out mean, variance and standard deviation

Coding 1: Inputs We will now start out coding.

We need to import the square root function

from math import sqrt

So we add that into our outline

We need to input our data. We shall create a list. In the real world these numbers could be loaded from a data file or from the command line. But it is always quicker and easier when setting up a programme and debugging to hard code a simple example. Later when we are sure the code works we can extend it.

print("Load Numbers")
nums = [1,2,3,4,5,6,7,8,9,10]
print("Loaded")

We add this into our outline under the Get list of numbers comment. Note we have added print statements so that when we run this programme we know what is happening.

Once we have finished writing the code many of the statments can be commented out or removed.

We will also add the following:

print(nums)

so we can check the numbers have loaded correctly.

Run the code so far in the box below. Be sure it is working correctly.

The output should be:

Load Numbers

Loaded

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] Add it to the outline

# Start and import any libraries
## Import math library for square root


# Set up key variables needed later

# Input data to work with
## Get list of numbers

# Do calculations on data
## Calculate Mean
    ## Sum up all the numbers - get sum
    ##divide the sum by the number of values - get mean

## Calculate Variance
    ## For each number
        ## take difference from the mean - get diff
        ## square that diff - get square diff
        ## add to total square diff - get total diff
    ## divide the total diff by the number of values - get variance
    
## Calculate std
    ## Take square root of variance - get std


# Output the results
## Print out mean, variance and standard deviation


Coding 2: The mean Now we calculate the mean. This means first calculating the sum of the numbers we have loaded

The fact that we have to sum tells us that we need a loop

So we will create a loop in which to calculate the mean. But first we will add the following code below the sum up all numbers comment (watch out for your indent). Then run and test

for x in nums:
    print(x)

When we run this we should get all the numbers printed out one by one. Doing simple tests like this as we construct our code means we can trust our results and we should spot errors sooner

We will now add the following after the print(x) command, this will add the value of x into the variable mean remembering to indent so it is included in the loop

mean = mean + x
print(mean)

Run this

You should get an error that mean is not definied. So we need to remember to define mean before we use it. Currently we are asking the program to add a number to itself before it has a value, this is clearly wrong.

We need to define mean before using it. More subtly we must set it to 0 when we start, this is not a syntax issue, we could start it with any value we like. Think about why it must start at 0 for a sum?

We can place the line mean = 0 in two different places in our code. It can either go inside the for loop before the mean = mean + x line or it can go above the for loop. Try both positions and then consider which is the correct place.

Run and test it. You should get the final mean value to be 55. Currently this is the sum, we know from earlier that it should be 55. So far our code is correct

The final step in the calculation of the mean is to divide it by the number of elements. So now after the loop (so we reduce the indent by one) we need the following lines

n_points = len(nums)
mean = mean / n_points
print("The mean is ", mean)

The first line gets the length of the list using a function built into Python called len(). We could also calculate this by counting how many times we go around the loop We do this division outside the loop - why? What would we calculate if we did it inside the loop?

# Start and import any libraries
## Import math library for square root
from math import sqrt

# Set up key variables needed later

# Input data to work with
## Get list of numbers
print("Load Numbers")
nums = [1,2,3,4,5,6,7,8,9,10]
print("Loaded")
print(nums)
# Do calculations on data
## Calculate Mean
    ## Sum up all the numbers - get sum
    ##divide the sum by the number of values - get mean

## Calculate Variance
    ## For each number
        ## take difference from the mean - get diff
        ## square that diff - get square diff
        ## add to total square diff - get total diff
    ## divide the total diff by the number of values - get variance
    
## Calculate std
    ## Take square root of variance - get std


# Output the results
## Print out mean, variance and standard deviation

We now have a program that will calculate the mean for a given list of numbers.

# Start and import any libraries
## Import math library for square root
from math import sqrt

# Set up key variables needed later
mean = 0

# Input data to work with
## Get list of numbers
print("Load Numbers")
nums = [1,2,3,4,5,6,7,8,9,10]
print("Loaded")
print(nums)
# Do calculations on data
## Calculate Mean
    ## Sum up all the numbers - get sum
for x in nums:
    print(x)
    mean = mean + x
    print(mean)
    ##divide the sum by the number of values - get mean

n_points = len(nums)
mean = mean / n_points
print("The mean is ", mean)

## Calculate Variance
    ## For each number
        ## take difference from the mean - get diff
        ## square that diff - get square diff
        ## add to total square diff - get total diff
    ## divide the total diff by the number of values - get variance
    
## Calculate std
    ## Take square root of variance - get std


# Output the results
## Print out mean, variance and standard deviation


Coding 2: The standard deviation

Next we need to calculate the standard deviation. This means another for loop to calculate the squared difference from the mean. Again any time we know we need to sum a list of numbers there must be a loop

var = 0
for x in nums:
    var = var + ((x-mean)*(x-mean))
    print(var)
    
var = var /n_points
print("The variance is ",var)

In this case we take the square by multiplying two values together, we could also use the power operator **. For simple powers taking the square this way can be more efficient. Note that we have defined the var varible and set to zero We have also removed some of the earlier print statements we used for checking our code.

Run and test you should get a value of 8.25. Again this agrees with the value we expect that we calculated earlier. Our code is doing what we want

Our last step in the calculation is to take the square root of the variance to get the standard deviation

std = sqrt(var)
print("Standard Deviation ",std)

Run and test the value of the standard deviation should be 2.87 exactly as we expect.

We have outline and constructed a programme that calculates the sum, the mean, the variance and the standard deviation of a dataset.

We have tested it part as we went so we are happy everything is working correctly.

So that is our first program

The final step is just to move the print statements into the output section of code (if we want)


Final Programme
# Start and import any libraries
## Import math library for square root
from math import sqrt

# Set up key variables needed later
mean = 0
var = 0

# Input data to work with
## Get list of numbers
print("Load Numbers")
nums = [1,2,3,4,5,6,7,8,9,10]
print("Loaded")

# Do calculations on data
## Calculate Mean
    ## Sum up all the numbers - get sum
for x in nums:
    mean = mean + x
    ##divide the sum by the number of values - get mean

n_points = len(nums)
mean = mean / n_points


## Calculate Variance
    ## For each number
        ## take difference from the mean - get diff
        ## square that diff - get square diff
        ## add to total square diff - get total diff

for x in nums:
    var = var + ((x-mean)*(x-mean))

    ## divide the total diff by the number of values - get variance
var = var /n_points

    
    
## Calculate std
    ## Take square root of variance - get std
std = sqrt(var)


# Output the results
## Print out mean, variance and standard deviation
print("The mean is ", mean)
print("The variance is ",var)
print("Standard Deviation ",std)


Though our program works it could be improved.

Try to alter the code in the following ways

  • Improve the commenting of the code

  • Improve the formatting of the outputs print statements

  • Add functions

    • For loading data

    • For Mean

    • For variance

    • For standard deviation

    • For outputting the results

  • Alter the programme so it calculates the sample variance and standard deviation rather than population (N-1)

  • Advanced - Improve the efficiency by calculating the variance and the mean in a single loop, the equation to do that is below

    \[v(x,x) = \sum{(x_i - \bar{x})^2} = \frac{\sum{x_i^2}-\frac{1}{N}(\sum{x_i})^2}{N} \]