Writing files#

Learning Outcomes#

  • Saving data into a newly created file

  • Add additional data into an existing file.

Prerequisites#

Writing files#

We have seen how Python can process chemical data and perform calculations. Once you have run your program, it is likely that you want to save your data to a file to use for later. This is called ‘writing’. To write to a file, you change the mode from "r" to one of the following:

  • "w" is write mode. If the file doesn’t already exist, it is created. If the file already exists, any data already in it is removed, and data is added as if it was an empty file.

  • "a" is append mode open for writing. If the file doesn’t already exist, it is created. Any data written to the file appears at the bottom of any data already in the file.

  • Adding + to the end of the letter means the file is also opened for reading.

The function .write() allows any content within the brackets to be written to the file.

For example:

file = open("new_file.txt", "w")
file.write("content" + str(variable_name) + "content")
file.write(f"content {variable_name} \n content")
file.close()

Things to note:

  • Since the file “new_file.txt” does not already exist, the first line creates a file with this name.

    • If the file already existed, this first line would wipe the data of that file ready to write over with the next lines. You need to be careful that you don’t do this to files you want to keep!

  • The function .write() allows content within the brackets to be written to the file.

    • You can write using concatenation (as in line 2). Here, strings are combined into one long string using the ‘+’ symbol.

    • You can write using f-strings, as we have seen in the past.

    • Putting two file.write() statements does not place your items on a new line. You need to add the special character "\n" to write a new line. Other special characters you might use are "\n", the carriage return (moves the cursor to the beginning of the current line), and "\t", which inserts a tab.

  • You can iterate using ‘for’ to print a phrase over and over again.

  • Once you are done, you close the file. Just like with reading, this is good practice.

Writing data to a new file

If the file you want to write to doesn’t exist, Python will create it.

In the below example, we want to write the gas constant to a file called “gas_const.txt”.

file = open('gas_const.txt', 'w')

R = 8.314
file.write("The gas constant is: " + str(R) + " J/K.mol")

file.close()

The text inside the file now reads:

The gas constant is: 8.314 J/K.mol

Try it for yourself.


Writing data to an existing file

If the file you want to write to already exists, Python will erase the contents of the file (this is called truncation) and write the new contents in that space.

Run the above code to create the file “gas_const.txt”, then run the code below.

file = open('gas_const.txt', 'w')

R = 8.206
file.write("The gas constant is: " + str(R) + " m3.atm/K.mol")

file.close()

You will see that the previous value of R is no longer in the file. This is why we need to be careful with writing files, we don’t want to lose any important data!


Writing using 'with' statements

Just like when we are reading files, we can use a with statement to write to files.

with open("gas_const.txt", "w") as file:
    R = 1.987
    file.write("The gas constant is: " + str(R) + " cal/K.mol")

Just like with reading, the file will automatically close once the with statement is no longer respected.


Exercise: Writing files

Using a with statement and an f-string, create a code to write the following statement to a new text file using the given variables below:

Lanthanum, with the elemental symbol La, has 57 protons and a mass of 138.91 amu.

Lanthanum oxide is used in making some optical glasses.

Use these variables:

element = "Lanthanum"
symbol = "La"
number = 57
mass = 138.91

Click to reveal answer
element = "Lanthanum"
symbol = "La"
number = 57
mass = 138.91

with open("lanth_facts.txt", "w") as file:
    file.write(f"{element}, with the elemental symbol {symbol}, has {number} protons and a mass of {mass} amu. \nLanthanum oxide is used in making some optical glasses.")

Don’t forget the \n character to create a new line!


Iteration to write to a file#

Writing one statement at a time is a very slow to write a file. Instead, we could use for loops to write lots of lines quickly.

In the simplest case we can write out the same thing over and over. The code below prints the same phrase on 100 new lines (using the special character "\n" - try with and without it and see what happens).

with open("gas_const.txt", "w") as file:
    R = 8.314
    for i in range(100):
        file.write(f"The gas constant is: {R} J/K.mol \n")

But more crucially, we can iterate through a list and store the values. The code below will create a file “spectrum.txt” and write each absorbance value on a new line.

absorbance = [0.123, 0.132, 0.346, 0.563, 0.998, 0.377, 0.021]

with open("spectrum.txt", "w") as file:
    for value in absorbance:
        file.write(str(value) + "\n")

Note! When we read files, we read everything (including numbers) as a string. When we write files, we need to remember to convert them back to a string, or the concatenation will not work!

This is the simplest example of adding structure to a file, by writing each new value on a new line. But this isn’t particularly useful. If we could write multiple columns, or add headers, this would become much more flexible.

Exercise: Write data to a file

For the following list of random background noise, write each item to a text file on a new line. Include a header that describes what the data is.

background = [1.48, 1.23, 1.39, 1.46, 1.31, 1.55, 1.25, 1.36, 1.44, 1.29, 1.58, 1.34, 1.41, 1.53, 1.27, 1.38, 1.52, 1.22, 1.33, 1.49, 1.24, 1.56, 1.26, 1.45, 1.51, 1.28, 1.35, 1.57, 1.43, 1.21]


Click to view answer
background = [1.48, 1.23, 1.39, 1.46, 1.31, 1.55, 1.25, 1.36, 1.44, 1.29, 1.58, 1.34, 1.41, 1.53, 1.27, 1.38, 1.52, 1.22, 1.33, 1.49, 1.24, 1.56, 1.26, 1.45, 1.51, 1.28, 1.35, 1.57, 1.43, 1.21]

with open("data_record.txt", "w") as file:
    file.write("random background measurements\n")
    for i in background:
        file.write(str(i) + "\n")

Exercise: Debugging code

Below is a piece of code with numerous errors. Identify as many as possible. Try running the code and look at the error message to more quickly figure out what is wrong. When you are writing a code project, you will encounter lots of error messages, so it is good to get used to recognising them now.

data_points = 1.2, 5.6, 3.4, 0.5, 4.1, 3.2

file = open(data_record)

write("record of data points")
for point in data_points
   file.write(point + "\n")

Click to view answer
data_points = [1.2, 5.6, 3.4, 0.5, 4.1, 3.2]

file = open("data_record.txt", "w")

file.write("record of data points\n")
for point in data_points:
    file.write(str(point) + "\n")

Here were the errors:

  • data_points should is missing square brackets to make it a list.

  • When we open the file, the file name should be in speech marks "".

  • There should be a file extension on the file name, e.g. "data_record.txt". It is not so important when creating simple files, but you will find your files get confusing if you don’t start doing this, especially if you have two files with the same name but different extensions!

  • The function open() should take a second argument. As we want to write to this file, this will be "w".

  • When writing to the file, the correct syntax is file.write(), not just write(). We need to specify which file we are writing to. This becomes important if you have multiple files open.

  • There should be a "\n" at the end of the header line, otherwise the first data point will also appear on the header line.

  • There should always be a colon : at the end of the for line.

  • When writing an integer or float to a file using concatenation (the + sign), it must be converted to a string using str(). In this case, it should read file.write(str(point) + "\n").


Adding structure to a file#

The way in which data is written to a file depends on the type of data itself. We could format it with commas separating values (CSV), with tabs separating values, spaces, or anything else we want.

We can also add headers to describe the data. When we do this, we write the headers before the loops.

CSV Structure

In this example, we take two lines of data and write them to a CSV file. We use a for loop with the range() function, and then write to the file the value at the index i which we are iterating through using square brackets.

When you are doing this, remember \n to start a new line!

energies_1 = [1.01e-19, 1.43e-19, 1.85e-19, 2.28e-19, 2.7e-19, 3.13e-19, 3.55e-19, 3.98e-19, 4.4e-19, 4.83e-19, 5.25e-19, 5.68e-19]
energies_2 = [1.82e-19, 2.11e-19, 2.4e-19, 2.69e-19, 2.99e-19, 3.28e-19, 3.57e-19, 3.86e-19, 4.15e-19, 4.44e-19, 4.74e-19, 5.03e-19]

with open("data_record.csv", "w") as file:
    file.write(f"run1,run2\n")
    for i in range(len(energies_1)):
        file.write(f"{energies_1[i]},{energies_2[i]}\n")

Using zip()

We don#t have to use the range() function. In the below example, the two lists are being associated item-wise using the function zip().

energies_1 = [1.01e-19, 1.43e-19, 1.85e-19, 2.28e-19, 2.7e-19, 3.13e-19, 3.55e-19, 3.98e-19, 4.4e-19, 4.83e-19, 5.25e-19, 5.68e-19]
energies_2 = [1.82e-19, 2.11e-19, 2.4e-19, 2.69e-19, 2.99e-19, 3.28e-19, 3.57e-19, 3.86e-19, 4.15e-19, 4.44e-19, 4.74e-19, 5.03e-19]

with open("data_record.csv", "w") as file:
    file.write("run1,run2\n")
    for i, j in zip(energies_1,energies_2):
        file.write(f"{i},{j}\n")

Zip() and the unpacking operator *

If we had lots and lots of lists that we want to write to the document, both range() and zip() seem to require lots of repetitive naming of lists. However, there are ways around this.

When we first learned about zip(), we might have mentioned that if you use only 1 temporary variable for multiple lists (e.g. for i in zip(list1,list2,list3):), the items in the list are still associated with each other, but as a tuple, in this case with the variable name i. Tuples are very similar to lists, with the difference that they are immutable, their contents cannot be changed once it has been created. Once you have the tuple, you can iterate through it, writing each item in the tuple to your document.

energies_1 = [1.01e-19, 1.43e-19, 1.85e-19, 2.28e-19, 2.7e-19, 3.13e-19, 3.55e-19, 3.98e-19, 4.4e-19, 4.83e-19, 5.25e-19, 5.68e-19]
energies_2 = [1.82e-19, 2.11e-19, 2.4e-19, 2.69e-19, 2.99e-19, 3.28e-19, 3.57e-19, 3.86e-19, 4.15e-19, 4.44e-19, 4.74e-19, 5.03e-19]
energies_3 = [5.17e-19, 5.45e-19, 5.74e-19, 6.02e-19, 6.30e-19, 6.58e-19, 6.86e-19, 7.14e-19, 7.43e-19, 7.71e-19, 7.99e-19, 8.27e-19]

with open("data_record.csv", "w") as file:
    file.write("run1,run2\n")
    for tuple in zip(energies_1,energies_2,energies_3):
        print(tuple)
        for item in tuple:
            file.write(f"{item},")
        file.write("\n")

At the moment, this isn’t particularly quicker than range(), or the first examples using zip(). But what if you had 100 lists of data you want to write to a file? You don’t want to write zip(energies_1,energies_2,energies_3,...) all the way to 100! This is where a powerful tool called the unpacking operator * comes in handy.

The unpacking operator literally ‘unpacks’ objects from inside other objects. For example, a list list_0 = [1,2,3,4] which has been acted on by the unpacking operator *list_0, has its contents unpacked and turned into individual objects, in this case a group of integers 1,2,3,4. A nested list n_list = [[1,2],[5,6],[8,9]] when unpacked will just become three lists: [1,2],[5,6],[8,9].

In the context of tuples and zip(), this means we could have a list containing our lists of data, e.g. all_energies = [energies_1, energies_2, energies_3], and inside our zip() function unpack it into separate lists like so: for tuple in zip(*all_energies):

An example is below.

energies_1 = [1.01e-19, 1.43e-19, 1.85e-19, 2.28e-19, 2.7e-19, 3.13e-19, 3.55e-19, 3.98e-19, 4.4e-19, 4.83e-19, 5.25e-19, 5.68e-19]
energies_2 = [1.82e-19, 2.11e-19, 2.4e-19, 2.69e-19, 2.99e-19, 3.28e-19, 3.57e-19, 3.86e-19, 4.15e-19, 4.44e-19, 4.74e-19, 5.03e-19]
energies_3 = [5.17e-19, 5.45e-19, 5.74e-19, 6.02e-19, 6.30e-19, 6.58e-19, 6.86e-19, 7.14e-19, 7.43e-19, 7.71e-19, 7.99e-19, 8.27e-19]
main_list = [energies_1,energies_2,energies_3]

with open("data_record.csv", "w") as file:
    file.write("run1,run2\n")
    for energy_tuple in zip(*main_list):
        print(energy_tuple)
        for item in energy_tuple:
            file.write(f"{item},")
        file.write("\n")

In this case, we have still had to define the list manually, which is annoying. However, if you are reading data from a file, you can choose to store it in any way you want - you can store it straight into a nested list! Go back and look at the reading files lesson for examples where we pull data straight into a nested list!


Using tab

Just like with commas as shown above, you can separate items using tab. The special character for this is \t.

In this example, we opened and closed the file manually. We have also used range(). Run the code and have a look at the file spectrum.txt. Notice that on either side of the tab there is an additional space. This is because there is a space on either side of the special symbol \t. If you want a specific format, you need to watch out when writing to your file!

file = open('spectrum.txt', 'w')

wavelength = [240, 250, 260, 270, 280, 290]
absorbance = [0.123, 0.132, 0.346, 0.563, 0.998, 0.377, 0.021]
file.write(f"nm \t abs \n")
for i in range(len(wavelength)):
  file.write(f"{wavelength[i]} \t {absorbance[i]}\n")

file.close()

Using other delimiters

So far, we have used pretty standard delimiters that make sense to us intuitively, such as commas, spaces, tabs, or semicolons. But this does not have to be the case. You can use anything as a delimiter. You could use a certain word, a combination of letters and numbers, or other symbols. In the example below the string “&#&#” has been used as a delimiter. Although it looks strange, you could still read this file just as easily as using a tab, space, or comma, by writing .split("&#&#").

file = open('spectrum.txt', 'w')

wavelength = [240, 250, 260, 270, 280, 290]
absorbance = [0.123, 0.132, 0.346, 0.563, 0.998, 0.377, 0.021]
file.write(f"nm &#&# abs \n")
for i in range(len(wavelength)):
  file.write(f"{wavelength[i]} &#&# {absorbance[i]}\n")

file.close()

Exercise: Read, transform, write

You have 3 columns of random background data in the file “measurements_2.csv”. After calibrating your instrument, you have realised that every data point needs to be increased by 4.35. Write a piece of code that reads the file, adds 4.35 to every item, and writes the new data into a new CSV file. Use zip() and the unpacking operator.

Hint: When tackling an open-ended question like this, break the problem into chunks and do not get overwhelmed by the overall problem. Start with reading the file and storing the data, remembering that if we want to use the unpacking operator later on, we want to store the data in a nested list.

Once you have written your code, think about how you might want to adjust this code later to take many more (even hundreds or thousands) of columns of data.


Click to view answer

This is one way of solving this problem. If you have used another way and it works, that’s great!

# This will be a nested list of each column
# data = [[data from column 0], [data from column 1], [data from column 2]]
data = []
with open("measurements_2.csv") as file:
    file = file.read()
    contents = file.split()
    # We have 3 columns in the CSV, so we will extract each column using range(3)
    for i in range(3):
        # The temporary list stores a column, and gets reset to empty when moving to the next column
        temp_list = []
        for line in contents:
            line = line.split(",")
            temp_list.append(float(line[i]))
        # Once the entire column has been recorded in temp_list, the whole list is appended to the list data
        data.append(temp_list)

# Take each value and add 4
for i in range(len(data[1])):
    for column in data:
        column[i] = column[i] + 4.35

# Write the new values to a new CSV file
# Using zip() and the unpacking operator
with open("data_record.csv", "w") as file:
    file.write("column0,column1,column2\n")
    for temp_tuple in zip(*data):
        for item in temp_tuple:
            file.write(str(item) + ",")
        file.write("\n")

If I then wanted to do the same thing, but on a file that has 80 columns of data, all I need to change is the line for i in range(3): in the first section, and replace 3 with 80. Check it works for yourself by changing it to 2.


Appending to an existing file#

In all of the examples above. When we run the code over and over again the data that was there previously is truncated (cut off/deleted), and replaced with the current program.

Append mode, "a", is a bit different. It can still create a new file if it doesn’t already exist, however for existing files, it will simply append to the bottom of the file.

Run the code below a few times and see how data appears at the bottom of the file.

file = open('spectrum.txt', 'a')

wavelength = [300, 310]
absorbance = [0.007, 0.002]

for i in range(len(wavelength)):
  file.write(f"{wavelength[i]} \t {absorbance[i]}\n")

file.close()

Further Practice#

Question 1#

Three moles of an ideal gas are contained within a frictionless piston at 298.15 K. Use Python to calculate the volume of the gas at the following four different pressures:
1.00 kPa
10.00 kPa
50.00 kPa
100.00 kPa

Output the results in a file, formatted as two columns of numbers (to two d.p.), with the first column being pressure and the second being volume. Choose your own delimiter.

Ideal gas equation:

\(V = \frac{nRT}{p}\)

Click to view answer
pressure_kPa = [1.00, 10.00, 50.00, 100.00] # kPa

# Convert to SI units, Pa
pressure_Pa = [] # Pa
for i in pressure_kPa:
    press = i * 1000
    pressure_Pa.append(press)
print(pressure_Pa)

# Calculate the volumes
volume = []
for p in pressure_Pa:
    vol = (3*8.314*298.15)/p
    volume.append(vol)
print(volume)

# Write to a new file
with open("data_record.txt", "w") as file:
    file.write("Pressure / kPa \t Volume / m^3 \n")
    for p, V in zip(pressure_kPa, volume):
        file.write(f"{p} \t {V:.2f} \n")

Question 2#

Add an additional four values of pressure and the associated volume. Copy and adapt your above script to append the following values of pressure and associated volume to the end of your file.

200.00 kPa
500.00 kPa
1000.00 kPa
5000.00 kPa

Click to view answer

This code is very similar to the one above. The only changes are changing "w" to "a" when opening our file, and removing the line inserting a header.

pressure_kPa = [200.00, 500.00, 1000.00, 5000.00] # kPa

# Convert to SI units, Pa
pressure_Pa = [] # Pa
for i in pressure_kPa:
    press = i * 1000
    pressure_Pa.append(press)
print(pressure_Pa)

# Calculate the volumes
volume = []
for p in pressure_Pa:
    vol = (3*8.314*298.15)/p
    volume.append(vol)
print(volume)

# Write to a new file
with open("data_record.txt", "a") as file:
    for p, V in zip(pressure_kPa, volume):
        file.write(f"{p} \t {V:.2f} \n")

However, if we wanted to use both pieces of chunks of code in the same program and over and over again, there are a few changes we could make.

  • Watch out for repeated variable names. This answer and the answer from above share lots of variable names. If placed in the same program, this could get confusing quickly. Another way to combat this is using functions, remembering our discussion of local and global variables.

  • Use functions

    • To avoid writing the same code twice, we could write a function to calculate the volume from SI units. We could also write a function to convert from kPa to Pa, especially if we know that a lot of our data will be given in kPa.

    • We could write a function that writes a new file, so we can call it whenever we want a new file.

    • We could also write a function that just appends to the named file.

Question 3#

We have the coordinates of two hydrogen atoms in an XYZ file called “hydrogen_atoms.xy”. Translate these coordinates +10 Angstroms in the x direction, -5 Angstroms in the y direction, and +1 Angstrom in the z direction. Then, create a new XYZ file to store the translated coordinates. Make sure it is stored in an XYZ format!

Click here to view an answer
atoms = []
x_coords = []
y_coords = []
z_coords = []

# Open XYZ file and extract the atoms and x, y, and z coordinates
with open("hydrogen_atoms.xyz") as file:
    file = file.read()
    contents = file.split("\n")
    for line in contents:
        line = line.split()
        if len(line) < 4:
            continue
        else:
            atoms.append(line[0])
            x_coords.append(float(line[1]))
            y_coords.append(float(line[2]))
            z_coords.append(float(line[3]))

# Translate in space
for i in range(len(atoms)):
    x_coords[i] += 10
    y_coords[i] -= 5
    z_coords[i] += 1

# Write to an XYZ file
with open("data_record.xyz", "w") as file:
    file.write("2\n")
    file.write("two hydrogen atoms translated in space\n")
    for a, x, y, z in zip(atoms, x_coords, y_coords, z_coords):
        file.write(f"{a}\t{x:.3f}\t{y:.3f}\t{z:.3f}\n")

Extension Practice (classroom exercise? Mini-project? ‘Approaching problems’ section?)#

You have many columns of data from the lab in a CSV file, all in Joules. You want to be able to take this data, convert all values to eV, and write it all back into a CSV file. You want to write this code such that it works for an arbitrary number of columns in our initial CSV file, and can write an arbitrary number of columns (potentially of different lengths) to our CSV file.

Let’s break down the steps together:

  1. Extract the data from the file by reading.

    • What do we need to consider in this step?

    • What inputs will we take?

    • What will our output(s) be?

    • What problems might we face?

    • How have we done this in the past?

  2. Convert all the data from J to eV:

    • What format is our data in?

    • What format will we output it as?

    • What problems might we face?

    • How have we done this in the past?

  3. Write all our lists of data into a new CSV file:

    • What do we need to consider?

    • What inputs will we take?

    • What will our output be?

    • What problems might we face?

    • How have we done something similar in the past?

For step 1, we have written code in the past to allow us to extract columns of data. We can repurpose this for this code.

For step 2, we have written code in the past to convert items in a list from J to eV. We can repurpose this to work for many lists.

For step 3, we have written code to append lists to a CSV file before. How can we repurpose this to take lists of different lengths? How can we repurpose this to take an arbitrary number of lists?

# Take data, transform it, write to a CSV file

# Read a certain number of columns and return the data in a nested list
def read_file(data_file, columns):
    """
    Read a file and output any of the columns
    """
    column_index = []
    data = []
    
    for column in columns:
        column_index.append(column)

    with open(data_file) as file:
        file = file.read()
        contents = file.split()
        for i in column_index:
            temp_list = []
            for line in contents:
                line = line.split(",")
                if line[i] == '':
                    break
                else:
                    temp_list.append(float(line[i]))
            data.append(temp_list)
    return data

def J_to_eV(data):
    """ 
    Convert list values in J to eV.

    Parameters:
        *data : LIST
            An arbitrary number of lists
    Returns:
        final_list : LIST
            A nested list of all the input lists
            Now converted to eV
    """
    final_list = []
    for measurements in data:
        temp_list = []
        for value in measurements:
            eV_val = value / 1.602e-19
            temp_list.append(eV_val)
        final_list.append(temp_list)
    return final_list

def CSV_data_write(file_name, lists):
    """
    Create a CSV file of data with an arbitrary number of columns of arbitrary length.

    Parameters:
        file_name : STR
            The name of the file we want to create or append to.
        *lists : LIST
            An arbitrary number of lists of data or arbitrary length. 
            Each list of data will become a column in the CSV file
    """
    longest = []
    final_lists = []
    # Identifies the longest data list
    for i in lists:
        if len(i) > len(longest):
            longest = i
        else:
            continue
    # For all lists shorter than the longest, appends "" to make them all equal length
    # Nests all equal length lists in variable final_lists
    for j in lists:
        for k in range(len(longest)-len(j)):
            j.append("")
        final_lists.append(j)

    file = open(file_name, "a+")
    # Unpacks and iterates through each of the lists in final_lists
    # Storing each value item-wise in a tuple
    # For each tuple, appends data to file in CSV format
    for tuple in zip(*final_lists):
        for item in tuple:
            file.write(f"{item},")
        file.write("\n")
    file.close()
    return

# List of the 100 columns I want to extract from
columns = []
for i in range(100):
    columns.append(i)

# take the data from the CSV file and store as a list
J_data = read_file("./practice_files/energy_data_large.csv", columns)

# Convert to eV
eV_data = J_to_eV(J_data)

# Store back into a CSV file
CSV_data_write("data_record.csv", eV_data)

Learning Outcomes#

  • Write to a file

  • Add structure to a file

Summary#

  • Use “w” when you open a file to change to write mode, and “a” for append mode.

  • If the file does not exist, it will be created.

  • In “w” mode, if data already exists in a file that you open, it will be erased and any new data added on top.

  • In “a” mode, if data already exists in the file that you open, any new data will be added at the bottom.

  • You can write to files either using file = open("file.txt", "w"), or with open("file.txt", "w") as file:.

  • Use iteration to write many things to a file.

  • Add structure to a file using special characters like \t (tab) and \n (new line).