Files and File Types#

Prerequisites#

  • None

Learning Outcomes#

  • File Types

  • Organising your files

Organising files#

Files are organised using directories. These will often appear on your computer as folder symbols.

Directories branch downwards from the root directory, and can contain both files and subdirectories. For example, you may have a directory called ‘Lab Documents’, which might contain a subdirectory ‘Module A Practical B’, in which you hold all your results, risk assessments, and writeups for Practical B of Module A. The directory ‘Lab Documents’, is itself a subdirectory of something else, stemming back until you reach the root directory, often called C:\ on a Windows operating system, or /var/root/ on a Apple mac. This way, we can cluster together groups of related files to keep ourselves better organised.

When you are programming, it is important to know which directory you are in for a number of reasons.

  • To use libraries like NumPy and MatPlotLib, they need to be accessible to the directory you are working in. If you are using an environment like Anaconda, you only need to import the library once. Explanation on downloading libraries can be found in the importing modules lesson.

  • If you are running code from the command line (more information in another lesson), you need to be able to change directories forwards and backwards to find your code.

  • If you want to read or write to another file, that file must be in the same directory as your code file.

Visualising Directories with a tree diagram

One way of visualising directories is with a tree or branch diagram, like below.

Each black box is a directory, represented by folders on your computer. Each blue box is a Python file, and each green box is a data file.

When we say “the files are in the same directory”, the two files are like final_project.py and my_functions.py in the diagram below, both in the directory Project. Going up one directory is going towards Python_files, and this file is down from the directory Tara.

../_images/directory_visualisation.png

Filepaths#

To access a file from a Python program, or to run a program from the command line, you need to specify a filepath. This tells the computer where specifically to look for a certain file. Otherwise, there might be files in different directories with the same name, and the computer would not know where to look.

Absolute (full) Filepath#

An absolute or full filepath specifies the complete address of a file or directory in a system, starting from the root directory and including all directories leading to the specific file or folder.

Directories are separated by a forward slash, /, or a backslash, \ (both are supported on most operating systems).

For example, the filepath of my CSV data might be:

C:\Users\Tara\Data_Files\my_data.csv

Sometimes, your filepaths might get very long. To access a data file from a Year 3 lab your filepath might be:

C:\Users\Bella\OneDrive\Documents\University\Year_3\Labs\Synthesis\Module0020\Part_1\results.CSV

Firstly, giving your directories and files sensible names and locations will help reduce the amount of time you spend figuring out where your files are located. Secondly, you might want to consider using a relative filepath instead.

Relative Filepaths#

Sometimes, we may be very deep, within multiple directories, and it would become very long to specify the full filepath. This is where relative filepaths become useful. These allow you to specify a filepath relative to the directory you are already in.

There are two main commands to specify a relative directory.

  • . indicates that we are going to go forward (down) a directory.

  • .. indicates that we are going to go backwards (up) a directory.

For example, using our tree diagram as a reference:

../_images/directory_visualisation.png

If you were writing code in the file my_code.py and you wanted to access the file my_functions.py, you would first have to go forward into the directory “Project”. The relative filepath would be: ./Project/my_functions.py.

If you were instead writing code in the file final_project.py and wanted to access the file my_code.py, you would have to go backwards up to the directory “Python_Files”. The relative filepath would be: ../my_code.py.

If you were writing code in the file final_project.py, and wanted to access the file my_data.CSV, you would have to go backwards and then forwards. The relative filepath would be: ../../Data_Files/my_data.csv. The first set of dots takes you backwards into “Python_Files”, the second backwards into “Tara”, then you move forwards into “Data_Files”, and finally access the file “my_data.csv”.

File types#

If you are running a practical in the lab, you might take a measurement once a minute or multiple times a second, for example if you are running cyclic voltammetry. You then need somewhere to store all those data points. Different file types are marked by the extension at the end of the name, for example .txt, .dat, .cif, or .xyz. If you have used Excel for data processing in the past, you might have noticed that Excel files are often stored as .csv. Each has a purpose, and each has their own formatting depending on that purpose.

Name

Extension

Uses

Text file

.txt

Can take any values, stores plain text

Comma-separated values (CSV)

.csv

Data storage, delimiter is a comma

Data file (DAT)

.dat

Can contain any file type (plain text, PDF, audio, etc.). Often not human-readable.

Crystallographic Information File (CIF)

.cif

Standard format for storing crystallographic structural data

XYZ file

.xyz

Standard format for storing atomic coordinates

Python file (PY)

.py

Text file containing Python code, to be opened and used by a Python IDE

Jupyter Notebook (IPYNB)

.ipynb

Text-based file used by Jupyter Notebook

Each file type serves a different purpose. Try opening the files with different editors. For example, open a .py file in both your Python IDE (e.g. Spyder, IDLE, or VSCode), and then open it in a text editor (e.g. Word or Notepad) and look at the difference. The Python IDE is able to interpret the data in a useful way, and then save any changes you make. Open one of the data files. You will be able to extract data from a file (e.g. a .xyz file) and make it useable by your program.

If you want to use information from a file in your program, the file must be in the same directory as your pam.

Delimiters#

Objects in different kinds of files are separated in different ways. In text files, words are separated by spaces. In .csv files, cells are separated with a comma, or sometimes a semicolon. The delimiter is that character or sequence of characters used to specify the boundary between separate, independent values.

Other kinds of delimiter might be new lines, appearing as “\n”, a tab (standardly made up of 4 spaces), or occasionally a series of characters, such as “&#&#”, used when items may contain the delimiter within them (for example, commas inside Excel cells in a CSV file would cause an error).

You will see these appear in the files below.

Common file type structures#

Different file types look different when opened. You can view your files using an IDE (like VSCode or Spyder), or a text editor (your computer may have a built-in one called ‘Notepad’ or similar).

If you are reading data from a file, you should always open up your file and check that the data looks correct, whether there are any headers, and that it has been arranged in the format you are expecting.

Common formats are shown below:

Text files

Text files can contain lots of types of content, but always in plain text. Any data extracted into Python will be as a string data type, not an integer or float.

../_images/text_file.png

CSV files

.csv files are some of the more common file types. You will often see Excel documents saved in this format.

../_images/CSV_file.png

As you can see, each cell is separated by a comma to its sides, and a new line above and below. The delimiter is a comma. Empty cells are marked by just a comma with no contents.

If you want to take data values from an Excel sheet, you need to make sure it is saved with the .csv format. If it isn’t, you can simply click “save as” and change the file type to .csv. However, make sure you do not save it as “CSV UTF-8”, or you will get unexpected formatting when you try to extract information from your file.


XYZ file

XYZ files store atomic coordinates.

The first line indicates how many atomic coordinates are in the file.

The second line describes the molecule or collection or molecules.

Successive lines first contain the element, then it’s x, y, z coordinates in Angstroms. The delimiter is a tab, although you can also separate with just whitespace.

../_images/XYZ_file.png

Python file

Python files have the extension .py. If opened with an IDE, text will be automatically highlighted. Otherwise, it will be in plain text, like a text file.

../_images/py_file.png

Summary#

  • Files are grouped in directories to make navigation more convenient.

  • Directories branch down from a root directory.

  • Absolute filepaths indicate every directory and step from the root to the file you are indicating.

  • Relative filepaths use . and .. to indicate a relative path.

    • . indicates to go forward (down) a directory.

    • .. indicates to go backwards (up) a directory.

  • Different file types have different extensions. These include, but are not limited to:

    • .py for Python files

    • .txt for text files

    • .csv for CSV files

    • .xyz for XYZ files

    • .ipynb for Jupyter Notebooks files

    • .cif for CIF files

    • .dat for a DAT file