Change Path In Python: A Detailed Introduction

Blog

Best Practice: Working With Paths In Python (Part 1)

Stefan Seltmann

Published on

12.2.2019

8.5.2025

Updated on

8.5.2025

Data Science & AI

Best Practice: Working With Paths In Python (Part 1)

The problem: listing folders and drives

Recently while working on a project, a colleague asked whether one could list the content of drives in Python. Of course, you can. Moreover, since this isn’t at all complicated, I’d like to take this case to illustrate key best practices recommended for working with paths on drives.

‍Step 1: How do I input the right path?

Assuming that you wish to get a listing of a particular path accurately, we start by selecting a user directory on a Windows 10 system, which is basically a reproducible example:

path_dir: str = "C:\Users\sselt\Documents\blog_demo"

The variables assigned upon execution immediately cause an error:

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

The interpreter doesn’t understand the character sequence \U, since this initiates Unicode characters of a similar sequence. This problem arises because the Windows system uses the backslash “\” as a path separator and Linux uses the slash “/”. Unfortunately, since the Windows separator is also the initiator for diverse special characters or escape in Unicode, it obviously confuses everything. Just like we don’t expect any coherence soon in the use of decimal separators in various countries, our only choice is to go for one of three solutions.

Solution 1 – The Hideous Variant

Simply avoid the Windows separator and instead write the path using Linux separators only:‍

path_dir: str = "C:/Users/sselt/Documents/blog_demo"

The interpreter then recognizes the correct path, believing it were a Linux system to start with.

‍Solution 2 – The Even More Hideous Variant

Use escape sequences.

path_dir: str = "C:\\Users\sselt\Documents\\blog_demo"

What bothers me besides the illegibility of this is that one does not use escape sequences at every character-separator combination, only before the “U” and “b”.

Solution 3 – The Elegant One

Use raw strings with “r” as a prefix to indicate that special characters should not be evaluated.

path_dir: str = r"C:\Users\sselt\Documents\blog_demo"

Step 2: Scanning the files

Back to our task of wanting to list all elements in a folder. We already know the path.

The simple command os.listdir lists all strings, i.e., only the path filenames. Here and in all other examples, I use type hinting for additional code documentation. This syntax became available from Python 3.5 onwards.

import os

from typing import List

path_dir: str = r"C:\Users\sselt\Documents\blog_demo"

content_dir: List[str] = os.listdir(path_dir)

The file is okay, but I’m more interested in file statistics, for which we have os.stat.

Step 3: Catenating paths

To transfer the file path, we must first combine the filename and path. I have often seen the following constructs in the wild, and even used them when starting out. For example:

path_file: str = path_dir + "/" + filename                         
path_file: str = path_dir + "\\" + filename                         
path_file: str = "{}/{}".format(path_dir, filename)                         
path_file: str = f"{path_dir}/{filename}"

A and B are hideous, because they catenate strings with a “+” sign – which is unnecessary in Python.

B is especially hideous, because one needs a double separator in Windows, or it will be evaluated as an escape sequence for the closing quotation mark.

C and D are somewhat better, since they use string formatting, but they still do not resolve the system-dependence problem. If I apply the result under Windows, I get a functional, but inconsistent path with a mixture of separators.

filename = "some_file"

print("{}/{}".format(path_dir, filename))

...: 'C:\\Users\\sselt\\Documents\\blog_demo/some_file'

A Solution Independent of the OS

A solution from Python is os.sep or os.path.sep. Both return the path separator of the respective system. They are functionally identical, but the second, more explicit syntax immediately shows the separator involved.

This means, one can write:

path_file = "{}{}{}".format(path_dir, os.sep, filename)

The result is better, but at the expense of a complicated code, if you were to combine several path segments.

Therefore, the convention is to combine path elements via string catenation. This is even shorter and more generic:

path_file = os.sep.join([path_dir, filename])

‍

The first full run

Let’s go to the directory:

for filename in os.listdir(path_dir):

    path_file = os.sep.join([path_dir, filename])

    print(os.stat(path_file))

‍

One of the results (not shown) is st_atime, the last time it was accessed, st_mtime for the last modification, and st_ctime for the creation time. Also, st_size gives the file size in bytes. At the moment, all I want to know is the size and last modification date, and so I choose to save a simple list format.

‍

import os

from typing import List, Tuple


filesurvey: List[Tuple] = []

content_dir: List[str] = os.listdir(path_dir)

for filename in content_dir:

    path_file = os.sep.join([path_dir, filename])

    stats = os.stat(path_file)

    filesurvey.append((path_dir, filename, stats.st_mtime, stats.st_size))

‍

The Final Function With Recursion

The resulting outcome appears satisfactory at first, but two new problems arise. Listdir does not differentiate between files and folders, addresses only the folder level and does not process subfolders. Hence, we need a recursive function that differentiates between files and folders. os.path.isdir checks for us whether there is a folder below a path.

Making the Results Useful as a Data Frame

Done! We have resolved the problem in less than 10 lines. Since I planned to have filesurvey as a list of tuples, I can easily transfer the result into the panda data frame and analyze it there to compute the totals saved in folders, etc

...But Unfortunately, It’s Not The Very Best Practice

I know, the blog promised to solve the problem using best practices.

A few years ago, my blogs would have earned some repute, but although Python keeps being developed it’s possible to improve even such simple use cases.

In the next part, I’m going to address this use case again and solve it elegantly.

Want To Learn More? Contact Us!

Dr. Sebastian Petry

Domain Lead Data Science & AI

Who is b.telligent?

Do you want to replace the IoT core with a multi-cloud solution and utilise the benefits of other IoT services from Azure or Amazon Web Services? Then get in touch with us and we will support you in the implementation with our expertise and the b.telligent partner network.

Get to know us

The top of an office building on a bright day

All posts

No previous post

No next post

Best Practice: Working With Paths In Python (Part 1)

The problem: listing folders and drives

Table of Contents

‍Step 1: How do I input the right path?

Solution 1 – The Hideous Variant

‍Solution 2 – The Even More Hideous Variant

Solution 3 – The Elegant One

Step 2: Scanning the files

Step 3: Catenating paths

A Solution Independent of the OS

The first full run

The Final Function With Recursion

Making the Results Useful as a Data Frame

...But Unfortunately, It’s Not The Very Best Practice

Want To Learn More? Contact Us!

Your contact person

Dr. Sebastian Petry

Who is b.telligent?

Munich

Basel

Berlin

Cluj

Dusseldorf

Frankfurt

Hamburg

Nuremberg

Vienna

Zurich

Cluj

Vienna – Postal address

Vienna – Visitor address

Basel

Zurich

Nürnberg

Frankfurt

Düsseldorf

Hamburg

Berlin

Munich

Best Practice: Working With Paths In Python (Part 1)

The problem: listing folders and drives

Table of Contents

‍Step 1: How do I input the right path?

Solution 1 – The Hideous Variant

‍Solution 2 – The Even More Hideous Variant

Solution 3 – The Elegant One

Step 2: Scanning the files

Step 3: Catenating paths

A Solution Independent of the OS

The first full run

The Final Function With Recursion

Making the Results Useful as a Data Frame

...But Unfortunately, It’s Not The Very Best Practice

Want To Learn More? Contact Us!

Your contact person

Dr. Sebastian Petry

Who is b.telligent?

Related Posts

Snowflake Document AI – Easily Extract Data From Unstructured Documents

Neural Averaging Ensembles for Tabular Data With TensorFlow 2.0

Neural Networks for Tabular Data: Ensemble Learning Without Trees

Sizing and Scaling Azure AI Search

Munich

Basel

Berlin

Cluj

Dusseldorf

Frankfurt

Hamburg

Nuremberg

Vienna

Zurich

Cluj

Vienna – Postal address

Vienna – Visitor address

Basel

Zurich

Nürnberg

Frankfurt

Düsseldorf

Hamburg

Berlin

Munich