My Train Tracker

ยท 1288 words ยท 7 minute read

During May 2024, I started tracking my journeys by train, to get a sense of where I was travelling, and on which trains. I was heavily inspired by people online that try to visit as many stations as they can, such as Nick Badley and Jen On the Move. I was also interested in what type of trains I take, and how much time I spend on them.

I started off with a very simple table, taken from everylaststation.co.uk, which had a list of all the stations in the country that you could manually tick off. Being an excel nerd, of course, I decided that this wasn’t enough. I wanted more data. This was when I started to track individual journeys. This journeys table contains all the basic information: the start and end stations, departure and arrival times, and the distance between stations. I also tracked more niche data points: the train identity number as found on real time trains, and the individual unit number of multiple units/locomotives I was on. This data was enough to gather lots of insight into my journeys, and gave me lots of functionality to implement.

Station Tracker ๐Ÿ”—

The first thing I wanted to do was to integrate these two tables: at this point on visiting a new station, I would have to log it in my journey tracker, and then in my station tracker. This was quite easy to implement. I first added an additional column, for the number of times I had visited. This was a simple COUNTIF function, that counted the number of times the station appeared in either the departure or arrival columns. Note that this value considers changing trains at the station as two changes, so hub stations tend to have large numbers. Filling in the ‘Visited’ column was then easy: I simply checked if the number of times visited was greater than 0. If it was, then I had visited the station. This design did lead to some weirdness however: stations I had visited before I began tracking journeys are listed in the journeys table as journeys with only a departure station. This means there are several rows (in my case 7) before the true journeys start.

What type of trains do I ride? ๐Ÿ”—

Now knowing which stations I frequent, I was interested in how I got between those stations. Obviously I had some intuition for this, I knew that I often took Class 323 trains between Birmingham New Street and Selly Oak, but other than that I had no idea.

By this time I had standardised a format for my unit number column in the case of multiple units in a train. I chose the format, for example, ‘323206 + 323201’, where 323206 was the leading unit. This format is very easy to enter, however it isn’t great for parsing in excel. My first thought was to use the TEXTSPLIT and TEXTJOIN functions, but these did not get what I was looking for. This initial approach generated a list of all trains I had ridden, however it led to overcounting. In the example above, the list would include

323206

323201

This meant if I counted the number of times ‘323’ was the first three characters, I would count this journey twice, when in reality I had only ridden a Class 323 once. This meant I had to implement my own way to detect the class ridden.

The Filtering Lambda Functions ๐Ÿ”—

In order to do this, I created the following lambda function

=LAMBDA(class,
FILTER(Journeys,
ISNUMBER(
SEARCH(CONCAT(class,"???"),Journeys[Unit Number])
)))

This filter takes the class, and searches for Unit Numbers containing 323xxx. If none are found, the SEARCH function returns an #N/A error. Hence, we check if the SEARCH function returns a number. If it does, we know that that journey involved a Class 323. We then use this value to filter by. If a train consists of multiple different classes, this function will return that journey for both classes.

I also created similar functions for filtering by individual train and by train operating company.

How I used these functions ๐Ÿ”—

Now having a way to get all journeys where I rode a particular class, I wanted to gain insights, rather than just looking at a list. I therefore wanted a table of all classes I had ridden, and some details about how much I had ridden them. However, the functions I defined above do not give me a way to easily see data for each train class, since it requires me to input the train class. Therefore, I needed to generate a list of all trains I have ridden using the TEXTJOIN and TEXTSPLIT functions. For this I used the following formula:

=LET(joined, TEXTJOIN("+",TRUE, Journeys[Unit Number]),
split, TEXTSPLIT(joined,,"+",TRUE,0),
TRIM(split))

This formula first joins all of the entries in the Unit Number column with a +, before splitting them again using + as the delimiter. It then uses the TRIM function to remove any leading or trailing spaces. This function leads to a list of all the individual trains I have ridden. This solution may not work permanently, since the TEXTJOIN function can only take 252 entries (Source: ablebits.com), so I am limited to 252 journeys. Labelling this cell as trains_ridden, I can then use the formula below to get a list of all unique train classes I have ridden.

=VALUE(SORT(UNIQUE(LEFT(trains_ridden,3))))

I can then use the MAP function to apply a function to each entry in this dynamic array. For these functions, I chose to get the number of times I have ridden the class, how long for, and for what distance. I also extracted the most recent date I rode the class, using the TAKE function to get the last entry in the filtered table. I did similarly to get the same information, but by individual unit rather than train class.

Information about each train class ๐Ÿ”—

In the screenshot above, you can see that the train classes are not just denoted by their number: some additionally have names. For example, the Class 350 is known as a Desiro, and the Class 390 is known as a Pendolino. To add this information, I created a new table where I could note the class numbers and names, as well as the traction type (DMU, EMU, etc.). Using a simple CONCAT function, I was able to add the class name to the graphs. This also enabled me to investigate the amount of time I spend on different traction types.

The Summary Statistics ๐Ÿ”—

The other interesting worksheet in the excel file is the summary tab. This contains some single number measures of my train journeys. Most of them were easy to implement, such as the total distance travelled, the average speed and the percentage of stations visited. The more complicated summary statistic is the most common class, which uses the formula

=LET(common_class, XLOOKUP(MAX('By Train'!L4#),'By Train'!L4#,'By Train'!H4#,,0,1),
class_label, XLOOKUP(common_class, Class_labels[Class Number],Class_labels[Known As],"",0),
"Class "&common_class&" "&class_label)

I will explain this function in parts. First, we find the most common class number, using the formula

=XLOOKUP(MAX('By Train'!L4#),'By Train'!L4#,'By Train'!H4#,,0,1)

The dynamic array L4# is the column representing the time ridden for, displayed per train class, and H4# is the corresponding array of classes. The inner MAX function here therefore finds the maximum time that a class has been ridden for. We then use this value as a lookup value in the table, and use XLOOKUP to return the corresponding class number.

Then, we use this value to find the class name in the lookup table, and finally combine the class number and class name to get the final result.

How to get the tracker ๐Ÿ”—

Should you want to use this tracker yourself, you can view/download a copy from my OneDrive: Railway Tracker Blank (May 24).xlsx