❓ What is my thesis actually about?

🚉 On Trains: Post #6

Fabio Barbero published on

6 min, 1124 words

Categories: Random

Introduction

If you've been following my past blogposts, you probably figured that, for my master's thesis, I'm working on the European Passenger Rail Network. I am indeed. You might have also seen that I am travelling a lot, and might be questioning whether my thesis is actually progressing or if I'm just enjoying spending a lot of money on Interrail tickets. Well, let me reassure you, that I do feel like I'm progressing pretty well.

I now have a much clearer picture, of what I'm doing, and this blogpost is an attempt to concretize the scope of my thesis.

This is not a final plan for my thesis. Things might still change a lot, and I have no problem with that.

Why am I doing this?

I am not backed by any organization. My university does not have a department related to transport that has been pushing for my findings. The truth is, I just wanted to work on something I'm passionate about.

Now that I have immersed myself in the field, by reading and speaking with countless actors and experts, I can confidently say: there is a lack of independent research on rail data in Europe. The main reason being, the state of open rail data is unclear, hard to set up and understand.

All relevant research I have found, was commissioned by a large organizations (European Commission, train operators), on closed source data (in particular UIC's MERITS dataset, which costs from 8.000 to 50.000 EUR/year to access).

The biggest impact I hope to have with my master's thesis is popularizing independent research on the European-wide passenger rail network, by releasing statistics and tools to reproduce those findings.

Part 1: open data

The first goal of my thesis is therefore to give an overview of the state of open rail data, with metrics on its completeness and quality. This includes: timetable data (GTFS open access points (as NeTex is not currently widely adopted) and rail infrastructure (Open Street Map (OSM), RINF, and Wikidata for train stops and tracks). I am also going to look into geographical and geopolitical information about areas (NUTS).

Concretely I aim to:

  1. release a set of tools to easily download and manage GTFS files in Europe. This will include:
    • An efficient way to track/download/process all files
    • Visualization tools for estimating completeness
  2. contribute to open source projects (such as OSM and Wikidata) when finding incomplete or incorrect information about rail infrastructure

In practice, I hope to be releasing a first version of my GTFS processing tool by the end of February.

Part 2: Visualization

Given the data at hand, I am going to explore ways to visualize the timetable data for long-distance passenger travel in Europe. Concretely, I aim to replicate the website chronotrains.com (potentially in collaboration with the original creator) making isochrones with open access data.

I'd then like to compare it/overlap it with an isochrone map of the theoretical minimum time by train given the current infrastructure, and the time by car.

For computing the optimal time by car and theoretical minimum time by train given the current infrastructure, I will be using the Open Source Routing Machine (OSRM). OSRM doesn't support rail routing, but a very kind researcher shared with me a long, custom OSRM Lua profile for considering train tracks instead of roads. It should hopefully be released soon, but works exactly like signal.eu.org/osm/ (and it's really fast!).

For computing the optimal travel time from the GTFS files, I am planning on using already existing Trip Planners designed for taking OSM and GTFS data. I had the intention of making an in-depth comparison of different trip planners, but so far the only one that was able to run on my consumer hardware is the underrated MOTIS project. I am looking forward to meeting the developer behind MOTIS at FOSDEM, tomorrow.

I might make other visualizations other than isochrones, but for now this is the only concrete one I have in mind.

Part 3: Indicators for pairs of European cities

To focus more on cross-border/long-distance travelling, I will be taking pairs of European cities and computing the travel time (at different time periods), weighted by some factors of the city. These factors include: population, number of jobs, travel data (if I get access to any), language similarity between the two cities...

The goal is to find which pairs of cities are lacking connection. Other factors, such as the number of changes or the waiting time between changes, could also be taken into account.

The data for different cities will come from open sources such as Eurostat, Wikidata, and UN data.

This is the topic I have so far dedicated the least time to, and therefore requires being further expanded.

In my backlog

The following are things I'd like to explore if I have more time/they end up being easy to do:

  1. Comparing open access data with the Deutsche Bahn (DB) navigator data: I am currently unsure whether that would break the Terms of Service.
  2. Comparing different trip planners (MOTIS, r5py, ...)
  3. Estimating track capacity per pair of cities (how many trains/hour at different times of the day)
  4. Looking at the state of RINF data, compared with OSM

Closing thoughts

My last post accidentally broke the streak of shouting out someone else's work in the closing thoughts! So this week I'd like to once again thank Jon Worth for being such an enthusiastic train person. I was lucky to meet him to discuss my thesis this week, to which he provided very valuable insights.

This blogpost was written while I was sick, so I hope what I wrote makes sense!

Note: no text in this document has been generated or rewritten by a Large Language Model.