In my last blog post I talked about what data I would need to go about building a Split Ticketing website, where you split up your train journey into multiple segments to reduce the overall cost.
To do this I first off needed to find out information about train stations, particularly where they all are! This is because geographic distances play a part in the simple heuristic function employed by the A* search algorithm when plotting the cheapest route.
First off I needed a list of all the train stations, luckily this is provided by the ‘Office of Rail Regulation’ where there are currently 2352 stations in the UK, and they conveniently provide a spreadsheet with their names and postcodes – which using Google Docs can be saved as a CSV file.
By using Ordance Surveys OpenData project I can for free convert these postcodes into grid references that I can use to easily plot locations and calculate distances. You have to request the PostCode info and 10-20 mins later you get huge CSV files full of Easting/Northing data for each postcode.
I wrote a series of parsers to go through the data and pick out the information I need and match it up between the 2 data sets:
- Train Station Code
- Train Station Name
Once done I enumerated through, and using the relative Easting / Northing data I plotted the information on a bitmap along with the station code, the PNG of which can be seen at the top of the blog post – Click on it to get the full effect!
For the nerds out there, you can download the CSV file of the data that was generated here which contains the categories described above
National Rail and ATCO have routing guides, these are used to generate train routes when finding tickets. Although good, we can use the A* algorithm to see if there are better routes.
We need a mechanism for determining which stations are connected, to do this we need to mine National Public Transport Data Repository (NPTDR) data from the ATCO, luckily we can get all this (250mb+) from data.gov.uk.
This provides a HUGE amount of data on every stop (bus, train, tram, ferry, airport etc.) in the country, each is given a special ATCO code. By stripping out the other forms of transport we don’t need, we can match up the Train Station to its ATCO code (RLY code) based on the Easting/Northing information, which can then be used in the provided stopping data to plot journeys.
Below is the map which includes the ATCO routing information overlaid (showing direct routes between stations), it looks a bit rough and some of the journeys don’t seem right at the moment but I will look into this further.
So there we have part 1, I have everything (nearly) in place to start searching for possible routes, join me in part 2 where I will connect it to one of the many ticket information & timetable APIs to generate the A* search algorithm.
Well I tinkered away and fixed most of the bizarre connections (due mainly to invalidate coordinate data), I added the A* search algorithm and using the heuristic of shortest distance & actual, we get the route of shortest distance across the rail network, I choose CPM and YRK as the endpoints.