• How to design a Split Ticketing website algorithm

    by  • May 15, 2012 • .Net, Articles, Programming • 2 Comments

    I recently got introduced to the concept of Split Ticketing, where you split up your train journey into multiple segments to reduce the overall cost.

    This may seem strange, but it is in part due to reducing the flexibility of your route, and by doing this you can reduce the cost.

    This got me thinking, how would you design such a system; there are some currently available, such as Money Saving Experts demo or Split Your Ticket.

    Getting from A to B

    In order to work out the cheapest route, you need to know the route to begin with. This area of Computer Science is pretty much covered in terms of the basics (although research continues in the areas of optimising large data sets). The front runners are A* and Dijkstra’s algorithm. A* seems the best for us as the dataset (train stations) can be put in a logical order where a heuristic can be easily designed.

    A* works by attempting to plot a route between 2 points via intermediate points by calculating a weighting between each point, A* excels because it follows the path of least cost (by utilising a process queue).

    The points will be the train stations that can be visited, next we need to work out the weighting heuristic. Things get a bit more interesting though as you can get trains that traverse multiple stations in ‘one leap’ rather than buying tickets per station as well. However this gets handled by A* as the weight would be automatically adjusted and computed.

    In real life…

    According to the ‘Office of Rail Regulation’ there are currently 2352 stations in the UK, and they conveniently provide a spreadsheet with their names and postcodes. By using Ordance Surveys OpenData project I can for free convert these postcodes into grid references that I can use to easily plot locations and calculate distances.

    How much will it cost?

    There is not just the factor of cost that needs (or can) be factored in, overall there are several factors to consider:

    • Cost of ticket between two stations
    • Journey time between stations
    • Wait time at each station connection

    The first two are fixed costs however the 3rd is dependent on the connecting train, therefore it’s weight would be placed on the n+1 segment of that particular route.

    For the purpose and entire concept of split ticketing, the cost of the ticket is the major factor. I feel the weight time then becomes the 2nd most important (no one wants a 8hour wait overnight because the tickets are cheap)


    • 60% – Cost of ticket (pounds) between two stations (C) – Lower is better
    • 30% – Wait time (minutes) at each station (W) – Lower is better
    • 10% – Journey time (minutes) (T) – Lower is better

    Hn = 0.6 * Cn * W1 + 0.3 * Wn-1 * W2 + 0.1 * Tn * W3

    W1, W2 and W3 would be additional weighting which would need to be determined when testing to accommodate differentials between the costings of each.
    Creating the queue of next stations two process from a particular point would be done via a birds eye view distance between the two stations and a weighting of how many intermediate stops are needed to get there. – the simple Heuristic.
    Hn = Sn  * WDn * W2 
    The more complex heuristic is done when required – to save computational workload.
    The simple heuristic would favour the final station or stations close to the final station with minimal intermediate stations on route, it would then work back to the origin station.
    As this simple heuristic targets the final(f) station first, we can work out the ‘maximum’ standard fare for the route – the cost (C) most people would face going into the station.

    C = Hf 

    When performing the complex heuristic on each item in the queue, we only add the result if the computed heuristic is less than C, this means the computation will remain reasonably time/space efficient and hopefully won’t take silly routes across the country.

    Hn < C 

    The nice thing about all this, is that both of these heuristics (and journey plots) can be cached when they are performed. By adding a ‘time to live’, the values can be refreshed periodically. Other information which would be cached is the cost and journey times/lengths when travelling at different times of the day.

    Putting it all together

    The cost of train fairs between stations can be worked out using the open source librailfare API which most train operators use. The journey times, and availability of trains (to work out the wait times at the station) would need to be gathered from the train operators.

    All this could be knocked up using ASP.Net, but Azure would seem like a great oppurtunity (parrallel processing anyone!)


    Software engineer. Tea drinker


    2 Responses to How to design a Split Ticketing website algorithm

    1. Alan Hart
      January 29, 2013 at 1:21 pm

      Did you ever think any more about this? I was interested in doing something similar for the broader problem of international business travel a few years back. What I had in mind then was optimising the journey based on “where specifically you need to be and when you need to be there”, and human concepts like “I need to know whether I can get home the same day”, rather than the present online booking interfaces which still seem to be focused on expecting you to enter airport names and times.

      I think this problem seems very useful and much more tractable, although I am not sure how you would integrate advance tickets, as availability of these does not seem to be available programatically.

    Leave a Reply

    Your email address will not be published. Required fields are marked *