This is an example analysis of some bus data from Kingston's Open Data catalogue using Python.

In [1]:
# make plots appear inline in the notebook and load key libraries
import re
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')

%matplotlib inline
In [2]:
# check out some available datasets
!ls /home/datasets/transit
agency.txt	    feed_info.txt  routes.txt  stops.txt       trips.txt
calendar_dates.txt  real-time	   shapes.txt  stop_times.txt
In [3]:
# load a dataset of all bus trips and take a look
trips = pd.read_csv('/home/datasets/transit/trips.txt')
trips.head()
Out[3]:
route_id service_id trip_id trip_headsign shape_id block_id
0 12 189 866469:38369:5853 Highway 15 339 5853
1 501 188 864006:38371:4705 Express - Cataraqui Centre via Front/Bayridge 290 4705
2 2 223 859920:38372:5345 Division Street 345 5345
3 3 224 860985:38373:5365 Downtown via Queen Mary Rd 367 5365
4 10 192 864736:38374:5227 Cataraqui Centre 267 5227
In [4]:
# get the number of trips for each individual bus route
num_trips = trips.groupby('route_id').count()
num_trips = num_trips.sort_values('trip_id', ascending=False)

Now lets's make a bar plot of the most frequent bus routes. It looks like the 501/502 is the most frequent bus, with most other buses having the same amount of service:

In [5]:
num_trips['trip_id'].plot(kind='bar')
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fdcef693940>