This is an example analysis of some bus data from Kingston's Open Data catalogue using Python.
# make plots appear inline in the notebook and load key libraries
import re
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
%matplotlib inline
# check out some available datasets
!ls /home/datasets/transit
# load a dataset of all bus trips and take a look
trips = pd.read_csv('/home/datasets/transit/trips.txt')
trips.head()
# get the number of trips for each individual bus route
num_trips = trips.groupby('route_id').count()
num_trips = num_trips.sort_values('trip_id', ascending=False)
Now lets's make a bar plot of the most frequent bus routes. It looks like the 501/502 is the most frequent bus, with most other buses having the same amount of service:
num_trips['trip_id'].plot(kind='bar')