Using Power BI to Analyse 1.9 Million Rows of Call A Bike Data

Dan Gray

2019/02/19

Summary

The personal-transportation revolution is well underway - the last decade has seen the massive expansion of “hire-to-ride” schemes for cars, bicycles, e-bikes and even scooters.

The Deutsche Bahn has its own car, car-sharing, and bicycle rental scheme. Here I use the powerful model-building, and visualization capabilities of Microsoft’s Power BI to load, clean and build a visual dashboard of bicycle trips in the city of Hamburg over the period summer 2016.

Implementation

The data for the brief analysis is freely accessible under the Deutsche Bahn Connect website.

The data is stored using a fact and dimension table schema - containing tariffs, station, booking and vehicle data. I downloaded the 4GB dataset for bookings over the years 2014-2017, as well as the corresponding vehicle table.

Pulling the raw data into Power BI Desktop was simple using the csv data-import portal. In the Query Editor section, I adjusted columns, used the Time/Date functions to extract useful derived columns such as Day of Week, Week No, Month Name etc. from the booking start and end columns. I also filtered down on location to just Hamburg, and restricted the time interval to cover the period April-September 2016.

The vehicle (dimension) table holds the key(s) to match vehicle type, manufacturer, company to the individual bookings. I wanted to only look at bicycle trips so I needed to merge this query with my other, using a left outer join - binding all items from the first table, to those the match from the second.

Building a Dashboard

Power BI Desktop’s real strength to the casual user is the very intuitive ‘drag-and-drop’ interface for building up numerous tables. In addition to the standard palette of bar, column, scatter, mapping, matrix/table visualizations the user gets cross-filtering, slicers and “drill-down” capabilities on the fly.

To illustrate these possibilities I recorded a short screen-cast (2min) of the charts and minor customization I made to build a simple interactive dashboard highlighting peak hiring times, stations and use patterns.