A data-driven web application.
Before beginning the project, I hypothesized the larger the airport ‘size’ the more delays would occur.
The qualitative approach.
The first visualization, a map, takes a qualitative approach, and aims to do two things. First, it geographically display’s the latitude and longitude of the 29 airports with varying radii and fills as circles. The radii of each circle is calculated using a log scale of the size* of the airport. The opaque color of the circle is calculated using a linear gradient of percent air traffic delayed: blue symbolizes a lower amount of delayed flights, while red symbolizes a higher amount of delayed flights. I specified a few notable airports on the map for user reference.
Analyzing this qualitative, geographic approach, it did not provide me with a substantial proof that provided the cause of delays. However, it did show me that the major coastal airport hubs spawn significant delays (SFO, JFK, EWR, LGA), while the interior of the US did not have nearly as many flight delays. Maybe it has to do with metro hubs? Let’s continue analyzing a more quantitative approach.
*Size is defined as the total number of departing/arriving flights. There is a very loose correlation between area of airport and flights serviced.
The quantitative approach.
Since the qualitative map didn’t produce clear results, lets get numerical. The second visualization consists of two bar graphs. On the left, it shows the breakdown by cause of delay (i.e. National Aviation System, Security, Weather, Late Aircraft, Carrier) by each year of the dataset. As you can see, it looks like the breakdowns stayed roughly consistent throughout the years, peaking in 2007, and bottoming out in 2012. No one exact cause of delays has been increasing over the years…hmmm.
Maybe my perceptions on the amount of delays could be biased? Maybe the airports I fly out of are prone to delays (primarily PHL, ORD, and SFO)? The second bar graph on the right shows each airport in the dataset, and displays the total time for each cause of delay (in days), throughout the timeframe of data. From this visualization, you can see that Chicago (ORD), and Atlanta (ATL), had the vast majority of delays. The cause of this is most likely because they service the largest amount of connecting flights - ORD serves East-West travel, and ATL servicing the major east coast North-South travel.
So what's causing the delays?
Now, looking at the results as a whole, could I arrive at a solid conclusion that shows the primary cause of delays? Well, not really...but I could still extract some useful information. For example from the qualitative approach it showed me the coastal metro hubs produced many more delays. While the quantitate approach showed me there is no cause of delays has been a major factor, yet airports with more connecting flights create more delays.