Status: Delayed

A data-driven web application.

Project overview.

I was tired of seeing the ominous ‘DELAYED’ symbol before so many of my flights. So this small project aims to find causations of airport delays, while displaying the results in a visually pleasing data driven webpage. The FAA dataset used details 29 major domestic airports broken down into two major visualizations detailed below. The project is written in HTML, JavaScript, and CSS. However, I primarily used JavaScript d3, which is geared for dynamic, interactive data visualizations in web browsers.

Before beginning the project, I hypothesized the larger the airport ‘size’ the more delays would occur. 

 

The qualitative approach.

The first visualization, a map, takes a qualitative approach, and aims to do two things. First, it geographically display’s the latitude and longitude of the 29 airports with varying radii and fills as circles. The radii of each circle is calculated using a log scale of the size* of the airport. The opaque color of the circle is calculated using a linear gradient of percent air traffic delayed: blue symbolizes a lower amount of delayed flights, while red symbolizes a higher amount of delayed flights. I specified a few notable airports on the map for user reference.

Analyzing this qualitative, geographic approach, it did not provide me with a substantial proof that provided the cause of delays. However, it did show me that the major coastal airport hubs spawn significant delays (SFO, JFK, EWR, LGA), while the interior of the US did not have nearly as many flight delays. Maybe it has to do with metro hubs? Let’s continue analyzing a more quantitative approach.

*Size is defined as the total number of departing/arriving flights. There is a very loose correlation between area of airport and flights serviced.

 

The quantitative approach.

Since the qualitative map didn’t produce clear results, lets get numerical. The second visualization consists of two bar graphs. On the left, it shows the breakdown by cause of delay (i.e. National Aviation System, Security, Weather, Late Aircraft, Carrier) by each year of the dataset. As you can see, it looks like the breakdowns stayed roughly consistent throughout the years, peaking in 2007, and bottoming out in 2012. No one exact cause of delays has been increasing over the years…hmmm.

Maybe my perceptions on the amount of delays could be biased? Maybe the airports I fly out of are prone to delays (primarily PHL, ORD, and SFO)? The second bar graph on the right shows each airport in the dataset, and displays the total time for each cause of delay (in days), throughout the timeframe of data. From this visualization, you can see that Chicago (ORD), and Atlanta (ATL), had the vast majority of delays. The cause of this is most likely because they service the largest amount of connecting flights - ORD serves East-West travel, and ATL servicing the major east coast North-South travel. 

 

So what's causing the delays?

Now, looking at the results as a whole, could I arrive at a solid conclusion that shows the primary cause of delays? Well, not really...but I could still extract some useful information. For example from the qualitative approach it showed me the coastal metro hubs produced many more delays. While the quantitate approach showed me there is no cause of delays has been a major factor, yet airports with more connecting flights create more delays.

Still given the lack of practical results, I learned an extensive amount about building data driven, visually pleasing, and stable web pages. I had nearly no experience using the javascript d3 library before this project, and I think it still turned out quite well. I would definitely suggest using this library if you need to build something similar.

 


Codebase can be found here.