US Airport Network Exploration and Visualization Using Networkx and Basemap

Xinqian Zhai
6 min readJan 27, 2022

--

Photo by Erik Odiin on Unsplash

Today, I’ll be using the Networkx and Basemap libraries to do some simple network exploration and visualization about the airports and routes in the United States. After the exploration, we will see the distribution of the airport network and have a better understanding of which airports have the most influence on the whole airport network.

Framework

  • Download data
  • Manipulate data
  • Make a airport graph using Networkx
  • Overlay airport network on a map using Basemap
  • Show the airport network map

This is the final network map of flight routes in the US mainland.

the US mainland airport network map

0. Import useful libraries

First, let’s import all the libraries we’ll be using.

1. Download datasets

First, let’s download the airport data and the route data from OpenFlight. It has global-wide datasets about airports, airlines, routes, planes, train stations, and even ferry terminals. There are over 10,000 records in the airports database, and 67,664 route records in the routes database. Unfortunately, the data on OpenFlight is not very up-to-date, but the good thing is that the current datasets are good enough for us to practice.

We will use the airports.dat dataset and the routes.dat dataset. Here is the code to download and clean the two datasets.

Below are the small portion of the two data frames.

first 3 rows of airports.dat dataset
first 5 rows of routes.dat dataset

2. Manipulate data

Once the data is downloaded, we will use these two data frames to generate four useful data frames.

  • US airport data frame. Narrow down the global airport data to only the US airport data, and separate the US airport data into two scenarios: a) US mainland airport data (excluding Alaska and Puerto Rico), b) all the US airport data (including Alaska and Puerto Rico).
  • US route data frame. Generate the US route data frame to contain only the routes in the US airport data frame above.
  • US flight count data frame. Use the US route data frame to calculate the total number of flights (routes) for each US airport and organize the data in this data frame.
  • US combined data frame. Combine US airport data with US flight count data to generate a dense US airport data frame.

For now, let’s take a peek at the four data frames. You can find the full code in my Github here.

3. Make airport graph using networkx

After preparing all the needed data, we are ready to make a network graph using networkx. We will use the us_route_df data frame and from_pandas_edgelist() function to generate a directed graph for the US airport network.

In the network world, each airport is treated as a node, and each flight route is treated as an edge. Since flying from A airport to B airport is not the same route as flying from B airport to A airport, we will use a directed network to represent this airport network, that is, each node (airport) can be the source node (from the node) and also can be the target node (to the node).

Below are the US airport network graphs. The left one is a directed graph showing all the US airports and routes (including Alaska and Puerto Rico), and the right one is a directed graph only showing the US mainland airports and routes (excluding Alaska and Puerto Rico). One thing that should be noticed is that the shape of the graph does not matter. Each time you rerun the plotting code, the connections (edges) between nodes (airports) do not change, but node positions and graph shape can change.

directed graphs (left: all the US airports and routes; right: US mainland airports and routes)

4. Overlay airport network on a map using Basemap

The logic for making the airport network map is as follows:

  • First, set up a map using Basemap. Since having two scenarios here, we need to set up two base maps, one focusing on the US mainland, and the other expanding to include the Alaska and Puerto Rico area.
  • Second, project the latitude and longitude onto the base map so that we can draw the nodes (airports) and edges (routes) that match the correct locations on the map.
  • Third, overlay the airport network graph on the map. Here we will draw the airport network nodes, edges, and labels on the map. To be more informative and visually appealing, we will segment airports into different levels based on the routes they have and show them on the final visualization.

You can find the full code in my Github here.

5. Show the airport network map and report

Finally, we’re here to show the network map of airports in the US!

It can be directly seen from the network diagram that ATL, ORD, DEN, DFW, and MSP are the five largest airports (dark orange nodes) with the most routes in the US mainland. In addition, more airports and local routes concentrated on the East Coast, and DEN airport seems to be an important link connecting the East Coast and the West Coast.

Evaluated with different centrality metrics, such as the degree centrality, betweenness centrality, and closeness centrality, all the top 5 airports also appear in all the top 3 centrality lists. It means that these airports not only have the most flight routes, bu also act as “influencers”, “bridges”, and “broadcasters” across the whole airport network. If these airports stop functioning properly, passengers will need more transit points to get their destinations, or may not be able to get there unless they take other transportation.

the US mainland airport network map

This is the network map of all flight routes in the US including Alaska and Puerto Rico. As we can see, all airports in Alaska and Puerto Rico are relatively small. There are more routes to Puerto Rico than to Alaska, and majority of the airports directly connected to it are from the West Coast, which means these airports can serve as the transfer stations for other airports to Puerto Rico or Alaska.

all the US airport network map

Actually, there are more interesting and deeper topics about network analysis, such as community discovery and link prediction using the networkx library, each of which is a big topic. Hope this post can give you some basic idea of the network :)

--

--

Xinqian Zhai
Xinqian Zhai

Written by Xinqian Zhai

Graduate student at the University of Michigan and a new learner on the road to data science.

No responses yet