Open In Colab

colab

17. Geo-Visualization#

In this notebook we will explore geo-visualization using:

  • GeoPandas

  • matplotlib

  • plotly.express

This notebook was largely built by Usman Alim (ualim@ucalgary.ca)

Let’s load in the libraries that we’ll need.

import numpy as np
import pandas as pd
import geopandas as gpd

import matplotlib as mpl
import matplotlib.pyplot as plt

import plotly.express as px

17.1. GeoPandas#

  • Inherits from pandas. Provides the following two data structures: GeoSeries and GeoDataFrame.

  • A GeoSeries is a Series where each entry describes a geometric shape in vector format. The following shapes are supported:

    • Points / Multi-points

    • Lines / Multi-lines

    • Polygons / Multi-polygons

  • A GeoDataFrame is a pandas DataFrame with a special ‘geometry’ column that contains a GeoSeries.

  • Provides methods to convert between different geospatial coordinate systems.

17.2. Read GeoJSON Data#

GeoJSON is a commonly used format for storing data associated with regions defined through geographic coordinates. In this notebook, we’ll look at the Census by Community dataset for Calgary.

Download the dataset (in GeoJSON format) from the following URL: Census by Community 2019

# Let's read the dataset through geopandas as a GeoDataFrame

census_file = '/content/Census_by_Community_2019_20250904.geojson' #you'll have to change this to your path!
cendf = gpd.read_file(census_file)
cendf.head()

We can see that we created a geopandas dataframe

type(cendf)

There’s a geometry column which holds the geometric information as a GeoSeries

type(cendf.geometry)

It looks like the coordinates for the community polygons are in lat/long format. The projection can be setup appropriately. For further details on managing projections, see: https://geopandas.org/en/stable/docs/user_guide/projections.html

#check the crs
cendf.crs

Let’s convert to world mercator which has units of meters

You can check out the CRS at: https://epsg.io/3395

#change crs
cendf.geometry = cendf.geometry.to_crs(epsg=3395)

#check that the new crs is setup
cendf.crs

Finally let’s change the index so that it uses the name of the polygon

# Let's index by community name
cendf.set_index('name', inplace=True)

#take a look
cendf.head()

17.3. Plot GeoJSON Data#

To visualize all the geometric entities, we can use the built-in plot command.

#create an empty plot
fig = plt.figure()

#create one drawing box within that box
axis = fig.add_subplot(1,1,1) #(nrow,ncol,index)

#plot the map within that box
cendf.plot(ax=axis, edgecolor='black')

#make sure the x and y axies are equal
plt.axis('equal')

#add a tile
plt.title('YYC Community Map')

#set the size of the plot
fig.set_size_inches(8,8)

colab

Try out the above with the lat/long projection (use EPSG 4326)

Now let’s look at how we can plot individual polygons.

# If we select a particular geometry cell, Jupyter will display it
cendf.loc['INGLEWOOD','geometry']
# Let's take a look at a few!
display(cendf.loc['INGLEWOOD','geometry'], cendf.loc['ROCKY RIDGE', 'geometry'],
        cendf.loc['VARSITY', 'geometry'], cendf.loc['FISH CREEK PARK','geometry'])
# Basic methods to compute areas and bounds are also provided.
cendf['geometry'].area

We can even calculate the distance between polygons.

# Euclidean distance to Downtown for all the communities (based on centroid distances)
cendf['geometry'].distance(cendf.loc['DOWNTOWN COMMERCIAL CORE', 'geometry'])

17.4. Choropleth mapping#

When we want to visualize how a value changes between polugons it is often useful to build a choropleth map. In these maps we colour code the polygons by a variable value. Let’s take a look below.

First let’s install a library that colab is missing.

# Have to install a new library to color code polygons
!pip install mapclassify

We can then specify a column from the GeoDataFrame to visualize. Let’s use the count of residences within each polygon: res_cnt

# if we try to get summary stats we see something strange...
cendf.res_cnt.describe()

Essentially, the variable has been assigned the datatype ‘Object’ not a number… let’s change that.

#convert to float first then integer
cendf['res_cnt'] = cendf['res_cnt'].astype(float).astype(int) # convert to an integer

#try again
cendf['res_cnt'].describe()

Now that we have a numeric varaible let’s create a plot where each polygon is coloured based on the residential count.

#empty plot with a box to add a plot to
fig = plt.figure()
axis = fig.add_subplot(1,1,1)

#use the plot function: specifcy numeric column, how to classify residential counts to colours (i.e., quantiles, k is the number of classes)
cendf.plot(ax=axis, column='res_cnt', scheme='quantiles', k=6, cmap='Blues', legend=True)

#keep the x and y axes equal
plt.axis('equal')

#title and size
plt.title('YYC Resident Count (2019)')
fig.set_size_inches(8,8)

In the plot above we’ve used quantiles to help break the polygons up into different classes. Here are some more options we could have tried and the use cases.

Scheme

Description

When to use

quantiles

Equal number of polygons in each class (e.g., 20% each for quintiles).

Good for showing rank or relative standing; can exaggerate differences if data are skewed.

equal_interval

Divides range into equal-width bins.

Easy to interpret, but may leave some bins with few/no polygons if data are skewed.

natural_breaks (Jenks)

Finds breaks that minimize variance within classes and maximize variance between them.

Good default for skewed data; emphasizes “natural” groupings.

fisher_jenks

Similar to natural breaks but optimized using Fisher-Jenks algorithm.

More computationally intensive, often very clear groupings.

std_mean

Bins based on standard deviations from the mean.

Good for data with normal-ish distribution, emphasizes how far from average a polygon is.

boxplot

Uses quartiles and interquartile range (IQR).

Nice when you want consistency with boxplot outlier rules.

percentiles

Similar to quantiles, but lets you specify exact percent thresholds.

Custom flexibility.

maximum_breaks

Finds breakpoints that maximize the differences between groups.

Similar use case to Jenks.

headtail_breaks

For heavy-tailed distributions; recursively splits head/tail.

Great for city-size or population-type data.

colab

Experiment with setting scheme=’equal_interval’ or another classification method. How does changing the scheme affect the way the polygons are grouped? Does it change your interpretation of where most residences are concentrated?

Bonus

See if you can do the same with Edmonton! You should be able to download the geojson file here: https://data.edmonton.ca/City-Administration/City-of-Edmonton-Neighbourhoods/65fr-66s6/

17.5. Adding a scatterplot on a map#

  • We can also plot points (given in geographic coordinates) on a base map.

  • Let’s display micromobility locations on YYC base map (i.e., locations of rentable e-bikes or scooters)

  • Download the dataset from the following URL (for offline use): Micromobility Locations

    There is also a live feed which at the time of writing this notebook is accessible through the following URL: https://data.calgary.ca/resource/8mci-3bzr.csv

# Let's use the live feed
mmlocs = pd.read_csv("https://data.calgary.ca/resource/8mci-3bzr.csv", #read_csv is very flexible... here we are reading directly from a URL!
                     usecols=["timestamp","latitude", "longitude"],    #only read in certain columns
                     parse_dates=['timestamp'])  #automatically convert this column to dates!

#take a look
mmlocs

We can see that we have a dataframe with latitude and longitude coordinates!

Let’s use this to create a geodataframe object that we can then plot.

from shapely.geometry import Point

#add longitude and latitude together into a list and place it into a new column called coordinates
mmlocs['coordinates'] = list(zip(mmlocs["longitude"], mmlocs["latitude"]))

#once in a list we can convert the list to a point for each row
mmlocs['coordinates'] = mmlocs['coordinates'].apply(Point)

#take a look
mmlocs

You should now see a new column with Points in it!

Now we can convert it to a geodataframe.

#specify which column to find the geometry
locsdf = gpd.GeoDataFrame(mmlocs, geometry='coordinates')

#set the coordinate reference system
locsdf = locsdf.set_crs(epsg=4326) # interpret the coordinates as lat/long

#take a look
locsdf.head()

Now that we have a geodataframe let’s plot it!

This time, let’s add a base map so we can see where the points fall within the city. To do this we’ll call the plot function twice adding layers to the map.

#create an empty plot
fig = plt.figure()
axis = fig.add_subplot(1,1,1)

# set the projection to mercator (always good to make sure your data is in the same crs)
cendf = cendf.to_crs(epsg=3395) 
locsdf = locsdf.to_crs(epsg=3395)

#plot the layers we want (order matters here)
locsdf.plot(ax=axis, markersize=5, color="red")
cendf.plot(ax=axis, edgecolor='black')


#title/size
plt.axis('equal')
plt.title('YYC Micromobility Locations Snapshot (' + str(locsdf['timestamp'].mean()) + ')')
fig.set_size_inches(8,8)

colab

See if you can do the same for Edmonton again! Below is some code for the micromobility data in Edmonton, just to get you started.

csv_url = "https://data.edmonton.ca/resource/vq44-ni9f.csv"
df = pd.read_csv(csv_url)
print(df.head())

17.6. Mapping in Plotly#

  • Matplotlib maps are good for offline use, lack interactivity.

  • Plotly uses tiled maps provided by mapbox as the underlying interactive base maps.

    • Some mapbox base maps require an access token but there are also free ones.

    • Untiled (outline) maps are also supported.

  • We’ll look at the following examples that work with geopandas dataframes:

#Let's use a simple geographic projection (WGS84)
locsdf = locsdf.to_crs(epsg=4326)

#Create a spatial scatter plot
fig = px.scatter_mapbox(locsdf,
                        lat=locsdf.geometry.y,
                        lon=locsdf.geometry.x,
                        hover_name="timestamp",
                        center={"lat": 51.0486, "lon": -114.0708}, # Calgary
                        mapbox_style='open-street-map', #this will be the background
                        zoom=9, #try changing this to zoom in or out
                        title='YYC Micromobility Locations Snapshot (' + str(locsdf['timestamp'].mean()) + ')')

#add some margins around the plotting area (helps avoid labels getting cut off)
fig.update_layout(margin={"r":50,"t":50,"l":50,"b":50},
                  autosize=True,
                  height=600 )
fig.show()

Let’s do the same thing, but this time generate a choropleth map.

#Let's use a simple geographic projection (WGS84)
cendf = cendf.to_crs(epsg=4326)

#Create a choropleth plot
fig = px.choropleth_mapbox(data_frame = cendf, #dataframe with polygons and attributes
                           geojson=cendf, #geometry to draw the polygons
                           locations=cendf.index, #which rows in the data_frame correspond to polygons in the geojson data
                           color="res_cnt", #variable used to fill colours
                           color_continuous_scale = 'YlGn', #yellow to green color map (to see more options: https://colorbrewer2.org/)
                           center={"lat": 51.0486, "lon": -114.0708}, # Calgary
                           mapbox_style="carto-positron", #free background base map
                           opacity=0.5, #how transparent the polygons are
                           zoom=9,
                           title = 'YYC Resident Count (2019)')

#add some margins around the plotting area (helps avoid labels getting cut off)
fig.update_layout(margin={"r":50,"t":50,"l":50,"b":50},
                  autosize=True,
                  height=600 )
fig.show()

17.7. Search for your own geo-datastes (Optional!)#

colab

Feel free to go to www.naturalearthdata.com and in the search bar, search for some spatial data that might be of interest to you!

E.g., Antarctic ice shelves, antarctic boundaries, protected areas, etc…

Just to get you started here’s an example with coral reefs around the world!

Download reef data:

https://www.naturalearthdata.com/downloads/10m-physical-vectors/10m-reefs/

Download country data:

https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-countries/

import geopandas as gpd
import requests
import zipfile
import io

#import the data into a geopandas object
# Note: here we are reading directly into a geodataframe from a zip file that contains a shapfile within! very convenient...
reefs = gpd.read_file("/content/ne_10m_reefs.zip")

print(reefs.head())
print(reefs.crs)  # check the coordinate reference system
reefs.plot()
# Load a world background (built into GeoPandas: naturalearth_lowres)
world = gpd.read_file("/content/ne_10m_admin_0_countries_lakes.zip")

# Make sure both are in the same CRS (coordinate reference system)
reefs = reefs.to_crs(world.crs)

# Plot
fig, ax = plt.subplots(figsize=(12, 6))
world.plot(ax=ax, color="lightgrey", edgecolor="white")
reefs.plot(ax=ax, color="red", markersize=1, alpha=0.7)

ax.set_title("World Reefs (Natural Earth 1:10m)", fontsize=14)
plt.show()

17.8. Further reading#

In this notebook, we’ve explored libraries that support geographic data visualization. In particular, we’ve looked at GeoPandas and its integration with matplotlib.

Plotly supports interactive geo-visualization capabilities on top of base maps provided by mapbox.