SDS 271 Final Project: Geopandas and Folium Tutorial¶
Group Members: Flora, Karolina, and Katarina¶
Geopandas expands the pandas library by allowing the user to work with spatial data. Spatial data, or geospatial data, is information that identifies the geographic location and characteristics of natural or constructed features. This includes lines, polygons, and points. This can be combined with coordinate systems to create maps.
In pandas, you have dataframes, but in Geopandas, you have geodataframes, which contain geometric information. Series are similar, where we now have a geoseries with geographic features. We can manipulate the data in the dataframes using pandas, and then we can map that data using geopandas.
Installing Geopandas and Importing Necessary Packages.¶
In the terminal line, run uv pip install geopandas to install the geopandas package. Now we can import the geopandas package, as well as numpy, and pandas.
import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
Collecting and Reading in Data¶
We will use GeoJSON files that we retrieved from the MassGIS website. Using gpd.read_file, we can input the name of the data file and read the data in. These datasets contain all bike trails/paths in Massachusetts, MA county boundaries, and MA town boundaries. GeoJSON is a file format used to represent basic geographic features (points, lines, polygons). The read_file() command of geopandas works with other vector-based file formats, meaning that objects are represented with coordinate pairs.
data = gpd.read_file("Bike_Trails_(Feature_Service).geojson")
data.head()
mass = gpd.read_file("Massachusetts_Counties.geojson")
towns = gpd.read_file("Massachusetts_Municipalities_Hosted_3976135619956439721.geojson")
Below is the dataset of all bike trails in Massachusetts
data.head()
| OBJECTID | MDOT_ROUTE_ID | LOCAL_NAME | ALT_NAME | DATA_SOURCE | LINE_SOURCE | ROUTE_SOURCE | FAC_TYPE_NAME | SURFACE_TYPE | NAMESOURCE | Shape__Length | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | B0001 | Ashuwillticook Rail Trail | Western New England Greenway | BRPC | NaN | MassDOT | Shared Use Path | NaN | NaN | 2767.946996 | LINESTRING (-73.20356 42.48869, -73.20358 42.4... |
| 1 | 2 | B0001 | Ashuwillticook Rail Trail | Western New England Greenway | BRPC | NaN | MassDOT | Shared Use Path | NaN | NaN | 615.806771 | LINESTRING (-73.19346 42.51186, -73.19346 42.5... |
| 2 | 3 | B0001 | Ashuwillticook Rail Trail | Western New England Greenway | BRPC | NaN | MassDOT | Shared Use Path | NaN | NaN | 444.870559 | LINESTRING (-73.19324 42.5174, -73.19322 42.51... |
| 3 | 4 | B0001 | Ashuwillticook Rail Trail | Western New England Greenway | BRPC | NaN | MassDOT | Shared Use Path | NaN | NaN | 2087.723529 | LINESTRING (-73.1932 42.5214, -73.1932 42.5214... |
| 4 | 5 | B0001 | Ashuwillticook Rail Trail | Western New England Greenway | BRPC | NaN | MassDOT | Shared Use Path | NaN | NaN | 6.366835 | LINESTRING (-73.18222 42.53781, -73.18218 42.5... |
When we look at the dataset, we have a new type of column. Geometry contains the shape of bikepath.
Creating Basic Maps¶
We can start by creating basic maps of Massachusetts using the plot() method. It is really simple and easy to get basic shapes to show up in your output. Notice how different polygons are displayed, either the counties or towns being clearly marked by boundaries.
mass.plot().set_title("Massachusetts with County Boundaries")
Text(0.5, 1.0, 'Massachusetts with County Boundaries')
# We can also create maps using only the boundary of the states using .boundary.plot()
towns.boundary.plot().set_title("Massachusetts with Town Boundaries")
Text(0.5, 1.0, 'Massachusetts with Town Boundaries')
The column and legend arguments of the plot() method allow us to display data that corresponds with each polygon. Population data was included in the GeoDataFrames we created from the GeoJSON files.
Mapping Variables on Map¶
The column and legend arguments of the plot() method allow us to display data that corresponds with each polygon. Population data was included in the GeoDataFrames we created from the GeoJSON files.
mass.plot(column = "POP2010", legend=True).set_title("Massachusetts 2010 Population")
Text(0.5, 1.0, 'Massachusetts 2010 Population')
Filtering polygons on a map¶
Let’s say we want to create a map of just specific towns surrounding the five colleges, with town boundaries included. Using a boolean mask we can filter the towns GeoDataFrame to only include the towns of interest. We can then create a new and smaller GeoDataFrame called fivecol.
amh = (towns["TOWN"] == "AMHERST")
nor = (towns["TOWN"] == "NORTHAMPTON")
shad = (towns["TOWN"] == "SOUTH HADLEY")
had = (towns["TOWN"] == "HADLEY")
ehamp = (towns["TOWN"] == "EASTHAMPTON")
# apply mask to the towns data
mask2 = (amh | nor | had | shad | ehamp)
fivecol = towns[mask2]
fivecol.plot().set_title("Massachusetts Five College Area Map")
Text(0.5, 1.0, 'Massachusetts Five College Area Map')
Plotting Lines and Bike Trails¶
We can plot things other than counties or states as long as we have geometry data to use to plot. We can plot all the bike trails in Massachusetts below. The plot() method also works to display lines.
data.plot().set_title("Bike Trails In Massachusetts")
Text(0.5, 1.0, 'Bike Trails In Massachusetts')
Joining Spatial Data to Get the Bike Trails in the 5 College Area¶
The sjoin() method allows you to combine GeoDataFrames based on spatial relationships. The how argument specified whether you want to perform a left, right or inner join. The predicate argument allows you to specify the spatial relationship you want to join by.
Here are the different possible predicates for a spatial join:
right.sjoin(left, how = ..., predicate = ...)
- contains: True if the right object contains the entire left object.
- intersects: True if the left object intersects with any part of the right object.
- within: True if the left object is within the entire right object.
- touches: True if both objects have at least one point in common, but their interiors do not intersect.
- crosses: True if the interior of the right object intersects the interior of the left (but does not contain it).
- overlaps: True if the objects have more than one but not all points in common.
In our example, we want to only display bike trails and paths that are in a specific set of towns, defined in the fivecol GeoDataFrame (which was created using a similar boolean mask technique as seen in step 4). We use predicate = intersects, to display parts of the bike trails that intersect with the polygons (or town boundaries) defined in fivecol.
bikewcol = data.sjoin(fivecol, how = "inner", predicate = "intersects")
bikewcol.plot().set_title("Bike Trails Surrounding the Five Colleges")
Text(0.5, 1.0, 'Bike Trails Surrounding the Five Colleges')
Labeling On the Map¶
If we want to label the counties on the map, first we have to find a central point to place the label. We can find a point inside the polygon by using .representative_point.coords[:]
This code extracts the coordinate tuple of a point guaranteed to be inside a Shapely geometry. It uses representative_point() to find a safe location, coords[:] to convert that point object into a list of tuples, usually resulting in [(x, y)]
fivecol['coords'] = fivecol['geometry'].apply(lambda x: x.representative_point().coords[:])
fivecol['coords'] = [coords[0] for coords in fivecol['coords']]
When we look at the dataset again, we see that the column coords contains the langitude and longistude point that we found from above. This is a point that is garunteed to be inside the shape and where our labels will appear.
fivecol.head()
| OBJECTID | TOWN | TOWN_ID | TYPE | COUNTY | FIPS_STCO | FOURCOLOR | POP1960 | POP1970 | POP1980 | POP1990 | POP2000 | POP2010 | POP2020 | POPCH10_20 | AREA_ACRES | AREA_SQMI | geometry | coords | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 149 | 150 | SOUTH HADLEY | 275 | T | HAMPSHIRE | 25015 | 3 | 14956 | 17033 | 16399 | 16685 | 17196 | 17514 | 18150 | 636 | 11797.438 | 18.433 | POLYGON ((-72.5399 42.30413, -72.53515 42.3042... | (-72.58313806385038, 42.2585839185225) |
| 163 | 164 | NORTHAMPTON | 214 | C | HAMPSHIRE | 25015 | 2 | 30058 | 29664 | 29286 | 29289 | 28978 | 28549 | 29571 | 1022 | 22878.833 | 35.748 | POLYGON ((-72.63522 42.35606, -72.63619 42.355... | (-72.67259867023176, 42.3305649268947) |
| 170 | 171 | HADLEY | 117 | T | HAMPSHIRE | 25015 | 1 | 3099 | 3750 | 4125 | 4231 | 4793 | 5250 | 5325 | 75 | 15752.120 | 24.613 | POLYGON ((-72.54678 42.40009, -72.53159 42.401... | (-72.56238413463919, 42.35602788803365) |
| 188 | 189 | EASTHAMPTON | 87 | C | HAMPSHIRE | 25015 | 3 | 12326 | 13012 | 15580 | 15537 | 15994 | 16053 | 16211 | 158 | 8707.337 | 13.605 | POLYGON ((-72.67213 42.29167, -72.67215 42.291... | (-72.67138655051201, 42.2575813016879) |
| 226 | 227 | AMHERST | 8 | T | HAMPSHIRE | 25015 | 4 | 13718 | 26331 | 33229 | 35228 | 34874 | 37819 | 39263 | 1444 | 17762.606 | 27.754 | POLYGON ((-72.48369 42.40748, -72.48369 42.407... | (-72.50545950460702, 42.3701699268835) |
Looping through Labels to Plot them onto Map¶
We can label the counties by looping through the rows in our dataset that we have plotted. We then use the string from the TOWN column and place the text at the central point that we found in the earlier code. We can adjust the alignment and font size as well by using the appropirate arguments.
ax = fivecol.plot()
for idx, row in fivecol.iterrows():
plt.annotate(text=row["TOWN"], xy=row['coords'], horizontalalignment = "center", fontsize = 7, color = "black", fontweight = "bold")
ax.set_title("Five College Area Map with labels")
Text(0.5, 1.0, 'Five College Area Map with labels')
Adding Coordinates¶
A method of adding points to the map is to convert a set of coordinates with labels into a GeoDataFrame. First, we created a regular pandas dataframe with the five colleges and their x/y coordinates. Next, using the gpd.GeoDataFrame() method we can convert this dataframe into a geodataframe. Essentially, the only difference is the addition of geometry column.
The geometry argument is where we use the points_from_xy() method to convert the x and y columns in our original dataframe into geographic point objects. Then, we set the crs argument, or coordinate reference system, to EPSG:4326 which is the standard latitude/longitude system.
# Collect Coordinates of the Locations you want to map
colleges = {
"Smith": [42.318, -72.640],
"UMass": [42.386, -72.529],
"Amherst College": [42.372, -72.518],
"Hampshire College": [42.3250, -72.5325],
"Mount Holyoke": [42.2556, -72.5744]
}
# convert to DataFrame
df = pd.DataFrame(colleges).T
# Label the column, making sure that
df.columns = ["y", "x"]
# create geometry column
#geometry = [gpd.points_from_xy(df["x"],df["y"]) for xy in zip(df["x"], df["y"])]
geometry = gpd.points_from_xy(df["x"], df["y"])
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df["x"], df["y"]), crs = "EPSG:4326")
gdf
| y | x | geometry | |
|---|---|---|---|
| Smith | 42.3180 | -72.6400 | POINT (-72.64 42.318) |
| UMass | 42.3860 | -72.5290 | POINT (-72.529 42.386) |
| Amherst College | 42.3720 | -72.5180 | POINT (-72.518 42.372) |
| Hampshire College | 42.3250 | -72.5325 | POINT (-72.5325 42.325) |
| Mount Holyoke | 42.2556 | -72.5744 | POINT (-72.5744 42.2556) |
# Setting these points as geometry so they can be plotted using geopandas
gdf = gdf.set_geometry("geometry")
gdf.plot().set_title("Preliminary Five College Map")
Text(0.5, 1.0, 'Preliminary Five College Map')
Applying them on top of map¶
ax = fivecol["geometry"].plot()
gdf["geometry"].plot(ax=ax, color = "black").set_title("Five College Map with Towns")
Text(0.5, 1.0, 'Five College Map with Towns')
Worked Example¶
We can use the functions that we have explained to create a map of the bike paths between the five colleges.
# gives us the coordinate points to place the labels
fivecol['coords'] = fivecol['geometry'].apply(lambda x: x.representative_point().coords[:])
fivecol['coords'] = [coords[0] for coords in fivecol['coords']]
# Creates a plot of our five chosen towns
ax = fivecol.plot(color = "pink")
# Adds the town names to the plot
for idx, row in fivecol.iterrows():
plt.annotate(text=row["TOWN"], xy=row["coords"], horizontalalignment = "center", fontsize = 7, color = "black", fontweight = "bold")
# Plots the bike trails we joined with the fivecol data set
bikewcol.plot(ax = ax, color = "green")
# Plots the points of the five colleges
gdf.plot(ax=ax)
gdf["geometry"].plot(ax=ax, color = "purple").set_title("Five College Area Bike Trail Connectivity")
Text(0.5, 1.0, 'Five College Area Bike Trail Connectivity')
In this map, we can see the five colleges represented by purple dots and the bike trails in the area represented by the green lines. We can easily see which colleges have a bike route between them. We can easily get from Smith College to UMass Amherst and Amherst College in Amherst. There are no bike routes connecting Hampshire or Mount Holyoke to the three other colleges. This map can tell someone who is planning to take a bike route where they can go. This map can also assist people taking classes as part of the 5 College Consortium, as they can see if they can bike to the other colleges. This map can also be used to identify where there is a lack of bike routes. We see many between Northampton and Amherst, but little to none between South Hadley and Hadley.
We decided to create a map of the bike routes between the 5 colleges becasue we enjoy biking and wanted to see which routes we could use to get from one college to another.
Interactive Mapping¶
For the interactive map, we used Folium to visualize the Five College campuses alongside regional bike paths. First, we initialized a base map centered on the Northampton area using latitude and longitude coordinates. We then added markers for each of the Five College campuses (Smith, UMass, Amherst College, Hampshire, and Mount Holyoke), allowing users to click on each location and view its name.
Next, we loaded bike trail data from a GeoJSON file using GeoPandas, which allows us to work with spatial data in Python. Because web maps require coordinates in the standard GPS format (WGS84 / EPSG:4326), we converted the dataset to this coordinate system to ensure proper alignment on the map.
We then added the bike paths as a GeoJSON layer on top of the base map, enabling us to visualize how the trail network connects the different campuses. Finally, we included a layer control, which allows users to toggle the bike paths on and off for easier exploration.
import folium
import geopandas as gpd
# Create a base interactive map centered on the Five College area
# location = [latitude, longitude]
# zoom_start controls how zoomed in the map is initially
m = folium.Map(location=[42.32, -72.64], zoom_start=12)
# ------------------------------------------------------------
# Add campus locations as clickable markers
# ------------------------------------------------------------
# Dictionary mapping campus names to their coordinates
locations = {
"Smith": [42.318, -72.640],
"UMass": [42.386, -72.529],
"Amherst College": [42.372, -72.518],
"Hampshire": [42.325, -72.531],
"Mount Holyoke": [42.255, -72.574]
}
# Loop through each campus and add a marker to the map
# popup=name means the campus name appears when you click the marker
for name, coords in locations.items():
folium.Marker(coords, popup=name).add_to(m)
# ------------------------------------------------------------
# Load bike path data using GeoPandas
# ------------------------------------------------------------
# Read a GeoJSON file containing bike trail geometries
# (lines representing bike paths)
bike = gpd.read_file("Bike_Trails_(Feature_Service).geojson")
# Convert coordinate system to WGS84 (EPSG:4326) is the standard coordinate
# system used by GPS and most web maps
# This ensures compatibility with web maps like Folium (lat/lon format)
bike = bike.to_crs(epsg=4326)
# ------------------------------------------------------------
# Add bike paths to the map
# ------------------------------------------------------------
# Convert the GeoDataFrame into a GeoJSON layer that Folium can display
# This overlays the bike paths on top of the base map
folium.GeoJson(
bike,
name="Bike Paths" # Name used in the layer control
).add_to(m)
# ------------------------------------------------------------
# Add layer control (interactive toggle)
# ------------------------------------------------------------
# Allows users to turn layers (like bike paths) on/off
folium.LayerControl().add_to(m)
# ------------------------------------------------------------
# Display the final interactive map
# ------------------------------------------------------------
m
Voronoi Polygons¶
We created an interactive map of the Five College campuses and visualized the regions closest to each campus using Voronoi polygons. We began by storing campus names and their latitude/longitude coordinates in a dictionary, then converted that into a pandas DataFrame for easier handling.
Next, we turned the DataFrame into a GeoDataFrame by adding a geometry column with point objects and specifying the coordinate reference system (EPSG:4326), which is the standard GPS format. Because Voronoi diagrams rely on accurate distance calculations, we reprojected the data into a projected coordinate system (EPSG:3857) where distances are measured in meters. We then generated Voronoi polygons, which divide the map into regions such that each region contains all points closest to a given campus.
After creating these polygons, we converted them back to latitude/longitude so they could be displayed properly in Folium. Finally, we built an interactive map centered on the Five College area, added markers for each campus, overlaid the Voronoi regions as a GeoJSON layer, and included a layer control for interactivity. The result is a map that not only shows where each campus is located, but also illustrates the geographic area that is closest to each one.
import geopandas as gpd
import pandas as pd
import folium
# ------------------------------------------------------------
# 1. Store campus coordinates
# ------------------------------------------------------------
# Dictinary Format: "Campus name": [latitude, longitude]
locations = {
"Smith": [42.318, -72.640],
"UMass": [42.386, -72.529],
"Amherst College": [42.372, -72.518],
"Hampshire": [42.325, -72.531],
"Mount Holyoke": [42.255, -72.574]
}
# ------------------------------------------------------------
# 2. Convert the dictionary into a regular pandas DataFrame
# ------------------------------------------------------------
# This creates a table with three columns:
df = pd.DataFrame([
{"name": name, "lat": coords[0], "lon": coords[1]}
for name, coords in locations.items()
])
# ------------------------------------------------------------
# 3. Convert the regular DataFrame into a GeoDataFrame
# ------------------------------------------------------------
# A GeoDataFrame is like a pandas DataFrame, but with a geometry column.
# The geometry column stores spatial objects such as points, lines, or polygons.
# gpd.points_from_xy() creates point geometries from longitude and latitude.
# crs="EPSG:4326" tells GeoPandas that these coordinates are in WGS84,
# the standard latitude/longitude coordinate system used by GPS and web maps.
campuses = gpd.GeoDataFrame(
df,
geometry=gpd.points_from_xy(df["lon"], df["lat"]),
crs="EPSG:4326"
)
# ------------------------------------------------------------
# 4. Reproject the data before doing geometric calculations
# ------------------------------------------------------------
# Voronoi polygons are distance-based geometric operations.
# Latitude/longitude coordinates are measured in degrees, not meters.
# We temporarily convert the campus points to EPSG:3857, a projected
# coordinate system commonly used for web mapping.
campuses_proj = campuses.to_crs(epsg=3857)
# ------------------------------------------------------------
# 5. Create Voronoi polygons
# ------------------------------------------------------------
# Voronoi polygons divide space based on nearest points.
# Each polygon represents the area closest to one campus compared with all others.
voronoi = campuses_proj.geometry.voronoi_polygons()
# ------------------------------------------------------------
# 6. Convert geometric results back to latitude/longitude
# ------------------------------------------------------------
# Folium needs data in EPSG:4326 latitude/longitude format.
# Since we created the Voronoi objects in EPSG:3857,
# we convert them back to EPSG:4326 before mapping.
voronoi = gpd.GeoSeries(voronoi, crs=campuses_proj.crs).to_crs(epsg=4326)
# ------------------------------------------------------------
# 7. Create the interactive Folium map
# ------------------------------------------------------------
# This creates a base map centered around the Five College area.
# zoom_start controls the initial zoom level.
m = folium.Map(location=[42.32, -72.58], zoom_start=11)
# ------------------------------------------------------------
# 8. Add campus markers to the map
# ------------------------------------------------------------
# This loops through each campus in the GeoDataFrame.
# Each row contains a campus name and a point geometry.
# row.geometry.y = latitude
# row.geometry.x = longitude
# Folium markers need coordinates in [latitude, longitude] order.
for row in campuses.itertuples():
folium.Marker(
[row.lat, row.lon],
popup=row.name
).add_to(m)
# ------------------------------------------------------------
# 9. Add Voronoi polygons to the map
# ------------------------------------------------------------
# This adds the Voronoi regions as a GeoJSON layer.
# name="Voronoi polygons" gives the layer a label in the layer control.
# fillOpacity controls how transparent the polygons are.
# weight controls the thickness of the polygon borders.
folium.GeoJson(
voronoi,
name="Voronoi polygons",
style_function=lambda x: {
"fillOpacity": 0.2,
"weight": 2
}
).add_to(m)
# ------------------------------------------------------------
# 10. Add layer control
# ------------------------------------------------------------
# This adds a small control box to the map.
folium.LayerControl().add_to(m)
# ------------------------------------------------------------
# 11. Display the final interactive map
# ------------------------------------------------------------
m
The end. Thank you!