Plotly Tutorial for SDS 271 Students¶
Author: Amber Zhang, Anna Zhao, Helen Zhou
Part 1: Introduction to Plotly and Basic Functions¶
1.What is Plotly and When to Use It?¶
Plotly is an open-source Python graphing library for creating interactive visualizations for the web. Plotly supports many common chart types, including line charts, bar charts, scatter plots, histograms, box plots, heatmaps, maps, and 3D charts. Unlike static plots from matplotlib or seaborn, Plotly charts allow you to interact with the graphs directly by hovering over data points, zooming in and out, selecting parts of the plot, and turning legend categories on or off.
When to use Plotly:¶
- You want to create interactive visualizations.
- You are building a dashboard.
- You need 3D or scientific plots.
- You are working with geospatial data.
- You want to share an interactive report.
How it compares to tools you've already learned in class:¶
| Library | Main Strength | Typical Output | When We Might Use It |
|---|---|---|---|
| matplotlib | Very flexible and highly customizable | Mostly static plots | When we need detailed control over every part of a figure, especially for publication-quality visualizations |
| seaborn | Statistical visualizations with simple syntax and attractive defaults | Mostly static plots | When we want quick statistical plots such as box plots, histograms, heatmaps, or regression plots |
| plotly | Interactive visualizations for exploration and dashboards | Interactive plots | When we want users to hover, zoom, filter, and explore the data visually, especially in web-based projects |
A simple way to think about the difference is:
- matplotlib gives detailed control and customization.
- seaborn makes statistical plotting easier and cleaner.
- plotly makes plots interactive and easier to explore.
2. Installation and Imports¶
If you haven't installed Plotly yet, run uv pip install plotly in your terminal.
import plotly.io as pio
pio.renderers.default = "notebook"
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
3. Two Core Interfaces in Plotly¶
Plotly has two main ways to create visualizations:
plotly.express(usually imported aspx)plotly.graph_objects(usually imported asgo)
Both create interactive Plotly figures, but they are designed for different levels of complexity and control.
Plotly Express (px)¶
plotly.express is the high-level interface in Plotly. It is designed to create common statistical visualizations with very little code.
A single Plotly Express function call can often generate a complete interactive figure with:
- data
- axes
- title
- legends
- colors
- hover information
This makes Plotly Express a good starting point for beginners and for exploratory data analysis.
Graph Objects (go)¶
plotly.graph_objects is the lower-level interface in Plotly. It provides much more detailed control over every part of the figure, but it usually requires more code.
With Graph Objects, we build a figure more manually by:
- creating an empty figure
- adding traces, such as scatter points, bars, or lines
- updating the layout
- customizing details more directly
Plotly Express vs. Graph Objects¶
| Feature | Plotly Express (px) |
Graph Objects (go) |
|---|---|---|
| Difficulty Level | Easier | More advanced |
| Amount of Code | Short and concise | More verbose |
| Best For | Quick analysis and exploration | Full customization |
| Works Well With | pandas DataFrames | Manual figure building |
| Learning Curve | Beginner-friendly | Steeper |
| Typical Use Case | Exploratory data analysis | Complex dashboards and custom visualizations |
To see the difference clearly, we will create the same scatter plot using both interfaces.
iris = px.data.iris()
iris.head()
| sepal_length | sepal_width | petal_length | petal_width | species | species_id | |
|---|---|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | 1 |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa | 1 |
| 2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa | 1 |
| 3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa | 1 |
| 4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa | 1 |
Creating a Scatter Plot with Plotly Express¶
The following code creates a scatter plot using Plotly Express.
Notice that we only need one main function: px.scatter().
fig = px.scatter(
iris,
x="sepal_width",
y="sepal_length",
color="species",
title="Iris Dataset Scatter Plot: Plotly Express"
)
fig.show()
In this example, px.scatter() automatically:
- creates the scatter plot
- maps
sepal_widthto the x-axis - maps
sepal_lengthto the y-axis - colors the points by
species - creates a legend
- allows hovering and zooming
- adds hover information
- returns an interactive Plotly figure
This is why Plotly Express is useful for quick exploratory data analysis.
Creating the Same Scatter Plot with Graph Objects¶
Now we will create a similar scatter plot using Graph Objects.
The final plot is similar, but the code is longer because we have to manually create one trace for each species.
fig = go.Figure()
for species_name in iris["species"].unique():
species_data = iris[iris["species"] == species_name]
fig.add_trace(
go.Scatter(
x=species_data["sepal_width"],
y=species_data["sepal_length"],
mode="markers",
name=species_name,
text=species_data["species"],
hovertemplate=(
"Species: %{text}<br>"
"Sepal Width: %{x}<br>"
"Sepal Length: %{y}<extra></extra>"
)
)
)
fig.update_layout(
title="Iris Dataset Scatter Plot: Graph Objects",
xaxis_title="Sepal Width",
yaxis_title="Sepal Length"
)
fig.show()
Instead of using one function call, we have to:
- create an empty figure with
go.Figure() - loop through each species
- add one scatter trace for each species
- manually define hover information
- manually update the title and axis labels
This gives us more control, but it also makes the code longer.
Why This Tutorial Focuses on Plotly Express¶
In this tutorial, we will mainly use Plotly Express.
There are several reasons for this:
- It is beginner-friendly. Plotly Express has a simpler syntax that is easier to learn if you are new to Plotly.
- It works naturally with pandas. In SDS 271, we often clean and manipulate data using pandas before visualization. Plotly Express integrates directly with pandas DataFrames.
- It is excellent for exploratory data analysis.
- It requires much less code. Compared to Graph Objects, Plotly Express can create high-quality interactive visualizations with only a few lines of code.
Even though we focus on Plotly Express in this tutorial, it is important to know that both interfaces are part of the same Plotly library. Many advanced users start with Plotly Express and later use Graph Objects when they need additional customization.
Connection to SDS 271¶
This tutorial connects directly to what we have learned in SDS 271. In this class, we often use pandas to load, clean, filter, group, and summarize data before making visualizations. Plotly fits into this same workflow. The main difference is that after we prepare the data with pandas, Plotly helps us turn the cleaned DataFrame into an interactive visualization.
Plotly also builds on ideas we have already practiced with seaborn. For example, using color in Plotly is similar to using hue in seaborn because both arguments let us compare groups within the same plot. However, Plotly adds interactivity, such as hover labels, zooming, and clickable legends. Because of this, Plotly is useful when we want the audience to explore the data more actively instead of only looking at a static graph.
4. Basic Plotly Express Functions¶
In this section, we will introduce several commonly used Plotly Express functions.
Most Plotly Express functions follow a similar structure:
fig = px.function_name(
data_frame,
x = "column_name",
y = "column_name",
color = "grouping_variable",
title = "plot title"
)
fig.show()
data_frame: the pandas DataFrame containing the datax: the variable shown on the x-axisy: the variable shown the y-axiscolor: Optional grouping variable shown using colortitle: plot title
4.1 Scatter Plots with px.scatter()¶
Basic Syntax¶
px.scatter(
data_frame,
x,
y,
color = None,
size = None,
hover_name = None,
hover_data = None,
title = None
)
Important arguments:
size: optional variable controlling marker sizehover_name: main label shown when hoveringhover_data: additional columns shown in the hover label
When would we use px.scatter()?¶
We would use px.scatter() when:
- comparing two numerical variables
- exploring relationships or patterns
- identifying clusters
- visualizing groups with color
Example¶
iris = px.data.iris()
iris.head()
| sepal_length | sepal_width | petal_length | petal_width | species | species_id | |
|---|---|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | 1 |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa | 1 |
| 2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa | 1 |
| 3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa | 1 |
| 4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa | 1 |
fig = px.scatter(
iris,
x = "sepal_width",
y = "sepal_length",
color = "species",
hover_name = "species",
title = "Sepal Width vs. Sepal Length"
)
fig.show()
This scatter plot explores the relationship between sepal width and sepal length in the iris dataset. The argument color = "species" tells Plotly to color points by species. This is similar to using hue in seaborn.
Because this is a Plotly figure, the plot is interactive by default:
- hovering over the points shows detailed information
- clicking color legend categories hides or shows groups
- dragging over an area zooms into that region
- double-clicking returns to the original view
Adding More Information with hover_data¶
The hover_data argument lets us choose which additional columns should appear in the tooltip. This is useful because we can keep the plot visually simple while still making readers see more detailed values.
fig = px.scatter(
iris,
x = "sepal_width",
y = "sepal_length",
color = "species",
hover_name = "species",
hover_data = ["petal_width", "petal_length"],
title = "Sepal Width vs. Sepal Length with Extra Hover Information"
)
fig.show()
In this version, the hover label also includes petal width and petal length.
This is especially useful for exploratory data analysis because we can include more information without making the graph visually crowded.
4.2 Line Charts with px.line()¶
Basic Syntax¶
px.line(
data_frame,
x,
y,
color = None,
markers = False,
title = None
)
Important arguments:
markers: whether to show points on the line
When would we use px.line()?¶
We would use px.line() when:
- studying trends over time
- visualizing change across observations
- comparing trajectories between groups
Example¶
gapminder = px.data.gapminder()
usa = gapminder[gapminder["country"] == "United States"]
usa.head()
| country | continent | year | lifeExp | pop | gdpPercap | iso_alpha | iso_num | |
|---|---|---|---|---|---|---|---|---|
| 1608 | United States | Americas | 1952 | 68.44 | 157553000 | 13990.48208 | USA | 840 |
| 1609 | United States | Americas | 1957 | 69.49 | 171984000 | 14847.12712 | USA | 840 |
| 1610 | United States | Americas | 1962 | 70.21 | 186538000 | 16173.14586 | USA | 840 |
| 1611 | United States | Americas | 1967 | 70.76 | 198712000 | 19530.36557 | USA | 840 |
| 1612 | United States | Americas | 1972 | 71.34 | 209896000 | 21806.03594 | USA | 840 |
fig = px.line(
usa,
x = "year",
y = "lifeExp",
markers = True,
title = "Life Expectancy in the United States Over Time"
)
fig.show()
This line chart shows that life expectancy in the United States generally increased over time. The argument markers=True adds points to the line. This makes each individual observation easier to see.
Comparing Multiple Groups with color¶
We can use color to draw separate lines for different groups. This is useful when we want to compare trends across groups.
gapminder_small = gapminder[
gapminder["country"].isin(["United States", "Canada", "Mexico"])
]
fig = px.line(
gapminder_small,
x = "year",
y = "lifeExp",
color = "country",
markers = True,
title = "Life Expectancy Over Time for Three Countries"
)
fig.show()
color = "country" creates one line for each country. This is similar to using hue in seaborn. The difference is that Plotly also makes the legend interactive, so users can click countries on or off.
4.3 Bar Charts with px.bar()¶
Basic Syntax¶
px.bar(
data_frame,
x,
y,
color = None,
barmode = "relative",
title = None
)
Important arguments:
barmode: controls whether grouped bars are stacked or side-by-side
When would we use px.bar()?¶
We would use px.bar() when:
- comparing categories
- visualizing counts or frequencies
- displaying summary statistics
- showing grouped or aggregated data
Example¶
tips = px.data.tips()
tips.head()
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
avg_bill = (
tips.groupby("day")["total_bill"]
.mean()
.reset_index()
)
avg_bill
| day | total_bill | |
|---|---|---|
| 0 | Fri | 17.151579 |
| 1 | Sat | 20.441379 |
| 2 | Sun | 21.410000 |
| 3 | Thur | 17.682742 |
fig = px.bar(
avg_bill,
x = "day",
y = "total_bill",
title = "Average Restaurant Bill by Day"
)
fig.show()
This bar chart compares the average restaurant bill across days.
Grouped and Stacked Bar Charts¶
Bar charts can also compare groups within categories.
The color argument creates different colored groups. The barmode argument controls how those groups are displayed:
barmode = "group"places bars side by sidebarmode = "stack"stacks bars on top of each other
This is useful when we want to compare both categories and subgroups.
bill_by_day_sex = (
tips.groupby(["day", "sex"])["total_bill"]
.mean()
.reset_index()
)
bill_by_day_sex
| day | sex | total_bill | |
|---|---|---|---|
| 0 | Fri | Female | 14.145556 |
| 1 | Fri | Male | 19.857000 |
| 2 | Sat | Female | 19.680357 |
| 3 | Sat | Male | 20.802542 |
| 4 | Sun | Female | 19.872222 |
| 5 | Sun | Male | 21.887241 |
| 6 | Thur | Female | 16.715312 |
| 7 | Thur | Male | 18.714667 |
# barmode = "group"
fig = px.bar(
bill_by_day_sex,
x = "day",
y = "total_bill",
color = "sex",
barmode = "group",
title = "Average Restaurant Bill by Day and Sex"
)
fig.show()
The side-by-side bars make it easier to compare the two groups within each day.
# barmode = "stack"
fig = px.bar(
bill_by_day_sex,
x = "day",
y = "total_bill",
color = "sex",
barmode = "stack",
title = "Stacked Bar Chart Example"
)
fig.show()
A stacked bar chart is useful when we care about how smaller groups contribute to a larger total.
4.4 Histograms with px.histogram()¶
Basic Syntax¶
px.histogram(
data_frame,
x,
color = None,
nbins = None,
marginal = None,
title = None
)
Important arguments:
nbins: controls the number of binsmarginal: adds a supporting plot such as a boxplot or rug plot
When would we use px.histogram()?¶
We would use px.histogram() when:
- exploring numerical distributions
- checking skewness
- identifying unusual observations
Example¶
fig = px.histogram(
tips,
x = "total_bill",
nbins = 20,
title = "Distribution of Restaurant Bills"
)
fig.show()
Adding a Marginal Plot¶
The marginal argument adds a small supporting plot to the histogram. For example:
marginal = "box"adds a boxplotmarginal = "rug"adds small marks for individual observationsmarginal = "violin"adds a violin plot
fig = px.histogram(
tips,
x = "total_bill",
nbins = 20,
marginal = "box",
title = "Distribution of Restaurant Bills with a Marginal Boxplot"
)
fig.show()
In this plot, the histogram shows the distribution of restaurant bills, while the marginal boxplot summarizes the median, spread, and possible outliers.
5. What Does a Plotly Function Return?¶
One important idea in Plotly is that a plot is stored as a figure object.
When we write code like fig = px.scatter(...), Plotly Express creates a complete interactive figure and stores it in the variable fig.
This fig object contains two main parts:
- Data: the visual marks shown in the plot, such as points, bars, or lines
- Layout: the non-data parts of the plot, such as the title, axis labels, legend, font size, and theme
This matters because we can first create a basic figure using Plotly Express, and then customize it using methods such as:
fig.show(): displays the figurefig.update_layout(): changes the overall layout, such as title, theme, and axis labelsfig.update_traces(): changes the visual marks, such as marker size or opacityfig.update_xaxes()&fig.update_yaxes(): customize the axes
A common Plotly workflow may look like this:
fig = px.some_plot(...)
fig.update_layout(...)
fig.update_traces(...)
fig.show()
Example¶
fig = px.scatter(
iris,
x = "sepal_width",
y = "sepal_length",
color = "species",
title = "Basic Plotly Express Figure"
)
fig.update_layout(
template = "plotly_white",
title_x = 0.5,
xaxis_title = "Sepal Width",
yaxis_title = "Sepal Length"
)
fig.update_traces(
marker = dict(size = 9, opacity = 0.75)
)
fig.show()
In this example, px.scatter() creates the original figure. Then update_layout() changes the theme, centers the title, and customizes the axis labels; update_traces() changes the appearance of the points; fig.show() displays the final interactive figure
This is different from seaborn, where we often create a plot and then use matplotlib commands to customize it. In Plotly, the figure object itself stores the data and layout, so we can keep modifying the same object.
6. Summary¶
In this part, you've learned the basic ideas about Plotly:
- Plotly is used to create interactive visualizations.
- Plotly Express is the high-level interface and is the best starting point for beginners.
- Graph Objects gives more control, but usually requires more code.
- Plotly Express functions usually take a pandas DataFrame and column names as arguments.
- Plotly functions return a figure object, usually stored as
fig. - We display figures using
fig.show(). - We can customize figures with methods such as
update_layout()andupdate_traces(). - Interactive features, such as tooltips, zooming, and legends, are built into Plotly by default.
In the next part of the tutorial, we will apply these ideas to a real dataset.
Part 2: Dataset Introduction and Preparation¶
The dataset comes from data.cityofnewyork.us and is collected by the Mayor’s Office of Media and Entertainment (MOME), New York. This dataset contains NYC film permit records from 2023 to early 2026. The data has been collected since July 27, 2015, each row represents one permit, with 16793 data points.
This dataset is meaningful because it allows our group to examine how film-related activities are distributed across New York City. Therefore, by applying plotly packaage, we aim to explore how film activities changed across time, which boroughs have the most permits, and what types of production are most active in each borough.
In this section, we use Plotly to explore patterns in New York City film permit activity across time, location, and production type. Compared with static visualization tools, Plotly allows readers to interact directly with the figures by hovering over data points, filtering categories through the legend, and viewing exact values without cluttering the chart itself. This makes it especially useful for exploring a dataset like this one, where multiple dimensions of information—such as date, borough, and category—can be examined together.
Why Plotly Works Well for This Dataset¶
Overall, Plotly is a strong fit for this dataset because it allows us to examine changes across time, compare borough-level patterns, and explore category composition within boroughs in a more interactive way. In our analysis, hover labels made it easy to inspect exact monthly and borough-level values, while the interactive legend was especially helpful for comparing category composition without overwhelming the reader.
Taken together, these features made it easier to move between broad patterns and precise values, which is especially useful for a dataset that combines time, location, and production type.
Data Cleaning¶
import pandas as pd
import plotly.express as px
df = pd.read_csv("Film_Permits_20260410.csv")
df.head()
| EventID | EventType | StartDateTime | EndDateTime | EnteredOn | EventAgency | ParkingHeld | Borough | CommunityBoard(s) | PolicePrecinct(s) | Category | SubCategoryName | Country | ZipCode(s) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 911404 | Theater Load in and Load Outs | 01/09/2026 05:00:00 AM | 01/11/2026 01:00:00 PM | 01/06/2026 11:36:04 PM | Mayor's Office of Media & Entertainment | AMSTERDAM AVENUE between WEST 73 STREET and ... | Manhattan | 7, | 20, | Theater | Theater | United States of America | 10023, |
| 1 | 911038 | Shooting Permit | 01/08/2026 06:00:00 AM | 01/08/2026 11:00:00 PM | 01/06/2026 10:15:51 AM | Mayor's Office of Media & Entertainment | EAST 37 STREET between PARK AVENUE and MADIS... | Manhattan | 5, 6, | 14, 17, | Television | Cable-episodic | United States of America | 10016, 10018, |
| 2 | 911009 | Shooting Permit | 01/08/2026 07:30:00 AM | 01/08/2026 11:00:00 PM | 01/06/2026 09:11:12 AM | Mayor's Office of Media & Entertainment | MALCOLM X BOULEVARD between WEST 118 STREET a... | Manhattan | 10, | 28, | Television | Episodic series | United States of America | 10026, 10027, |
| 3 | 910739 | Shooting Permit | 01/08/2026 09:00:00 AM | 01/09/2026 01:00:00 AM | 01/05/2026 01:47:01 PM | Mayor's Office of Media & Entertainment | JOHNSON AVENUE between WHITE STREET and BOGART... | Brooklyn | 1, | 90, | Television | Episodic series | United States of America | 11206, 11237, |
| 4 | 910636 | Shooting Permit | 01/08/2026 07:00:00 AM | 01/08/2026 08:00:00 PM | 01/05/2026 11:35:43 AM | Mayor's Office of Media & Entertainment | EAST 66 STREET between MADISON AVENUE and PA... | Manhattan | 8, | 19, | Commercial | Commercial | United States of America | 10065, |
There are twelve key variables collected by the dataset.
EventID: Auto-generated unique event identification number
EventType: Type of Activity for this approved permit
StartDateTime: Activity scheduled to begin
EndDateTime: Activity scheduled to be completed
EnteredOn: Activity scheduled to be completed
EventAgency: Date permit request submitted to MOME
ParkingHeld: Locations of request to hold parking in advance for permitted filming activity
Borough: First borough of activity for the day
CommunityBoard(s): First Community Board of activity for the day
PolicePrecinct(s): First Police precinct of activity for the day
Category: Description of production as selected by permit applicant
SubCategoryName: More specific description of production as selected by permit applicant
Country: Project origin
ZipCode(s): First zip code of production activity
print(df.isna().sum())
# Convert StartDateTime to datetime
df["StartDateTime"] = pd.to_datetime(df["StartDateTime"], errors="coerce")
# Clean text fields
df["Borough"] = df["Borough"].astype(str).str.strip()
df["Category"] = df["Category"].astype(str).str.strip()
# Replace empty strings with missing values
df["Borough"] = df["Borough"].replace("", pd.NA)
df["Category"] = df["Category"].replace("", pd.NA)
EventID 0 EventType 0 StartDateTime 326 EndDateTime 151 EnteredOn 1806 EventAgency 0 ParkingHeld 0 Borough 0 CommunityBoard(s) 10 PolicePrecinct(s) 10 Category 0 SubCategoryName 0 Country 0 ZipCode(s) 10 dtype: int64
Part 3: Interactive Visualizations With NYC Film Permit Dataset¶
Visualizing Permit Activity Over Time¶
To understand how film permit activity changes over time, we aggregated permit records by month using the permit start date. This provides a clearer view of broader temporal patterns than looking at individual permit entries. A monthly perspective is useful for identifying periods of relatively high or low activity and for examining whether filming patterns remain stable across the years covered by the dataset.
Because the most recent month in the dataset (January 2026) contains only partial observations, we exclude it from the figure to avoid misleading comparisons with earlier months. December 2025 is retained because it represents a full month of observations, even though its permit count is noticeably lower than the preceding months.
The interactive line chart below also demonstrates one of Plotly’s main strengths. Instead of overcrowding the figure with labels, Plotly allows readers to inspect exact monthly values through hover interactions, making the chart both readable and detailed.
# Data cleaning
# Drop rows with missing start dates
df_time = df.dropna(subset=["StartDateTime"]).copy()
# Create a year-month variable
df_time["year_month"] = df_time["StartDateTime"].dt.to_period("M").dt.to_timestamp()
monthly_counts = (
df_time.groupby("year_month")
.size()
.reset_index(name="permit_count")
.sort_values("year_month")
)
last_month = monthly_counts["year_month"].max()
df_last_month = df_time[df_time["year_month"] == last_month]
if df_last_month["StartDateTime"].dt.day.max() < 25:
monthly_counts = monthly_counts[monthly_counts["year_month"] != last_month]
fig = px.line(
monthly_counts,
x="year_month",
y="permit_count",
markers=True,
title="Monthly Film Permit Counts in New York City",
labels={
"year_month": "Month",
"permit_count": "Number of Permits"
},
hover_data={
"year_month": False,
"permit_count": True
}
)
fig.update_traces(
hovertemplate="<b>%{x|%B %Y}</b><br>Permits: %{y}<extra></extra>"
)
fig.update_layout(
template="plotly_white",
hovermode="x unified",
title_x=0.5,
xaxis_title="Month",
yaxis_title="Number of Permits",
width=1100,
height=500,
font=dict(size=13)
)
fig.update_xaxes(range=["2023-01-01", "2025-12-31"])
fig.show()
For the monthly trend figure, we excluded the partial January 2026 observations (57 records), leaving 16,736 records in the plotted time series.
This figure shows that film permit activity in New York City varies meaningfully across time rather than staying at a constant level. Permit counts rise and fall across the years in the dataset, with several visible peaks followed by lower-activity periods. This suggests that filming activity may reflect broader seasonal, logistical, or administrative patterns rather than occurring evenly month by month.
At the same time, the figure also shows that some declines are sharper than others. For example, the drop at the end of 2025 is substantial even though December 2025 is a complete month in the dataset. This means the decline should not automatically be treated as a data artifact, although it would require additional context to explain fully. Plotly is especially useful here because hover labels allow readers to inspect exact monthly counts without reducing the overall readability of the chart.
Comparing Filming Activity Across Boroughs¶
After examining how permit activity changes over time, we next compare filming activity across boroughs. Because each permit record includes location information, borough-level counts provide a useful way to see whether filming activity is concentrated in particular parts of New York City or distributed more broadly across the city.
The bar chart below uses Plotly to compare total permit counts by borough. In addition to showing the overall ranking visually, the chart allows readers to inspect exact counts and each borough’s share of total permits through hover interactions.
df_borough = df.dropna(subset=["Borough"]).copy()
borough_counts = (
df_borough.groupby("Borough")
.size()
.reset_index(name="permit_count")
.sort_values("permit_count", ascending=False)
)
borough_counts["share_percent"] = (
borough_counts["permit_count"] / borough_counts["permit_count"].sum() * 100
).round(1)
fig = px.bar(
borough_counts,
x="Borough",
y="permit_count",
title="Film Permit Counts by Borough",
labels={"Borough": "Borough", "permit_count": "Permit Count"},
text="permit_count"
)
fig.update_traces(
textposition="outside",
hovertemplate=(
"<b>%{x}</b><br>"
"Permits: %{y}<br>"
"Share of total: %{customdata[0]}%<extra></extra>"
),
customdata=borough_counts[["share_percent"]].values
)
fig.update_layout(
template="plotly_white",
title_x=0.5,
xaxis_title="Borough",
yaxis_title="Permit Count",
height=550,
font=dict(size=13)
)
fig.show()
This figure shows that film permit activity is highly concentrated in Manhattan and Brooklyn. Manhattan has by far the largest number of permits, followed by Brooklyn, while Queens accounts for a much smaller total. The Bronx and Staten Island represent only a very small share of permit activity in comparison.
This pattern suggests that filming is not distributed evenly across New York City. Instead, it appears to cluster in boroughs that may offer denser commercial areas, more recognizable urban settings, stronger production infrastructure, or locations that are especially attractive to media producers. Plotly is useful here because it makes the ranking immediately visible while also allowing readers to inspect precise counts and shares through hover interactions.
Exploring Production Categories Interactively¶
Finally, we examine production categories to better understand what kinds of filming activity appear in the permit data. Looking only at total permit counts can show where and when permit activity occurs, but category-level analysis helps explain what types of production are driving those broader patterns.
To keep the figure readable, we focus on the five most common categories in the dataset. Instead of comparing raw counts alone, the chart below shows the composition of these categories within each borough. This makes it easier to compare category patterns across boroughs without letting the largest boroughs dominate the visual comparison. Plotly’s color encoding and interactive legend also allow readers to isolate categories and explore the chart more flexibly.
df_cat = df.dropna(subset=["Borough", "Category"]).copy()
df_cat = df_cat[(df_cat["Borough"] != "") & (df_cat["Category"] != "")]
top_categories = (
df_cat["Category"]
.value_counts()
.head(5)
.index
.tolist()
)
df_cat_top = df_cat[df_cat["Category"].isin(top_categories)].copy()
borough_order = ["Manhattan", "Brooklyn", "Queens", "Bronx", "Staten Island"]
df_cat_top["Borough"] = pd.Categorical(
df_cat_top["Borough"],
categories=borough_order,
ordered=True
)
category_counts = (
df_cat_top.groupby(["Borough", "Category"])
.size()
.reset_index(name="permit_count")
)
category_counts["share"] = (
category_counts.groupby("Borough")["permit_count"]
.transform(lambda x: x / x.sum())
)
category_counts["share_percent"] = (category_counts["share"] * 100).round(1)
category_order = (
df_cat_top["Category"]
.value_counts()
.loc[top_categories]
.index
.tolist()
)
category_counts["Category"] = pd.Categorical(
category_counts["Category"],
categories=category_order,
ordered=True
)
category_counts = category_counts.sort_values(["Borough", "Category"])
fig = px.bar(
category_counts,
x="Borough",
y="share",
color="Category",
title="Composition of Top Film Permit Categories by Borough",
labels={
"Borough": "Borough",
"share": "Share of Permits",
"Category": "Category"
},
barmode="stack"
)
fig.update_traces(
hovertemplate=(
"<b>%{x}</b><br>"
"Category: %{fullData.name}<br>"
"Share within borough: %{customdata[0]}%<br>"
"Permits: %{customdata[1]}<extra></extra>"
),
customdata=category_counts[["share_percent", "permit_count"]].values
)
fig.update_layout(
template="plotly_white",
title_x=0.5,
xaxis_title="Borough",
yaxis_title="Share of Permits",
height=600,
font=dict(size=13),
legend_title_text="Category"
)
fig.update_yaxes(tickformat=".0%")
fig.show()
This figure shows that the category composition of film permit activity differs across boroughs. Television represents a substantial share of permits in every borough, but the relative importance of the other categories varies. Manhattan and Brooklyn appear to have a more mixed profile, while Queens and the Bronx are more heavily dominated by Television permits. Staten Island appears to have a relatively large share of Film permits, although its total permit volume is much smaller than that of the other boroughs.
This composition-based view adds an important layer to the earlier results. The previous borough chart showed where permit activity is concentrated overall, while this figure helps explain what kinds of productions make up that activity in each location. Plotly is especially useful here because the interactive legend allows readers to focus on one category at a time, making multi-category comparisons easier than they would be in a static chart.
Conclusion¶
Overall, this tutorial shows that Plotly is a useful library for creating interactive visualizations in Python. Compared with static plots from libraries like matplotlib or seaborn, Plotly makes it easier for readers to explore the data by hovering over points, zooming in, and clicking legend items to show or hide groups. At the same time, Plotly still fits naturally into the data science workflow we have learned in SDS 271 because we can use pandas to clean, organize, and summarize the data before passing it into Plotly.
Through the NYC film permit dataset, we used Plotly to examine permit activity over time, compare the number of permits across boroughs, and explore how different production categories are distributed across New York City. The visualizations showed that film permit activity is not evenly distributed. Manhattan and Brooklyn have much higher permit counts than the other boroughs, while the types of permitted productions also vary by location.
The main takeaway is that Plotly is very helpful when we want visualizations to be both informative and interactive. It does not replace pandas, seaborn, or matplotlib, but it adds another layer that makes data exploration more flexible and reader-friendly. For this reason, Plotly is a good tool to learn after building basic skills in data cleaning and static visualization.