# Import the pandas library, which is useful for data manipulation and analysis.
import pandas as pd
# Import the geopandas library, which extends pandas to support spatial data operations.
import geopandas as gpd
# Import the Point class from the shapely.geometry module, used for handling geometric points.
from shapely.geometry import Point
# Import the osmnx library, which simplifies the process of downloading and analyzing street networks and other geospatial data from OpenStreetMap.
import osmnx as ox
# Import the contextily library, which is used for adding basemaps (like raster tiles) to geospatial plots.
import contextily as cx
# Import the pyplot module from matplotlib, a library for creating static, animated, and interactive visualizations in Python.
import matplotlib.pyplot as plt
# Import the CRS class from pyproj, which provides tools for defining and transforming coordinate reference systems.
from pyproj import CRS
# Operating systems
import os
# Interactive maps
import folium
# Import the display function
from IPython.display import display
Lab in Python
In this lab, we will learn how to load, manipulate and visualize spatial data. In some senses, spatial data are usually included simply as “one more column” in a table. However, spatial is special sometimes and there are few aspects in which geographic data differ from standard numerical tables. In this session, we will extend the skills developed in the previous one about non-spatial data, and combine them. In the process, we will discover that, although with some particularities, dealing with spatial data in Python
largely resembles dealing with non-spatial data.
Importing modules
Datasets
Today we are going to go to London. We will be playing around with different datasets loading them both locally and dynamically from the web. You can download data manually, keep a copy on your computer, and load them from there.
Creating geographic data
First we will use the following commands create geographic datasets from scratch representing coordinates of some famous locations in London. Most projects start with pre-generated data, but it’s useful to create datasets to understand data structures.
# Create the DataFrame
= {
data 'name': ["The British Museum", "Big Ben", "King's Cross", "The Natural History Museum"],
'lon': [-0.1459604, -0.1272057, -0.1319481, -0.173734],
'lat': [51.5045975, 51.5007325, 51.5301701, 51.4938451]
}= pd.DataFrame(data)
poi_df
# Convert DataFrame to GeoDataFrame
= [Point(xy) for xy in zip(poi_df['lon'], poi_df['lat'])]
geometry = gpd.GeoDataFrame(poi_df, geometry=geometry)
poi_gdf
# Set the coordinate reference system (CRS)
=4326, inplace=True)
poi_gdf.set_crs(epsg
print(poi_gdf)
name lon lat geometry
0 The British Museum -0.145960 51.504598 POINT (-0.14596 51.50460)
1 Big Ben -0.127206 51.500732 POINT (-0.12721 51.50073)
2 King's Cross -0.131948 51.530170 POINT (-0.13195 51.53017)
3 The Natural History Museum -0.173734 51.493845 POINT (-0.17373 51.49385)
Types of Data
Now let’s look at the different types of geographical data starting with polygons. We will use a dataset that contains the boundaries of the districts of London. We can read it into an object named districts.
We first import the district shapefile use gpd.read_file
, we then plot it to make sure we are seeing it ‘correctly’.
# Read the shapefile for districts
= gpd.read_file("data/London/Polygons/districts.shp")
districts
# Create a simple plot
districts.plot()
# Display the plot
plt.show()
We them import a file of roads in London and plot it.
# Read the shapefile for A roads
= gpd.read_file("data/London/Lines/a_roads.shp")
a_roads
# If you needed to import a `geojson` file, this would be the function:
# a_roads = gpd.read_file("data/London/Lines/a_roads.geojson")
# Create a simple plot of the roads
a_roads.plot()
# Display the plot
plt.show()
We can also import point files. So far, we have imported shapefiles
and geojsons
, but we can also obtain data from urls like in the Open Science DIY session or from other sources like OpenStreetMap. Both R
and Python
have libraries that allow us to query OpenStreetMap.
Note that we use the method features_from_place
, which queries for points in a particular place (London in this case) and creates a GeoDataFrame of OSM features.
# Create an OSM query for "Greater London, U.K."
= "London, United Kingdom"
query = ox.features_from_place(query, tags={"amenity": ["restaurant", "bar", "pub"]})
restaurants # Create a simple plot of the roads
restaurants.plot()# Display the plot
plt.show()
And to inspect the data queried:
restaurants.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
MultiIndex: 12197 entries, ('node', 451152) to ('relation', 17299869)
Columns: 572 entries, addr:city to ways
dtypes: geometry(1), object(571)
memory usage: 53.6+ MB
You do not need to know at this point what happens behind the scenes when we run these lines but, if you are curious, we are making a query to OpenStreetMap (almost as if you typed “restaurant in London, UK” within Google Maps) and getting the response as a table of data, instead of as a website with an interactive map. Pretty cool, huh?
Note: the code cells above requires internet connectivity. For more about querying from osm see here.
Important: Be careful, if you query too much data, your environment is likely to get stuck.
Inspecting Spatial Data
Inspecting
Just like a dataframe
(see the OpenScience Lab), we can inspect the data (attributes table) within a spatial object. The most direct way to get from a file to a quick visualization of the data is by loading it and calling the plot
command. Let’s start by inspecting the data like we did for non spatial dataframes
.
We can see our data is very similar to a traditional, non-spatial dataFrame
, but with an additional column called geometry.
# Read the first 5 rows of the data
print(districts.head())
DIST_CODE DIST_NAME \
0 00AA City of London
1 00AB Barking and Dagenham
2 00AC Barnet
3 00AD Bexley
4 00AE Brent
geometry
0 POLYGON ((531028.507 181611.160, 531036.062 18...
1 POLYGON ((550817.007 184195.999, 550814.000 18...
2 POLYGON ((526830.313 187535.453, 526830.302 18...
3 POLYGON ((552373.534 174606.900, 552372.893 17...
4 POLYGON ((524661.688 184631.047, 524665.261 18...
We can inspect the object in different ways :
# Read the first row
print(districts.iloc[0])
# Read the first column
print(districts.iloc[:, 0])
# Read the first row, first column
print(districts.iloc[0, 0])
# Read the column "DIST_NAME"
print(districts['DIST_NAME'])
DIST_CODE 00AA
DIST_NAME City of London
geometry POLYGON ((531028.5069610038 181611.159611177, ...
Name: 0, dtype: object
0 00AA
1 00AB
2 00AC
3 00AD
4 00AE
5 00AF
6 00AG
7 00AH
8 00AJ
9 00AK
10 00AL
11 00AM
12 00AN
13 00AP
14 00AQ
15 00AR
16 00AS
17 00AT
18 00AU
19 00AW
20 00AX
21 00AY
22 00AZ
23 00BA
24 00BB
25 00BC
26 00BD
27 00BE
28 00BF
29 00BG
30 00BH
31 00BJ
32 00BK
Name: DIST_CODE, dtype: object
00AA
0 City of London
1 Barking and Dagenham
2 Barnet
3 Bexley
4 Brent
5 Bromley
6 Camden
7 Croydon
8 Ealing
9 Enfield
10 Greenwich
11 Hackney
12 Hammersmith and Fulham
13 Haringey
14 Harrow
15 Havering
16 Hillingdon
17 Hounslow
18 Islington
19 Kensington and Chelsea
20 Kingston upon Thames
21 Lambeth
22 Lewisham
23 Merton
24 Newham
25 Redbridge
26 Richmond upon Thames
27 Southwark
28 Sutton
29 Tower Hamlets
30 Waltham Forest
31 Wandsworth
32 Westminster
Name: DIST_NAME, dtype: object
We can read or create subsets:
# dataframe can be subsetted using conditional statement
# read the rows which have "City of London" as value for DIST_NAME
# Filter rows where 'DIST_NAME' is 'City of London'
= districts[districts['DIST_NAME'] == 'City of London']
filtered_districts
print(filtered_districts)
DIST_CODE DIST_NAME geometry
0 00AA City of London POLYGON ((531028.507 181611.160, 531036.062 18...
Quick visualisation
Let’s start by plotting London in a colour and adding Hackney (a district) in a different colour.
# Plot London in grey
= plt.subplots()
fig, ax =ax, color='lightgrey')
districts.plot(ax
# Add city of London (Hackney) in turquoise to the map
= districts[districts['DIST_NAME'] == 'Hackney']
hackney =ax, color='turquoise')
hackney.plot(ax
plt.show()
Some guidance on colours in Python
can be found here.
Styling plots
It is possible to tweak many aspects of a plot to customize if to particular needs. In this section, we will explore some of the basic elements that will allow us to obtain more compelling maps.
Note: some of these variations are very straightforward while others are more intricate and require tinkering with the internal parts of a plot. They are not necessarily organized by increasing level of complexity.
Plotting different layers
We first start by plotting one layer over another
# Plotting the geometries
= plt.subplots()
fig, ax
# Plot districts with no fill (transparent fill)
=ax, edgecolor='black', facecolor='none') # No fill, only border
districts.plot(ax# Plot roads with transparency
=ax, color='brown') # Roads in brown
a_roads.plot(ax
plt.show()
Changing transparency
The intensity of color of a polygon can be easily changed through the alpha attribute in plot. This is specified as a value betwee zero and one, where the former is entirely transparent while the latter is the fully opaque (maximum intensity):
# Plotting the geometries
= plt.subplots()
fig, ax
# Plot districts with no fill (transparent fill)
=ax, edgecolor='black', facecolor='none') # No fill, only border
districts.plot(ax# Plot roads with transparency
=ax, color='brown', alpha=0.5) # Roads in brown
a_roads.plot(ax
plt.show()
Removing axes
Although in some cases, the axes can be useful to obtain context, most of the times maps look and feel better without them. Removing the axes involves wrapping the plot into a figure, which takes a few more lines of aparently useless code but that, in time, it will allow you to tweak the map further and to create much more flexible designs.
# Plotting the geometries
= plt.subplots()
fig, ax
# Plot districts with no fill (transparent fill)
=ax, edgecolor='black', facecolor='none') # No fill, only border
districts.plot(ax# Plot roads with transparency
=ax, color='brown', alpha=0.5) # Roads with 50% transparency
a_roads.plot(ax
# Remove the axis
'off')
ax.axis(
plt.show()
Let us stop for a second a study each of the previous lines:
We have first created a figure named fig
with one axis named ax
by using the command plt.subplots
(part of the library matplotlib, which we have imported at the top of the notebook). Note how the method is returning two elements and we can assign each of them to objects with different name (fig
and ax
) by simply listing them at the front of the line, separated by commas.
Second, we plot the geographies as before, but this time we tell the function that we want it to draw the polygons on the axis we are passing, ax
. This method returns the axis with the geographies in them, so we make sure to store it on an object with the same name, ax
.
On the third line, we effectively remove the box with coordinates.
Finally, we draw the entire plot by calling plt.show()
.
Adding a title
Adding a title is an extra line, if we are creating the plot within a figure, as we just did. To include text on top of the figure:
# Plotting the geometries
= plt.subplots()
fig, ax
# Plot districts with no fill (transparent fill)
=ax, edgecolor='black', facecolor='none') # No fill, only border
districts.plot(ax# Plot roads with transparency
=ax, color='brown', alpha=0.5) # Roads with 50% transparency
a_roads.plot(ax# Remove the axis
'off')
ax.axis(# Add figure title
"Main roads in London")
fig.suptitle(# Display
plt.show()
Changing what border lines look like
Border lines sometimes can distort or impede proper interpretation of a map. In those cases, it is useful to know how they can be modified. Let us first see the code to make the lines thicker and black, and then we will work our way through the different steps:
# Convert CRS from WGS 84 (EPSG:4326) to British National Grid (EPSG:27700)
= poi_gdf.to_crs(epsg=27700)
poi_gdf_bng
# Plotting the geometries
= plt.subplots()
fig, ax # Plot districts with no fill, black borders
=ax, edgecolor='black', facecolor='none')
districts.plot(ax# Plot roads with brown color and 50% transparency
=ax, color='brown', alpha=0.5)
a_roads.plot(ax# Plot restaurants with blue color and adjusted size
=ax, edgecolor='blue', facecolor='blue', markersize=100)# Adjust size accordingly
poi_gdf_bng.plot(ax# Remove the axis for a clean look
'off')
ax.axis(# Add figure title
"Main roads in London")
fig.suptitle(# Display the plot
plt.show()
Labelling
Labeling maps is of paramount importance as it is often key when presenting data analysis and visualization. Properly labeled maps enables readers to effectively analyze and interpret spatial data.
iterrows()
is a pandas function that iterates over DataFrame rows as (index, Series) pairs. Here,idx
is the index of the row, and row is apandas
series containing the data for that particular district.centroid = row['geometry'].centroid
gets the centroid of each district’s geometry.row['geometry']
refers to the geometry of the district, which could be a polygon or a multipolygon..centroid
computes the geometric center (centroid) of this polygon.ax.text()
is a method fromMatplotlib
, used to place text at specific coordinates on the plot.centroid.x
andcentroid.y
provide the x and y coordinates of the centroid, which determine where the text will be placed.row['DIST_NAME']
is the name of the district that will be displayed as the label.fontsize=6
sets the size of the text to 8 points.ha='center'
ensures that the text is horizontally aligned to the center of the specified coordinates.
# Plot the districts with a gray fill
= plt.subplots()
fig, ax =ax, edgecolor="black", facecolor='none')
districts.plot(ax
# Add text labels at the centroids of the districts
for idx, row in districts.iterrows():
= row['geometry'].centroid
centroid 'DIST_NAME'], fontsize=6, ha='center')
ax.text(centroid.x, centroid.y, row[
# Remove axis
ax.set_axis_off()
plt.show()
Coordinate reference Systems
CRSs in Python
Coordindate reference systems (CRS) are the way geographers and cartographers represent a three-dimentional objects, such as the round earth, on a two-dimensional plane, such as a piece of paper or a computer screen. If the source data contain information on the CRS of the data, we can modify this.
First we need to retrieve the CRS from the vector data.
# Retrieve the CRS from the GeoDataFrame
= districts.crs
crs # Print the CRS information
print(crs)
EPSG:27700
We can also retrieve some additional information about the used CRS. For example, try to run:
# Check if the CRS is geographic or not
= CRS(crs).is_geographic
is_geographic # Find out the CRS units
= CRS(crs).axis_info[0].unit_name if CRS(crs).axis_info else None
units_gdal # Extract the SRID
= CRS(crs).to_epsg()
srid # Extract the proj4string representation
= CRS(crs).to_proj4()
proj4string
# Print results
print(f"Is Geographic: {is_geographic}")
print(f"Units (GDAL): {units_gdal}")
print(f"SRID: {srid}")
print(f"Proj4 String: {proj4string}")
Is Geographic: False
Units (GDAL): metre
SRID: 27700
Proj4 String: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs +type=crs
/Users/carmen/anaconda3/envs/geo-env-new/lib/python3.10/site-packages/pyproj/crs/crs.py:1293: UserWarning:
You will likely lose important projection information when converting to a PROJ string from another format. See: https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems
As we can see, there is information stored about the reference system: it is using the standard British projection (British National Grid EPSG:27700), which is expressed in meters. There are also other less decipherable parameters but we do not need to worry about them right now.
If we want to modify this and “reproject” the polygons into a different CRS, the quickest way is to find the EPSG code online (epsg.io is a good one, although there are others too). For example, if we wanted to transform the dataset into lat/lon coordinates, we would use its EPSG code, 4326 (CRS’s name “WGS84”):
In cases when a coordinate reference system (CRS) is missing or the wrong CRS is set, the .to_crs
function can be used:
# Transform the CRS to EPSG:4326
= districts.to_crs(epsg=4326)
districts_4326
# Optionally, print the new CRS to verify
print(districts_4326.crs)
EPSG:4326
From coordinates to spatial objects
CRSs are also very useful if we obtain data that is in a csv, has coordinates but needs to be transformed to a GeoDataFrame
. For example we have some London housing transactions we want to import and use.
We want to transform the .csv in a GeoDataFrame
using the coordinates stored in columns 17 and 18, and then we set the GeoDataFrame
CRS to the British National Grid (EPSG:27700).
# Import housesales data from CSV
= pd.read_csv("data/London/Tables/housesales.csv")
housesales
# Filter housesales to include only those with price less than 500000
= housesales[housesales['price'] < 500000]
housesales_f
# Assume columns 17 and 18 are 'longitude' and 'latitude' respectively
= gpd.GeoDataFrame(
housesales_gdf =gpd.points_from_xy(housesales_f.greastings, housesales_f.grnorthing), crs="EPSG:27700"
housesales_f, geometry
)
print(housesales_gdf.head())
propid price lnprice advance age bathroom bedroom buyage centheat \
0 1 105000 12 85000 18 1 2 25 1
1 2 84000 11 79800 31 1 1 47 0
2 3 89000 11 84550 16 1 1 30 0
3 4 136000 12 106000 31 2 4 32 2
4 5 103550 12 88000 31 1 2 38 1
chelec ... psemi pterrace popdens prkdouble prknone prksingle \
0 0 ... 0 0 3 0 0 0
1 0 ... 0 0 0 0 0 0
2 0 ... 0 1 23 0 1 0
3 1 ... 0 1 23 0 1 0
4 0 ... 0 0 39 0 0 1
prkspace tenfree tenlease geometry
0 1 0 1 POINT (504998.000 188930.000)
1 1 0 1 POINT (505026.000 177041.000)
2 0 1 0 POINT (505049.000 189030.000)
3 0 1 0 POINT (505087.000 189238.000)
4 0 0 1 POINT (505132.000 183300.000)
[5 rows x 34 columns]
Zooming in or out
It’s important to know what CRS your data is in if you want to create zoomed versions of your maps. BBox finder is a useful tool to identify coordinates in EPSG:4326
.
Here for example we are zooming in to some of the point we created at the beginning of the lab.
# Create a plot
= plt.subplots(figsize=(10, 10))
fig, ax # Plot the districts
=ax, color='none', edgecolor='black')
districts_4326.plot(ax# Plot the points of interest
=ax, color='blue', markersize=50)
poi_gdf.plot(ax# Set the coordinate limits
-0.180723, -0.014212)
ax.set_xlim(51.476668, 51.532337)
ax.set_ylim(# Remove axis labels and ticks
ax.set_axis_off()# Show the plot
plt.show()
Manipulating Spatial Tables
Once we have an understanding of how to visually display spatial information contained, let us see how it can be combined with the operations related to manipulating non-spatial tabular data. Essentially, the key is to realize that a GeoDataFrame
contain most of its spatial information in a single column named geometry, but the rest of it looks and behaves exactly like a non-spatial DataFrame
(in fact, it is). This concedes them all the flexibility and convenience that we saw in manipulating, slicing, and transforming tabular data, with the bonus that spatial data is carried away in all those steps. In addition, GeoDataFrame
also incorporate a set of explicitly spatial operations to combine and transform data. In this section, we will consider both.
GeoDataFrames
come with a whole range of traditional GIS operations built-in. Here we will run through a small subset of them that contains some of the most commonly used ones.
One of the spatial aspects we often need from polygons is their area. “How big is it?” is a question that always haunts us when we think of countries, regions, or cities. To obtain area measurements, first make sure the GeoDataFrame
you are working with is projected. If that is the case, you can calculate areas as follows:
We had already checked that district was projected to the British National Grid
= districts.area
districts_areas
districts_areas.head()
= districts_areas / 1000000 #convert into squared kilometres
areas_in_sqkm areas_in_sqkm.head()
0 3.151465
1 37.778900
2 86.736434
3 64.263476
4 43.235288
dtype: float64
Similarly, an equally common question with lines is their length. Also similarly, their computation is relatively straightforward, provided that our data are projected.
= a_roads.length
street_length street_length.head()
0 7.695310
1 374.714434
2 417.493662
3 45.221697
4 1748.445461
dtype: float64
If you check the dataframe
you will see the lengths.
Sometimes it is useful to summarize a polygon into a single point and, for that, a good candidate is its centroid (almost like a spatial analogue of the average).
# Create a dataframe with centroids
= districts.centroid
cents cents.head()
0 POINT (532464.075 181219.688)
1 POINT (548021.148 184939.599)
2 POINT (524027.452 192316.258)
3 POINT (548927.608 175720.862)
4 POINT (520176.906 185829.122)
dtype: geometry
# Create a plot
= plt.subplots(figsize=(10, 10))
fig, ax # Plot the districts
=ax, color='lightgrey', edgecolor='black')
districts.plot(ax# Plot the centroids
=ax, color='red', markersize=10, edgecolor='none')
cents.plot(ax# Set minimal theme by removing axes and grid
ax.set_axis_off()# Show the plot
plt.show()
To create a buffer using geopandas
, simply call the buffer method, passing in the radious. For example, to draw a 1000m. buffer around every centroid of every district:
# buffer
= cents.buffer(1000)
buf buf.head()
0 POLYGON ((533464.075 181219.688, 533459.260 18...
1 POLYGON ((549021.148 184939.599, 549016.333 18...
2 POLYGON ((525027.452 192316.258, 525022.637 19...
3 POLYGON ((549927.608 175720.862, 549922.793 17...
4 POLYGON ((521176.906 185829.122, 521172.091 18...
dtype: geometry
# Create a plot
= plt.subplots(figsize=(10, 10))
fig, ax # Plot the districts
=ax, color='none', edgecolor='black')
districts.plot(ax# Plot the centroids
=ax, color='black', markersize=20, edgecolor='none')
cents.plot(ax# Plot the centroid buffers
=ax, color='none', alpha=0.5, edgecolor='red')
buf.plot(ax# Set minimal theme by removing axes and grid
ax.set_axis_off()# Show the plot
plt.show()
Joins
Join districts with educational level data
# Import qualifications data from CSV
= pd.read_csv("data/London/Tables/qualifications2001_2.csv")
qualifications2001_df
# Take a quick look at the table by reading the first 5 lines
qualifications2001_df.head()
Zone_Code | Zone_Name | Population1674 | Noquals | Level1 | Level2 | Level3 | Level4 | |
---|---|---|---|---|---|---|---|---|
0 | 00AA | City of London | 6067 | 607 | 359 | 634 | 665 | 3647 |
1 | 00AB | Barking and Dagenham | 113579 | 44873 | 21654 | 20564 | 6626 | 11615 |
2 | 00AC | Barnet | 228123 | 44806 | 25558 | 41118 | 24695 | 80907 |
3 | 00AD | Bexley | 156172 | 44887 | 32110 | 35312 | 10759 | 20704 |
4 | 00AE | Brent | 198712 | 48915 | 23913 | 33280 | 21121 | 60432 |
Check the data
# Join check that the code you are joining the data on is the same.
# First check the GeoDataFrame
districts.head()
# Then the DataFrame
qualifications2001_df.head()
## rename(columns={'Zone-Code': 'Dist_code'}) specifies that the column name Zone-Code should be renamed to Dist_code
={'Zone_Code': 'DIST_CODE'}, inplace=True) qualifications2001_df.rename(columns
Merge the qualification dataset to the districts GeoDataFrame
:
- Merge the
qualifications2001_df
DataFrame with thedistricts
GeoDataFrame
. - The merge is done on the
DIST_CODE
column, which must be present in both DataFrames. - By default, this performs an inner join, but you can specify different types of joins (e.g., left, right) with the ‘how’ parameter.
gpd.GeoDataFrame(...)
converts the resulting merged DataFrame into a GeoDataFrame using the ‘geopandas’ library.district_qual
is the newGeoDataFrame
that contains the combined data from ‘qualifications2001_df’ and ‘districts’.districts.plot()
plots theGeoDataFrame
andcolumn='level 4'
specifies the column to plot.legend=True
adds a legend to the plot andcmap='OrRd'
applies a color map.
= districts.merge(qualifications2001_df, on='DIST_CODE')
district_qual # Prove it worked
="Level4", cmap="OrRd")
district_qual.plot(column# Show the plot
plt.show()
Calculation
Now, let’s create the share of people with level 4 qualification, i.e. create the new variable Level4p
equal to the number of people with level4 qualification divided by total population:
# Assuming 'Level4' and 'Pop' columns exist in the merged GeoDataFrame district_qual
'Level4p'] = district_qual['Level4'] / district_qual['Population1674'] district_qual[
Saving maps to figures
Create a file to put your maps:
# Create directory "maps" if it doesn't exist
"maps2", exist_ok=True) os.makedirs(
Let’s create a simple map with the variable we just created and save to external file:
# Create a figure and axes for the plot
= plt.subplots()
fig, ax # Plot the districts' geometry with no fill color and black edges
=ax, color='none', edgecolor='black')
districts.plot(ax# Plot the district qualifications with a color map
=ax, column="Level4", cmap="OrRd", legend=True)
district_qual.plot(ax# Save the plot to a PDF file
"maps/london_test2.pdf")
plt.savefig(# Save the plot as a JPG file
"maps/london_test2.jpg", format='jpg', dpi=300)
plt.savefig(# Close the plot
plt.close()
/Users/carmen/anaconda3/envs/geo-env-new/lib/python3.10/site-packages/fontTools/misc/py23.py:11: DeprecationWarning:
The py23 module has been deprecated and will be removed in a future release. Please update your code.
Adding baselayers
Many single datasets lack context when displayed on their own. A common approach to alleviate this is to use web tiles, which are a way of quickly obtaining geographical context to present spatial data. In Python, we can use contextily
to pull down tiles and display them along with our own geographic data.
We can begin by creating a map in the same way we would do normally, and then use the add_basemap
command to add a basemap:
= restaurants.plot(color="black")
ax =restaurants.crs, source=cx.providers.CartoDB.Voyager);
cx.add_basemap(ax, crs
# Show the plot
plt.show()
Note that we need to be explicit when adding the basemap to state the coordinate reference system (crs) our data is expressed in, contextily
will not be able to pick it up otherwise. Conversely, we could change our data’s CRS into Pseudo-Mercator
, the native reference system for most web tiles:
= restaurants.to_crs(epsg=3857)
restaurants_wm = restaurants_wm.plot(color="black")
ax =cx.providers.OpenStreetMap.Mapnik);
cx.add_basemap(ax, source
# Show the plot
plt.show()
Note how the coordinates are different but, if we set it right, either approach aligns tiles and data nicely.
Now, contextily offers a lot of options in terms of the sources and providers you can use to create your basemaps. For example, we can use satellite imagery instead. A lot more about basemap options with contextly
here.
= plt.subplots(1, figsize=(9, 9))
f, ax =0.5, ax=ax)
districts.plot(alpha
cx.add_basemap(
ax, =districts.crs,
crs=cx.providers.Esri.WorldImagery
source
)
ax.set_axis_off()# Show the plot
plt.show()
Interactive maps
Everything we have seen so far relates to static maps. These are useful for publication, to include in reports or to print. However, modern web technologies afford much more flexibility to explore spatial data interactively.
We wil use the folium
library, which is a Python wrapper for Leaflet
. Here’s how you can do it:
# Define the locations and popups
= {
locations "The British Museum": (51.5045975, -0.1459604),
"Big Ben": (51.5007325, -0.1272057),
"King's Cross": (51.5301701, -0.1319481),
"The Natural History Museum": (51.4938451, -0.173734)
}
# Create a base map
= folium.Map(location=[51.505, -0.127], zoom_start=14, tiles='CartoDB positron')
m
# Add markers
for popup, (lat, lng) in locations.items():
=[lat, lng], popup=popup).add_to(m)
folium.Marker(location
# Display the map
display(m)