Geocoding with GeoPy¶

In [45]:
# install geopy
!pip install geopy
Requirement already satisfied: geopy in c:\users\colto\anaconda3\lib\site-packages (2.4.0)
Requirement already satisfied: geographiclib<3,>=1.52 in c:\users\colto\anaconda3\lib\site-packages (from geopy) (2.0)
In [46]:
import pandas as pd

# Import Nominatim ("Name" in Latin) and initialize
# Though most of these services require an API key, Nominatim, which uses OpenStreetMap data, does not,
# which is why we’re going to use it here. But we still need to create a unique application name.
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="Wenyi Shang's mapping app", timeout=2)
In [47]:
# Locate a specific location
location = geolocator.geocode("East Daniel Street, Champaign")
location
Out[47]:
Location(East Daniel Street, Campustown, Champaign, Champaign County, Illinois, 61820, United States, (40.1079572, -88.2319022, 0.0))
In [48]:
# Full information
location.raw
Out[48]:
{'place_id': 27867261,
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright',
 'osm_type': 'way',
 'osm_id': 745458640,
 'lat': '40.1079572',
 'lon': '-88.2319022',
 'class': 'highway',
 'type': 'residential',
 'place_rank': 26,
 'importance': 0.10000999999999993,
 'addresstype': 'road',
 'name': 'East Daniel Street',
 'display_name': 'East Daniel Street, Campustown, Champaign, Champaign County, Illinois, 61820, United States',
 'boundingbox': ['40.1079500', '40.1079691', '-88.2335000', '-88.2303450']}
In [49]:
print(location.address)
print(location.raw['lat'])
print(location.raw['lon'])
print(location.raw['class'])
print(location.raw['type'])
East Daniel Street, Campustown, Champaign, Champaign County, Illinois, 61820, United States
40.1079572
-88.2319022
highway
residential
In [50]:
possible_locations = geolocator.geocode("East Daniel Street", exactly_one=False)
for location in possible_locations:
    print(location.address)
East Daniel Street, Bloomington, Monroe County, Indiana, 47401, United States
East Daniel Street, Campustown, Champaign, Champaign County, Illinois, 61820, United States
East Daniel Street, Seniorland, Champaign, Champaign County, Illinois, 61820, United States
East Daniel Street, Collinwood, Wayne County, Middle Tennessee, Tennessee, 38450, United States
East Daniel Street, Hinton, Boone County, Missouri, United States
East Daniel Street, Greene County, Missouri, 65803, United States
East Daniel Street, Uvalde, Uvalde County, Texas, 78801, United States
East Daniel Street, North Uvalde Colonia, Uvalde, Uvalde County, Texas, 78801, United States
East Daniel Street, Albany, Gentry County, Missouri, 64402, United States
East Daniel Street, Silver City, Grant County, New Mexico, 88062, United States
In [51]:
Illini_Union = geolocator.geocode('Illini Union')
Illini_Union.address
Out[51]:
'Illini Union, 1401, West Green Street, Urbana, Champaign County, Illinois, 61801, United States'

Task 1¶

In [52]:
# Create a dataframe to store the geographical locations of a list of places in Champaign.
# The dataframe "Champaign_df" should contain 6 columns:
# "Place", "Address" (obtained by ".address"), "Latitude", "Longitude", "Class", "Type" (obtained by corresponding keys in ".raw")

Champaign_places = ['Foellinger Auditorium', 'Altgeld Hall', 'Krannert Center for the Performing Arts', 
                    'University of Illinois Urbana-Champaign University Library', 'Japan House']

geolocator = Nominatim(user_agent="champaign_locator")

place_list = []
address_list = []
latitude_list = []
longitude_list = []
class_list = []
type_list = []

Cham = geolocator.geocode('Champaign, Illinois')

for place in Champaign_places:
    location = geolocator.geocode(place)
    place_list.append(place)
    address_list.append(location.address)
    latitude_list.append(location.latitude)
    longitude_list.append(location.longitude)
    location_raw = location.raw
    class_list.append(location_raw.get('class'))
    type_list.append(location_raw.get('type'))

Champaign_df = pd.DataFrame({
    'Place': place_list,
    'Address': address_list,
    'Latitude': latitude_list,
    'Longitude': longitude_list,
    'Class': class_list,
    'Type': type_list
})

print(Champaign_df)
                                               Place  \
0                              Foellinger Auditorium   
1                                       Altgeld Hall   
2            Krannert Center for the Performing Arts   
3  University of Illinois Urbana-Champaign Univer...   
4                                        Japan House   

                                             Address   Latitude  Longitude  \
0  Foellinger Auditorium, 709, South Mathews Aven...  40.105941 -88.227182   
1  Altgeld Hall, 1409, West Green Street, Urbana,...  40.109328 -88.228328   
2  Krannert Center for the Performing Arts, 500, ...  40.108011 -88.222528   
3  Main Library, 1408, West Gregory Drive, Urbana...  40.104681 -88.228990   
4  Japan House, 101-111, Kensington High Street, ...  51.501350  -0.191682   

      Class              Type  
0   amenity           theatre  
1  building        university  
2   amenity       arts_centre  
3   amenity           library  
4   amenity  community_centre  

Making Interactive Maps with folium¶

In [54]:
!pip install folium
Requirement already satisfied: folium in c:\users\colto\anaconda3\lib\site-packages (0.14.0)
Requirement already satisfied: requests in c:\users\colto\anaconda3\lib\site-packages (from folium) (2.27.1)
Requirement already satisfied: branca>=0.6.0 in c:\users\colto\anaconda3\lib\site-packages (from folium) (0.6.0)
Requirement already satisfied: numpy in c:\users\colto\anaconda3\lib\site-packages (from folium) (1.21.5)
Requirement already satisfied: jinja2>=2.9 in c:\users\colto\anaconda3\lib\site-packages (from folium) (2.11.3)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\colto\anaconda3\lib\site-packages (from jinja2>=2.9->folium) (2.0.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\colto\anaconda3\lib\site-packages (from requests->folium) (1.26.9)
Requirement already satisfied: idna<4,>=2.5 in c:\users\colto\anaconda3\lib\site-packages (from requests->folium) (3.3)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\colto\anaconda3\lib\site-packages (from requests->folium) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\colto\anaconda3\lib\site-packages (from requests->folium) (2.0.4)
In [55]:
import folium
In [58]:
Champaign = geolocator.geocode('Champaign')
Champaign
Out[58]:
Location(Champaign, Champaign County, Illinois, United States, (40.1164841, -88.2430932, 0.0))
In [59]:
Champaign = geolocator.geocode('Champaign')
champaign_map
Out[59]:
Location(Champaign, Champaign County, Illinois, United States, (40.1164841, -88.2430932, 0.0))
In [66]:
# Add a marker
folium.Marker(location=([Illini_Union.raw['lat'], Illini_Union.raw['lon']]), tooltip = 'click me', 
              popup="Illini Union").add_to(champaign_map)
champaign_map
Out[66]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [65]:
champaign_map.save("Data/Champaign-map.html")

Task 2¶

In [67]:
# First, reload the champaign_map to drop the added Marker of Illini Union
# Then for each place in Champaign_df you created for Task 1, add it as a Marker, with location defined by the
# latitude and longitude values in the DataFrame, popup value as the place names in the dataframe



champaign_map = folium.Map(location=[Cham.latitude, Cham.longitude], zoom_start=14)

for i in range(len(Champaign_df)):
    folium.Marker(
        location=[Champaign_df['Latitude'][i], Champaign_df['Longitude'][i]],
        popup=Champaign_df['Place'][i],
    ).add_to(champaign_map)

champaign_map
Out[67]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Add a Circle Marker¶

In [68]:
# Scottish witchcraft dataset
# (http://witches.hca.ed.ac.uk/#:~:text=The%20database%20contains%20all%20people,to%20social%20and%20cultural%20history)
df = pd.read_csv('data/accused_witches.csv')
df
Out[68]:
AccusedRef AccusedSystemId AccusedID FirstName LastName M_Firstname M_Surname Alias Patronymic DesTitle ... SocioecStatus Occupation Notes Createdby Createdate Lastupdatedby Lastupdatedon Family of Accused Cases Date
0 A/EGD/10 EGD 10.0 Mareon Quheitt Marion White NaN NaN NaN ... NaN NaN NaN SMD 5/15/01 11:06 jhm 8/9/02 11:40 NaN C/EGD/21 3/4/1661
1 A/EGD/100 EGD 100.0 Thom Cockburn Thomas Cockburn NaN NaN NaN ... NaN NaN NaN SMD 5/15/01 11:06 jhm 10/2/02 10:32 NaN C/EGD/111 1591
2 A/EGD/1000 EGD 1000.0 Christian Aitkenhead Christine Aikenhead NaN NaN NaN ... NaN NaN NaN SMD 5/15/01 11:06 jhm 10/1/02 10:48 AF/LA/150 C/EGD/1011 6/5/1628
3 A/EGD/1001 EGD 1001.0 Janet Ireland Janet Ireland NaN NaN NaN ... NaN NaN NaN SMD 5/15/01 11:06 jhm 10/1/02 10:49 AF/LA/151 C/EGD/1012 6/5/1628
4 A/EGD/1002 EGD 1002.0 Agnes Hendersoun Agnes Henderson NaN NaN NaN ... NaN NaN NaN SMD 5/15/01 11:06 jhm 10/1/02 10:50 NaN C/EGD/1013 3/7/1628
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4378 20/10/1637 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4379 17/2/1642 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4380 24/10/1628 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4381 25/10/1651 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4382 4/1568 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

4383 rows × 36 columns

In [69]:
# Count the number of witches accused in each county
county_witches = df.groupby("Res_county")["AccusedRef"].count()
county_witches
Out[69]:
Res_county
Aberdeen         175
Argyll             6
Ayr              153
Banff              9
Berwick          126
Bute              54
Caithness         52
Clackmannan       18
Cromarty           2
Dumfries          78
Dunbarton         25
Edinburgh        374
Elgin             28
Fife             382
Forfar            82
Haddington       543
Inverness         45
Kincardine         2
Kinross            7
Kirkcudbright     35
Lanark            77
Linlithgow       114
Nairn             55
Orkney            72
Peebles           91
Perth            109
Renfrew          124
Ross              74
Roxburgh          60
Selkirk           21
Shetland          28
Stirling          53
Sutherland        15
Wigtown           15
Name: AccusedRef, dtype: int64
In [70]:
# Obtain the latitude and longitude of each county by getting the median of the all latitude and longitude records
# under the name of that county. Median is chose so that incorrect record will be ignored
# (the incorrect data can only be the outliers and will not influence the median),
# and convert them to dictionaries.
county_latitude = df.groupby("Res_county")["latitude"].median()
county_longitude = df.groupby("Res_county")["longitude"].median()
county_latitude_dict = county_latitude.to_dict()
county_longitude_dict = county_longitude.to_dict()
print(county_latitude_dict, county_longitude_dict)
{'Aberdeen': 57.146262, 'Argyll': 56.4006, 'Ayr': 55.457393, 'Banff': 57.666252, 'Berwick': 55.781124, 'Bute': 55.8274, 'Caithness': 58.4389, 'Clackmannan': 56.108159, 'Cromarty': 57.6806, 'Dumfries': 55.068632, 'Dunbarton': 56.83301, 'Edinburgh': 55.92826, 'Elgin': 57.648022, 'Fife': 56.253397, 'Forfar': 56.639836, 'Haddington': 55.955755, 'Inverness': 57.479549, 'Kincardine': 56.068975, 'Kinross': 56.211141, 'Kirkcudbright': 54.835888, 'Lanark': 55.674898, 'Linlithgow': 55.9716, 'Nairn': 57.585033, 'Orkney': 58.9809, 'Peebles': 55.651467, 'Perth': 56.395704, 'Renfrew': 55.874645, 'Ross': 56.83301, 'Roxburgh': nan, 'Selkirk': 55.548073, 'Shetland': 60.305229, 'Stirling': 56.281067, 'Sutherland': 58.249999, 'Wigtown': 54.868426} {'Aberdeen': -2.136575, 'Argyll': -5.4807, 'Ayr': -4.628716, 'Banff': -2.52426, 'Berwick': -2.011552, 'Bute': -5.0936, 'Caithness': -3.0937, 'Clackmannan': -3.747183, 'Cromarty': -4.0347, 'Dumfries': -3.608237, 'Dunbarton': -4.180209, 'Edinburgh': -3.275582, 'Elgin': -3.320025, 'Fife': -3.134239, 'Forfar': -2.893115, 'Haddington': -2.783795, 'Inverness': -4.237208, 'Kincardine': -3.715198, 'Kinross': -3.425103, 'Kirkcudbright': -4.049153, 'Lanark': -3.773292, 'Linlithgow': -3.6026, 'Nairn': -3.869079, 'Orkney': -2.9605, 'Peebles': -3.191026, 'Perth': -3.435877, 'Renfrew': -4.389173, 'Ross': -4.180209, 'Roxburgh': nan, 'Selkirk': -2.839351, 'Shetland': -1.294066, 'Stirling': -4.436736, 'Sutherland': -4.499998, 'Wigtown': -4.442783}
In [71]:
# Create a dataframe to record the county names, number of cases, and their latitudes and longitudes
county_name = []
county_witch_num = []
county_latitude = []
county_longitude = []
for i in range(len(county_witches)):
    current_county_name = county_witches.index[i]
    county_name.append(current_county_name)
    county_witch_num.append(county_witches[i])
    county_latitude.append(county_latitude_dict[current_county_name])
    county_longitude.append(county_longitude_dict[current_county_name])
data_df = pd.DataFrame({
    'name': county_name,
    'case number': county_witch_num,
    'latitude': county_latitude,
    'longitude': county_longitude,
})
data_df['case number'] = data_df['case number'].astype(float)
data_df = data_df.dropna().reset_index(drop=True) # remove the counties with na values and reset the index
data_df
Out[71]:
name case number latitude longitude
0 Aberdeen 175.0 57.146262 -2.136575
1 Argyll 6.0 56.400600 -5.480700
2 Ayr 153.0 55.457393 -4.628716
3 Banff 9.0 57.666252 -2.524260
4 Berwick 126.0 55.781124 -2.011552
5 Bute 54.0 55.827400 -5.093600
6 Caithness 52.0 58.438900 -3.093700
7 Clackmannan 18.0 56.108159 -3.747183
8 Cromarty 2.0 57.680600 -4.034700
9 Dumfries 78.0 55.068632 -3.608237
10 Dunbarton 25.0 56.833010 -4.180209
11 Edinburgh 374.0 55.928260 -3.275582
12 Elgin 28.0 57.648022 -3.320025
13 Fife 382.0 56.253397 -3.134239
14 Forfar 82.0 56.639836 -2.893115
15 Haddington 543.0 55.955755 -2.783795
16 Inverness 45.0 57.479549 -4.237208
17 Kincardine 2.0 56.068975 -3.715198
18 Kinross 7.0 56.211141 -3.425103
19 Kirkcudbright 35.0 54.835888 -4.049153
20 Lanark 77.0 55.674898 -3.773292
21 Linlithgow 114.0 55.971600 -3.602600
22 Nairn 55.0 57.585033 -3.869079
23 Orkney 72.0 58.980900 -2.960500
24 Peebles 91.0 55.651467 -3.191026
25 Perth 109.0 56.395704 -3.435877
26 Renfrew 124.0 55.874645 -4.389173
27 Ross 74.0 56.833010 -4.180209
28 Selkirk 21.0 55.548073 -2.839351
29 Shetland 28.0 60.305229 -1.294066
30 Stirling 53.0 56.281067 -4.436736
31 Sutherland 15.0 58.249999 -4.499998
32 Wigtown 15.0 54.868426 -4.442783
In [72]:
data_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   name         33 non-null     object 
 1   case number  33 non-null     float64
 2   latitude     33 non-null     float64
 3   longitude    33 non-null     float64
dtypes: float64(3), object(1)
memory usage: 1.2+ KB
In [73]:
# Get the location of Scotland
Scotland = geolocator.geocode('Scotland')
Scotland
Out[73]:
Location(Alba / Scotland, United Kingdom, (56.7861112, -4.1140518, 0.0))
In [74]:
# Create the map of Scotland. Zoom_start is smaller than Champaign map because we want to cover a larger area
Scotland_map = folium.Map(location=[Scotland.raw['lat'], Scotland.raw['lon']], zoom_start=6)
Scotland_map
Out[74]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [75]:
# Add circles to the map, each representing the witch accused data of a county
for i in range(len(data_df)):
    folium.Circle(location = (data_df['latitude'][i], data_df['longitude'][i]),
                  radius = data_df['case number'][i]*100,
                  tooltip = data_df['name'][i]).add_to(Scotland_map)
Scotland_map
Out[75]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Task 3¶

In [95]:
# Recreate the map of Scottish witchcraft. First, reload the map with the location of Scottish capital "Edinburgh",
# and set the zoom_start as 7 to focus on the surrounding area of the city. Then add the data of each county into the new map,
# but the radius will be 1000*the square root (this can be obtained by importing "math" and use "math.sqrt") of the case number,
# to get a smoothed result. Besides, add popup according to the format "County_name Case Number: number"
# (e.g., Edinburgh Case Number: 374.0), and set "fill" as True. Finally display the new map.

import math

Edinburgh = geolocator.geocode('Edinburgh, Scotland')


Scotland_map = folium.Map(location=[Edinburgh.latitude, Edinburgh.longitude], zoom_start=7)

for i in range(len(data_df)):
    radius = 1000 * math.sqrt(data_df['case number'][i])
    popup_text = f"{data_df['name'][i]} Case Number: {data_df['case number'][i]}"
    folium.Circle(location=(data_df['latitude'][i], data_df['longitude'][i]), radius=radius, popup=popup_text, fill=True,).add_to(Scotland_map)

Scotland_map
Out[95]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Customize Map Backgrounds¶

In [82]:
# Using titles in folium
folium.Map(location=[Scotland.raw['lat'], Scotland.raw['lon']], tiles = 'cartodbpositron', zoom_start=6)
Out[82]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [83]:
# Using external background
folium.Map(location=[Scotland.raw['lat'], Scotland.raw['lon']],
           zoom_start=4,
           tiles='http://services.arcgisonline.com/arcgis/rest/services/NatGeo_World_Map/MapServer/tile/{z}/{y}/{x}',
           attr="Sources: National Geographic, Esri, Garmin, HERE, UNEP-WCMC, USGS, NASA, ESA, METI, NRCAN, GEBCO, NOAA, INCREMENT P")
Out[83]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [84]:
# And this not necessarily need to be real world...
folium.Map(location=[0, 30],
           zoom_start=4, min_zoom=4, max_zoom=10,
           max_bounds=True,
           min_lon=0, max_lon=70, min_lat=-40, max_lat=40,
           tiles='https://cartocdn-gusc.global.ssl.fastly.net//ramirocartodb/api/v1/map/named/tpl_756aec63_3adb_48b6_9d14_331c6cbc47cf/all/{z}/{x}/{y}.png',
           attr='Textures and Icons from https://www.textures.com/ & https://thenounproject.com/')
Out[84]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Geojson¶

In [85]:
# US states geojson file (obtained from https://github.com/python-visualization/folium/blob/main/examples/data/us-states.json)
US_states = "Data/us-states.json"
In [86]:
# Include the US States boundaries in the US Map
US_map = folium.Map(location=[42, -102], zoom_start=4)

folium.Choropleth(
    geo_data = US_states,
).add_to(US_map)

US_map
Out[86]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [87]:
# US unemployment rate data (obtained from https://www.kaggle.com/datasets/aniruddhasshirahatti/us-unemployment-dataset-2010-2020?resource=download)
US_unemployment = pd.read_csv("Data/unemployment_data_us_state.csv")
US_unemployment
Out[87]:
State Unemployment_Rate_Jan_20 Unemployment_Rate_Feb_20 Unemployment_Rate_Mar_20
0 Alabama 2.7 2.7 3.5
1 Alaska 6.0 5.8 5.6
2 Arizona 4.5 4.5 5.5
3 Arkansas 3.5 3.9 4.8
4 California 3.9 2.5 5.3
5 Colorado 2.5 2.8 4.5
6 Connecticut 3.7 3.9 3.7
7 Delaware 4.0 5.2 5.1
8 D.C. 5.2 2.8 6.0
9 Florida 2.8 3.1 4.3
10 Georgia 3.1 2.7 4.2
11 Hawaii 2.7 2.7 2.6
12 Idaho 2.8 3.4 2.6
13 Illinois 3.5 3.1 4.6
14 Indiana 3.1 2.8 3.2
15 Iowa 2.8 3.1 3.7
16 Kansas 3.1 4.2 3.1
17 Kentucky 4.3 5.2 5.8
18 Louisiana 5.1 3.2 6.1
19 Maine 3.1 3.3 3.2
20 Maryland 3.3 2.8 3.3
21 Massachusetts 2.8 3.6 2.9
22 Michigan 3.8 3.1 4.1
23 Minnesota 3.2 5.4 3.1
24 Mississippi 5.5 3.5 5.3
25 Missouri 3.5 3.5 4.5
26 Montana 3.5 2.9 3.5
27 Nebraska 3.9 3.6 4.2
28 Nevada 3.6 3.6 6.3
29 New Hampshire 2.6 3.8 2.6
30 New Jersey 3.8 4.8 3.8
31 New Mexico 4.8 3.7 5.9
32 New York 3.8 3.6 4.5
33 North Carolina 3.6 2.2 4.4
34 North Dakota 2.3 4.2 2.2
35 Ohio 4.1 3.2 5.5
36 Oklahoma 3.3 3.3 3.1
37 Oregon 3.3 4.7 3.3
38 Pennsylvania 4.7 3.4 6.0
39 Rhode Island 3.1 2.5 4.6
40 South Carolina 2.4 3.3 2.6
41 South Dakota 3.4 3.4 3.3
42 Tennessee 3.3 3.5 3.5
43 Texas 4.5 2.5 4.7
44 Utah 2.5 3.4 3.6
45 Vermont 2.4 2.6 3.2
46 Virginia 2.7 3.8 3.3
47 Washington 3.9 4.9 5.1
48 West Virginia 5.0 3.7 6.1
49 Wisconsin 3.5 2.7 3.4
50 Wyoming 3.7 5.8 3.7
In [88]:
# Change column name to match the geojson data
US_unemployment = US_unemployment.rename(columns={'State': 'name'})
US_unemployment
Out[88]:
name Unemployment_Rate_Jan_20 Unemployment_Rate_Feb_20 Unemployment_Rate_Mar_20
0 Alabama 2.7 2.7 3.5
1 Alaska 6.0 5.8 5.6
2 Arizona 4.5 4.5 5.5
3 Arkansas 3.5 3.9 4.8
4 California 3.9 2.5 5.3
5 Colorado 2.5 2.8 4.5
6 Connecticut 3.7 3.9 3.7
7 Delaware 4.0 5.2 5.1
8 D.C. 5.2 2.8 6.0
9 Florida 2.8 3.1 4.3
10 Georgia 3.1 2.7 4.2
11 Hawaii 2.7 2.7 2.6
12 Idaho 2.8 3.4 2.6
13 Illinois 3.5 3.1 4.6
14 Indiana 3.1 2.8 3.2
15 Iowa 2.8 3.1 3.7
16 Kansas 3.1 4.2 3.1
17 Kentucky 4.3 5.2 5.8
18 Louisiana 5.1 3.2 6.1
19 Maine 3.1 3.3 3.2
20 Maryland 3.3 2.8 3.3
21 Massachusetts 2.8 3.6 2.9
22 Michigan 3.8 3.1 4.1
23 Minnesota 3.2 5.4 3.1
24 Mississippi 5.5 3.5 5.3
25 Missouri 3.5 3.5 4.5
26 Montana 3.5 2.9 3.5
27 Nebraska 3.9 3.6 4.2
28 Nevada 3.6 3.6 6.3
29 New Hampshire 2.6 3.8 2.6
30 New Jersey 3.8 4.8 3.8
31 New Mexico 4.8 3.7 5.9
32 New York 3.8 3.6 4.5
33 North Carolina 3.6 2.2 4.4
34 North Dakota 2.3 4.2 2.2
35 Ohio 4.1 3.2 5.5
36 Oklahoma 3.3 3.3 3.1
37 Oregon 3.3 4.7 3.3
38 Pennsylvania 4.7 3.4 6.0
39 Rhode Island 3.1 2.5 4.6
40 South Carolina 2.4 3.3 2.6
41 South Dakota 3.4 3.4 3.3
42 Tennessee 3.3 3.5 3.5
43 Texas 4.5 2.5 4.7
44 Utah 2.5 3.4 3.6
45 Vermont 2.4 2.6 3.2
46 Virginia 2.7 3.8 3.3
47 Washington 3.9 4.9 5.1
48 West Virginia 5.0 3.7 6.1
49 Wisconsin 3.5 2.7 3.4
50 Wyoming 3.7 5.8 3.7
In [89]:
# Visualize based on unemployment rate of January 2020
US_map = folium.Map(location=[42, -102], zoom_start=4)
folium.Choropleth(
    geo_data = US_states, # Geo_data to be used
    data = US_unemployment, # Data used for visualization
    columns = ['name', 'Unemployment_Rate_Jan_20'], # First column is the key to match, second column is the value to display
    key_on = 'feature.properties.name', # The matched key in geo_data
    fill_color = 'OrRd', # Seelct a color scheme
    line_opacity = 0.2, # Select line opacity
    legend_name= 'Unemployment Rate by State in January 2020', # Choose a name for the legend
).add_to(US_map)

US_map
Out[89]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [90]:
# Add tooltip to display the state names
tooltip = folium.features.GeoJson(
    US_states,
    tooltip=folium.features.GeoJsonTooltip(fields=['name'], localize=True)
                                )
US_map.add_child(tooltip)
US_map
Out[90]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Task 4¶

In [97]:
# Recreate the map of US unemployment. First, create a column 'Unemployment_Rate_2020_spring' in the DataFrame, as the average
# of the unemployment rates of January, Feburary, and March. Then, reload the map using the geocode of "USA" in gelocator.
# Next, create folium.Choropleth, and set the values of the newly-created column Unemployment_Rate_2020_spring, fill_color as
# "GnBu" (green and blue), and line_opacity as 0.3. Also, change the legend name to reflect the change of data.
# Finally, add the State names as tooltip, and display the new map.

US_unemployment['Unemployment_Rate_2020_spring'] = (US_unemployment['Unemployment_Rate_Jan_20']+
                                                  US_unemployment['Unemployment_Rate_Feb_20']+
                                                  US_unemployment['Unemployment_Rate_Mar_20'])/3
USA = geolocator.geocode('USA')

US_map = folium.Map(location=[USA.latitude, USA.longitude], zoom_start=4)


folium.Choropleth(
    geo_data = US_states,
    data = US_unemployment, 
    columns = ['name', 'Unemployment_Rate_2020_spring'],
    key_on = 'feature.properties.name', 
    fill_color = 'GnBu', 
    line_opacity = 0.3, 
    legend_name= 'Unemployment Rate spring 2020',).add_to(US_map)

tooltip = folium.features.GeoJson(
    US_states,
    tooltip=folium.features.GeoJsonTooltip(fields=['name'], localize=True)
                                )
US_map.add_child(tooltip)
US_map
Out[97]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Task 5¶

This task is designed to enhance your critical analysis and interpretation skills when dealing with visualizations.
Write a short paragraph in the following cell for Task 3 to draw conclusions from the data visualization of the geographical distribution of witchcraft cases in Scotland. Additionally, make some assumptions regarding the reasons behind this distribution.
Then, write a separate paragraph in the next cell for Task 4 to examine areas in the US with higher or lower unemployment rates in Spring 2020. Also, make assumptions regarding the reasons for this distribution.
Finally, remember to convert the cell type for both cells to "markdown" and execute it.

For the distribution of witchcraft cases in Scotland we can see that through the congregation of major roads/rivers that these are cities. This including the fact that one of them is for sure the capital of Edinburgh this makes sense. As the increase in cases in these areas would be because of the higher populations compared to the countryside/rural areas of scotland. Individuals tend to congregate more leading to altercations or accusations of witchcraft. Also there may be more individuals interested in practicing witchcraft but many of these cases were probably be false cases of witchcraft. But these larger populations in cities is why we see more cases like in Edinburgh, the surrounding areas, and Glasgow.

The reason for this distribution may be more nuanced. We see areas with the highest rates of unemployment were states like West Virginia, Virginia, Arizona, Missippi, Lousiana, and Alaska. Many of these states with Arizona and Virginia being outliers have a tendency to be pretty rural states and West Virginia/Missippi are the poorest states in the country. Since they are so poor with a lacking in Urban development there may less job opportunities as businesses don't want to set up in these states also with lower populations. This also pertains to Alaska as they lack population and don't share the same benefit of interstate commerce like other states can benefit from due to Alaska's isolated nature. We can see support for this trend as the most urban, wealthy, and large states like California, Texas, New York, and Illinois all are in the lesser portion of states suffering from unemployment.

Reference for some of my state assumptions: https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_GDP#:~:text=GDP%20per%20capita%20also%20varied,recorded%20the%20three%20lowest%20GDP