# install geopy
!pip install geopy
Requirement already satisfied: geopy in c:\users\colto\anaconda3\lib\site-packages (2.4.0) Requirement already satisfied: geographiclib<3,>=1.52 in c:\users\colto\anaconda3\lib\site-packages (from geopy) (2.0)
import pandas as pd
# Import Nominatim ("Name" in Latin) and initialize
# Though most of these services require an API key, Nominatim, which uses OpenStreetMap data, does not,
# which is why we’re going to use it here. But we still need to create a unique application name.
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="Wenyi Shang's mapping app", timeout=2)
# Locate a specific location
location = geolocator.geocode("East Daniel Street, Champaign")
location
Location(East Daniel Street, Campustown, Champaign, Champaign County, Illinois, 61820, United States, (40.1079572, -88.2319022, 0.0))
# Full information
location.raw
{'place_id': 27867261,
'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright',
'osm_type': 'way',
'osm_id': 745458640,
'lat': '40.1079572',
'lon': '-88.2319022',
'class': 'highway',
'type': 'residential',
'place_rank': 26,
'importance': 0.10000999999999993,
'addresstype': 'road',
'name': 'East Daniel Street',
'display_name': 'East Daniel Street, Campustown, Champaign, Champaign County, Illinois, 61820, United States',
'boundingbox': ['40.1079500', '40.1079691', '-88.2335000', '-88.2303450']}
print(location.address)
print(location.raw['lat'])
print(location.raw['lon'])
print(location.raw['class'])
print(location.raw['type'])
East Daniel Street, Campustown, Champaign, Champaign County, Illinois, 61820, United States 40.1079572 -88.2319022 highway residential
possible_locations = geolocator.geocode("East Daniel Street", exactly_one=False)
for location in possible_locations:
print(location.address)
East Daniel Street, Bloomington, Monroe County, Indiana, 47401, United States East Daniel Street, Campustown, Champaign, Champaign County, Illinois, 61820, United States East Daniel Street, Seniorland, Champaign, Champaign County, Illinois, 61820, United States East Daniel Street, Collinwood, Wayne County, Middle Tennessee, Tennessee, 38450, United States East Daniel Street, Hinton, Boone County, Missouri, United States East Daniel Street, Greene County, Missouri, 65803, United States East Daniel Street, Uvalde, Uvalde County, Texas, 78801, United States East Daniel Street, North Uvalde Colonia, Uvalde, Uvalde County, Texas, 78801, United States East Daniel Street, Albany, Gentry County, Missouri, 64402, United States East Daniel Street, Silver City, Grant County, New Mexico, 88062, United States
Illini_Union = geolocator.geocode('Illini Union')
Illini_Union.address
'Illini Union, 1401, West Green Street, Urbana, Champaign County, Illinois, 61801, United States'
# Create a dataframe to store the geographical locations of a list of places in Champaign.
# The dataframe "Champaign_df" should contain 6 columns:
# "Place", "Address" (obtained by ".address"), "Latitude", "Longitude", "Class", "Type" (obtained by corresponding keys in ".raw")
Champaign_places = ['Foellinger Auditorium', 'Altgeld Hall', 'Krannert Center for the Performing Arts',
'University of Illinois Urbana-Champaign University Library', 'Japan House']
geolocator = Nominatim(user_agent="champaign_locator")
place_list = []
address_list = []
latitude_list = []
longitude_list = []
class_list = []
type_list = []
Cham = geolocator.geocode('Champaign, Illinois')
for place in Champaign_places:
location = geolocator.geocode(place)
place_list.append(place)
address_list.append(location.address)
latitude_list.append(location.latitude)
longitude_list.append(location.longitude)
location_raw = location.raw
class_list.append(location_raw.get('class'))
type_list.append(location_raw.get('type'))
Champaign_df = pd.DataFrame({
'Place': place_list,
'Address': address_list,
'Latitude': latitude_list,
'Longitude': longitude_list,
'Class': class_list,
'Type': type_list
})
print(Champaign_df)
Place \
0 Foellinger Auditorium
1 Altgeld Hall
2 Krannert Center for the Performing Arts
3 University of Illinois Urbana-Champaign Univer...
4 Japan House
Address Latitude Longitude \
0 Foellinger Auditorium, 709, South Mathews Aven... 40.105941 -88.227182
1 Altgeld Hall, 1409, West Green Street, Urbana,... 40.109328 -88.228328
2 Krannert Center for the Performing Arts, 500, ... 40.108011 -88.222528
3 Main Library, 1408, West Gregory Drive, Urbana... 40.104681 -88.228990
4 Japan House, 101-111, Kensington High Street, ... 51.501350 -0.191682
Class Type
0 amenity theatre
1 building university
2 amenity arts_centre
3 amenity library
4 amenity community_centre
!pip install folium
Requirement already satisfied: folium in c:\users\colto\anaconda3\lib\site-packages (0.14.0) Requirement already satisfied: requests in c:\users\colto\anaconda3\lib\site-packages (from folium) (2.27.1) Requirement already satisfied: branca>=0.6.0 in c:\users\colto\anaconda3\lib\site-packages (from folium) (0.6.0) Requirement already satisfied: numpy in c:\users\colto\anaconda3\lib\site-packages (from folium) (1.21.5) Requirement already satisfied: jinja2>=2.9 in c:\users\colto\anaconda3\lib\site-packages (from folium) (2.11.3) Requirement already satisfied: MarkupSafe>=0.23 in c:\users\colto\anaconda3\lib\site-packages (from jinja2>=2.9->folium) (2.0.1) Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\colto\anaconda3\lib\site-packages (from requests->folium) (1.26.9) Requirement already satisfied: idna<4,>=2.5 in c:\users\colto\anaconda3\lib\site-packages (from requests->folium) (3.3) Requirement already satisfied: certifi>=2017.4.17 in c:\users\colto\anaconda3\lib\site-packages (from requests->folium) (2021.10.8) Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\colto\anaconda3\lib\site-packages (from requests->folium) (2.0.4)
import folium
Champaign = geolocator.geocode('Champaign')
Champaign
Location(Champaign, Champaign County, Illinois, United States, (40.1164841, -88.2430932, 0.0))
Champaign = geolocator.geocode('Champaign')
champaign_map
Location(Champaign, Champaign County, Illinois, United States, (40.1164841, -88.2430932, 0.0))
# Add a marker
folium.Marker(location=([Illini_Union.raw['lat'], Illini_Union.raw['lon']]), tooltip = 'click me',
popup="Illini Union").add_to(champaign_map)
champaign_map
champaign_map.save("Data/Champaign-map.html")
# First, reload the champaign_map to drop the added Marker of Illini Union
# Then for each place in Champaign_df you created for Task 1, add it as a Marker, with location defined by the
# latitude and longitude values in the DataFrame, popup value as the place names in the dataframe
champaign_map = folium.Map(location=[Cham.latitude, Cham.longitude], zoom_start=14)
for i in range(len(Champaign_df)):
folium.Marker(
location=[Champaign_df['Latitude'][i], Champaign_df['Longitude'][i]],
popup=Champaign_df['Place'][i],
).add_to(champaign_map)
champaign_map
# Scottish witchcraft dataset
# (http://witches.hca.ed.ac.uk/#:~:text=The%20database%20contains%20all%20people,to%20social%20and%20cultural%20history)
df = pd.read_csv('data/accused_witches.csv')
df
| AccusedRef | AccusedSystemId | AccusedID | FirstName | LastName | M_Firstname | M_Surname | Alias | Patronymic | DesTitle | ... | SocioecStatus | Occupation | Notes | Createdby | Createdate | Lastupdatedby | Lastupdatedon | Family of Accused | Cases | Date | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | A/EGD/10 | EGD | 10.0 | Mareon | Quheitt | Marion | White | NaN | NaN | NaN | ... | NaN | NaN | NaN | SMD | 5/15/01 11:06 | jhm | 8/9/02 11:40 | NaN | C/EGD/21 | 3/4/1661 |
| 1 | A/EGD/100 | EGD | 100.0 | Thom | Cockburn | Thomas | Cockburn | NaN | NaN | NaN | ... | NaN | NaN | NaN | SMD | 5/15/01 11:06 | jhm | 10/2/02 10:32 | NaN | C/EGD/111 | 1591 |
| 2 | A/EGD/1000 | EGD | 1000.0 | Christian | Aitkenhead | Christine | Aikenhead | NaN | NaN | NaN | ... | NaN | NaN | NaN | SMD | 5/15/01 11:06 | jhm | 10/1/02 10:48 | AF/LA/150 | C/EGD/1011 | 6/5/1628 |
| 3 | A/EGD/1001 | EGD | 1001.0 | Janet | Ireland | Janet | Ireland | NaN | NaN | NaN | ... | NaN | NaN | NaN | SMD | 5/15/01 11:06 | jhm | 10/1/02 10:49 | AF/LA/151 | C/EGD/1012 | 6/5/1628 |
| 4 | A/EGD/1002 | EGD | 1002.0 | Agnes | Hendersoun | Agnes | Henderson | NaN | NaN | NaN | ... | NaN | NaN | NaN | SMD | 5/15/01 11:06 | jhm | 10/1/02 10:50 | NaN | C/EGD/1013 | 3/7/1628 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4378 | 20/10/1637 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4379 | 17/2/1642 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4380 | 24/10/1628 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4381 | 25/10/1651 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4382 | 4/1568 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4383 rows × 36 columns
# Count the number of witches accused in each county
county_witches = df.groupby("Res_county")["AccusedRef"].count()
county_witches
Res_county Aberdeen 175 Argyll 6 Ayr 153 Banff 9 Berwick 126 Bute 54 Caithness 52 Clackmannan 18 Cromarty 2 Dumfries 78 Dunbarton 25 Edinburgh 374 Elgin 28 Fife 382 Forfar 82 Haddington 543 Inverness 45 Kincardine 2 Kinross 7 Kirkcudbright 35 Lanark 77 Linlithgow 114 Nairn 55 Orkney 72 Peebles 91 Perth 109 Renfrew 124 Ross 74 Roxburgh 60 Selkirk 21 Shetland 28 Stirling 53 Sutherland 15 Wigtown 15 Name: AccusedRef, dtype: int64
# Obtain the latitude and longitude of each county by getting the median of the all latitude and longitude records
# under the name of that county. Median is chose so that incorrect record will be ignored
# (the incorrect data can only be the outliers and will not influence the median),
# and convert them to dictionaries.
county_latitude = df.groupby("Res_county")["latitude"].median()
county_longitude = df.groupby("Res_county")["longitude"].median()
county_latitude_dict = county_latitude.to_dict()
county_longitude_dict = county_longitude.to_dict()
print(county_latitude_dict, county_longitude_dict)
{'Aberdeen': 57.146262, 'Argyll': 56.4006, 'Ayr': 55.457393, 'Banff': 57.666252, 'Berwick': 55.781124, 'Bute': 55.8274, 'Caithness': 58.4389, 'Clackmannan': 56.108159, 'Cromarty': 57.6806, 'Dumfries': 55.068632, 'Dunbarton': 56.83301, 'Edinburgh': 55.92826, 'Elgin': 57.648022, 'Fife': 56.253397, 'Forfar': 56.639836, 'Haddington': 55.955755, 'Inverness': 57.479549, 'Kincardine': 56.068975, 'Kinross': 56.211141, 'Kirkcudbright': 54.835888, 'Lanark': 55.674898, 'Linlithgow': 55.9716, 'Nairn': 57.585033, 'Orkney': 58.9809, 'Peebles': 55.651467, 'Perth': 56.395704, 'Renfrew': 55.874645, 'Ross': 56.83301, 'Roxburgh': nan, 'Selkirk': 55.548073, 'Shetland': 60.305229, 'Stirling': 56.281067, 'Sutherland': 58.249999, 'Wigtown': 54.868426} {'Aberdeen': -2.136575, 'Argyll': -5.4807, 'Ayr': -4.628716, 'Banff': -2.52426, 'Berwick': -2.011552, 'Bute': -5.0936, 'Caithness': -3.0937, 'Clackmannan': -3.747183, 'Cromarty': -4.0347, 'Dumfries': -3.608237, 'Dunbarton': -4.180209, 'Edinburgh': -3.275582, 'Elgin': -3.320025, 'Fife': -3.134239, 'Forfar': -2.893115, 'Haddington': -2.783795, 'Inverness': -4.237208, 'Kincardine': -3.715198, 'Kinross': -3.425103, 'Kirkcudbright': -4.049153, 'Lanark': -3.773292, 'Linlithgow': -3.6026, 'Nairn': -3.869079, 'Orkney': -2.9605, 'Peebles': -3.191026, 'Perth': -3.435877, 'Renfrew': -4.389173, 'Ross': -4.180209, 'Roxburgh': nan, 'Selkirk': -2.839351, 'Shetland': -1.294066, 'Stirling': -4.436736, 'Sutherland': -4.499998, 'Wigtown': -4.442783}
# Create a dataframe to record the county names, number of cases, and their latitudes and longitudes
county_name = []
county_witch_num = []
county_latitude = []
county_longitude = []
for i in range(len(county_witches)):
current_county_name = county_witches.index[i]
county_name.append(current_county_name)
county_witch_num.append(county_witches[i])
county_latitude.append(county_latitude_dict[current_county_name])
county_longitude.append(county_longitude_dict[current_county_name])
data_df = pd.DataFrame({
'name': county_name,
'case number': county_witch_num,
'latitude': county_latitude,
'longitude': county_longitude,
})
data_df['case number'] = data_df['case number'].astype(float)
data_df = data_df.dropna().reset_index(drop=True) # remove the counties with na values and reset the index
data_df
| name | case number | latitude | longitude | |
|---|---|---|---|---|
| 0 | Aberdeen | 175.0 | 57.146262 | -2.136575 |
| 1 | Argyll | 6.0 | 56.400600 | -5.480700 |
| 2 | Ayr | 153.0 | 55.457393 | -4.628716 |
| 3 | Banff | 9.0 | 57.666252 | -2.524260 |
| 4 | Berwick | 126.0 | 55.781124 | -2.011552 |
| 5 | Bute | 54.0 | 55.827400 | -5.093600 |
| 6 | Caithness | 52.0 | 58.438900 | -3.093700 |
| 7 | Clackmannan | 18.0 | 56.108159 | -3.747183 |
| 8 | Cromarty | 2.0 | 57.680600 | -4.034700 |
| 9 | Dumfries | 78.0 | 55.068632 | -3.608237 |
| 10 | Dunbarton | 25.0 | 56.833010 | -4.180209 |
| 11 | Edinburgh | 374.0 | 55.928260 | -3.275582 |
| 12 | Elgin | 28.0 | 57.648022 | -3.320025 |
| 13 | Fife | 382.0 | 56.253397 | -3.134239 |
| 14 | Forfar | 82.0 | 56.639836 | -2.893115 |
| 15 | Haddington | 543.0 | 55.955755 | -2.783795 |
| 16 | Inverness | 45.0 | 57.479549 | -4.237208 |
| 17 | Kincardine | 2.0 | 56.068975 | -3.715198 |
| 18 | Kinross | 7.0 | 56.211141 | -3.425103 |
| 19 | Kirkcudbright | 35.0 | 54.835888 | -4.049153 |
| 20 | Lanark | 77.0 | 55.674898 | -3.773292 |
| 21 | Linlithgow | 114.0 | 55.971600 | -3.602600 |
| 22 | Nairn | 55.0 | 57.585033 | -3.869079 |
| 23 | Orkney | 72.0 | 58.980900 | -2.960500 |
| 24 | Peebles | 91.0 | 55.651467 | -3.191026 |
| 25 | Perth | 109.0 | 56.395704 | -3.435877 |
| 26 | Renfrew | 124.0 | 55.874645 | -4.389173 |
| 27 | Ross | 74.0 | 56.833010 | -4.180209 |
| 28 | Selkirk | 21.0 | 55.548073 | -2.839351 |
| 29 | Shetland | 28.0 | 60.305229 | -1.294066 |
| 30 | Stirling | 53.0 | 56.281067 | -4.436736 |
| 31 | Sutherland | 15.0 | 58.249999 | -4.499998 |
| 32 | Wigtown | 15.0 | 54.868426 | -4.442783 |
data_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 33 entries, 0 to 32 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 33 non-null object 1 case number 33 non-null float64 2 latitude 33 non-null float64 3 longitude 33 non-null float64 dtypes: float64(3), object(1) memory usage: 1.2+ KB
# Get the location of Scotland
Scotland = geolocator.geocode('Scotland')
Scotland
Location(Alba / Scotland, United Kingdom, (56.7861112, -4.1140518, 0.0))
# Create the map of Scotland. Zoom_start is smaller than Champaign map because we want to cover a larger area
Scotland_map = folium.Map(location=[Scotland.raw['lat'], Scotland.raw['lon']], zoom_start=6)
Scotland_map
# Add circles to the map, each representing the witch accused data of a county
for i in range(len(data_df)):
folium.Circle(location = (data_df['latitude'][i], data_df['longitude'][i]),
radius = data_df['case number'][i]*100,
tooltip = data_df['name'][i]).add_to(Scotland_map)
Scotland_map
# Recreate the map of Scottish witchcraft. First, reload the map with the location of Scottish capital "Edinburgh",
# and set the zoom_start as 7 to focus on the surrounding area of the city. Then add the data of each county into the new map,
# but the radius will be 1000*the square root (this can be obtained by importing "math" and use "math.sqrt") of the case number,
# to get a smoothed result. Besides, add popup according to the format "County_name Case Number: number"
# (e.g., Edinburgh Case Number: 374.0), and set "fill" as True. Finally display the new map.
import math
Edinburgh = geolocator.geocode('Edinburgh, Scotland')
Scotland_map = folium.Map(location=[Edinburgh.latitude, Edinburgh.longitude], zoom_start=7)
for i in range(len(data_df)):
radius = 1000 * math.sqrt(data_df['case number'][i])
popup_text = f"{data_df['name'][i]} Case Number: {data_df['case number'][i]}"
folium.Circle(location=(data_df['latitude'][i], data_df['longitude'][i]), radius=radius, popup=popup_text, fill=True,).add_to(Scotland_map)
Scotland_map
# Using titles in folium
folium.Map(location=[Scotland.raw['lat'], Scotland.raw['lon']], tiles = 'cartodbpositron', zoom_start=6)
# Using external background
folium.Map(location=[Scotland.raw['lat'], Scotland.raw['lon']],
zoom_start=4,
tiles='http://services.arcgisonline.com/arcgis/rest/services/NatGeo_World_Map/MapServer/tile/{z}/{y}/{x}',
attr="Sources: National Geographic, Esri, Garmin, HERE, UNEP-WCMC, USGS, NASA, ESA, METI, NRCAN, GEBCO, NOAA, INCREMENT P")
# And this not necessarily need to be real world...
folium.Map(location=[0, 30],
zoom_start=4, min_zoom=4, max_zoom=10,
max_bounds=True,
min_lon=0, max_lon=70, min_lat=-40, max_lat=40,
tiles='https://cartocdn-gusc.global.ssl.fastly.net//ramirocartodb/api/v1/map/named/tpl_756aec63_3adb_48b6_9d14_331c6cbc47cf/all/{z}/{x}/{y}.png',
attr='Textures and Icons from https://www.textures.com/ & https://thenounproject.com/')
# US states geojson file (obtained from https://github.com/python-visualization/folium/blob/main/examples/data/us-states.json)
US_states = "Data/us-states.json"
# Include the US States boundaries in the US Map
US_map = folium.Map(location=[42, -102], zoom_start=4)
folium.Choropleth(
geo_data = US_states,
).add_to(US_map)
US_map
# US unemployment rate data (obtained from https://www.kaggle.com/datasets/aniruddhasshirahatti/us-unemployment-dataset-2010-2020?resource=download)
US_unemployment = pd.read_csv("Data/unemployment_data_us_state.csv")
US_unemployment
| State | Unemployment_Rate_Jan_20 | Unemployment_Rate_Feb_20 | Unemployment_Rate_Mar_20 | |
|---|---|---|---|---|
| 0 | Alabama | 2.7 | 2.7 | 3.5 |
| 1 | Alaska | 6.0 | 5.8 | 5.6 |
| 2 | Arizona | 4.5 | 4.5 | 5.5 |
| 3 | Arkansas | 3.5 | 3.9 | 4.8 |
| 4 | California | 3.9 | 2.5 | 5.3 |
| 5 | Colorado | 2.5 | 2.8 | 4.5 |
| 6 | Connecticut | 3.7 | 3.9 | 3.7 |
| 7 | Delaware | 4.0 | 5.2 | 5.1 |
| 8 | D.C. | 5.2 | 2.8 | 6.0 |
| 9 | Florida | 2.8 | 3.1 | 4.3 |
| 10 | Georgia | 3.1 | 2.7 | 4.2 |
| 11 | Hawaii | 2.7 | 2.7 | 2.6 |
| 12 | Idaho | 2.8 | 3.4 | 2.6 |
| 13 | Illinois | 3.5 | 3.1 | 4.6 |
| 14 | Indiana | 3.1 | 2.8 | 3.2 |
| 15 | Iowa | 2.8 | 3.1 | 3.7 |
| 16 | Kansas | 3.1 | 4.2 | 3.1 |
| 17 | Kentucky | 4.3 | 5.2 | 5.8 |
| 18 | Louisiana | 5.1 | 3.2 | 6.1 |
| 19 | Maine | 3.1 | 3.3 | 3.2 |
| 20 | Maryland | 3.3 | 2.8 | 3.3 |
| 21 | Massachusetts | 2.8 | 3.6 | 2.9 |
| 22 | Michigan | 3.8 | 3.1 | 4.1 |
| 23 | Minnesota | 3.2 | 5.4 | 3.1 |
| 24 | Mississippi | 5.5 | 3.5 | 5.3 |
| 25 | Missouri | 3.5 | 3.5 | 4.5 |
| 26 | Montana | 3.5 | 2.9 | 3.5 |
| 27 | Nebraska | 3.9 | 3.6 | 4.2 |
| 28 | Nevada | 3.6 | 3.6 | 6.3 |
| 29 | New Hampshire | 2.6 | 3.8 | 2.6 |
| 30 | New Jersey | 3.8 | 4.8 | 3.8 |
| 31 | New Mexico | 4.8 | 3.7 | 5.9 |
| 32 | New York | 3.8 | 3.6 | 4.5 |
| 33 | North Carolina | 3.6 | 2.2 | 4.4 |
| 34 | North Dakota | 2.3 | 4.2 | 2.2 |
| 35 | Ohio | 4.1 | 3.2 | 5.5 |
| 36 | Oklahoma | 3.3 | 3.3 | 3.1 |
| 37 | Oregon | 3.3 | 4.7 | 3.3 |
| 38 | Pennsylvania | 4.7 | 3.4 | 6.0 |
| 39 | Rhode Island | 3.1 | 2.5 | 4.6 |
| 40 | South Carolina | 2.4 | 3.3 | 2.6 |
| 41 | South Dakota | 3.4 | 3.4 | 3.3 |
| 42 | Tennessee | 3.3 | 3.5 | 3.5 |
| 43 | Texas | 4.5 | 2.5 | 4.7 |
| 44 | Utah | 2.5 | 3.4 | 3.6 |
| 45 | Vermont | 2.4 | 2.6 | 3.2 |
| 46 | Virginia | 2.7 | 3.8 | 3.3 |
| 47 | Washington | 3.9 | 4.9 | 5.1 |
| 48 | West Virginia | 5.0 | 3.7 | 6.1 |
| 49 | Wisconsin | 3.5 | 2.7 | 3.4 |
| 50 | Wyoming | 3.7 | 5.8 | 3.7 |
# Change column name to match the geojson data
US_unemployment = US_unemployment.rename(columns={'State': 'name'})
US_unemployment
| name | Unemployment_Rate_Jan_20 | Unemployment_Rate_Feb_20 | Unemployment_Rate_Mar_20 | |
|---|---|---|---|---|
| 0 | Alabama | 2.7 | 2.7 | 3.5 |
| 1 | Alaska | 6.0 | 5.8 | 5.6 |
| 2 | Arizona | 4.5 | 4.5 | 5.5 |
| 3 | Arkansas | 3.5 | 3.9 | 4.8 |
| 4 | California | 3.9 | 2.5 | 5.3 |
| 5 | Colorado | 2.5 | 2.8 | 4.5 |
| 6 | Connecticut | 3.7 | 3.9 | 3.7 |
| 7 | Delaware | 4.0 | 5.2 | 5.1 |
| 8 | D.C. | 5.2 | 2.8 | 6.0 |
| 9 | Florida | 2.8 | 3.1 | 4.3 |
| 10 | Georgia | 3.1 | 2.7 | 4.2 |
| 11 | Hawaii | 2.7 | 2.7 | 2.6 |
| 12 | Idaho | 2.8 | 3.4 | 2.6 |
| 13 | Illinois | 3.5 | 3.1 | 4.6 |
| 14 | Indiana | 3.1 | 2.8 | 3.2 |
| 15 | Iowa | 2.8 | 3.1 | 3.7 |
| 16 | Kansas | 3.1 | 4.2 | 3.1 |
| 17 | Kentucky | 4.3 | 5.2 | 5.8 |
| 18 | Louisiana | 5.1 | 3.2 | 6.1 |
| 19 | Maine | 3.1 | 3.3 | 3.2 |
| 20 | Maryland | 3.3 | 2.8 | 3.3 |
| 21 | Massachusetts | 2.8 | 3.6 | 2.9 |
| 22 | Michigan | 3.8 | 3.1 | 4.1 |
| 23 | Minnesota | 3.2 | 5.4 | 3.1 |
| 24 | Mississippi | 5.5 | 3.5 | 5.3 |
| 25 | Missouri | 3.5 | 3.5 | 4.5 |
| 26 | Montana | 3.5 | 2.9 | 3.5 |
| 27 | Nebraska | 3.9 | 3.6 | 4.2 |
| 28 | Nevada | 3.6 | 3.6 | 6.3 |
| 29 | New Hampshire | 2.6 | 3.8 | 2.6 |
| 30 | New Jersey | 3.8 | 4.8 | 3.8 |
| 31 | New Mexico | 4.8 | 3.7 | 5.9 |
| 32 | New York | 3.8 | 3.6 | 4.5 |
| 33 | North Carolina | 3.6 | 2.2 | 4.4 |
| 34 | North Dakota | 2.3 | 4.2 | 2.2 |
| 35 | Ohio | 4.1 | 3.2 | 5.5 |
| 36 | Oklahoma | 3.3 | 3.3 | 3.1 |
| 37 | Oregon | 3.3 | 4.7 | 3.3 |
| 38 | Pennsylvania | 4.7 | 3.4 | 6.0 |
| 39 | Rhode Island | 3.1 | 2.5 | 4.6 |
| 40 | South Carolina | 2.4 | 3.3 | 2.6 |
| 41 | South Dakota | 3.4 | 3.4 | 3.3 |
| 42 | Tennessee | 3.3 | 3.5 | 3.5 |
| 43 | Texas | 4.5 | 2.5 | 4.7 |
| 44 | Utah | 2.5 | 3.4 | 3.6 |
| 45 | Vermont | 2.4 | 2.6 | 3.2 |
| 46 | Virginia | 2.7 | 3.8 | 3.3 |
| 47 | Washington | 3.9 | 4.9 | 5.1 |
| 48 | West Virginia | 5.0 | 3.7 | 6.1 |
| 49 | Wisconsin | 3.5 | 2.7 | 3.4 |
| 50 | Wyoming | 3.7 | 5.8 | 3.7 |
# Visualize based on unemployment rate of January 2020
US_map = folium.Map(location=[42, -102], zoom_start=4)
folium.Choropleth(
geo_data = US_states, # Geo_data to be used
data = US_unemployment, # Data used for visualization
columns = ['name', 'Unemployment_Rate_Jan_20'], # First column is the key to match, second column is the value to display
key_on = 'feature.properties.name', # The matched key in geo_data
fill_color = 'OrRd', # Seelct a color scheme
line_opacity = 0.2, # Select line opacity
legend_name= 'Unemployment Rate by State in January 2020', # Choose a name for the legend
).add_to(US_map)
US_map
# Add tooltip to display the state names
tooltip = folium.features.GeoJson(
US_states,
tooltip=folium.features.GeoJsonTooltip(fields=['name'], localize=True)
)
US_map.add_child(tooltip)
US_map
# Recreate the map of US unemployment. First, create a column 'Unemployment_Rate_2020_spring' in the DataFrame, as the average
# of the unemployment rates of January, Feburary, and March. Then, reload the map using the geocode of "USA" in gelocator.
# Next, create folium.Choropleth, and set the values of the newly-created column Unemployment_Rate_2020_spring, fill_color as
# "GnBu" (green and blue), and line_opacity as 0.3. Also, change the legend name to reflect the change of data.
# Finally, add the State names as tooltip, and display the new map.
US_unemployment['Unemployment_Rate_2020_spring'] = (US_unemployment['Unemployment_Rate_Jan_20']+
US_unemployment['Unemployment_Rate_Feb_20']+
US_unemployment['Unemployment_Rate_Mar_20'])/3
USA = geolocator.geocode('USA')
US_map = folium.Map(location=[USA.latitude, USA.longitude], zoom_start=4)
folium.Choropleth(
geo_data = US_states,
data = US_unemployment,
columns = ['name', 'Unemployment_Rate_2020_spring'],
key_on = 'feature.properties.name',
fill_color = 'GnBu',
line_opacity = 0.3,
legend_name= 'Unemployment Rate spring 2020',).add_to(US_map)
tooltip = folium.features.GeoJson(
US_states,
tooltip=folium.features.GeoJsonTooltip(fields=['name'], localize=True)
)
US_map.add_child(tooltip)
US_map
This task is designed to enhance your critical analysis and interpretation skills when dealing with visualizations.
Write a short paragraph in the following cell for Task 3 to draw conclusions from the data visualization of the geographical distribution of witchcraft cases in Scotland. Additionally, make some assumptions regarding the reasons behind this distribution.
Then, write a separate paragraph in the next cell for Task 4 to examine areas in the US with higher or lower unemployment rates in Spring 2020. Also, make assumptions regarding the reasons for this distribution.
Finally, remember to convert the cell type for both cells to "markdown" and execute it.
For the distribution of witchcraft cases in Scotland we can see that through the congregation of major roads/rivers that these are cities. This including the fact that one of them is for sure the capital of Edinburgh this makes sense. As the increase in cases in these areas would be because of the higher populations compared to the countryside/rural areas of scotland. Individuals tend to congregate more leading to altercations or accusations of witchcraft. Also there may be more individuals interested in practicing witchcraft but many of these cases were probably be false cases of witchcraft. But these larger populations in cities is why we see more cases like in Edinburgh, the surrounding areas, and Glasgow.
The reason for this distribution may be more nuanced. We see areas with the highest rates of unemployment were states like West Virginia, Virginia, Arizona, Missippi, Lousiana, and Alaska. Many of these states with Arizona and Virginia being outliers have a tendency to be pretty rural states and West Virginia/Missippi are the poorest states in the country. Since they are so poor with a lacking in Urban development there may less job opportunities as businesses don't want to set up in these states also with lower populations. This also pertains to Alaska as they lack population and don't share the same benefit of interstate commerce like other states can benefit from due to Alaska's isolated nature. We can see support for this trend as the most urban, wealthy, and large states like California, Texas, New York, and Illinois all are in the lesser portion of states suffering from unemployment.
Reference for some of my state assumptions: https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_GDP#:~:text=GDP%20per%20capita%20also%20varied,recorded%20the%20three%20lowest%20GDP