Can someone please help, I’ve tried searching the internet how to do what I need, looking at my course materials, and using chat gpt. I have to use MongoDB Compass to hold my csv data. I made a database called energy_data, within that are the following collections: biomass, geothermal, hydropower, solar, and wind. In those collections the data is columns of years from 2000 to 2021 and the rows are the 50 states. I have the following code to input the data into a map using python and jupyter notebook and the folium library.
import requests
import pandas as pd
import folium
from ipywidgets import widgets
from IPython.display import display
def load_energy_data(file_path, energy_type):
data = pd.read_csv(file_path)
data.set_index(‘State’, inplace=True)
data.rename(columns=lambda x: f"{energy_type}_{x}", inplace=True)
return data
biomass_data = load_energy_data(‘biomass_energy_consump2.csv’, ‘Biomass’)
geothermal_data = load_energy_data(‘geothermal_energy_consump2.csv’, ‘Geothermal’)
hydro_data = load_energy_data(‘hydropower_energy_consump2.csv’, ‘Hydropower’)
solar_data = load_energy_data(‘solar_energy_consump2.csv’, ‘Solar’)
wind_data = load_energy_data(‘wind_energy_consump2.csv’, ‘Wind’)
df_pivoted = pd.concat([biomass_data, geothermal_data, hydro_data, solar_data, wind_data], axis=1)
kw = {“location”: [41.52, -90.58], “zoom_start”: 5}
def create_map():
return folium.Map(**kw)
midwest_map = create_map()
out = widgets.Output(layout={‘border’: ‘1px solid black’})
midwest_dropdown = widgets.Dropdown(
options=[str(year) for year in df_pivoted.columns],
value=str(df_pivoted.columns[0]),
description=‘Year:’,
disabled=False,
)
def update_map(year):
global midwest_map
midwest_map = create_map()
states = requests.get(
"https://raw.githubusercontent.com/python-visualization/folium-example-data/main/us_states.json"
).json()
ids = ['IL', 'IN', 'IA', 'KS', 'MI', 'MN', 'MO', 'NE', 'ND', 'OH', 'SD', 'WI']
filtered = [feature for feature in states['features'] if feature['id'] in ids]
for f in filtered:
state_id = f['id']
consumption_value = df_pivoted.loc[state_id, year] if state_id in df_pivoted.index else "Data not available"
f['properties']['consumption'] = consumption_value
folium.GeoJson(
data={'type': 'FeatureCollection', 'features': filtered},
name="US States",
style_function=lambda x: {'fillColor': 'green', 'color': 'black', 'weight': 2, "dashArray": "5, 5"},
highlight_function=lambda x: {'weight': 3, 'color': 'green'},
tooltip=folium.GeoJsonTooltip(fields=['name'], aliases=['State:']),
popup=folium.GeoJsonPopup(fields=['name', 'consumption'], aliases=['State:', 'Consumption in BTUs:'])
).add_to(midwest_map)
with out:
display(midwest_map)
def on_dropdown_change(change):
out.clear_output(wait=True)
year = change[‘new’]
update_map(year)
midwest_dropdown.observe(on_dropdown_change, names=‘value’)
display(midwest_dropdown)
display(out)
update_map(midwest_dropdown.value)
when I run this code it works exactly how I need it to. The drop down menu gives you options for each energytype_year. ex. biomass_2000
But this data import is using pandas and for the sake of the project and “repeatability” factor, my instructor wants us to import the data into MongoDB Compass. As stated above I made the database and collections. However when I try to join them, every attempt fails. This is an example of help I received from chatgpt but the when I click on the map, it reads “Data note available”
import pandas as pd
from pymongo import MongoClient
Connect to MongoDB
client = MongoClient(‘mongodb://localhost:27017/’) # Replace ‘localhost’ with your MongoDB host
db = client.energy_data # Select the ‘energy_data’ database
def load_energy_data(collection_name, energy_type, year):
# Retrieve data from MongoDB collection excluding the _id field
data = list(db[collection_name].find({}, {‘_id’: 0}))
# Convert data to DataFrame
df = pd.DataFrame(data)
# Add ‘EnergyType’ and ‘Year’ columns
df[‘EnergyType’] = energy_type
df[‘Year’] = year
return df
Load energy data for each type
biomass_data = load_energy_data(‘biomass’, ‘Biomass’, 2000)
geothermal_data = load_energy_data(‘geothermal’, ‘Geothermal’, 2000)
hydropower_data = load_energy_data(‘hydropower’, ‘Hydropower’, 2000)
solar_data = load_energy_data(‘solar’, ‘Solar’, 2000)
wind_data = load_energy_data(‘wind’, ‘Wind’, 2000)
Concatenate DataFrames along rows axis
dfs = [biomass_data, geothermal_data, hydropower_data, solar_data, wind_data]
df_concatenated = pd.concat(dfs, axis=0, ignore_index=True)
Pivot the DataFrame to combine years with energy types
df_pivoted = df_concatenated.pivot_table(index=‘State’, columns=[‘EnergyType’, ‘Year’], aggfunc=‘first’)
Flatten the multi-index columns
df_pivoted.columns = [‘‘.join(map(str, col)).strip(’’) for col in df_pivoted.columns.values]
Reset index to make ‘State’ a column again
df_pivoted.reset_index(inplace=True)
Display the pivoted DataFrame
print(df_pivoted)
I feel like I am close because it does give some semblance of what the pandas concatenated data looked like, but not completely.
Please help!
-Frustrated Data Analyst Student