School Project Code

Michael_Hesprich · February 8, 2024, 6:01am

Can someone please help, I’ve tried searching the internet how to do what I need, looking at my course materials, and using chat gpt. I have to use MongoDB Compass to hold my csv data. I made a database called energy_data, within that are the following collections: biomass, geothermal, hydropower, solar, and wind. In those collections the data is columns of years from 2000 to 2021 and the rows are the 50 states. I have the following code to input the data into a map using python and jupyter notebook and the folium library.

import requests
import pandas as pd
import folium
from ipywidgets import widgets
from IPython.display import display

def load_energy_data(file_path, energy_type):
data = pd.read_csv(file_path)
data.set_index(‘State’, inplace=True)
data.rename(columns=lambda x: f"{energy_type}_{x}", inplace=True)
return data

biomass_data = load_energy_data(‘biomass_energy_consump2.csv’, ‘Biomass’)
geothermal_data = load_energy_data(‘geothermal_energy_consump2.csv’, ‘Geothermal’)
hydro_data = load_energy_data(‘hydropower_energy_consump2.csv’, ‘Hydropower’)
solar_data = load_energy_data(‘solar_energy_consump2.csv’, ‘Solar’)
wind_data = load_energy_data(‘wind_energy_consump2.csv’, ‘Wind’)

df_pivoted = pd.concat([biomass_data, geothermal_data, hydro_data, solar_data, wind_data], axis=1)

kw = {“location”: [41.52, -90.58], “zoom_start”: 5}

def create_map():
return folium.Map(**kw)

midwest_map = create_map()

out = widgets.Output(layout={‘border’: ‘1px solid black’})

midwest_dropdown = widgets.Dropdown(
options=[str(year) for year in df_pivoted.columns],
value=str(df_pivoted.columns[0]),
description=‘Year:’,
disabled=False,
)

def update_map(year):
global midwest_map
midwest_map = create_map()

states = requests.get(
    "https://raw.githubusercontent.com/python-visualization/folium-example-data/main/us_states.json"
).json()

ids = ['IL', 'IN', 'IA', 'KS', 'MI', 'MN', 'MO', 'NE', 'ND', 'OH', 'SD', 'WI']
filtered = [feature for feature in states['features'] if feature['id'] in ids]
for f in filtered:
    state_id = f['id']
    consumption_value = df_pivoted.loc[state_id, year] if state_id in df_pivoted.index else "Data not available"
    f['properties']['consumption'] = consumption_value

folium.GeoJson(
    data={'type': 'FeatureCollection', 'features': filtered},
    name="US States",
    style_function=lambda x: {'fillColor': 'green', 'color': 'black', 'weight': 2, "dashArray": "5, 5"},
    highlight_function=lambda x: {'weight': 3, 'color': 'green'},
    tooltip=folium.GeoJsonTooltip(fields=['name'], aliases=['State:']),
    popup=folium.GeoJsonPopup(fields=['name', 'consumption'], aliases=['State:', 'Consumption in BTUs:'])
).add_to(midwest_map)
with out:
    display(midwest_map)

def on_dropdown_change(change):
out.clear_output(wait=True)
year = change[‘new’]
update_map(year)

midwest_dropdown.observe(on_dropdown_change, names=‘value’)

display(midwest_dropdown)
display(out)
update_map(midwest_dropdown.value)

when I run this code it works exactly how I need it to. The drop down menu gives you options for each energytype_year. ex. biomass_2000

But this data import is using pandas and for the sake of the project and “repeatability” factor, my instructor wants us to import the data into MongoDB Compass. As stated above I made the database and collections. However when I try to join them, every attempt fails. This is an example of help I received from chatgpt but the when I click on the map, it reads “Data note available”

import pandas as pd
from pymongo import MongoClient

Connect to MongoDB

client = MongoClient(‘mongodb://localhost:27017/’) # Replace ‘localhost’ with your MongoDB host
db = client.energy_data # Select the ‘energy_data’ database

def load_energy_data(collection_name, energy_type, year):
# Retrieve data from MongoDB collection excluding the _id field
data = list(db[collection_name].find({}, {‘_id’: 0}))
# Convert data to DataFrame
df = pd.DataFrame(data)
# Add ‘EnergyType’ and ‘Year’ columns
df[‘EnergyType’] = energy_type
df[‘Year’] = year
return df

Load energy data for each type

biomass_data = load_energy_data(‘biomass’, ‘Biomass’, 2000)
geothermal_data = load_energy_data(‘geothermal’, ‘Geothermal’, 2000)
hydropower_data = load_energy_data(‘hydropower’, ‘Hydropower’, 2000)
solar_data = load_energy_data(‘solar’, ‘Solar’, 2000)
wind_data = load_energy_data(‘wind’, ‘Wind’, 2000)

Concatenate DataFrames along rows axis

dfs = [biomass_data, geothermal_data, hydropower_data, solar_data, wind_data]
df_concatenated = pd.concat(dfs, axis=0, ignore_index=True)

Pivot the DataFrame to combine years with energy types

df_pivoted = df_concatenated.pivot_table(index=‘State’, columns=[‘EnergyType’, ‘Year’], aggfunc=‘first’)

Flatten the multi-index columns

df_pivoted.columns = [‘‘.join(map(str, col)).strip(’’) for col in df_pivoted.columns.values]

Reset index to make ‘State’ a column again

df_pivoted.reset_index(inplace=True)

Display the pivoted DataFrame

print(df_pivoted)

I feel like I am close because it does give some semblance of what the pandas concatenated data looked like, but not completely.

Please help!
-Frustrated Data Analyst Student

Mark_Smith · February 8, 2024, 11:12am

Hello Michael!

There’s a lot of code here that I’m not really familiar with - I’m not a big Pandas user, and I’ve never used Folium. But I think that’s partly the key to solving your problem - there’s a lot of code here, and you appear to be finding it difficult to locate the source of your problem.

You need to break down the problem into smaller steps. I’m not sure I’d recommend using ChatGPT to write your code for you, but maybe it can help by producing small, understandable code snippets.

If you already have working code that brings CSV data into your Pandas dataframe, and then manipulates that data successfully, then maybe you can continue to use that code, but change the step where the data is loaded into the dataframe. This then gives you a very specific area of code that you need to get working - how do you get your data from MongoDB into a dataframe with the correct structure? If you’re working in a Jupyter notebook, then inspecting this is made easier than if you are writing a big Python script and wanting to check things in the middle.

Anyway, I hope this advice, although general, is helpful. It’s helped me solve many problems in the past with code that isn’t doing what I needed it to do.

One extra thing: MongoDB has a really powerful query language that can do many of the things that you can do inside a dataframe, but potentially over much larger data sets. Once you’ve got this working with Pandas, you could have a look to see if an aggregation query could do some of the work for you.

Mark