I am currently working on a web scraping project. The goal is to retrieve the price of the different types of fuel available in the gaz stations (900+) each day . If the price changes, the script will be able to append the new price to my Mongodb database.
The data collected looks like this:
Price_post_api = {
"station_id": 31200009,
"price_detail": [
{
"fuel_id": 1,
"fuel_name": "Gazole",
"fuel_cost": 1.959,
"update_date": {
"$date": "2022-05-30T10:05:22Z"
}
},
{
"fuel_id": 2,
"fuel_name": "SP95",
"fuel_cost": 2.049,
"update_date": {
"$date": "2022-05-30T10:05:23Z"
}
},
{
"fuel_id": 5,
"fuel_name": "E10",
"fuel_cost": 2.009,
"update_date": {
"$date": "2022-05-30T10:05:23Z"
}
}
]
},
I’m having a hard time to figure out how to $push
properly the data in Mongodb based on the "fuel_cost"
field. Here an example of the expected output in the db.
Mongodb_price_data ={
"station_id": 31200009,
"price_detail": [
{
"fuel_id": 1,
"fuel_name": "Gazole",
"fuel_cost": 1.959,
"update_date": {
"$date": "2022-05-30T10:05:22Z"
}
},
{
"fuel_id": 1,
"fuel_name": "Gazole",
"fuel_cost": 35.87,
"update_date": {
"$date": "2022-05-31T10:09:22Z"
}
},
{
"fuel_id": 2,
"fuel_name": "SP95",
"fuel_cost": 2.049,
"update_date": {
"$date": "2022-05-30T10:05:23Z"
}
},
{
"fuel_id": 2,
"fuel_name": "Gazole",
"fuel_cost": 1.59,
"update_date": {
"$date": "2022-07-14T00:10:19Z"
}
},
{
"fuel_id": 5,
"fuel_name": "E10",
"fuel_cost": 2.009,
"update_date": {
"$date": "2022-05-30T10:05:23Z"
}
}
]
}
So far, I have the following function:
def update_new_price(station_id, fuel_id, fuel_name, cost):
query = {'station_id': station_id, 'price_detail.fuel_id': fuel_id,
'price_detail.fuel_name': fuel_name, 'price_detail.fuel_cost': cost}
result = db[CL_PRICE].find(query)
if not list(result):
db[CL_PRICE].update_one(
{'station_id': station_id, 'price_detail.fuel_id': fuel_id,
'price_detail.fuel_name': fuel_name},
{'$push': {'price_detail': {'$each': [
{'fuel_id': fuel_id, 'fuel_name': fuel_name, 'fuel_cost': cost}]}}},upsert=True)
print('new value added: ', {'station_id': station_id, 'fuel_id': fuel_id, 'fuel_name': fuel_name, 'fuel_cost': cost})
else:
print('Already exists: ', {'station_id': station_id, 'fuel_id': fuel_id, 'fuel_name': fuel_name, 'fuel_cost': cost})
The function works great until I get an error message
pymongo.errors.WriteError: The field 'price_detail' must be an array but is of type object in document {no id}, full error: {'index': 0, 'code': 2, 'errmsg': "The field 'price_detail' must be an array but is of type object in document {no id}"}
Any idea why and how can I fix it?