Issue with $push operator

I am currently working on a web scraping project. The goal is to retrieve the price of the different types of fuel available in the gaz stations (900+) each day . If the price changes, the script will be able to append the new price to my Mongodb database.

The data collected looks like this:

Price_post_api = {

    "station_id": 31200009,
    "price_detail": [
      {
        "fuel_id": 1,
        "fuel_name": "Gazole",
        "fuel_cost": 1.959,
        "update_date": {
          "$date": "2022-05-30T10:05:22Z"
        }
      },
      {
        "fuel_id": 2,
        "fuel_name": "SP95",
        "fuel_cost": 2.049,
        "update_date": {
          "$date": "2022-05-30T10:05:23Z"
        }
      },
      {
        "fuel_id": 5,
        "fuel_name": "E10",
        "fuel_cost": 2.009,
        "update_date": {
          "$date": "2022-05-30T10:05:23Z"
        }
      }
    ]
  },

I’m having a hard time to figure out how to $push properly the data in Mongodb based on the "fuel_cost" field. Here an example of the expected output in the db.

Mongodb_price_data ={
    "station_id": 31200009,
    "price_detail": [
      {
        "fuel_id": 1,
        "fuel_name": "Gazole",
        "fuel_cost": 1.959,
        "update_date": {
          "$date": "2022-05-30T10:05:22Z"
        }
      },
      {
        "fuel_id": 1,
        "fuel_name": "Gazole",
        "fuel_cost": 35.87,
        "update_date": {
          "$date": "2022-05-31T10:09:22Z"
        }
      },
      {
        "fuel_id": 2,
        "fuel_name": "SP95",
        "fuel_cost": 2.049,
        "update_date": {
          "$date": "2022-05-30T10:05:23Z"
        }
      },
      {
        "fuel_id": 2,
        "fuel_name": "Gazole",
        "fuel_cost": 1.59,
        "update_date": {
          "$date": "2022-07-14T00:10:19Z"
        }
      },
      {
        "fuel_id": 5,
        "fuel_name": "E10",
        "fuel_cost": 2.009,
        "update_date": {
          "$date": "2022-05-30T10:05:23Z"
        }
      }
    ]
}

So far, I have the following function:

def update_new_price(station_id, fuel_id, fuel_name, cost):
    query = {'station_id': station_id, 'price_detail.fuel_id': fuel_id,
             'price_detail.fuel_name': fuel_name, 'price_detail.fuel_cost': cost}

    result = db[CL_PRICE].find(query)

    if not list(result):
        db[CL_PRICE].update_one(
            {'station_id': station_id, 'price_detail.fuel_id': fuel_id,
                'price_detail.fuel_name': fuel_name},
            {'$push': {'price_detail': {'$each': [
                {'fuel_id': fuel_id, 'fuel_name': fuel_name, 'fuel_cost': cost}]}}},upsert=True)
        print('new value added: ', {'station_id': station_id, 'fuel_id': fuel_id, 'fuel_name': fuel_name, 'fuel_cost': cost})
    else:
        print('Already exists: ', {'station_id': station_id, 'fuel_id': fuel_id, 'fuel_name': fuel_name, 'fuel_cost': cost})

The function works great until I get an error message

pymongo.errors.WriteError: The field 'price_detail' must be an array but is of type object in document {no id}, full error: {'index': 0, 'code': 2, 'errmsg': "The field 'price_detail' must be an array but is of type object in document {no id}"}

Any idea why and how can I fix it?

Hi @franck_ishemezwe - Welcome to the community.

Not too sure if you’ve solved this yet but I just wanted to confirm a few items:

  1. Based off the “data collected”, please confirm the two documents are the ones you wish to add to the "price_detail" array:
/// Documents to be pushed / added
{
  "fuel_id": 2,
  "fuel_name": "Gazole",
  "fuel_cost": 1.59,
  "update_date": {
    "$date": "2022-07-14T00:10:19Z"
  }
},
{
  "fuel_id": 1,
  "fuel_name": "Gazole",
  "fuel_cost": 35.87,
  "update_date": {
    "$date": "2022-05-31T10:09:22Z"
  }
}
  1. Are there multiple documents that are returned using the query portion of your update? I’m curious to see if any of those documents have the "price_detail" field as an object rather than type array based off the error you provided. Example of this below:
DB> db.stations2.find()
[
  { _id: ObjectId("630fee2ba0eae719ee140850"), price_detail: { a: 1 } }
]
DB> db.stations2.updateOne({},{$push:{"price_detail":{$each:[{a:2}]}}})

MongoServerError: The field 'price_detail' must be an array but is of type object in document {_id: ObjectId('630fee2ba0eae719ee140850')}
  1. Driver version in use
  2. MongoDB version in use

Note: The above examples are performed / shown in mongosh

Additionally, from an initial glance, this type of schema design and work flow may lead to the “price_detail” array growing indefinitely which means that it will most likely encounter issues in future if the application runs long enough due to the fact that it may hit the BSON document size limit. One alternative to this would be perhaps having the "price_detail" as a collection instead of an array. However, it may not reach this point depending on your environment or use case :slightly_smiling_face:

Regards,
Jason

1 Like

Hi @Jason_Tran. Thanks for your answer.

To reply to your questions:

1 - Yes that is correct. Here it’s only a snippet but in production environment we have up to 6 documents to add in "price_detail" array.

2 - Yes multiple documents and they are returning an array.

3 - I am using Pymongo

4 - The latest one, MongoDB 5.0.9 community

5 - You raised a good point here. As I’ m novice to Mongodb world, I designed the schema based on my “understanding”. But basically, I want to build a webapp that give the ability to the end user to:

  • Based on the user location and fuel type, choose the cheapest gaz price around
  • Display the latest fuel price based on his/her location
  • Display the fuel price trend whether it s up or down.

The data provided are from an OpenData website in xml format and are refreshed every 10 minutes. Since I want to keep track of the fuel prices for each station_ID, I came up with this schema design. In a nutshell, I have 2 collections, one for the station information as below:

and the second one for the fuel prices per station

At the end of the day I’m not sure to have the right approach…

Thanks for providing the details requested @franck_ishemezwe

Upon further inspection, it may be possible the error that’s being returned is due to the upsert behaviour as mentioned in the Upsert with Operator Expressions documetation:

If no document matches the query criteria and the <update> parameter is a document with update operator expressions, then the operation creates a base document from the equality clauses in the <query> parameter and applies the expressions from the <update> parameter.

I believe due to the query criteria, the "price_detail" field is of type object rather than an array when it is created as part of the “base document” which will then cause the error when the $push operator is applied.

If it suits your use case, would it be possible for you to utilise $elemMatch at the query portion? The below example is performed via mongosh:

DB>db.collection.find()
/// Empty collection to start
DB> db.collection.updateOne({"price_detail.fuel_id":1},{$push:{"price_detail":{"$each":[1,2,3]}}},{upsert:true})
MongoServerError: The field 'price_detail' must be an array but is of type object in document {no id}
/// Same error returned. Note the query value.
DB> db.collection.updateOne({"price_detail":{$elemMatch:{"fuel_id":1}}},{$push:{"price_detail":{"$each":[1,2,3]}}},{upsert:true})
/// Using $elemMatch instead
{
  acknowledged: true,
  insertedId: ObjectId("6317d956c3d5b1b653dc09bf"),
  matchedCount: 0,
  modifiedCount: 0,
  upsertedCount: 1
}
DB> db.collection.find()
[
  {
    _id: ObjectId("6317d956c3d5b1b653dc09bf"),
    price_detail: [ 1, 2, 3 ]
  }
]

You can alter your current update_one() method accordingly with $elemMatch although this would only be if it suits your use case as I understand your current <query> within the update operation is checking for if the "price_detail.fuel_name" and "price_detail.fuel_id" separately regardless if they exist in the same object within the array. It is highly recommended to test thoroughly on a test environment after making the required changes to verify it suits all your use case(s) and requirements.

Regards,
Jason

1 Like