MongoDB Java and Spark Driver Upgrade before MongoDB 4.4 Upgrade

Gareth_Furnell · January 23, 2024, 12:10pm

Hello

I plan to upgrade 6 RHEL servers that run MongoDB 4.4 from 7.9 to 8.8 in a few days - on my Replica set of 5 which is compatible and will go through all prior check and after check for a seamless upgrade.

After this upgrade I would need to upgrade MongoDB to 5.0 as 4.4 is becoming EOL, although, prior to this what I have found is that the application drivers that connect to the replica set are using legacy versions:

"driver": { "name": "mongo-java-driver|legacy|mongo-spark", "version": "3.12.3|2.4.1" }, "os": { "type": "Linux", "name": "Linux", "architecture": "amd64", "version": "3.10.0-1160.105.1.el7.x86_64" }, "platform": "Java/Red Hat, Inc./1.8.0_392-b08|Scala/2.11.12:Spark/2.4.8.7.1.9.0-387" },

and I am not 100% sure of the procedure to upgrade these drivers for compatibility with 5.0 as this is the matrix that I found:

or if the spark connector and scala version needs to be updated too.

Any assistance on this would be greatly appreciated.

Kind Regards
Gareth Furnell

chris · January 23, 2024, 1:40pm

Hi @Gareth_Furnell,

The icon in the Java driver compatibility guide indicated that the driver will connect to those versions of MongoDB but not support all the features of that version. You may be able to upgrade the server and catch up on the driver later, but it is always recommended to be running the latest version of the driver This is best tested in a test/staging environment.

The upgrade link provides some guidance for driver upgrades.
https://www.mongodb.com/docs/drivers/java/sync/current/upgrade/

What versions are currently in use?

Gareth_Furnell · January 24, 2024, 7:18am

Hi @chris
I see, so since the driver that currently connects to MongoDB 4.4 is in the category that it would connect but not support all features, there is an instance where data extracts are happening through aggregates sent from Spark.
Version: Scala/2.11.12:Spark/2.4.8.7.1.9.0-387
and what happens is,

            {
                "$match": {
                    "_id": {
                        "$lt": "65b09952fc92b6115f6b07b4"
                    }
                }
            },
            {
                "$match": {
                    "date_index": 202401,
                    "day_index": 24,
                    "hour_index": 7,
                    "field.field": "value"
                }
            },
            {
                "$project": {
                    "field.field": 0
                }
            }

Although the only pipeline made in Spark by us is:

                "$match": {
                    "date_index": 202401,
                    "day_index": 24,
                    "hour_index": 7,
                    "field.field": "value"
                }
            },
            {
                "$project": {
                    "field.field": 0
                }

Therefore I’m not sure why the

{ "$match": { "_id": { "$lt": "65b09952fc92b6115f6b07b4" } } },

is being created or populated, which makes the aggregate go on for over an hour sometimes…

Thanks for the link resource, I will look into it.

system · February 16, 2024, 12:24pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.