SQL -> count(distinct sessionid) | count(*) |

I am more used to using SQL. I watched many mongodb videos in the hope to find a solution to something that is very easy to do in SQL. I thought it should be simple in MongoDb but it does not appear to be. For example I have three documents:

{
    "_id" : ObjectId("613858b58bd4d7710a0a6c2b"),
    "SessionID" : "5o55spqrhbrtkrd87gjb5rsts4",
    "CreatedDateTime" : ISODate("2020-05-17T13:04:52.000Z")
}

/* 2 */
{
    "_id" : ObjectId("613858b58bd4d7710a0a6c2c"),
    "SessionID" : "5o55spqrhbrtkrd87gjb5rsts4",
    "CreatedDateTime" : ISODate("2020-05-17T13:04:53.000Z")
}

/* 3 */
{
    "_id" : ObjectId("613858b58bd4d7710a0a6c34"),
    "SessionID" : "ocdipkqqb502dut0jg2b3irrj5",
    "CreatedDateTime" : ISODate("2020-05-17T13:13:19.000Z")
}

etc…

I want to get the count of distinct SessionID and also the count of the documents.
In SQL we do this very easy using a count of distinct:

// SELECT Count(DISTINCT SessionID), count(*) FROM reports_stats...

// Result is:
count(distinct sessionid)  --> 2
allCount(*) --> 3

I cannot figure out how to do this in MongoDb! Simple grouping is easy in Mongo. But if I want to count the distinct SessionID and also count ALL of the docs and return it, like the above SQL result - Mongo is not easy to do this!

Is this solvable?

1 Like

You will need to use

One facet for the distinct part using $group and $first.

The other for the total count using $count.

1 Like

Hi Steeve,

Thanks for picking up my post! Are you familiar with Facet? I tried to read it before but I didn’t seem clear to me. Are you able to do an example using the my post? I’ll do some more searches, but it would be helpful f you could do a moch or an example with how I might go about htis using my own example data.

Kind regards,
Daniel

1 Like

Hi @Daniel_Gillett ,

The $facet recieve many pipelines to execute on a single data set. In your case it should look like:

[{$facet : { 
 dist_count : [{$group : {_id : "$sessionId" , count : {$sum : 1}}}],
count :[ {$count : "count"}]},
{$addFields : {dist_count : {$size : "$dist_count"}}}
]

Thanks
Pavel

1 Like

thanks Pavel! The $addFields gives me the following error:
FieldPath field names may not start with ‘$’

The rest of the query looks great!

I bought the mongo course from mongodb .com but for now I am still stuck.

Here is the query, a little easier to read. The $addFields throws an error:

FieldPath field names may not start with ‘$’.

[{
    $facet: {
      dist_count: [
        {
          $group: {
            _id: "$SessionID",
            count: { $sum: 1 },
          },
        },
      ],

      count: [{ $count: "count" }],

      $addFields: {
        dist_count: { $size: "$dist_count" },
      },
    },
  }]

Hi @Daniel_Gillett ,

Ok I had some typos sorry for that.

Here is a working pipeline:

[{$facet : { dist_count : [{$group : {_id : "$SessionID"
}}], count :[ {$count : "count"}]}}, {$addFields : {dist_count : {"$size" :
"$dist_count"}, count : { $first :  "$count.count" }}} ]

Added some tweaks,

Thanks Pavel! Thank you very much! That’s sorted it!

The source data is currently 8.6 Million documents.

Without any indexing or execution plan analysis the Atlas query takes 7.5 seconds.
My local (windows) machine takes 4.15 seconds.

Would you say this query is quite optimized and I should focus on the indexing?

Thanks again for your time and consideration!

Best,
Daniel

Ughhh sounds like the best optimization is to change your data model or precompute the counts …

Try to index and maybe add initial sort stage on SessionID…

Thanks Pavel,
I created an index on two of the fields and got it down to 0.111 seconds! A very big improvement.

I’m not running a query on the Date field → ISODdate → 2020-05-17T13:04:52.000+00:00.
When I added teh indexing, it took much longer!

what is the best way to query a date field (for speed of course)? With quotes, without quotes? I only need the year-month-day and not the time.

Thanks for your help!
Daniel

What is the query? Can you share it?