Word cloud of a field using aggregation

Hi everyone

I was trying to retrieve all occurrences of a headline field in my collection but I got only an array of my headlines. Could you tell me if it is possible to get an string of this array?

Below my pipeline:

[{$match: {

  dtNews:{$gt: "2019-10-01"}

}}, {$project: {

  str_title:1

}}, {$group: {

  _id: null,

  tex: {

    $addToSet: "$str_title"

  }

}}]

Ezequias.

Could you please give us an sample document?

[{
        $match: {
            dtNews: {
                $gt: "2019-10-01"
            }
        }
    } {
        "$group": {
            "_id": {
                "__alias_0": "$str_title"
            },
            "__alias_1": {
                "$sum": 1
            }
        }
    },
    {
        "$project": {
            "_id": 0,
            "__alias_0": "$_id.__alias_0",
            "__alias_1": 1
        }
    },
    {
        "$project": {
            "text": "$__alias_0",
            "size": "$__alias_1",
            "_id": 0
        }
    }
]

I have sample news documents like this:

{ _id: 1, title: "hello world", content: "some content 1" },
{ _id: 2, title: "Hello World", content: "some content 2" },
{ _id: 3, title: "Wonderful World", content: "some content 3" },
{ _id: 4, title: "Lovely World", content: "some content 4" }

and, using this aggregation query:

db.news.aggregate( [
  { 
      $group: { 
          _id: null, 
          titlesArr: { $push: "$title" } 
      } 
  },
  { 
      $project: {
          _id: 0, 
          tex: { 
              $reduce: {
                  input: "$titlesArr",
                  initialValue: "",
                  in: {
                      $concat : ["$$value", "$$this", " "]
                  }
              }
          }
      } 
  }
] )

I get this result:

{ "tex" : "hello world Hello World Wonderful World Lovely World " }

3 Likes

I think I misunderstood you before, looks like you are looking for distinct values;

db.getCollection("news").distinct("title");

result:

[
    "Hello World", 
    "Lovely World", 
    "Wonderful World", 
    "hello world"
]

Is this the output you are looking for?

Thank you so much. It worked perfectly.

Dear @Prasad_Saya

Your solution was perfect. I don’t know if it is appropriate to ask here but I would like to know why this concatenation does not appears in MongoDB Compass Agregation tab (not even in a view).

Could you give me some advice?

I meant with large dataset. With your data sample it appears ok but with more information the presentation at Compass does not show full data.

I can only see it at console mode.

Sincerely
Ezequias Rocha

With a larger data set (about 10,000 documents) I found that in the Compass’s Aggregation, the $group stage output showed all data. But, the $project stage’s result view showed only first few words of the text, as in the attached picture.

I am using MongoDB version 4.2 and Compass 1.19.

1 Like

Have you try disabling the “SAMPLE MODE” and “AUTO PREVIEW” options. May be it does that to make building the pipeline more responsive.

I am using 1.20.5 and the visualization isn’t also not so ok.

I noticed that if I create a view of this agregation I could see it in the second type of visualization as you can see below:

I think it is a solution but it would wrap text if it gets over 200 columns.

Sincerely
Ezequias Rocha

How to modify this options on Compass @steevej?

Ezequias.

Thank you @coderkid I was trying to get only one string with all words collected. I already get the correct agregation.

Best regards
Ezequias

If you look at the screenshot of Prasad_Saya you will see two options at the right almost at the top.

Thank you @steevej it does not change anything for me.