How to get a random document favoring a higher value?

Hey!

Let’s say I have a raffle with documents looking like this

{
  name: "John",
  tickets: 5
}

I want to get a random document, but favoring the documents with the highest tickets value

Hi @Victor_Back1 and welcome in the MongoDB Community :muscle: !

Here is my solution using the aggregation pipeline:

[
  {
    '$group': {
      '_id': '$tickets', 
      'maxis': {
        '$push': '$$ROOT'
      }
    }
  }, {
    '$sort': {
      '_id': -1
    }
  }, {
    '$limit': 1
  }, {
    '$unwind': {
      'path': '$maxis'
    }
  }, {
    '$replaceRoot': {
      'newRoot': '$maxis'
    }
  }, {
    '$sample': {
      'size': 1
    }
  }
]

Here it is in action in a Python 3 example:

from pprint import pprint

from faker import Faker
from pymongo import MongoClient

fake = Faker()


def rand_tickets():
    return [{
        'firstname': fake.first_name(),
        'tickets': fake.pyint(min_value=1, max_value=10)
    } for _ in range(10000)]


if __name__ == '__main__':
    client = MongoClient()
    db = client.get_database('test')
    tickets = db.get_collection('tickets')

    tickets.drop()
    tickets.create_index("tickets")
    tickets.insert_many(rand_tickets())

    pipeline = [
        {
            '$group': {
                '_id': '$tickets',
                'maxis': {
                    '$push': '$$ROOT'
                }
            }
        }, {
            '$sort': {
                '_id': -1
            }
        }, {
            '$limit': 1
        }, {
            '$unwind': {
                'path': '$maxis'
            }
        }, {
            '$replaceRoot': {
                'newRoot': '$maxis'
            }
        }, {
            '$sample': {
                'size': 1
            }
        }
    ]

    for raffle in range(1, 6):
        print("Raffle #" + str(raffle))
        for res in tickets.aggregate(pipeline):
            pprint(res)
            print()

Which print in the end 5 times 1 random document selected among the one with the maximum number of tickets.

Raffle #1
{'_id': ObjectId('602fac3b9fe949b1c71beb73'),
 'firstname': 'Jeremy',
 'tickets': 10}

Raffle #2
{'_id': ObjectId('602fac3b9fe949b1c71bcf79'),
 'firstname': 'James',
 'tickets': 10}

Raffle #3
{'_id': ObjectId('602fac3b9fe949b1c71bd5c6'),
 'firstname': 'Sarah',
 'tickets': 10}

Raffle #4
{'_id': ObjectId('602fac3b9fe949b1c71be15c'),
 'firstname': 'Jocelyn',
 'tickets': 10}

Raffle #5
{'_id': ObjectId('602fac3b9fe949b1c71bd898'),
 'firstname': 'Crystal',
 'tickets': 10}

Note: You could optimize this query if you can limit number of documents from the start if you have an idea of the numbers of tickets. For example if you know that there is always some people with at least 5 tickets. You could add a {$match : {tickets: {$gt: 5}}} directly at the top.

Cheers,
Maxime.

Hello @Victor_Back1, welcome to the MongoDB Community forum!

Here is an aggregation query which will get you a random document with highest of the tickets value. Note the query runs from the mongo shell.

Lets take these six sample documents, of these there are three of them with the highest ticket value of 5. The aggregation gets one of these three documents, randomly for each query run:

{
  name: "Jim",
  tickets: 3
},
{
  name: "Joe",
  tickets: 5     // <- highest
},
{
  name: "Jack",
  tickets: 1
},
{
  name: "Jane",
  tickets: 5    // <- highest
},
{
  name: "John",
  tickets: 2
},
{
  name: "Jon",
  tickets: 5    // <- highest
}

The aggregation:

db.collection.aggregate([ 
{ 
  $group: { 
      _id: null, 
      docs: { $push: "$$ROOT" }, 
      max: { $max: "$tickets" } 
  } 
}, 
{ 
  $addFields: { 
      max_docs: { 
          $filter: { 
              input: "$docs", 
              cond: { 
                  $eq: [ "$$this.tickets", "$max" ]
              }
          }
      }
  }
},
{ 
  $project: {
      _id: 0, 
      random_doc: { 
          $arrayElemAt: [ 
              "$max_docs", 
              { $floor: { $multiply: [ _rand(), { $floor:{ $size: "$max_docs" } } ] } } 
          ] 
      }
  }
},
{
  $replaceWith: "$random_doc" 
}
]).pretty()

The example output:

{
        "_id" : ObjectId("602fae75603389f49bd5533d"),
        "name" : "Jon",
        "tickets" : 5
}

Thank you for your responses, but I think you’ve misunderstood. I want everyone to have a chance to be selected, but people with more tickets have a higher chance of being selected