Matching arrays

Martin_Schneider · October 5, 2020, 7:52pm

Hi everyone!
I’m looking for a solution to find the matching documents in a collection.

I’ve got two types of arrays:

Type A is a storage, whit an array field containing more elements, for example let’s say letters, but not necessary all possible.

Docs of type B containing less elements in the array, also characters but in any variation, order and size. Every character appears just once in a document.

I’m looking for a quick way to find the documents of type B which containing just elements showing up in A.

Example:

storage: [a,b,c,e,f,h,I,j,k,l,o,p,q,z]

Type B docs: 1: [a,b,d]
2: [a,b,c]
3: [p,h,l,j,q]
4: [a,b,d,g]

result: 2,3

Is there any way to define, how many differences (1,2,3…) are allowed to still showing up as a matching result? Any ideas?

steevej · October 5, 2020, 9:42pm

May be you want

Martin_Schneider · October 6, 2020, 8:04am

Not exactly. I dot want to merge them or masking out elements or get an array back!

Following example:

A barkeeper or mixer has several drinks on hes shelf, but not all posible in the world. He is a document of type A.

And there are also documents like recipes for longdriks and cocktails. Those are the type B s. And in the array fild you can find all the ingredients that are needed to mix one of them.

I’m looking for a way, to find out, which one the barkeeper is able to mix from hes stock on the shelf. I need a function which returns these documents from the database.

In the second step I wanna find out, which one he could mix, if he had one or two types of drinks more on hes self. (Ranking them?)

steevej · October 6, 2020, 11:26am

The $setIntersection

A and drink 1 is [ a , b ] != size of drink 1 => cannot be mixed
A and drink 2 is [ a , b , c ] == size of drink 2 => can be mixed
A and drink 3 is [ p , h , l , j , q ] == size of drink 3 => can be mixed
A and drink 4 is [ a , b ] != size of drink 4 => cannot be mixed

And then you may sort on the size of the drink array ascending ( to get the simplest drink first ) or descending ( to get the most expensive drink first ).

Martin_Schneider · October 6, 2020, 4:01pm

Okay, I’m understanding what you’re doing, Steeven! And it’s a nice way. But I’m looking for a way to handle all the stuff inside Mongo.

Example:

A:
“name”:“Barkeeper”,
“store”:[“vodka”,“gin”,“soda”,“syrup”,“cherry”,“martini”,“olive”,“lemon”,“bacardi”,“coke”,“apple juice”,“orange”,“tonic”]

B:
“name”:“vodka soda”
“ingredients”:[“vodka”,“soda”]

“name”:“martini”
“ingredients”:[“martini”,“olive”]

“name”:“bacari coke”
“ingredients”:[“bacardi”,“coke”,“lime”,“ice”]

i found an ugly, but working solution for the first step (return just perfect matchs):

var filter = [“vodka”,“gin”,“soda”,“syrup”,“cherry”,“martini”,“olive”,“lemon”,“bacardi”,“coke”,“apple juice”,“orange”,“tonic”]

db.b.fing({ingredients: {"$not": {"$elemMatch": {"$nin" : filter }}}})

This query wil only return “vodka soda” and “martini”, cause there is no ice.

For the second part i’m also looking for a solution likely the upper, with theoportunity to define, how many differences i’m allowing.
In case it would be 1, “bacardi coke” would also appear as match, cause there is just one item missing.

I know, it could be done “manualy” sorting them out and ranking them with a function like in your solution! But i don’t want to generate unnecessary traffic by reading out nearly all documents -there could be 1000+ matchs in many cases-, and serching on my own way in my server application for matching data and ranking them creating indexes and doing stuff thats normaly done by the DB.

But i’ll keep your solution in mind as it’s the olny one I have yet!

steevej · October 6, 2020, 4:22pm

When publishing sample data, I would recommend that you post them as real JSON document as it is much more simpler and faster for people trying to help to enter the data into their own database.

To match documents from one collection (A) to documents of the other one (B) you will need to use an aggregation pipeline. See https://docs.mongodb.com/manual/aggregation/
You would start by $match stage for the A document that gives you your source array. You then $lookup into B collection.
Bass

Martin_Schneider · October 7, 2020, 6:57pm

Hello!

I figured it out, its ugly… it works! Thanks a lot for the help!

testdata:
{ 
    "_id" : ObjectId("5f7c5d430d4d5883c0c4b278"), 
    "name" : "A", 
    "values" : ["x","c","b","o","m"]
}
{ 
    "_id" : ObjectId("5f7c5d630d4d5883c0c4b279"), 
    "name" : "B", 
    "values" : ["f","c","m"]
}
{ 
    "_id" : ObjectId("5f7c5d630d4d5883c0c4b27a"), 
    "name" : "C", 
    "values" : ["x","c","b","5"]
}

The code im running on it:

var filter = ["a","b","c","e","f","k","j","l","p","s","m","x","y"]

db.test.aggregate([

{$project:{id: "$_id",
name:"$name",
values: "$values",
originSize: {$size:"$values"}, 
common:{$setIntersection:[ filter,"$values"]}}},

{$project:{id: "$_id",
  name:"$name",
  values: "$values",
  common:"$common",
  originSize: {$size:"$values"}, 
  commonSize:{$size:{$setIntersection:[ filter,"$values"]}}}},
        
{$project:{ 
    id:"$_id", 
    name: "$name", 
    values:"$values", 
    originSize: "$originSize",
    common:"$common",
    commonSize: "$commonSize", 
    difference:{ $subtract:["$originSize","$commonSize"]}}},

{$project:{ 
    name: "$name", 
    values:"$values", 
    originSize: "$originSize",
    common:"$common", 
    commonSize: "$commonSize",
    difference:"$difference", 
    missing:{$setDifference:["$values","$common"]}}},
    
{$sort:{difference:1,originSize:-1}},

])

just one of the outputs:

{   "_id" : ObjectId("5f7c5d630d4d5883c0c4b27a"), 
    "name" : "C", 
    "values" : ["x","c","b","5"], 
    "originSize" : NumberInt(4), 
    "common" : ["b","c","x"], 
    "commonSize" : NumberInt(3), 
    "difference" : NumberInt(1), 
    "missing" : ["5"]
}

My question is: how can i store data generated inside the aggregate (var d= difference or something like this), so i don’t have to $project them every time, to use it later?

Is it possible to declare functions from these $project sequences end nesting them into each other?

Can i transfer data (result) from one to the next aggregate sequence?

system · October 12, 2020, 6:57pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.