Query (equivalente to left join == null)

Thiago_Bernardes1 · January 28, 2023, 11:13am

Hi Guys,

Let’s say I have two collections:

User Entries: (user Id, entry Id)
Community Entries (entry Id)

I want to run a batch to populate the User Entries with, let’s say, the first 50 random data from the Community Entries. However, on the following runs, I would like to query and add only entries that haven’t been added to the User Entries.

In SQL, I used to create a left join, and set the other collection field == to null, which I would get only the remaining item. What would be the equivalent in MongoDB?

Thiago_Bernardes1 · January 28, 2023, 11:18pm

I was wondering about using $nin and the array of IDs already set to the user, but my concern is that this array will grow daily, and the $nin is going to be pretty big.

Yilmaz_Durmaz · January 29, 2023, 12:19pm

that is a problem of schema design: embedding or referencing.

if your documents are, and will stay, small enough, then prefer embedding all related data in one place.

one or more data fields are big in size or grow over time, give them their own document and put references in others.

this referencing is similar to relations in SQL, in this case the “left join” you refer. and the equivalent is a “$lookup” operation in an “aggregation” pipeline. (please search these keywords).

Thiago_Bernardes1 · January 30, 2023, 6:28am

Hi Yilmaz,

As mentioned, the collection will grow a lot - I understand that the NoSQL DB can handle embedded documents; however, you need to be very careful when using it.

When you have lots of transactions like reading and writing the embedded array, the document itself will grow drastically. In this scenario, the embedded document is not a good approach for performance.

Despite being a non-relational database, it costs much less if you add those references in another collection and need to access it often, runs consolidations, etc.

Find below the query that I have created and works exactly life the “left join == null”:



	const $match = {
		userId: {
			$ne: userId
		},
		isBlocked: false
	}

	const promptsCollection = this.getCollectionName('promo-tbc-prompt-entries');

	const $lookupPrompts = {
		from: promptsCollection,
		let: {
			promptUserId: '$userId',
			promptId: '$_id'
		},
		pipeline: [
			{
				$match: {
					$expr: {
						$and: [
							{ $eq: ['$promptUserId', '$$promptUserId'] },
							{ $eq: ['$promptId', '$$promptId'] },
						]
					}
				}
			}
		],
		as: 'prompts'
	}

	const $project = {
		_id: 0,
		id: '$_id',
		userId: 1,
	}

	return this.aggregate()
		.match($match)
		.lookup($lookupPrompts)
		.append({
			$set: {
				hasEntries: { '$ne': [{ '$size': '$prompts' }, 0] }
			}
		})
		.match({
			hasEntries: false
		})
		.project($project)
		.sample(4)
		;

I had to append a new column called “hasEntries” and retrieve the size of its prompts, and next I had to add another match to return only those that had size Zero.

Maybe it’s not the best performance yet, however this saves the document grow with embedded arrays.

Yilmaz_Durmaz · January 30, 2023, 9:07am

Yep, there are many ways to do the same thing in MongoDB, so it is possible to have a better and faster query.

By the way, you wrote “populate the User Entries with, let’s say, the first 50 random data from the Community Entries”, so you have a size limit on this array. It is just not clear when you are populating it; always or only when you read!? this may also change how you store data.

If you are still developing, I believe you may have a different angle on the problem if you can state it in SQL as you used to have. MongoDB, in the referencing sense, can be a mirror to SQL. And then, explore how embedding may change that behavior.

As you may appreciate, it is not always easy to understand what is asked. If you can provide some example data and your SQL query, Forum readers may find another solution to your problem which, in this case, requires you un-mark “solution” from your above post.

Thiago_Bernardes1 · January 30, 2023, 9:37am

Thanks Yilmaz,

No worries, I’ve just tried to summarize the scenario as changing the data structure is not an option.

There is a reason for the data to be structured in that way, and I was just looking for this similar approach to SQL.

The challenge I’ve been always facing is when doing a lookup for an exact match 1:1, if I unwind the data that specific field is null/undefined, however I couldn’t find a way to match this scenario neither $eq: [prop, null] nor $eq: [prop, undefined]

The way I’ve found is to leave it as an array, without unwinding, and match with the length of the array which would be zero or 1.

Thanks for taking time on it, I really appreciated your suggestion, I was just trying to share something that I hear very often by developers, trying to embed everything in the document, and in a long term find performance issues.

system · February 4, 2023, 9:38am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.