Which one is faster? Pointer List Object VS Filters

Kira_Resari · November 16, 2020, 10:18am

I have to decide about how to best save Data Objects for different Workspaces in an application in a database in the most performant way. For the sake of this example, let us assume that there are 1000 DataObjects, and 10 of them are in a Workspace. It is a given that I have a DataObject collection containing all DataObjects.

I can think of two different approaches:

Creating an additional WorkspaceObject collection that saves a WorkspaceObject containing pointers to the DataObjects contained within that Workspace.
Adding the WorkspaceId to the DataObject.

In the first scenario, I would first look up the WorkspaceObject by its WorkspaceId , and then look up the DataObjects contained within their IDs. That would mean 11 database calls, each of which target an object by its _id.

In the second scenario, I would look up the DataObjects using a filter for the WorkspaceId. That would be only one database call, but without any _id.

So here’s the question: Which one of these two options is faster? I know that generally fewer database calls are better, so I lean towards the second option, but I don’t know if using filters makes it slower again, and how those two scenarios compare.

Prasad_Saya · November 16, 2020, 1:27pm

Hello @Kira_Resari,

You have two entities workspace and dataobject, with One-to-Many relationship - that is a workspace has many data objects.

In the first scenario, the workspace has the datobject references embedded within it. The query would be a $lookup aggregation which is a singe call but accessess two collectons on the server. The query retrieves the information from the workspace and details from the related data objects.

You can also consider storing some information related to the data objects within the workspace - instead of just the _id. You can include additional info which doesn’t change often and is queried along with the workspace often. This way, you can get workspace and related dataobject information with a simple query - access one collection only. For example, the workspace document can be like this:

{
  _id: 12, 
  workspaceName: "string", 
  dataObjects: [ 
      { _id: 34, name: "string", anotherFld: "string" },
      {...},
      ...
  ]
 }

Note that there is some data duplication, in this case - the "dataObjects.name" is stored in the dataobject collection also.

In the second scenario, having the workspace _id reference in the dataobject lets you query with the workspace’s _id. This is also a single call to the server, but accesses the dataobject collection only. An index on the workspace’s _id within the dataobject collection will help the query perform faster.

NOTES:

When you make a query from a client application like mongo shell, Compass or a program using your favorite language (like, Java, Python, etc.), the query is sent to the MongoDB Server as a single call. The query gets executed on the server. When the query is executed, it may access one or more collections - once or multiple times to access its documents on the server itself.

In general, accessing one collection instead of two collections with a single query, performs better. Also, the query will be simpler.

system · November 21, 2020, 1:28pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.