Proposal for Enhancing Collection Initialization Control in MongoDB Insertions

Hello MongoDB Community,

I’m reaching out to discuss and propose an enhancement for MongoDB, specifically targeting the way we handle collection creation during document insertion processes. As we navigate through the development and maintenance of MongoDB applications, one aspect that often comes up is the automatic creation of collections when new documents are inserted. This feature, while convenient in many scenarios, can sometimes obscure certain operational nuances, particularly when dealing with specialized collections such as timeseries.

The core of the proposal is the introduction of a new parameter within the insertion command options (e.g., require_existing_collection on the optional arguments to a MongoDB insert operation) that would disable the automatic creation of collections upon the insertion of new documents. The motivation behind this suggestion stems from the need to ensure that collections are initialized properly before use, especially in cases where specific indexes, metaFields, timeFields, and other configurations are crucial for the collection’s operation.

Currently, without this parameter, developers might find themselves needing to verify the existence of a collection before each insertion or maintaining this information in a cache. This approach, however, is prone to errors and inconsistencies, particularly if changes are made to the collection outside the current operational context. For timeseries collections, which require a different initialization process than regular collections, the ability to explicitly control when and how a collection is created becomes even more important.

Implementing such a parameter would offer several benefits:

  1. Improved Initialization Control: It would allow developers to detect insert failures due to non-existent collections, prompting a proper initialization sequence that includes necessary index creation and other setup procedures.
  2. Enhanced Efficiency: By avoiding the need to check for a collection’s existence or maintain a potentially inconsistent cache, operations can be streamlined, focusing on handling exceptions meaningfully.
  3. Consistency in Timeseries Collections: For timeseries collections, which demand specific initialization steps, this control is invaluable. Ensuring that collections are set up correctly before any data insertion can significantly impact performance and functionality.

This proposal does not aim to detract from the flexibility and ease of use that MongoDB offers. Instead, it seeks to augment the database’s capability to handle more complex scenarios where collection initialization plays a critical role in the application’s performance and reliability.

I am eager to hear your thoughts on this proposal. Your feedback, insights, and any further suggestions would be immensely valuable in assessing the feasibility and potential impact of this enhancement on MongoDB’s future versions.

Thank you for considering this proposal, and I look forward to our fruitful discussions.

2 Likes

Havign permissions to insert into a collection implicity gives the ability to create it. My thoughts on this would be to seperate out createCollection from insert as a permission so explicit creation would fail for a correctly scoped application user. The advantage here is you don’t need to change code everywhere you use insert and you can call out that creating collections, like creating indexes shoudl be an explicit task in produciton.

1 Like

Thank you for your thoughtful response and suggestion to separate createCollection permissions from insert permissions. This approach indeed introduces a clear separation of concerns and enhances security by ensuring that collection creation is an explicit action, particularly valuable in production environments.

However, while this solution offers significant benefits in terms of security and operational clarity, it may not align well with the requirements of all application contexts, especially those that are dynamic and user-driven, such as in IoT applications that I am currently developing. In these environments, the ability to create collections dynamically in response to user interactions is crucial for flexibility and scalability.

The main challenge with the proposed permission separation approach in such dynamic environments includes:

  1. Loss of Granularity: The need to manage multiple users or roles (one for insertion and another for collection creation) introduces complexity, especially when the application logic requires the dynamic creation of collections. This granularity is essential for IoT backends, where users might need to create their own “buckets” of information on-the-fly.
  2. Operational Complexity: Implementing separate permissions complicates the application’s backend logic, requiring additional logic to handle permission errors and switch between users or roles depending on the operation. This increases the risk of bugs and makes the system harder to maintain.
  3. Reduced Flexibility: For applications that thrive on being able to dynamically adjust to user needs by creating new collections with specific configurations (e.g., timeseries collections with particular indexes), the need to predefine and explicitly create these collections before insertion can hinder responsiveness and agility.
  4. Increased Development Overhead : The requirement to detect insertion failures due to permission errors, followed by initiating collection creation with a different user, introduces significant complexity into the development process. This approach necessitates establishing a new communication session with the database, handling authentication for a different user or role, and managing the context switch between different operations. Such steps not only complicate the codebase but also impact the developer experience negatively. This added complexity can lead to a more cumbersome development process, detracting from the efficiency of error handling and recovery mechanisms, and resulting in additional overhead for maintaining and debugging the application.

Given these considerations, while I appreciate the security and clarity benefits of the suggested approach, I propose maintaining the flexibility to control collection creation through the insertion process itself. Introducing an optional parameter (e.g., require_existing_collection) could provide a more nuanced control, allowing developers to specify when collections should be created dynamically and when they should exist beforehand. This would offer a balanced solution that accommodates both the need for explicit collection management in production and the flexibility required in dynamic, user-centric applications.

I believe a solution that offers flexibility while still enabling strict control when needed could be more beneficial across a wider range of MongoDB use cases.

The issue of automatic creation of databases and collections in MongoDB is a recurring concern within the community, as evidenced by numerous discussions and questions across various forums:

Any other thoughts on this topic?

I’ve only just found this thread, so I help it’s not too late. I’ve found myself banging my head against a keyboard on multiple occasions when I’ve forgotten to change database before executing an insert, only to realise an hour later why my change isn’t working as expected.

We have existing processes to manage our indexes/overall schema, so adding an option require_existing_collection to thousands of operations in code could take a significant amount of effort and not address the mode by which most mistakes happen; making manual changes on the command-line.

Could the proposed createCollection workflow could be made a config-level behaviour? From my perspective, the “challenges” you list above are all good things for deployments with well established collection formats and management mechanisms.