Roadmap Advice from Seniors

I am not new to mongodb although I have used sql server throughout my career in IT.

Mongo is great. however, we are now getting into bigger data - currently 8.6 million but will be 500M in a year. We cannot use mongo as-is (doing simple queries) and are preparing for the next step. I’ve been trying to evaluate query times and things like that to decide if Atlas Search is right for us. But that means learning Atlas Search which will take time. …seems like a lot just to decide if Atlas or Atlas search is right.

Using mongo
Setting up the ‘right’ mongodb
Weighing up Atlas Search vs using something else like Spinx
Setting up performant indexes including the use of wildcards
Reporting → should I be looking into Data Lakes

As you can see, I need a roadmap! I’ve been reading and reading. Some things are relevant and some things are not.

Would you please offer some suggestions for where to being learning in order to cover these topics and also where I can get advice, fast, when I need it.

I’ve bought the Mongo Aggregation Framework course but what other courses are more suitable for my situation?

Many thanks!

Hi @Daniel_Gillett,

Your ask for advice is overly broad but if you can provide more details on your use case we may have some suggestions. Some of these questions will be better as separate discussion topics with specific examples.

If you need more advanced text-based search with features like fuzzy matching and autocomplete, Atlas Search is nicely integrated with MongoDB Atlas. An example of your desired search features would be helpful context for any suggestions.

Setting up the ‘right’ mongodb

Do you mean choosing a deployment topology (use at least a replica set for production), version (start with the latest GA release), or something else?

Weighing up Atlas Search vs using something else like Spinx

If you are already using Atlas, Atlas Search is the obvious choice. Using Sphinx or another search solution will require finding (or creating) a connector to MongoDB. You can sync data using change streams, but I expect building a sync solution isn’t core to scaling your application.

Setting up performant indexes including the use of wildcards

If you need help optimising a query, I would start a new forum discussion topic with relevant details (server version, query, sample indexes, full explain output). Atlas Search has features for partial and fuzzy matching and uses different indexing strategies to improve performance (the underlying implementation is Apache Lucene).

Reporting → should I be looking into Data Lakes

That really depends on your use case, but if you are just getting started I would focus on your core understanding of MongoDB before adding too many other services or optimisations.

You can ask questions in community forums, but there are no guarantees of timely responses or that someone with the relevant expertise will see (and have time to answer) a more niche question. If you can start with shorter and more specific questions, I expect you are more likely to get faster responses. For example, this post is asking a half dozen questions which are a larger than average time commitment if someone wants to respond to all of them.

I would also focus on creating clear titles for your discussion topics as that is the main context another community member sees before looking into a discussion. For example, this topic might have been more appealing as “Advice on scaling from 6.5 million to 500 million documents”.

If you want predictable support with SLAs for a business use case, I would look into Atlas Support Plans or Flex Consulting.

Since you are relatively new to MongoDB but want to scale a production use case quickly, I highly recommend getting professional advice or training. You can certainly take the time to learn everything yourself, but a consultant can provide more holistic advice about your use case and future needs than is practical in community forums. Discussions in community forums are better suited to specific questions that do not require context of your roadmap or overall plans.

MongoDB University courses are free. Are you referring to taking M121: The MongoDB Aggregation Framework or a course from another provider?

There are some course recommendations on the DBA Learning Path and Developer Learning Path, but it sounds like your questions are broader.

I suggest having a look through the white papers and presentations available in MongoDB Resources. White papers are likely more relevant, but if you provide more details on your situational context that would help narrow down the suggestions.

Taking a few guesses at white papers that may be of interest:

Hope that helps!

Regards,
Stennie

1 Like

Hi Stennie!

Thank you so much for all of the time you put into answering my ‘broad’ question(s) in this forum. I will try and do better as I go along. :wink:

I suppose, broadly speaking, I need to figure out where I can get the right kind of support - the kind of support from the mongo forum and perhaps also via a mongo support plan option as well. I did speak to a representaive about the Flex Consulting but as I am pretty much a sole developer, there is no budget for that. My feeling is that it would be very conforting to have a mongo support engineer be aware of what I am doing and offer insights and consensus - the pont being that we need to get this right.

Atlas Search is something I am looking at, although the deployment we are going with is on AWS. Does that mean that I need to dive into clusters and sharding?

I will create a new forum post/topic to ask specific questions relating to querying on the date field, as my day unfolds. I was given some sql queries which I created in mongo and compared the execution times to compare - mySql and mongo. …although I their current mySql implementation is starting to take too long …ergo the comparison is between mongo and spinx, but I am not sure that is a fair evaluation and so the topic of Atlas Search, or having a Data Lake, etc has come up to be considered… I am very new to this project so I am also learning the data/platform and seeking the best possible setup with mongo or mongo tools.

Would it help to say that we want to run ‘reports’ consisting of 20+ queries? In my experience with ms sql server, we would not run this on the live data - but I come from a time when Data Warehouses were used for this and often report data was a copy from the day before so we did not tax the life database. …but times change and we didn’t have the cloud or mongo, etc. back then. We now want to report on ‘live’ data but also keep the system from being heavily taxed.

Clustering, Replication, and Sharding have come up recently which is why you see my learning activity on the mongo university. I’m just about to get into this topic (hopefully today). If this is where the solution is then I hope to come across it today. However, I will try and post/ask to more appropriate categories.

BTW the course I referred to earlier (in additoin to all of the mongo university courses) is from coursera .org. This course I paid for before I found the ones directly from yourselves - although, this coursera site says it is a MongoDB University course (MongoDB Aggregation Framework, by MongoDB Inc.) Please let me know if you want a direct link. I compared this course to the other courses and the content is quite different, so I have decided that I will attempt this course after I have finished all of the courses directly from yourselves. :wink:

Thank you very much for resource links you shared. I will start looking into them today also.

Best wishes,
Daniel

Hi Daniel,

MongoDB team members are active in the Community Forums but we cannot offer any SLAs or guarantees of response and there are limitations in terms of what can easily be discussed in a public forum. Community channels are a helpful starting point for advice as long as you are prepared to possibly wait for folks with the desired expertise to provide feedback.

If you are deploying MongoDB in your own AWS instances (i.e. self-hosted and not using Atlas), you will also be responsible for operational aspects like security, backup, monitoring, and scaling. That also precludes the use of Atlas services like Atlas Search, Charts, Data Lake, or Realm.

For an on-premises production deployment you should start with a replica set (three data-bearing mongod processes). Sharding may be useful for scaling later, but is a much bigger investment in infrastructure and admin that you should grow into based on need.

I would give careful consideration to all the operational tasks you will have to learn for an on-premises deployment versus a managed deployment on Atlas. Your team (which may be a team of 1 to start with) will also have to split attention between development and admin for an on-premises install.

Unfortunately, no. The number of queries is not meaningful without context on the nature of the queries, especially if you are after recommendations on search solutions.

For suggestions on your most common queries you could start a discussion topic including as much known/applicable detail as possible for someone to provide suggestions:

  • some sample documents to test with
  • desired output
  • version of MongoDB server/driver
  • what you’ve tried (eg output of query explain with allPlansExecution verbosity)

If you have special usage patterns (for example, reporting or analytics workloads that are vastly different from your main application use case), you can isolate those within your deployment using a hidden secondary.

There are other options for Workload isolation in MongoDB deployments including replica set tags and zone sharding, but I would look into these after you have your base deployment configured and tuned.

The few Coursera courses from MongoDB Inc should also be free. There is some different material compared to M121: The MongoDB Aggregation Framework at MongoDB University (particularly on machine learning), but M121 has been more recently updated.

I also recommend reading @Paul_Done’s free Practical Aggregations Book which has some great insights & examples.

Regards,
Stennie

Hi Stennie,

Thanks again for your clarifications. They are a big help.

I just wanted to take a second to show you my purchase from coursera, as it was not free. I don’t know if that matters but you said courses on courseara from MongoDB Inc should be free.

image

There are so many good things I’m learning from your mongodb university ‘free’ courses that I may take the refund whilst I still have a chance.

Best wishes,
Daniel

Hi Daniel,

I rechecked the Coursera information: the Aggregation course is free if you choose “Audit Only” but a fee applies if you want to earn a completion certificate.

Apologies for any confusion there! I recall Coursera used to offer completion certificates for free as well, but that no longer seems to be the case.

Regards,
Stennie

1 Like