Mongodb Alphabetical Sorting

Hello,
Currently I am facing a issue in sorting my database. I have a database that contains 8k documents with a title field in it. I want to sort data in alphabetical order i.e. title with first letter A comes first, then so on to Z. But I am getting documents that starts with symbols in the beginning. like #facebook,&google etc. Please provide me the solution on how to achive this usecase using mongodb queries and that should be fast too.

Hi @Ashutosh_Mishra1 and welcome in the MongoDB Community :muscle: !

When you want to alter the sorting algorithm / behaviour, you use collation. I’m not sure if there is actually a solution for exactly your use case, but if it exists, it’s in here.

Cheers,
Maxime.

I couldn’t find any… Please someone help me to provide exact solution.

What do you want to do with those special titles?

You wan to keep them and move them at the end of the sort or completely remove them from your result set?

1 Like

It sounds like you want to strip any leading special character and you would have to do that in the pipeline before you performed your sort.

As for how things are currently, the sort is working exactly as it’s supposed to. The results are being sorted in the proper lexicographical order.

I don’t have time to test right now, but you should be able to reshape the data to strip out the special character using a regex pattern. One other thing to think about is that by default MongoDB will sort all upper case values before sorting in lower case values. That means that Google would come before facebook. If you have mixed case data, that where collations would come in to play.

1 Like

Actually if you just want to remove a certain set of characters from the start/end of a string, take a look at the $trim operator. I was able to use that to do what I believe you want to do. It’s easy enough to do so I will let you work through the solution to build up skill and retention. You will still need to set a collation for your aggregate function to sort properly if you have mixed case entries.

2 Likes

Suppose I have 10 documents which are, Cat ,Grapes, #zebra,#monkey, Apple, Bat, !snake, &house,110test etc.
I want to sort these documents alphabetically in such a way like, Apple, Bat, Cat, Grapes (ignoring special characters, or keeping them in last). But by doing normal alphabetical sorting I am getting, !snake,#zebra,#monkey, &house ,110test , Apple, Bat, Cat, Grapes etc. Actually My database consists of about 800k startups database in which some companies have their names begiining with special characters, but I want to get that data in Alphabetical order ignoring such startups with special characters.

  • Can you show how you would like the results of the sort to appear?
  • Do you want the displayed results to show the special character?
  • Have you tried using the $trim function I mentioned above? If so what results did you get?
  • Do you have any sample code from what you attempted?
2 Likes

Unfortunately you’re not showing the full query, but from what you do show, I see you’re running a .find() command. That is not going to work, and is why I keep suggesting to look at the aggregation framework (.aggregate() command) and the $trim operator. I can’t tell from your screen shot if all companies are proper-cased or not, but I will assume you will want to add the collation option as well just to make sure that A and a get sorted together instead of having a after Z.

Play around with the above suggestions for a bit and if you’re still not got it I will provide a sample query after I get some sleep.

3 Likes

Hi @Ashutosh_Mishra1,

As @Doug_Duncan mentioned, aggregation would be a straightforward way to dynamically calculate a field for custom sort orders but I would definitely add a pre-calculated field to your documents with an appropriate Index to Support Sorted Query Results. You want to avoid the performance overhead and limitations of an in-memory sort.

I set up some test data:

db.company.insert([
	{ InvestorName: '#Angels'},
	{ InvestorName: '&Vest'},
	{ InvestorName: '+Impact'},
	{ InvestorName: '0 Ventures'},
	{ InvestorName: 'A Plus Finance'}
])

I used $trim to remove ignorable characters and $toLower to make a case-insensitive pre-calculated sort field:

db.company.aggregate([

	// This field could also be calculated in client code
	// when documents are inserted or updated

	{ $addFields: {
		sortOrder: {
			$toLower: {
				$trim: {
					input: "$InvestorName",
					chars: "#&*+1234567890 "
				}
			}
		}
	}},

	// Update collection with calculated sortOrder field
	{ $out: "company" }
])

Now I can get the expected order using my pre-calculated custom sortOrder field (and fast, with an appropriate index):

db.company.find({}, {_id:0}).sort({sortOrder:1})	
[
  { InvestorName: 'A Plus Finance', sortOrder: 'a plus finance' },
  { InvestorName: '#Angels', sortOrder: 'angels' },
  { InvestorName: '+Impact', sortOrder: 'impact' },
  { InvestorName: '0 Ventures', sortOrder: 'ventures' },
  { InvestorName: '&Vest', sortOrder: 'vest' }
]

The sortOrder field could be projected out, but I included it here to show the outcome of my earlier calculation.

Regards,
Stennie

3 Likes

Hey this looks cool, I will try implement. Thanks! Just one last thing, The characters you mentioned i.e. chars: “#&*+1234567890” are the only characters which needs to be ignored according to mongodb rules or you used this only for testing!

Hi @Ashutosh_Mishra1,

I just used those as an example for testing since you provided screenshots rather than sample documents or expected output based on the input.

Your second screenshot doesn’t show where the initial documents would land (i.e. #Angels listed before or after Angels, 0 Ventures vs Ventures, etc), but you can adjust the calculation to achieve your desired sort order.

Regards,
Stennie

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.