Multi-language data

Hey,
I’m working with multi language text and curently I save it like this:

    {
    	"code": "01S",
    	"name": [{
    		"content": "España",
    		"language": "CAS"
    	},{
    		"content": "Spain",
    		"language": "ENG"
    	}],
    	"zones": [{
    		"zoneCode": 4,
    		"name": [{
    			"content": "Madrid",
    			"language": "CAS"
    		},{
    			"content":"Madrid",
    			"language":"ENG"
    		}],
    		"description": [{
    			"content": "Madrid ciudad capital",
    			"language": "CAS"
    		}, {
    			"content":"Madrid capital city",
    			"language": "ENG"
    		}]
    	}, {
    		"zoneCode": 7,
    		...
    	}]
    }

This data is not going to be updated frequently, but I need fast search and I’m not sure how to implement indexes here.

My result from the query should be specified for the requeried language like this example:

    {
    	"code": "01S",
    	"name": "España",
    	"zones": [{
    		"zoneCode": 4,
    		"name": "Madrid",
    		"description": "Madrid ciudad capital"
    	}, {
    		"zoneCode": 7,
    		...
    	}]
    }

I don´t know what is best way to do it, so I appreaciate every suggestion.

In order to know how to search fast we would have to know what you will be searching for.

How will you be querying these documents? You show how you want the document returned but transforming the document for return is quick, it’s identifying which document matches your query filter that can be slow without the right index…

Asya

1 Like

Hey thanks for the answer.

I would do a simple include text search for a specified language. For example I may search /adri/i and it have to find “Madrid” ignoring case. Something like this:

    .find({
        name: {
          $elemMatch: {
            language,
            content: {$regex: new RegExp(query, "i")}
          }
        }
    });

Also, I can search by zone name with the same approach. And project the country name and only the found zones.

By the way, I’m not sure why my code is not colored. I’m using preformatted text

Hi @dimitar_vasilev,

For code formatting you need to surround your blocks with triple backticks (```) similar to GitHub. You can optionally specify a language to use, but the default language detection is usually fine.

For more tips see: Formatting code and log snippets in posts.

Regards,
Stennie

If you want to search by fields X and Y you need a compound index on X and Y. So if you search on name language and content, I would suggest an index on {"name.language":1, "name.content":1}. If you also plan to search on zones then the index should be {"zones.name.language":1, "zones.name.content":1}

What portion of the document you return is a separate thing from what index would efficiently support finding the matching document(s). As of version 4.4 projection in find supports all the aggregation expressions so it should be possible to make any transformations you want on the fields you want to return - that’s done in memory and compared to find documents is very fast.

Asya