Escaping html chars in collection

Hi folks, I have imported a large amount of data, and unfortunately there’s a ton of escape chars such as & and < the original dump did this and I have not checked before importing.

Is there any function that can be used to process all documents in the collection and unescape those back to regular characters?

Thanks

Hi @Vinicius_Carvalho,

I have only briefly tested this on a small collection containing 3 sample documents so if you believe it may work for your use case / environment then I would recommend testing thoroughly on a test / duplicated environment of what you have currently imported to verify it is correct.

As an example, I have the following documents containing & and <:

DB>db.collection.find({},{_id:0,text:1})
[
  { text: '&this is some& text<' },
  { text: 'text<123' },
  { text: 'this &text< 123' }
]

Running the below update against this collection:

DB>db.collection.updateMany( {},
[
  {
    '$set': {
      text: {
        '$replaceAll': { input: '$text', find: '&', replacement: '' }
      }
    }
  },
  {
    '$set': {
      text: {
        '$replaceAll': { input: '$text', find: '<', replacement: '' }
      }
    }
  }
])
{
  acknowledged: true,
  insertedId: null,
  matchedCount: 3,
  modifiedCount: 3,
  upsertedCount: 0
}

Documents in the same collection after the above update:

DB> db.collection.find({},{_id:0,text:1})
[
  { text: 'this is some text' },
  { text: 'text123' },
  { text: 'this text 123' }
]

If you find that this does not suit your use case, please provide the following:

  • Sample documents
  • MongoDB Version in use
  • What you have attempted so far
  • Expected output

Hope this helps.

Regards,
Jason

1 Like