How to remove all non-alphanumeric characters while projecting in MongoDB?

I want to remove all non-alphanumeric characters while using project in an aggregation. How do I do this?

{
  $project: {
    channel: { $removeAll: "$channel" }, // How?
  }
}

My data is

[
  {
    "channel": "abc def"
  },
  {
    "channel": "a..bc"
  },
  {
    "channel": " ad. edf "
  }
]

The result should be

[
  {
    "channel": "abcdef"
  },
  {
    "channel": "abc"
  },
  {
    "channel": "adedf"
  }
]

Hi @khat33b,

Depending on your use case, you may wish to consider filtering out special characters on the client side (if these fields require user input for example) before inserting the documents into the database. Alternatively, you can also consider filtering out the special characters on the client side after the documents are retrieved.

I want to remove all non-alphanumeric characters while using project in an aggregation. How do I do this?

Please see the below example below which should help you achieve a alphanumeric only version, called filteredString, of the channel field:

Split the individual characters within the channel field and add each character to the channelArray field using $map:

[primary] regexdb> stage1 =
{
  '$addFields': {
    channelArray: {
      '$map': {
        input: { '$range': [ 0, { '$strLenCP': '$channel' } ] },
        in: { '$substrCP': [ '$channel', '$$this', 1 ] }
      }
    }
  }
}

Using $filter to filter on channelArray and return back only elements that match the regex condition:

[primary] regexdb> stage2 =
{
  '$addFields': {
    filteredArray: {
      '$filter': {
        input: '$channelArray',
        cond: { '$regexMatch': { input: '$$this', regex: '[0-9a-zA-Z]' } }
      }
    }
  }
}

Using $reduce to combine the filteredArray back into a string field filteredString:

[primary] regexdb> stage3 =
{
  '$addFields': {
    filteredString: {
      '$reduce': {
        input: '$filteredArray',
        initialValue: '',
        in: { '$concat': [ '$$value', '$$this' ] }
      }
    }
  }
}

The example set of results using the above stages:

[primary] regexdb> db.regexcoll.aggregate([stage1,stage2,stage3])
[
  {
    _id: ObjectId("61baaac7ce53d17c6dba7c55"),
    channel: 'abc def',
    channelArray: [
      'a', 'b', 'c',
      ' ', 'd', 'e',
      'f'
    ],
    filteredArray: [ 'a', 'b', 'c', 'd', 'e', 'f' ],
    filteredString: 'abcdef'
  },
  {
    _id: ObjectId("61baaac7ce53d17c6dba7c56"),
    channel: 'a..bc',
    channelArray: [ 'a', '.', '.', 'b', 'c' ],
    filteredArray: [ 'a', 'b', 'c' ],
    filteredString: 'abc'
  },
  {
    _id: ObjectId("61baaac7ce53d17c6dba7c57"),
    channel: ' ad. edf ',
    channelArray: [
      ' ', 'a', 'd',
      '.', ' ', 'e',
      'd', 'f', ' '
    ],
    filteredArray: [ 'a', 'd', 'e', 'd', 'f' ],
    filteredString: 'adedf'
  },
  {
    _id: ObjectId("61bac5a7301ede56b4d1f975"),
    channel: 'abc',
    channelArray: [ 'a', 'b', 'c' ],
    filteredArray: [ 'a', 'b', 'c' ],
    filteredString: 'abc'
  }
]

Feel free to combine the 3 stages into a single stage as the example above was to highlight what was occurring at each stage in the pipeline and the resulting field values from each stage.

Hope this helps.

Regards,
Jason

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.