How to loop over data in a field during aggregation

I have the following code that works fine at finding the first occurrence of within the response field. How do I get it to loop and add each of the values found into the new field?

/** extractResponseFields - extracts the tag from the response field into a new field
 * response will look like:
 *   "response": "Let me get someone to talk to you about your account.<metric name=\"Missing_accounts\" 
 *               stage=\"Handover_to_agent_IB\" /> <metric name=\"Speak_to_agent_DB\" stage=\"entrypointIB\" />
 *               <pause wait=2000 /><metric name=\"Speak to agent_DB\" stage=\"context_set_IB\" /> 
 *               <pause wait=2000 />I'm handing you over to one of my team who'll be happy to help. 
 *               They'll be with you as soon as they can.<metric name=\"DB_Speak_to_agent\" 
 *               stage=\"in_hours_handover_IB\" /> <pause wait=2000 /><handover skill=\"DS_Banking_IB\" />"
 * 
 * So we need to extract and store each of the values of the tag in a new field e.g.
 *   metricName: [ "Missing_accounts", "Speak_to_agent_DB", "Speak_to_agent_DB", "DB_Speak_to_agent" ]
 *   stage: [ "Handover_to_agent_IB", "entrypointIB", "context_set_IB", "in_hours_handover_IB" ]
 *   handoverSkill: [ "DS_General_IB" ]
 * 
 * 1. match all records than have '<tag>' in the response field
 * 2. use the substr command to find the start/end of the tag data
 * 3. create an additional field to hold the tag data
 * 4. merge the new field data into the original collection
 * @param {*} tag - tag to find in response field.
 * @param {*} fieldName - name of field to create.
 * @returns {Object} A distinct list of the values for this field sorted alphabetically.
 * @returns {Integer} The count of items in the list.
 */
const extractResponseFields = async (tag, fieldName) => {
  console.log('extractResponseFields:', tag, fieldName, tag.length)
  const hs = await Conversations.aggregate([
    { // stage 1 - match all records than have '<tag>' in the response field
      $match: {
        response: {
          $regex: tag
        }
      }
    },
    { // stage 2 - use the substr command to find the start/end of the tag data
      $project: {
        _id: 1,
        tagData: {
          $let: {
            vars: {
              startIndex: { $add: [{ $indexOfBytes: ['$response', tag] }, { $add: [ tag.length, 2]} ] }
            },
            in: {
              $substr: ['$response',
                '$$startIndex',
                { $subtract: [{ $indexOfBytes: ['$response', '\"', '$$startIndex'] }, '$$startIndex'] }]
            }
          }
        }
      }
    },
    { // stage 3 - create an additional field to hold the tag data
      $set: {
        [fieldName]: '$tagData'
      }
    },
    { // stage 4 - merge the new field data into the original collection
      $merge: { into: { db: 'chops', coll: 'conversations' }, on: '_id', whenMatched: 'merge', whenNotMatched: 'insert' }
    }
  ])
  //return [hs, hs.length]
  console.log(hs, hs.length)
  // return a distinct collection of the newly created field
  const results = await Conversations.distinct(fieldName).sort()
  return [results, results.length]
}

Hello @Ryan_Moore, Welcome to the MongoDB Community forum!

In general, you can loop (can call ‘iterate’) over some types of data. Two obvious types are the array data and the string data. The field reponse is a string and you can “split” it - and the split operation returns an array. For example, if you split a string “Hello-welcome-goodbye”, by “-” (this is a delimiter), you get an array of strings: [ 'Hello', 'welcome', 'goodbye' ].

Now that we have an array of strings, you can easily loop over it using a for-loop, and do something with each element of the array. For example:, in JavaScript (in mongosh):

var words = "Hello-welcome-goodbye"
var array = words.split("-")    // [ 'Hello', 'welcome', 'goodbye'  ]
for (str of array) {
    console.log(str.length)
}

This will print, 5, 7 and 7.

This functionality can also be achieved using the aggregation query. There are various operators which can be applied on array and string data types - and these can be applied in your query.

Solved it - this is the updated function…

/** extractResponseFields - extracts the tag from the response field into a new field
 * response will look like:
 *   "response": "Let me get someone to talk to you about your account.<metric name=\"Missing_accounts\"
 *               stage=\"Handover_to_agent_IB\" /> <metric name=\"Speak_to_agent_DB\" stage=\"entrypointIB\" />
 *               <pause wait=2000 /><metric name=\"Speak to agent_DB\" stage=\"context_set_IB\" />
 *               <pause wait=2000 />I'm handing you over to one of my team who'll be happy to help.
 *               They'll be with you as soon as they can.<metric name=\"DB_Speak_to_agent\"
 *               stage=\"in_hours_handover_IB\" /> <pause wait=2000 /><handover skill=\"DS_Banking_IB\" />"
 *
 * So we need to extract and store each of the values of the tag in a new field e.g.
 *   metrics: [ "Missing_accounts", "Speak_to_agent_DB", "Speak_to_agent_DB", "DB_Speak_to_agent" ]
 *   stages: [ "Handover_to_agent_IB", "entrypointIB", "context_set_IB", "in_hours_handover_IB" ]
 *   handoverSkills: [ "DS_General_IB" ]
 *
 * 1. match all records than have '<tag>' in the response field - pass in like stage="
 * 2. create an additional field to hold the tag data
 * 3. merge the new field data into the original collection
 * @param {*} tag - tag to find in response field.
 * @param {*} fieldName - name of field to create.
 * @returns {Object} A distinct list of the values for this field sorted alphabetically.
 * @returns {Integer} The count of items in the list.
 */
const extractResponseFields = async (tag, fieldName) => {
  console.log('extractResponseFields:', tag, fieldName, tag.length)
  const hs = await Conversations.aggregate([
    { // stage 1 - match all records than have '<tag>' in the response field
      $match: {
        response: {
          $regex: tag
        }
      }
    },
    { // stage 2 - create an additional field to hold the tag data
      $set: {
        [fieldName]:
          {
            $function:
            {
              body: function (response, tag) {
                // split the function by the tag - creates an array with
                const arr = response.split(tag)
                const tags = []
                for (let i = 1; i < arr.length; i++) { // ignore first element as it is the text preceding tag
                  // take the text up until the next "
                  tags.push(arr[i].split('"')[0])
                }
                return tags
              },
              args: ['$response', tag], // $var would come from collection, var is the variable passed
              lang: 'js'
            }
          }
      }
    },
    { // stage 3 - merge the new field data into the original collection
      $merge: { into: { db: 'chops', coll: 'conversations' }, on: '_id', whenMatched: 'merge', whenNotMatched: 'insert' }
    }
  ])
  // return [hs, hs.length]
  console.log(hs, hs.length)
  // return a distinct collection of the newly created field
  const results = await Conversations.distinct(fieldName).sort()
  return [results, results.length]
}

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.