Garbled text from context.http.get as source of AWS SES emails

Hello, I’ve using Realm to send AWS SES emails for 3 years now, from everything from user auth to newsletters. Starting on April 13th 2023 (or a few days before) all emails have garbled characters appearing everywhere, many strange characters appearing instead of accented characters, but sometimes the garbled text appears on spaces and common characters too!

I’ve spent quite a lot of time debugging and narrowed it down at “possibly” the context.http.get method changed somehow, and now has problems with utf8 charset.

I’d like to reassure that the error appeared without any changes on the Realm dependencies and no changes at all on our server. The only coincidence is that you upgraded the Atlas Shared clusters to MongoDB 6.0.5.

I will share some screenshots to illustrate the using, and also a working code below.

Before:
Screenshot 2023-04-19 at 09.48.53

After:
Screenshot 2023-04-19 at 09.48.43

from this HTML email: The world in you - typo/graphic posters

Before (error on spaces):
Screenshot 2023-04-19 at 09.48.31

After (error before spaces):
Screenshot 2023-04-19 at 09.47.57

Before: (error before spaces):
Screenshot 2023-04-19 at 09.48.28

After: (error before spaces):
Screenshot 2023-04-19 at 09.48.20

I’m using AWS SDK v3 as dependency, but tried with v2 just now and the error is the same.
Tried with @aws-sdk/client-sesv2 3.261.0, 3.267.0, 3.67.0, neither work.
Tried with aws-sdk 2.1360.0, too.

**What could be the cause? Realm Functions downgraded somehow? **
Is there a charset issue on context.http now?

Thanks!

Here the code:

exports = async function() {
  const from = 'FROM_EMAIL'
  const to = 'TO_EMAIL'

  // get email HTML  
  const emailHtml = await context.http.get({
    url: 'https://www.typographicposters.com/newsletters/the-world-in-you'
  })
  const body = emailHtml.body.text()
  
  // send email
  const { SESv2Client, SendEmailCommand } = require('@aws-sdk/client-sesv2')
  
  const client = new SESv2Client({
    region: 'us-east-1',
    credentials: {
      accessKeyId: context.values.get('AwsSesKey'),
      secretAccessKey: context.values.get('AwsSesSecret'),
    },
  })
  const command = new SendEmailCommand({
    FromEmailAddress: from,
    Destination: {
      ToAddresses: [to],
    },
    Content: {
      Simple: {
        Subject: {
          Data: 'Test from Realm',
          Charset: 'UTF-8',
        },
        Body: {
          Html: {
            Data: body,
            Charset: 'UTF-8', // tried with ISO-8859-1, the garbled characters just change
          },
        },
      },
    },
  })
  await client.send(command)


 // for AWS SDK v2 here is the code
 // const Ses = require("aws-sdk/clients/ses");
  
 // const client = new Ses({
 //   region: 'us-east-1',
 //   credentials: {
 //     accessKeyId: context.values.get('AwsSesKey'),
 //     secretAccessKey: context.values.get('AwsSesSecret'),
 //   },
 //  })
 //  const params = {
 //   Source: from,
 //   Destination: { ToAddresses: [to] },
 //   Message: {
 //     Body: {
 //       Html: {
 //        Charset: "UTF-8",
 //        Data: body
 //       }
 //      },
 //      Subject: {
 //       Charset: 'UTF-8',
 //       Data: 'Test from Realm with AWS SDK v2'
 //     }
 //   }
 // }
 // await client.sendEmail(params).promise(); 
  
};

Did you confirm the jumbled text is not generated directly from the HTML response before the email is sent?

Is the email jumbled for everyone who receives it, or did you only test with one recipient?

Yes, I did. The source HTML is fine.

In fact I noticed the error from one internal admin email which is sent regularly. Was always fine, and then suddenly all emails have garbled text.

Yes, I did test with different recipients too.

Tested with a completely different source, from another server:
https://www.eartheclipsed.com/newsletters/new-episode-available-today

Before:
Screenshot 2023-04-20 at 07.59.34

After:
Screenshot 2023-04-20 at 07.59.37

And another, completely different source too:
https://www.sp-arte.com/newsletters/casa-sp-arte-2/

Before:

After:

I really drained by options, as it’s not the AWS SES, nor it’s not the HTML sources, the only think in the middle is the context.http.get and the App Services Function itself.

So that’s the reason of my guess:
did App Services changed recently? Is it using a different charset?

So if the HTML looks fine immediately after context.http.get what makes you think it is the culprit?
Did you actually do a console.log right after this statement?

Yes, here are the results:

The console.log is indeed fine when looking at the Realm UI:

But very curious is this test, sending that paragraph inline to SES:

The email gets received correctly!

And again, getting going back to the case, if I get the HTML from here:

const emailHtml = await context.http.get({
    url: 'https://www.typographicposters.com/newsletters/atlas-test'
    // created this new URL with only that thank you note
})
const body = emailHtml.body.text()

const command = new SendEmailCommand({
    FromEmailAddress: from,
    Destination: { ToAddresses: [to] },
    Content: {
      Simple: {
        Subject: {
          Data: 'Test from Realm',
          Charset: 'UTF-8',
        },
        Body: {
          Html: {
            Data: body, // <---------------
            Charset: 'UTF-8',
          },
        },
      },
    },
  })
  await client.send(command)

The email gets received with all the garbled characters again:

And yes, that log at the Realm UI looks fine:

OK, now, what could be problem?

The source HTML is fine, and as I said, have being sending these emails for years. Just got the errors suddenly since last week.

(I will reference this source HTML from now on, it’s shorter: Atlas Test - typo/graphic posters)

To close another possibility, tried removing all CSS and HTML head.

But the result is the same:


Now take a look on here, this is the email source which was sent inline, where the characters were correct:

So the question is, who is fiddling with the characters? AWS or Atlas Functions?

My understanding it’s not AWS, because from the test above, sending the HTML inline, the email got received just fine.

Would be the Atlas Functions dependencies? Something at the transport level?
Please understand that it’s my guess, but makes sense somehow.

Here is the final test. Sending the exact same HTML of the thank you note, which was inlined, but now getting with the context.http.get

Is context.http.get supposed to be used as a general purpose HTTP client like you’re using it?
Based on review of the docs, I’m not so sure. In all the examples, BSON.Binary is the expected response.

exports = async function() {
  const response = await context.http.get({ url: "https://www.example.com/users" })
  // The response body is a BSON.Binary object. Parse it and return.
  return EJSON.parse(response.body.text());
};

Also, look here:

body: The binary-encoded body of the HTTP response.

Can you try to use axios or fetch instead of context.http.get?

Atlas functions doesn’t have fetch.

Have Axios 1.3.6 is installed, but just by requiring it error out with:

const axios = require('axios');

failed to execute source for 'node_modules/axios/index.js': FunctionError: failed to execute source for 'node_modules/axios/lib/axios.js': FunctionError: failed to execute source for 'node_modules/axios/lib/core/Axios.js': FunctionError: failed to execute source for 'node_modules/axios/lib/core/dispatchRequest.js': FunctionError: failed to execute source for 'node_modules/axios/lib/adapters/adapters.js': FunctionError: failed to execute source for 'node_modules/axios/lib/adapters/http.js': FunctionError: failed to execute source for 'node_modules/axios/lib/helpers/formDataToStream.js': TypeError: Value is not an object: undefined
	at node_modules/axios/lib/helpers/formDataToStream.js:41:19(118)

Like I said, I have being using context.http.get to get texts just fine, for more than 3 years. response.body.text() gets the body string.

The latest version of Axios doesn’t appear to work with Atlas functions, per the below post. Version 1.2.0 appears to work fine.

OK, installed Axios 1.2.0, but the response is coming as binary.

Of course I tried many different params and methods and researched on Github too, but we are losing the point here. I don’t see the reason to start testing Axios now. The issue is in MongoDB Atlas Functions and as I said have being working for 3+ years.

Now we need the MongoDB team to jump in this issue.

Here is Axios code:

const url = 'https://www.typographicposters.com/newsletters/atlas-test'
const axios = require('axios').default;

const response = await axios.get(url);
return response.data;

And the binary response, tried response.data.toString() also responseType: 'text':

@Try_Catch_Do_Nothing if you find real solutions, tested inside the Atlas Functions, let me know.

So submit a support request then.
If it is an issue that was introduced within Atlas Functions, then I highly doubt a fix will be made for your specific use case (which is why I suggested an alternative like axios).

The bottomline is, it’s not working now, so something changed to break the existing functionality. Was it context.http.get? Was it the aws library? You need to test different scenarios to pinpoint where the issue occurs.

Hello :wave: @andrefelipe,

Thank you for raising the concerns and discussing this. I’m looking into this. Please allow us some time.

We appreciate your patience and understanding.

Regards,
Kushagra

Everything right up to the email render is correct - the Web page us using UTF-8 and returning the ™ symbol as three bytes (Unicode character inspector: ™) E2 84 A2

In your final email you can see these three bytes as â ¢ - This is Those same three bytes being rendered using either ISO 8859-1 or ISO 8859-15 the single byte representation of non ASCII characters we used before Unicode was a common thing. E2 is â and A2 is ¢ - 84 is not defined in there. (ISO/IEC 8859-15 - Wikipedia)

The issue is that the Email is not correctly defining its contents as utf8 and so the rendered is falling back tio 8 bitl. At least that’s how it looks - if you look at the source of the email (a) is it really these three bytes or has it done some off translation to three unicode characters and (b) what charset is actually defined in the raw email content.

It’s possible the issue is that when calling the SES endpoint the Request is using ISO8859-1 rather then UTF 8, then the web service would think you were sending it three characters. That seems the most likely issue (we will see this is the internal email contents are quite different) .

Wh you send your Explicit test - you arent sending al the HTML you are only sending part of it (the text part) so it’s possible that somwhere in all the rest of the HTML is a problem.

I just tested this using the deprecated built-in SES client and it worked correctly.


exports = async function (emailAddress, Content, Subject) {


    const ses = context.services.get('email').ses();
    const emailHtml = await context.http.get({
    url: 'https://www.typographicposters.com/newsletters/atlas-test'
    // created this new URL with only that thank you note
})
const body = emailHtml.body.text()

    const result = await ses.SendEmail({
        Source: "ps-noreply@mongodb.com",
        Destination: {ToAddresses: ["john.page@mongodb.com"]},
        Message: {
            Body: {
                Html: {
                    Charset: "UTF-8",
                    Data: body
                }
            },
            Subject: {
                Charset: "UTF-8",
                Data: "trst Email"
            }
        }
    }).then(r => console.log(EJSON.stringify(r))).catch(e => console.log(e));

};

Thank you very much @John_Page for narrowing the issue and sharing your knowledge.

Yes, using the deprecated built-in SES client solved the issue and I already changed my App to use that for now.

But it will be deprecated on August 1st. What’s the solution going forward? I understand when you said that when calling the SES endpoint the Request is using ISO8859-1. But how to overcome this?

Note:
— I’ve tried both AWS SDK JS v2 and v3, both have the same issue. Is this an App Service’s Dependency issue?
— Sorry for saying again, but I used to send emails just fine using the solution I shared above (with AWS SDK v3 as dependency with @aws-sdk/client-sesv2 package) why suddenly it stoped working?

Thank you very much again.

I suspect this is an issue with a dependancy - Looks like in the config for the SDK you can set an optional _HttpHandler value which AWS describe as “Fetch in browser and Https in Nodejs.” - I’ve asked the app services development team to comment on why / if that might be passing the wrong content-type charset in the request.

If you can see a way to explicity set it to @aws-sdk/node-http-handler | AWS SDK for JavaScript v3 it might help.

I quickly browsed the @aws-sdk/node-http-handler but didn’t find a charset setting yet, will try to give attention later today.

Again, thanks for your effective support.