Moving existing atlas mongo snapshots to external storage

Bernardo_Garcia · March 8, 2021, 9:42pm

Dear Mongo community
I have a cluster on MongoDB atlas cloud service. It is deployed on azure cloud and I have enabled cloud backups in this way:

So my cluster is creating frequently snapshots. My objective is to move those snapshots outside the cluster scope since if for some reason the cluster is deleted, the snapshots will be removed as well.

I see we can upload mongodump’s or mongoexport’s to amazon s3 buckets, there are plenty of tutorials out there. In addition, I’ve realized, I don’t want to execute mongodump’s of my collections from a script, since the snapshots already do exists on cloud backup on atlas service.

I have been reading some documentation and I found several ways to try to move data to external storage but seems that using the mongo API to get the snapshots created and transfer them is a good option here. However, I got confused about which option to use:

I got this cloud manager, mongo documentation link to get the snapshots for one cluster but it seems it uses this endpoint: https://cloud.mongodb.com/api/public/v1.0
And I found this atlas mongo documentation link to download the snapshots via API Resources but it uses this endpoint: https://cloud.mongodb.com/api/atlas/v1.0

Both endpoints are completely different. Considering my cluster is on atlas service, any of them could works for me to interact with my cluster via API?

On the other hand, having said this, I am also wondering:

ok, by calling the APIs I can get all snapshots but is not clear for me if they are transported directly to AWS S3 buckets when a get/post and upload to s3 operations actions takes place.

So, can I transfer an existing snapshot from mongo to s3 without downloading it first?
Do we need to care about temporary storage in between along this process?
I mentioned this because once I am dealing with GB of data, will be reliable to trust in a direct transport from mongo to s3?
If so I am afraid a security private network link or service endpoint should be used to get the proper bandwidth for this operation transport Am I right here?
I am not entirely sure if is possible to avoid the downloading process, I would say not.

What is the best option to move my snapshots collections to an external storage service like s3 or azure storage accounts?

Pavel_Duchovny · March 9, 2021, 6:23am

Hi @Bernardo_Garcia

Welcome to MongoDB community.

I believe the way to go is to issue a restore command and get an http download link.

Use “download” type.

Following that you will need to use a server which might stream the downloaded link file to your external storage.

Of course this server needs to have the firewall and credentials to download and upload the backup tar.gz…

Thanks
Pavel

Bernardo_Garcia · March 9, 2021, 9:35am

Dear @Pavel_Duchovny
Thanks for your advice and the welcome
I am going to read carefully the context of restore jobs and perhaps I might have some questions after.

Bernardo_Garcia · March 9, 2021, 12:30pm

I am trying to download a snapshot of this way:

curl --user "{PUBLIC-KEY}:{PRIVATE-KEY}" --digest --include \
     --header "Accept: application/json" \
     --header "Content-Type: application/json" \
     --request POST "https://cloud.mongodb.com/api/atlas/v1.0/groups/{GROUP-ID}/clusters/{SOURCE-CLUSTER-NAME}/backup/restoreJobs" \
     --data '{
         "snapshotId" : "{SNAPSHOT-ID}",
         "deliveryType" : "download",
       }'

But I got the following error:

HTTP/2 401
www-authenticate: Digest realm="MMS Public API", domain="", nonce="3X1bUO+5LBP3jVBffeHG0ZHprzcw8MQO", algorithm=MD5, qop="auth", stale=false
content-type: application/json
content-length: 106
x-envoy-upstream-service-time: 1
date: Tue, 09 Mar 2021 12:25:52 GMT
server: envoy

HTTP/2 403
date: Tue, 09 Mar 2021 12:25:53 GMT
content-type: application/json
strict-transport-security: max-age=31536000
x-frame-options: DENY
content-length: 187
x-envoy-upstream-service-time: 23
server: envoy

{"detail":"This resource requires access through an access list of ip ranges.","error":403,"errorCode":"RESOURCE_REQUIRES_ACCESS_LIST","parameters":["213.127.5.142"],"reason":"Forbidden"}%

Not sure about first HTTP/2 401 error, since I am using the correct public and private keys

Regarding second HTTP/2 403 error I have whitelisted my home IP address on IP access list section but not sure why it does not works.

Pavel_Duchovny · March 9, 2021, 1:02pm

Hi @Bernardo_Garcia,

Please make sure to whitelist the IP in the API key section.

Here is the guide make sure to follow each step:

https://docs.atlas.mongodb.com/configure-api-access/

Best regards,
Pavel

Bernardo_Garcia · March 9, 2021, 2:41pm

I got it. Thanks for the clue. In additionI had to go to:
Organization>Access Manager>API Keys>
Select the existing used API KEY, Edit permissions
Private Key & Access List and add my home IP address.

I got the following output with my command:

curl --user "{PUBLIC-KEY}:{PRIVATE-KEY}" --digest --include \
     --header "Accept: application/json" \
     --header "Content-Type: application/json" \
     --request POST "https://cloud.mongodb.com/api/atlas/v1.0/groups/{GROUP-ID}/clusters/{SOURCE-CLUSTER-NAME}/backup/restoreJobs" \
     --data '{
         "snapshotId" : "{SNAPSHOT-ID}",
         "deliveryType" : "download"
       }'
HTTP/2 401
www-authenticate: Digest realm="MMS Public API", domain="", nonce="1kjxFzn5t6UROx5NmotVsgE6wcLe1zw0", algorithm=MD5, qop="auth", stale=false
content-type: application/json
content-length: 106
x-envoy-upstream-service-time: 1
date: Tue, 09 Mar 2021 13:37:48 GMT
server: envoy

HTTP/2 200
date: Tue, 09 Mar 2021 13:37:48 GMT
x-mongodb-service-version: gitHash=e9b00d560d9ff15b4dd614bb75d640577ac4f44f; versionString=v20210217
content-type: application/json
strict-transport-security: max-age=31536000
x-frame-options: DENY
content-length: 851
x-envoy-upstream-service-time: 48
server: envoy

{
	"cancelled": false,
	"deliveryType": "download",
	"deliveryUrl": [],
	"expired": false,
	"failed": false,
	"id": "60477a2d3d7834598fbee31a",
	"links": [{
		"href": "https://cloud.mongodb.com/api/atlas/v1.0/groups/{GROUP-ID}/clusters/{SOURCE-CLUSTER-NAME}/backup/restoreJobs/60477a2d3d7834598fbee31a",
		"rel": "self"
	}, {
		"href": "https://cloud.mongodb.com/api/atlas/v1.0/groups/{GROUP-ID}/clusters/{SOURCE-CLUSTER-NAME}/backup/snapshots/604705afbb33ec63425c7553",
		"rel": "http://cloud.mongodb.com/snapshot"
	}, {
		"href": "https://cloud.mongodb.com/api/atlas/v1.0/groups/{GROUP-ID}/clusters/{SOURCE-CLUSTER-NAME}",
		"rel": "http://cloud.mongodb.com/cluster"
	}, {
		"href": "https://cloud.mongodb.com/api/atlas/v1.0/groups/{GROUP-ID}",
		"rel": "http://cloud.mongodb.com/group"
	}],
	"snapshotId": "{SNAPSHOT-ID}",
	"timestamp": "2021-03-09T05:21:33Z"
}

I don’t know why the first HTTP 401 Unauthorized client error status response code. I mean my credentials or public_key and private_keys used has permissions over the project cluster I am getting the snapshot.

But regarding the second HTTP/2 200 status code, I can see a restore job was created, with its respective direct download link to the *.tar.gz snapshot file of this way

https://restore-604782d892e66d4c432ca912.xxxxx.azure.mongodb.net:port/vxxxxxxxxxxx/restore-{SNAPSHOT-ID}.tar.gz

That is good, but regarding my objective of moving the existing snapshots outside the mongo cluster scope, perhaps I am not understanding if creating a restore job, if this result is aligned with the objective of stream or move my snapshots directly from mongo to external storage …

I mean, of course, it is, I got a link to download directly the {SNAPSHOT-ID}.tar.gz and I can use that link to upload this snapshot to my AWS s3 or azure storage account, but then getting back to my question above:

Will I need to download every snapshot and then in a separate step upload it to my external storage?
So, Is that the way to proceed then? (I know the question is quite obvious, but I want to make a double check with you).

If so, in that case is not possible to make a direct transport from mongo to aws s3 for example?
If so, how can I get the https://restore-604782d892e66d4c432ca912.xxxxx.azure.mongodb.net:port/vxxxxxxxxxxx/restore-{SNAPSHOT-ID}.tar.gz link value during curl command (or whatever tool approach) execution runtime, just to get it and used it in a subsequent step to be uploaded to my external storage? I ask this because my final goal is to automate this snapshots moving process from mongo to somewhere else.

Pavel_Duchovny · March 9, 2021, 3:06pm

Hi @Bernardo_Garcia,

Unfortunately you will need to script it .

There might be a 3rd party or aws tool to fo this but we provide only a link.

Thanks
Pavel

Bernardo_Garcia · March 9, 2021, 3:12pm

Yes is that I was thinking.

Bernardo_Garcia · March 9, 2021, 3:44pm

@Pavel_Duchovny sorry it’s me again.
Do you know how can I retrieve the deliveryUrl parameter in the output of curl command?

Here says:

deliveryUrl: array of strings
If empty, Atlas is processing the restore job. Use the Get All Cloud Backup Restore Jobs endpoint periodically check for a deliveryUrl download value for the restore job.

Is not clear for me what kind of parameters should I put in the request for getting it.

Pavel_Duchovny · March 9, 2021, 7:27pm

Hi @Bernardo_Garcia,

I guess that you can use a pipe to a tool like jq to get the value but probably old good grep and awk can help out.

If you have a sample request I can help more maybe.

Thanks

system · March 14, 2021, 7:27pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.