Part #2: Create Your Model Endpoint With Amazon SageMaker, AWS Lambda, and AWS API Gateway
Amazon SageMaker + Atlas Vector Search
series. In Part 1, I showed you how to set up an architecture that uses both tools to create embeddings for your data and how to use those to then semantically search through your data.- Extract and analyze data: Automatically extract, process, and analyze documents for more accurate investigation and faster decision-making.
- Fraud detection: Automate detection of suspicious transactions faster and alert your customers to reduce potential financial loss.
- Churn prediction: Predict the likelihood of customer churn and improve retention by honing in on likely abandoners and taking remedial actions such as promotional offers.
- Personalized recommendations: Deliver customized, unique experiences to customers to improve customer satisfaction and grow your business rapidly.
RStudio
(more on that later) and JumpStart
. You can check both on the Amazon SageMaker pricing page by checking if your desired region appears in the On-Demand Pricing
list.Set up for a single user
. This will set up a domain and a quick-start user.
- The domain itself, which holds an AWS EC2 that models will be deployed onto. This inherently contains a list of authorized users and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations.
- The
UserProfile
, which represents a single user within a domain that you will be working with. - A
shared space
, which consists of a shared JupyterServer application and shared directory. All users within the domain have access to the same shared space. - An
App
, which represents an application that supports the reading and execution experience of the user’s notebooks, terminals, and consoles.



All MiniLM L6 v2
by Hugging Face.
Deploy
and SageMaker will get everything ready for you.

In service
, everything is ready to be used.
jumpstart-dft-hf-textembedding-all-20240117-062453
. Note down your endpoint name — you will need it in the next step.AWS Lambda
and click Create function
.
Author from scratch
, give your function a name (sageMakerLambda
, for example), and choose the runtime. For this example, we’ll be running on Python.

<YOUR_ENDPOINT_NAME>
with your actual endpoint name from the previous section.lambda_handler
returns a status code and a body. It’s ready to be exposed as an endpoint, for using AWS API Gateway.1 import json 2 import boto3 3 4 sagemaker_runtime_client = boto3.client("sagemaker-runtime") 5 6 def lambda_handler(event, context): 7 try: 8 # Extract the query parameter 'query' from the event 9 query_param = event.get('queryStringParameters', {}).get('query', '') 10 11 if query_param: 12 embedding = get_embedding(query_param) 13 return { 14 'statusCode': 200, 15 'body': json.dumps({'embedding': embedding}) 16 } 17 else: 18 return { 19 'statusCode': 400, 20 'body': json.dumps({'error': 'No query parameter provided'}) 21 } 22 23 except Exception as e: 24 return { 25 'statusCode': 500, 26 'body': json.dumps({'error': str(e)}) 27 } 28 29 def get_embedding(synopsis): 30 input_data = {"text_inputs": synopsis} 31 response = sagemaker_runtime_client.invoke_endpoint( 32 EndpointName="<YOUR_ENDPOINT_NAME>", 33 Body=json.dumps(input_data), 34 ContentType="application/json" 35 ) 36 result = json.loads(response["Body"].read().decode()) 37 embedding = result["embedding"][0] 38 return embedding
Deploy
!
Configuration
part of your Lambda function and then to Permissions
. You can just click on the Role Name
link to get to the associated role in AWS Identity and Access Management (IAM).
Add permissions
.
Attach policies
to attach pre-created policies from the IAM policy list.
AmazonSageMakerFullAccess
, but keep in mind to select only those permissions that you need for your specific application.
Create API
, and then Build
on the REST API
.
sageMakerApi
.
Create API
.

/
. Pick a name like sageMakerResource
.
Create method
. We need a GET method that integrates with a lambda function.
Lambda proxy integration
and choose the lambda function that you created in the previous section. Then, create the method.

TEST
might be a good choice.
Resources
tab should look something like this.
Resources
, click on GET
again, and head to the Method request
tab. Click Edit
.
URL query string parameters
section, you want to add a new query string by giving it a name. We chose query
here. Set it to Required
but not cached and save it.
Stages
tab by opening the stage and endpoint and clicking on GET
. For this example, it’s https://4ug2td0e44.execute-api.ap-northeast-2.amazonaws.com/TEST/sageMakerResource
. Your URL should look similar.
1 curl -X GET 'https://4ug2td0e44.execute-api.ap-northeast-2.amazonaws.com/TEST/sageMakerResource?query=foo'
1 {"embedding": [0.01623343490064144, -0.007662375457584858, 0.01860642433166504, 0.031969036906957626,................... -0.031003709882497787, 0.008777940645813942]}
Top Comments in Forums
Hello, it working with the below code :
import json
import boto3
sagemaker_runtime_client = boto3.client(“sagemaker-runtime”)
def lambda_handler(event, context):
try:
# Extract the query parameter ‘query’ from the event
query_param = event.get(‘queryStringParameters’, {}).get(‘query’, ‘’)
if query_param:
embedding = get_embedding(query_param)
return {
'statusCode': 200,
'body': json.dumps({'embedding': embedding})
}
else:
return {
'statusCode': 400,
'body': json.dumps({'error': 'No query parameter provided didi'})
}
except Exception as e:
return {
'statusCode': 500,
'body': json.dumps({'error': str(e)})
}
def get_embedding(synopsis):
input_data = {“inputs”: synopsis}
response = sagemaker_runtime_client.invoke_endpoint(
EndpointName=“jumpstart-dft-hf-textembedding-all-20250417-123155”,
Body=json.dumps(input_data),
ContentType=“application/json”
)
result = json.loads(response[“Body”].read().decode())
embedding = result[0]
return embedding
I am getting this error : curl -X GET ‘https://lpeg644c9b.execute-api.us-east-1.amazonaws.com/proddidi/sageMakerResource?query=foo’
{“error”: “An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (422) from primary with message "Failed to deserialize the JSON body into the target type: missing field
inputs
at line 1 column 22". See https://eu-central-1.console.aws.amazon.com/cloudwatch/home?region=eu-central-1#logEventViewer:group=/aws/sagemaker/Endpoints/jumpstart-dft-hf-textembedding-all-20250417-123155 in account 102570980430 for more information.”}% (base) didibitini@Didis-Air atlas_starter_python-master %