Deploy pre-trained models on AWS Wavelength with 5G edge using Amazon SageMaker JumpStart

With the advent of high-speed 5G mobile networks, enterprises are more easily positioned than ever with the opportunity to harness the convergence of telecommunications networks and the cloud. As one of the most prominent use cases to date, machine learning (ML) at the edge has allowed enterprises to deploy ML models closer to their end-customers to reduce latency and increase responsiveness of their applications. As an example, smart venue solutions can use near-real-time computer vision for crowd analytics over 5G networks, all while minimizing investment in on-premises hardware networking equipment. Retailers can deliver more frictionless experiences on the go with natural language processing (NLP), real-time recommendation systems, and fraud detection. Even ground and aerial robotics can use ML to unlock safer, more autonomous operations.

To reduce the barrier to entry of ML at the edge, we wanted to demonstrate an example of deploying a pre-trained model from Amazon SageMaker to AWS Wavelength, all in less than 100 lines of code. In this post, we demonstrate how to deploy a SageMaker model to AWS Wavelength to reduce model inference latency for 5G network-based applications.

Solution overview

Across AWS’s rapidly expanding global infrastructure, AWS Wavelength brings the power of cloud compute and storage to the edge of 5G networks, unlocking more performant mobile experiences. With AWS Wavelength, you can extend your virtual private cloud (VPC) to Wavelength Zones corresponding to the telecommunications carrier’s network edge in 29 cities across the globe. The following diagram shows an example of this architecture.

AWS Wavelength Reference Architecture

You can opt in to the Wavelength Zones within a given Region via the AWS Management Console or the AWS Command Line Interface (AWS CLI). To learn more about deploying geo-distributed applications on AWS Wavelength, refer to Deploy geo-distributed Amazon EKS clusters on AWS Wavelength.

Building on the fundamentals discussed in this post, we look to ML at the edge as a sample workload with which to deploy to AWS Wavelength. As our sample workload, we deploy a pre-trained model from Amazon SageMaker JumpStart.

SageMaker is a fully managed ML service that allows developers to easily deploy ML models into their AWS environments. Although AWS offers a number of options for model training—from AWS Marketplace models and SageMaker built-in algorithms—there are a number of techniques to deploy open-source ML models.

JumpStart provides access to hundreds of built-in algorithms with pre-trained models that can be seamlessly deployed to SageMaker endpoints. From predictive maintenance and computer vision to autonomous driving and fraud detection, JumpStart supports a variety of popular use cases with one-click deployment on the console.

Because SageMaker is not natively supported in Wavelength Zones, we demonstrate how to extract the model artifacts from the Region and re-deploy to the edge. To do so, you use Amazon Elastic Kubernetes Service (Amazon EKS) clusters and node groups in Wavelength Zones, followed by creating a deployment manifest with the container image generated by JumpStart. The following diagram illustrates this architecture.

Reference architecture for Amazon SageMaker JumpStart on AWS Wavelength

Prerequisites

To make this as easy as possible, ensure that your AWS account has Wavelength Zones enabled. Note that this integration is only available in us-east-1 and us-west-2, and you will be using us-east-1 for the duration of the demo.

To opt in to AWS Wavelength, complete the following steps:

On the Amazon VPC console, choose Zones under Settings and choose US East (Verizon) / us-east-1-wl1.
Choose Manage.
Select Opted in.
Choose Update zones.

Create AWS Wavelength infrastructure

Before we convert the local SageMaker model inference endpoint to a Kubernetes deployment, you can create an EKS cluster in a Wavelength Zone. To do so, deploy an Amazon EKS cluster with an AWS Wavelength node group. To learn more, you can visit this guide on the AWS Containers Blog or Verizon’s 5GEdgeTutorials repository for one such example.

Next, using an AWS Cloud9 environment or interactive development environment (IDE) of choice, download the requisite SageMaker packages and Docker Compose, a key dependency of JumpStart.

pip install sagemaker
pip install 'sagemaker[local]' --upgrade
sudo curl -L "https://github.com/docker/compose/releases/download/1.23.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
docker-compose --version

Create model artifacts using JumpStart

First, make sure that you have an AWS Identity and Access Management (IAM) execution role for SageMaker. To learn more, visit SageMaker Roles.

Using this example, create a file called train_model.py that uses the SageMaker Software Development Kit (SDK) to retrieve a pre-built model (replace <your-sagemaker-execution-role> with the Amazon Resource Name (ARN) of your SageMaker execution role). In this file, you deploy a model locally using the instance_type attribute in the model.deploy() function, which starts a Docker container within your IDE using all requisite model artifacts you defined:

#train_model.py
from sagemaker import image_uris, model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base
import sagemaker, boto3, json
from sagemaker import get_execution_role

aws_role = "<your-sagemaker-execution-role>"
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

# model_version="*" fetches the latest version of the model.
infer_model_id = "tensorflow-tc-bert-en-uncased-L-12-H-768-A-12-2"
infer_model_version= "*"
endpoint_name = name_from_base(f"jumpstart-example-{infer_model_id}")

# Retrieve the inference docker container uri.
deploy_image_uri = image_uris.retrieve(
region=None,
framework=None,
image_scope="inference",
model_id=infer_model_id,
model_version=infer_model_version,
instance_type="local",
)
# Retrieve the inference script uri.
deploy_source_uri = script_uris.retrieve(
model_id=infer_model_id, model_version=infer_model_version, script_scope="inference"
)
# Retrieve the base model uri.
base_model_uri = model_uris.retrieve(
model_id=infer_model_id, model_version=infer_model_version, model_scope="inference"
)
model = Model(
image_uri=deploy_image_uri,
source_dir=deploy_source_uri,
model_data=base_model_uri,
entry_point="inference.py",
role=aws_role,
predictor_cls=Predictor,
name=endpoint_name,
)
print(deploy_image_uri,deploy_source_uri,base_model_uri)
# deploy the Model.
base_model_predictor = model.deploy(
initial_instance_count=1,
instance_type="local",
endpoint_name=endpoint_name,
)

Next, set infer_model_id to the ID of the SageMaker model that you would like to use.

For a complete list, refer to Built-in Algorithms with pre-trained Model Table. In our example, we use the Bidirectional Encoder Representations from Transformers (BERT) model, commonly used for natural language processing.

Run the train_model.py script to retrieve the JumpStart model artifacts and deploy the pre-trained model to your local machine:

python train_model.py

Should this step succeed, your output may resemble the following:

763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.8-cpu
s3://jumpstart-cache-prod-us-east-1/source-directory-tarballs/tensorflow/inference/tc/v2.0.0/sourcedir.tar.gz
s3://jumpstart-cache-prod-us-east-1/tensorflow-infer/v2.0.0/infer-tensorflow-tc-bert-en-uncased-L-12-H-768-A-12-2.tar.gz

In the output, you will see three artifacts in order: the base image for TensorFlow inference, the inference script that serves the model, and the artifacts containing the trained model. Although you could create a custom Docker image with these artifacts, another approach is to let SageMaker local mode create the Docker image for you. In the subsequent steps, we extract the container image running locally and deploy to Amazon Elastic Container Registry (Amazon ECR) as well as push the model artifact separately to Amazon Simple Storage Service (Amazon S3).

Convert local mode artifacts to remote Kubernetes deployment

Now that you have confirmed that SageMaker is working locally, let’s extract the deployment manifest from the running container. Complete the following steps:

Identify the location of the SageMaker local mode deployment manifest: To do so, search our root directory for any files named docker-compose.yaml.

docker_manifest=$( find /tmp/tmp* -name "docker-compose.yaml" -printf '%T+ %p\n' | sort | tail -n 1 | cut -d' ' -f2-)
echo $docker_manifest

Identify the location of the SageMaker local mode model artifacts: Next, find the underlying volume mounted to the local SageMaker inference container, which will be used in each EKS worker node after we upload the artifact to Amazon s3.

model_local_volume = $(grep -A1 -w "volumes:" $docker_manifest | tail -n 1 | tr -d ' ' | awk -F: '{print $1}' | cut -c 2-) 
# Returns something like: /tmp/tmpcr4bu_a7</p>

Create local copy of running SageMaker inference container: Next, we’ll find the currently running container image running our machine learning inference model and make a copy of the container locally. This will ensure we have our own copy of the container image to pull from Amazon ECR.

# Find container ID of running SageMaker Local container
mkdir sagemaker-container
container_id=$(docker ps --format "{{.ID}} {{.Image}}" | grep "tensorflow" | awk '{print $1}')
# Retrieve the files of the container locally
docker cp $my_container_id:/ sagemaker-container/

Before acting on the model_local_volume, which we’ll push to Amazon S3, push a copy of the running Docker image, now in the sagemaker-container directory, to Amazon Elastic Container Registry. Be sure to replace region, aws_account_id, docker_image_id and my-repository:tag or follow the Amazon ECR user guide. Also, be sure to take note of the final ECR Image URL (aws_account_id.dkr.ecr.region.amazonaws.com/my-repository:tag), which we will use in our EKS deployment.

aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com
docker build .
docker tag <docker-image-id> aws_account_id.dkr.ecr.region.amazonaws.com/my-repository:tag
docker push aws_account_id.dkr.ecr.region.amazonaws.com/my-repository:tag

Now that we have an ECR image corresponding to the inference endpoint, create a new Amazon S3 bucket and copy the SageMaker Local artifacts (model_local_volume) to this bucket. In parallel, create an Identity Access Management (IAM) that provides Amazon EC2 instances access to read objects within the bucket. Be sure to replace <unique-bucket-name> with a globally unique name for your Amazon S3 bucket.

# Create S3 Bucket for model artifacts
aws s3api create-bucket --bucket <unique-bucket-name>
aws s3api put-public-access-block --bucket <unique-bucket-name> --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
# Step 2: Create IAM attachment to Node Group
cat > ec2_iam_policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::sagemaker-wavelength-demo-app/*",
        "arn:aws:s3:::sagemaker-wavelength-demo-app"
      ]
    }
  ]
}

# Create IAM policy
policy_arn=$(aws iam create-policy --policy-name sagemaker-demo-app-s3 --policy-document file://ec2_iam_policy.json --query Policy.Arn)
aws iam attach-role-policy --role-name wavelength-eks-Cluster-wl-workers --policy-arn $policy_arn

# Push model artifacts to S3
cd $model_local_volume
tar -cvf sagemaker_model.tar .
aws s3 cp sagemaker_model.tar s3://

Next, to ensure that each EC2 instance pulls a copy of the model artifact on launch, edit the user data for your EKS worker nodes. In your user data script, ensure that each node retrieves the model artifacts using the the S3 API at launch. Be sure to replace <unique-bucket-name> with a globally unique name for your Amazon S3 bucket. Given that the node’s user data will also include the EKS bootstrap script, the complete user data may look something like this.

#!/bin/bash
mkdir /tmp/model</p><p>cd /tmp/model
aws s3api get-object --bucket sagemaker-wavelength-demo-app --key sagemaker_model.tar  sagemaker_model.tar
tar -xvf sagemaker_model.tar
set -o xtrace
/etc/eks/bootstrap.sh <your-eks-cluster-id>

Now, you can inspect the existing docker manifest it and translate it to Kubernetes-friendly manifest files using Kompose, a well-known conversion tool. Note: if you get a version compatibility error, change the version attribute in line 27 of docker-compose.yml to “2”.

curl -L https://github.com/kubernetes/kompose/releases/download/v1.26.0/kompose-linux-amd64 -o kompose
chmod +x kompose && sudo mv ./kompose /usr/local/bin/compose
cd "$(dirname "$docker_manifest")"
kompose convert

After running Kompose, you’ll see four new files: a Deployment object, Service object, PersistentVolumeClaim object, and NetworkPolicy object. You now have everything you need to begin your foray into Kubernetes at the edge!

Deploy SageMaker model artifacts

Make sure you have kubectl and aws-iam-authenticator downloaded to your AWS Cloud9 IDE. If not, follow the installation guides:

Now, complete the following steps:

Modify the service/algo-1-ow3nv object to switch the service type from ClusterIP to NodePort. In our example, we have selected port 30,007 as our NodePort:

# algo-1-ow3nv-service.yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.26.0 (40646f47)
  creationTimestamp: null
  labels:
    io.kompose.service: algo-1-ow3nv
  name: algo-1-ow3nv
spec:
  type: NodePort
  ports:
    - name: "8080"
      port: 8080
      targetPort: 8080
      nodePort: 30007
  selector:
    io.kompose.service: algo-1-ow3nv
status:
  loadBalancer: {}

Next, you must allow the NodePort in the security group for your node. To do so, retrieve the security groupID and allow-list the NodePort:

node_group_sg=$(aws ec2 describe-security-groups --filters Name=group-name,Values='wavelength-eks-Cluster*' --query "SecurityGroups[0].GroupId" --output text)
aws ec2 authorize-security-group-ingress --group-id $node_group_sg --ip-permissions IpProtocol=tcp,FromPort=30007,ToPort=30007,IpRanges='[{CidrIp=0.0.0.0/0}]'

Next, modify the algo-1-ow3nv-deployment.yaml manifest to mount the /tmp/model hostPath directory to the container. Replace <your-ecr-image> with the ECR image you created earlier:

# algo-1-ow3nv-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.26.0 (40646f47)
  creationTimestamp: null
  labels:
    io.kompose.service: algo-1-ow3nv
  name: algo-1-ow3nv
spec:
  replicas: 1
  selector:
    matchLabels:
      io.kompose.service: algo-1-ow3nv
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        kompose.cmd: kompose convert
        kompose.version: 1.26.0 (40646f47)
      creationTimestamp: null
      labels:
        io.kompose.network/environment-sagemaker-local: "true"
        io.kompose.service: algo-1-ow3nv
    spec:
      containers:
        - args:
            - serve
          env:
            - name: SAGEMAKER_CONTAINER_LOG_LEVEL
              value: "20"
            - name: SAGEMAKER_PROGRAM
              value: inference.py
            - name: SAGEMAKER_REGION
              value: us-east-1
            - name: SAGEMAKER_SUBMIT_DIRECTORY
              value: /opt/ml/model/code
          image: <your-ecr-image>
          name: sagemaker-test-model
          ports:
            - containerPort: 8080
          resources: {}
          stdin: true
          tty: true
          volumeMounts:
            - mountPath: /opt/ml/model
              name: algo-1-ow3nv-claim0
      restartPolicy: Always
      volumes:
        - name: algo-1-ow3nv-claim0
          hostPath:
            path: /tmp/model
status: {}

With the manifest files you created from Kompose, use kubectl to apply the configs to your cluster:

$ kubectl apply -f algo-1-ow3nv-deployment.yaml algo-1-ow3nv-service.yaml
deployment.apps/algo-1-ow3nv created
service/algo-1-ow3nv created

Connect to the 5G edge model

To connect to your model, complete the following steps:

On the Amazon EC2 console, retrieve the carrier IP of the EKS worker node or use the AWS CLI to query the carrier IP address directly:

aws ec2 describe-instances --filters "Name=tag:aws:autoscaling:groupName,Values=eks-EKSNodeGroup*" --query 'Reservations[*].Instances[*].[Placement.AvailabilityZone,NetworkInterfaces[].Association.CarrierIp]' --output text
# Example Output: 155.146.1.12

Now, with the carrier IP address extracted, you can connect to the model directly using the NodePort. Create a file called invoke.py to invoke the BERT model directly by providing a text-based input that will be run against a sentiment-analyzer to determine whether the tone was positive or negative:

import json
endpoint_name="jumpstart-example-tensorflow-tc-bert-en-uncased-L-12-H-768-A-12-2"
request_body = "simply stupid , irrelevant and deeply , truly , bottomlessly cynical ".encode("utf-8")
import requests
r2=requests.post(url="http://155.146.1.12:30007/invocations", data=request_body, headers={"Content-Type":"application/x-text","Accept":"application/json;verbose"})
print(r2.text)

Your output should resemble the following:

{"probabilities": [0.998723, 0.0012769578], "labels": [0, 1], "predicted_label": 0}

Clean up

To destroy all application resources created, delete the AWS Wavelength worker nodes, the EKS control plane, and all the resources created within the VPC. Additionally, delete the ECR repo used to host the container image, the S3 buckets used to host the SageMaker model artifacts and the sagemaker-demo-app-s3 IAM policy.

Conclusion

In this post, we demonstrated a novel approach to deploying SageMaker models to the network edge using Amazon EKS and AWS Wavelength. To learn about Amazon EKS best practices on AWS Wavelength, refer to Deploy geo-distributed Amazon EKS clusters on AWS Wavelength. Additionally, to learn more about Jumpstart, visit the Amazon SageMaker JumpStart Developer Guide or the JumpStart Available Model Table.

About the Authors

Robert Belson is a Developer Advocate in the AWS Worldwide Telecom Business Unit, specializing in AWS Edge Computing. He focuses on working with the developer community and large enterprise customers to solve their business challenges using automation, hybrid networking and the edge cloud.

Mohammed Al-Mehdar is a Senior Solutions Architect in the Worldwide Telecom Business Unit at AWS. His main focus is to help enable customers to build and deploy Telco and Enterprise IT workloads on AWS. Prior to joining AWS, Mohammed has been working in the Telco industry for over 13 years and brings a wealth of experience in the areas of LTE Packet Core, 5G, IMS and WebRTC. Mohammed holds a bachelor’s degree in Telecommunications Engineering from Concordia University.

Evan Kravitz is a software engineer at Amazon Web Services, working on SageMaker JumpStart. He enjoys cooking and going on runs in New York City.

Justin St. Arnauld is an Associate Director – Solution Architects at Verizon for the Public Sector with over 15 years of experience in the IT industry. He is a passionate advocate for the power of edge computing and 5G networks and is an expert in developing innovative technology solutions that leverage these technologies. Justin is particularly enthusiastic about the capabilities offered by Amazon Web Services (AWS) in delivering cutting-edge solutions for his clients. In his free time, Justin enjoys keeping up-to-date with the latest technology trends and sharing his knowledge and insights with others in the industry.