AWS Machine Learning Blog

Configure an AWS DeepRacer environment for training and log analysis using the AWS CDK

This post is co-written by Zdenko Estok, Cloud Architect at Accenture and Sakar Selimcan, DeepRacer SME at Accenture.

With the increasing use of artificial intelligence (AI) and machine learning (ML) for a vast majority of industries (ranging from healthcare to insurance, from manufacturing to marketing), the primary focus shifts to efficiency when building and training models at scale. The creation of a scalable and hassle-free data science environment is key. It can take a considerable amount of time to launch and configure an environment tailored for a specific use case and even harder to onboard colleagues to collaborate.

According to Accenture, companies that manage to efficiently scale AI and ML can achieve nearly triple the return on their investments. Still, not all companies meet their expected returns on their AI/ML journey. Toolkits to automate the infrastructure become essential for horizontal scaling of AI/ML efforts within a corporation.

AWS DeepRacer is a simple and fun way to get started with reinforcement learning (RL), an ML technique where an agent discovers the optimal actions to take in a given environment. In our case, that would be an AWS DeepRacer vehicle, trying to race fast around a track. You can get started with RL quickly with hands-on tutorials that guide you through the basics of training RL models and test them in an exciting, autonomous car racing experience.

This post shows how companies can use infrastructure as code (IaC) with the AWS Cloud Development Kit (AWS CDK) to accelerate the creation and replication of highly transferable infrastructure and easily compete for AWS DeepRacer events at scale.

“IaC combined with a managed Jupyter environment gave us best of both worlds: repeatable, highly transferable data science environments for us to onboard our AWS DeepRacer competitors to focus on what they do the best: train fast models fast.”

– Selimcan Sakar, AWS DeepRacer SME at Accenture.

Solution overview

Orchestrating all the necessary services takes a considerable amount of time when it comes to creating a scalable template that can be applied for multiple use cases. In the past, AWS CloudFormation templates have been created to automate the creation of these services. With the advancements in automation and configuring with increasing levels of abstraction to set up different environments with IaC tools, the AWS CDK is being widely adopted across various enterprises. The AWS CDK is an open-source software development framework to define your cloud application resources. It uses the familiarity and expressive power of programming languages for modeling your applications, while provisioning resources in a safe and repeatable manner.

In this post, we enable the provisioning of different components required for performing log analysis using Amazon SageMaker on AWS DeepRacer via AWS CDK constructs.

Although the analysis graph provided within in the DeepRacer console if effective and straightforward regarding the rewards granted and progress achieved, it doesn’t give insight into how fast the car moves through the waypoints, or what kind of a line the car prefers around the track. This is where advanced log analysis comes into play. Our advanced log analysis aims to bring efficiency in training retrospectively to understand which reward functions and action spaces work better than the others when training multiple models, and whether a model is overfitting, so that racers can train smarter and achieve better results with less training.

Our solution describes an AWS DeepRacer environment configuration using the AWS CDK to accelerate the journey of users experimenting with SageMaker log analysis and reinforcement learning on AWS for an AWS DeepRacer event.

An administrator can run the AWS CDK script provided in the GitHub repo via the AWS Management Console or in the terminal after loading the code in their environment. The steps are as follows:

  1. Open AWS Cloud9 on the console.
  2. Load the AWS CDK module from GitHub into the AWS Cloud9 environment.
  3. Configure the AWS CDK module as described in this post.
  4. Open the cdk.context.json file and inspect all the parameters.
  5. Modify the parameters as needed and run the AWS CDK command with the intended persona to launch the configured environment suited for that persona.

The following diagram illustrates the solution architecture.

cdk-arch

With the help of the AWS CDK, we can version control our provisioned resources and have a highly transportable environment that complies with enterprise-level best practices.

Prerequisites

In order to provision ML environments with the AWS CDK, complete the following prerequisites:

  1. Have access to an AWS account and permissions within the Region to deploy the necessary resources for different personas. Make sure you have the credentials and permissions to deploy the AWS CDK stack into your account.
  2. We recommend following certain best practices that are highlighted through the concepts detailed in the following resources:
  3. Clone the GitHub repo into your environment.

Deploy the portfolio into your account

In this deployment, we use AWS Cloud9 to create a data science environment using the AWS CDK.

  1. Navigate to the AWS Cloud9 console.
  2. Specify your environment type, instance type, and platform.

  1. Specify your AWS Identity and Access Management (IAM) role, VPC, and subnet.

  1. In your AWS Cloud9 environment, create a new folder called DeepRacer.
  2. Run the following command to install the AWS CDK, and make sure you have the right dependencies to deploy the portfolio:
npm install -g aws-cdk
  1. To verify that the AWS CDK has been installed and to access the docs, run the following command in your terminal (it should redirect you to the AWS CDK documentation):
cdk docs
  1. Now we can clone the AWS DeepRacer repository from GitHub.
  2. Open the cloned repo in AWS Cloud9:
cd DeepRacer_cdk

After you review the content in the DeepRacer_cdk directory, there will be a file called package.json with all the required modules and dependencies defined. This is where you can define your resources in a module.

  1. Next, install all required modules and dependencies for the AWS CDK app:
npm install

cdk synth

This will synthesize the corresponding CloudFormation template.

  1. To run the deployment, either change the context.json file with parameter names or explicitly define them during runtime:
cdk deploy

The following components are created for AWS DeepRacer log analysis based on running the script:

  • An IAM role for the SageMaker notebook with a managed policy
  • A SageMaker notebook instance with the instance type either explicitly added as a cdk context parameter or default value stored in the context.json file
  • A VPC with CIDR as specified in the context.json file along with four public subnets configured
  • A new security group for the Sagemaker notebook instance allowing communication within the VPC
  • A SageMaker lifecycle policy with a bash script that is preloading the content of another GitHub repository, which contains the files we use for running the log analysis on the AWS DeepRacer models

  1. You can run the AWS CDK stack as follows:
$ cdk deploy
  1. Go to the AWS CloudFormation console in the Region where the stack is deployed to verify the resources.

Now users can start using those services to work with log analysis and deep RL model training on SageMaker for AWS DeepRacer.

Module testing

You can run also some unit tests before deploying the stack to verify that you accidently didn’t remove any required resources. The unit tests are located in DeepRacer/test/deep_racer.test.ts and can be run with the following code:

npm run test

Generate diagrams using cdk-dia

To generate diagrams, complete the following steps:

  1. Install graphviz using your operating system tools:
npm -g cdk-dia

This installs the cdk-dia application.

  1. Now run the following code:
cdk-dia

A graphical representation of your AWS CDK stack will be stored in .png format.

After you run the preceding steps, you should see be able see the creation process of the notebook instance with status Pending. When the status of the notebook instance is InService (as shown in the following screenshot), you can proceed with the next steps.

  1. Choose Open Jupyter to start running the Python script for performing the log analysis.

For additional details on log analysis using AWS DeepRacer and associated visualizations, refer to Using log analysis to drive experiments and win the AWS DeepRacer F1 ProAm Race.

Clean up

To avoid ongoing charges, complete the following steps:

  1. Use cdk destroy to delete the resources created via the AWS CDK.
  2. On the AWS CloudFormation console, delete the CloudFormation stack.

Conclusion

AWS DeepRacer events are a great way to raise interest and increase ML knowledge across all pillars and levels of an organization. In this post, we shared how you can configure a dynamic AWS DeepRacer environment and set up selective services to accelerate the journey of users on the AWS platform. We discussed how to create services Amazon SageMaker Notebook Instance, IAM roles, SageMaker notebook lifecycle configuration with best practices, a VPC, and Amazon Elastic Compute Cloud (Amazon EC2) instances based on identifying the context using the AWS CDK and scaling for different users using AWS DeepRacer.

Configure the CDK environment and run the advanced log analysis notebook to bring efficiency in running the module. Assist racers to achieve better results in less time and gain granular insights into reward functions and action.

References

More information is available at the following resources:

  1. Automate Amazon SageMaker Studio setup using AWS CDK
  2. AWS SageMaker CDK API reference

About the Authors

 Zdenko Estok works as a cloud architect and DevOps engineer at Accenture. He works with AABG to develop and implement innovative cloud solutions, and specializes in infrastructure as code and cloud security. Zdenko likes to bike to the office and enjoys pleasant walks in nature.

Selimcan “Can” Sakar is a cloud first developer and solution architect at Accenture with a focus on artificial intelligence and a passion for watching models converge.

Shikhar Kwatra is an AI/ML specialist solutions architect at Amazon Web Services, working with a leading Global System Integrator. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and supports the GSI partner in building strategic industry solutions on AWS. Shikhar enjoys playing guitar, composing music, and practicing mindfulness in his spare time.