Optimize AWS Inferentia utilization with FastAPI and PyTorch models on Amazon EC2 Inf1 & Inf2 instances
AWS Machine Learning
JULY 24, 2023
A client doesn’t need to know the model’s name or version that has been deployed under the server; the endpoint name is now just a proxy to a function that loads and runs the model. If the model changes on the server side, the client has to know and change its API call to the new endpoint accordingly. 3 Inf1.24xlarge 16 64 1.5
Let's personalize your content