Boost inference performance for Mixtral and Llama 2 models with new Amazon SageMaker containers
AWS Machine Learning
APRIL 8, 2024
In January 2024, Amazon SageMaker launched a new version (0.26.0) Be mindful that LLM token probabilities are generally overconfident without calibration. Before introducing this API, the KV cache was recomputed for any newly added requests. Be mindful that LLM token probabilities are generally overconfident without calibration.
Let's personalize your content