Boost inference performance for Mixtral and Llama 2 models with new Amazon SageMaker containers
AWS Machine Learning
APRIL 8, 2024
Be mindful that LLM token probabilities are generally overconfident without calibration. Transformers-NeuronX backend The updated release of NeuronX included in the LMI NeuronX DLC now supports models that feature the grouped-query attention mechanism, such as Mistral-7B and LLama2-70B.
Let's personalize your content