Odin: Machine Learning Operations at Scale
December 15, 2020 • 3 minute read

Introducing Odin: Machine Learning Operations at Scale

We’re pleased to announce we are open-sourcing Odin, a lightweight framework for automating machine learning workflows. Odin is a compact, efficient codebase that orchestrates and runs ML workflows, supporting parallel execution of pipeline steps and using a simple Git-backed configuration to describe processing pipelines.  Odin provides a simple API for launching and monitoring jobs and resources over a cluster. Odin is a minimalistic effort, built under the DRY philosophy, and it relies on industrial strength tools such as  Git, PostgreSQL and Kubernetes underneath. It also optionally supports commonly used Kubernetes operators for distributed training including a la carte operators provided by the Kubeflow and PyTorch Elastic projects, to make it easy to train across multiple devices or machines.  It can be run in Google Kubernetes Engine in the cloud, or on local Kubernetes clusters, providing a simple and unified interface for scheduling ML jobs.

We built Odin with simplicity and reproducibility in mind. You define your workflows declaratively in a compact YAML configuration language and submit them to Odin with a single command or HTTP request. Training machine learning models at scale is already challenging, the tooling for running these jobs should be as simple and transparent as possible. Odin is completely written in Python and communicates with Kubernetes using the official API.  It is very hackable too — the code provides a native Python tier, allowing developers to embed the graph executor directly.  It also provides a thin, lightweight WebSocket tier which allows the development of alternative sub-systems from other programming languages. Finally, it offers a simple HTTP layer that provides full access to user, job and pipeline management, all defined. We also have Python client code for HTTP and WebSockets to make integration straightforward at any level.

We have found numerous use cases for the Odin framework across Interactions. Initially, we proved out the framework building natural language processing (NLP) models for our social customer care product and for digital Intelligent Virtual Assistants. The framework is now applied more broadly to automation and tuning of the automatic speech recognition (ASR) and natural language understanding (NLU) models that power our Intelligent Virtual Assistant. We have also found the framework useful for other application areas such as conversation analytics where we use it to orchestrate the numerous stages of data transformation needed to process spoken or textual conversations and provide detailed analytics of dialog interaction patterns. Odin plays a key role in enabling our autoML and hyper-parameter optimization jobs, large-scale pretraining of Transformers and related models across many CPU and GPU resources, and continuous deployment of models.

Outside of Interactions, we see potential application to a variety of tasks in machine learning. There is nothing specific in the framework to the speech and text processing tasks we have applied it to. There are potential applications to image and document processing tasks and more broadly outside machine learning use cases to other processing workflows that also involve orchestration of multiple different processing steps and optimizing utilization of available resources. To this end we are contributing Odin to the open source community under the Apache Public License, you can access the Odin codebase and documentation here

Want to learn more? Let’s talk.