1. Introduction — Machine Learing Compilation 0.0.1 documentation
Machine learning applications have undoubtedly become ubiquitous. We get
smart home devices powered by natural language processing and speech
recognition models, computer vision models serve as backbones in
autonomous driving, and recommender systems help us discover new content
as we explore. Observing the rich environments where AI apps run is also
quite fun. Recommender systems are usually deployed on the cloud
platforms by the companies that provide the services. When we talk about
autonomous driving, the natural things that pop up in our heads are
powerful GPUs or specialized computing devices on vehicles. We use
intelligent applications on our phones to recognize flowers in our
garden and how to tend them. An increasing amount of IoT sensors also
come with AI built into those tiny chips. If we drill down deeper into
those environments, there are an even greater amount of diversities
involved. Even for environments that belong to the same
category(e.g. cloud), there are questions about the hardware(ARM or
x86), operation system, container execution environment, runtime library
variants, or the kind of accelerators involved. Quite some heavy
liftings are needed to bring a smart machine learning model from the
development phase to these production environments. Even for the
environments that we are most familiar with (e.g. on GPUs), extending
machine learning models to use a non-standard set of operations would
involve a good amount of engineering. Many of the above examples are
related to machine learning inference — the process of making
predictions after obtaining model weights. We also start to see an
important trend of deploying training processes themselves onto
different environments. These applications come from the need to keep
model updates local to users’ devices for privacy protection reasons or
scaling the learning of models onto a distributed cluster of nodes. The
different modeling choices and inference/training scenarios add even
more complexity to the productionisation of machine learning.