Why is AI so difficult to scale?
Converting an exploratory model into an enterprise capability is called scaling, and it can be remarkably difficult with AI. The simplest approach might be spinning up copies of the model in the cloud, but this sort of system can be easy to break and expensive to run, especially when each copy requires significant computing resources. Solving these types of problems when scaling across the enterprise will be an important next step toward realizing the potential of AI/ML. It’s an active area of research at Leidos, where data scientists like Tifani O'Brien are solving scalability problems every day. To learn more we welcome Tifani, who is expert in scaling AI for the intelligence community in particular.
Q: Take us inside the process of scaling AI. What’s the big challenge our customers are facing?
O'Brien: When a data scientist creates a model on their own computer, running on a set of data collected on a local hard drive, it works well when you need answers to a specific, time-bounded question. Model training happens once, and the results apply to the data already collected. However, if an enterprise wants to apply this model to the data they collect going forward, and improve the model as more data is collected, you need a way to scale your access to data and computing resources without breaking the bank.
, Chief AI EngineerIf you want to apply AI models across the enterprise, you need a way to scale your access to data and computing resources without breaking the bank.
Let’s say you want to do a search across all your data including text, video, and audio. Machine learning models can extract audio speech from a video and convert it to text, and even do machine learning translation of all the content that is available in one language. Your data scientists have demonstrated they can do this on a month’s worth of data already collected, and now you want to scale this up to do the same processing to all your historical data and all the new data streaming into your enterprise so your researchers can search across time and media type.
Just embedding the model in a browser application limits your users to only the resources available to each of them running it, so it would be very inefficient for them to have to extract and translate speech for each search every time. So we choose instead to pre-process all the audio and save the results for later searching. We deploy in the cloud with the ability to spin up as many resources as needed for the current task. This flexibility can come at a high cost though if not done efficiently.
Q: What particular methods make scalability more efficient?
O'Brien: One of the most effective ways we've found to scale AI is to deploy microservices in containers, managed by a container orchestration system. Orchestration makes it easy to spin up additional containers whenever the demand is encountered, but this can quickly turn costly if not throttled correctly. For our purposes, we configured our usage to apply the more expensive processing resources for monthly model retraining, but once we deployed the production model we could dial it back.
Determining the appropriate API design and architecture is another scaling technique. Correctly delineating the boundaries of a service allows it to be flexibly inserted at different points in the processing pipeline, potentially avoiding bottlenecks. Also, locating the API layer inside the containerized microservice itself avoids losing valuable time to repeated container startup, hardware allocation, and session startup activities. Carefully managing the data as it transitions through a pipeline of AI models is another opportunity to reduce the traffic and cost of scaling up. We use a fast cache for input data that multiple models are going to access, and then move that data to cheaper, slower storage.
Q: What are some of the biggest challenges you normally face?
O'Brien: One challenge is filtering out noise and selecting the right model to run on different data based on its type. If you run AI models on the data they are not trained to handle, you spend resources on irrelevant processing that also produces poor results. For example, using a model that was trained to detect firearms in photos on something different, like application icons, can result in many false positives and use expensive GPU computing time.
Another challenge is dealing with data proliferation. We often work with customers who have terabytes or even petabytes of data. Every time you process data using machine learning, you end up creating new artifacts based on that data. Take a situation in which we pull out speech from video. We have the original video file, then we have the audio file, and we also have text files based on the audio–one in a foreign language and one translated to English. And if you add to that extraction of all the still images pulled from the video, you'll add thousands of images that you must run through your processing pipeline again.
Q: What’s your best advice for overcoming these challenges?
O'Brien: To work well, scalable AI depends not only on strong AI capabilities, but also on an understanding of the computational infrastructure and how it handles resource contention, data transport and availability, and service orchestration. Many people say they can build microservices but then are challenged when they try to process truly large collections efficiently. We’ve been successful in scalable AI because we establish careful service boundaries, design analytics to best use the compute resources available, apply knowledge of the domain to customize the processing workflow, and apply the right models to the right data.