AI Infra Stories

REAL stories about REAL ML challenges from REAL companies. In each episode, our hosts Almog Baku and Amit BenDor, will bring you a different story of real-world stories deploying ML models to production. We'll learn from ML practitioners about the challenges that companies are facing, how they solve them, and what the future bears for us. --- Discord community - Twitter - Linkedin - Almog - Amit -

Model Commoditization with Patrick Barker(OneMedical)

Model Commoditization with Patrick Barker(OneMedical)

Wed, 21 Dec 2022 22:41

There's nothing better than pre-trained, open-source models! You can use them to solve a variety of common problems, such as image classification, speech-to-text, or even text generation.

But the problem is - they come in different formats and shapes. From native PyTorch and Tensorflow, Hugging face, or Torch lightning, it can take quite some time to set those up for inference, finetuning, or evaluation.

In this episode, we hosted Patrick Barker from One Medical, who shared his unique solution to the problem, how they shaped it, and how this is all connected to Kubernetes.


Join our Discord community -
Discord community -
Twitter -
Linkedin -

Almog -
Amit -
Patrick Linkedin -

Listen to Episode

Copyright © Almog Baku & Amit BenDor

Read Episode Transcript

Thush O no a am ydy, pan Magda drafn dieith dweud Chiafularikusly Lyonig Uniad y Gainer off Tebl experienced Cymru, fel polfog beth? Tebl... iy prikio'nIMddol supplying Qwag unrop nhw, i ddefo'n barn, зовут M. Hamlo. Huw i'n gweldos. Mae'r pan i weiser i pol связio, yna holl cydebu am dal o yma. Cebba to foodi视 2020 yma Quitek mincediol hwn mae'r affectingion put un treanc yma. Gweird, yn dod region… Mae'r ac yn eich yli cyprird o escapingg. Pe帶eli, tiene Greg bob synch yn gallwchор. Ond crewed yn fabeld cyfuneg gallwchen dyna cosielu gyya'r fawr yw tiech fytig. Felly wedi zyddol iawnarchau ieddwn psychistiad gan dyna, ac yn cyprird o fecheni sy'n anghiadol iawn o bari pros y gall氨ringiwh gெofas â oedd switch tu a'r cyfrungidau lle eu yw hyn ymgy portalag. So, a wedi pblade removed, mae hefyd cyllun cael ´c Po'rherww. hoots afran Bethart short I am a do fair my geez she won gluten bew希 a bwkk qtw Collectio Qtweth letter I am. I am. I am. I took a job in ML ad VMware doing some in ML use cases around NLP as well as building out ML infrastructure so we're building out. In ML platform that their engineers could use internally as well as we could sell outs to all of their connected Bernadis servers. And then I took a job with one medical and one medical I've been building out again in ML platform with the building I'd been a more portable Python centric way than a previously build since I've using the AMO we're just going to use strip I-TON. We're making it really really compositional. I did yeah. Now I'm continuing to work on that as well some kind of end user step around and I'll be a medical as well. That sounds cool. It might be a silly question because I hear so many definitions for ML engineer. So how's your typical day look like? What do you do as an ML engineer? Yeah so as an ML engineer I spend I split up my time we do full stack ML engineering so we own everything from the data to the infrastructure to the actual models so we're you know what medical we're full stack ML engineers the same at VMware. You have to own the whole stack all the way down but day to day that involves we're going to a couple of different projects so we have kind of this platform project going that's been ongoing for about a year and that's to kind of revamp their machine learning platform which is currently ECS based it's pretty painful to work on and we're trying to move that all over the Kubernetes and utilize all the awesome open source tooling we're seeing on the ecosystem right now and we're just trying to make that really easy to consume for people. So part of that is that and then another part is to play just working on it the ML use cases. One of the things that we're kind of one medical is routing patient messages so if you go and type a message to a little medical and say I need a COVID shot or whatever historically we had really long response times because the messages were getting routed to the route people. Now we do in LP on the message that comes in and label it and we basically route it to say a doctor and admin or the appropriate person and we've cut down response times or improved response times by 70% by using an LP much solution. That sounds really interesting Patrick let's jump into the main part of the episode and I'd like you to have to ask what is the challenge that you're going to talk about what's the story behind that. Yeah, so boot at VMware and one medical one of the big problems I've seen is we get a basic task. So a lot of our tasks at these companies are pretty simple we're talking about like multi class image classification multi label text classification these are like pretty known problems right we've been doing them for a while now in a mouth there's a lot of solutions there in the market today. So one of the things I saw both of these teams spending a lot of time on was like okay we've got this basic problem let's just say we're doing multi class text classification and I want to just explore all the models that might work well for this particular type of data. And now this we have replicate AI we have hugging face we have all the tensor flow pie torques libraries there's just tons and tons of open source models that you can utilize for a given problem and for any given distribution of say text data some of these models might work much better than others ran bi-directional else yeah might work better than the transformer on a certain data set. I still think these teams spending a lot of time exploring this what I call namespace. So these aren't like unique models these are just like a bi-directional as a yeah it's like a transformer pre-train on some common tags. This is all known space these are all like open source models that that are out there to consume but the engineers are spending a whole lot of time just trying to connect up to these models right so like I have this multi class text classification problem I am trying to use this hugging face model so now I need to convert my data to a way that hugging face can consume it spin up the hugging face library find a way of training that model using a GPU so there's like a lot of integration that needs to happen with that and now if I switch to pie towards model I have to do that all over again for my given that so I've been spending a lot of time working on a project I call org which aims at subfying this problem it's a little bit like one way AI if you've seen that but it's trusted to find very generic machine learning tasks at the schema level and from there it basically connects everything for you to these common bag ends like hugging face a tensor flow to all this kind of known space we have so you can say for a given task like multi class types of text classification I can now spin up a bunch of models across all these different frame were extreme them in a distributed way to them in a distributed way and evaluate them in a consistent way there's another problem I was saying was evaluation was highly inconsistent across for a part right so if one data scientist picks up a model and evaluates it say intends or flow and then someone else picks up a hugging face model and evaluates it for what is the same task they'll often maybe use different data and they'll use different metrics to evaluate it and it was really hard to compare these likewise we also want to compare them against SAS APIs say AWS comprehend for a tax classification and like how does my model then I've spent time working on actually compare against the SAS funnel that we could be using and given the honest compare I felt it really hard to do that in organizations I found a lot of slipperyness around data side to send them all these presenting things in different ways that might be favorable to the work they've done as a pair still what is truly the best for for me model for this problem so I really want to the product I'm trying to create a little more like an internal Kaggle where it aims to just say this is a competition right and like the SAS solutions in the ecosystem are definitely players here that can compete on our task and you can create your own models and you can also commoditize leverage these commoditized models that the ecosystem towards his task and the last see which one actually with so yeah sorry that was a very long rant but that's that's roughly what I'm after yeah that sounds super interesting and not common I would say but it seems like your solution has a few different layers you talked about the Hagenface style or probably like other hubs like a torch hub and the TensorFlow hub so generally be able to access in this because it's really simple to use through those hubs usually the models but if I'm a data scientist and I'm using PyTorch maybe I'd like to consume it through PyTorch hub so that was the main problem but was exactly the problem there yeah so the exact problem was yeah if I want to use a Hagenface so let's say Kagenface example not what to use a transformer so good Hagenface like we've got this problem multi-class classification I'm gonna grab up some pre-transformer fine-tuned into my task and do it for the job or try to do so that data side to simle needs to like go to the data set that we're using they need a massage it into the Hagenface library because every library takes like different inputs and different ways so that's part of the problem I they need to try on a bunch of different models and like faces a bunch of models to try from they need to do this in a distributed way and they need to train it with GPUs so also like deploying a Hagenface model in a way that I can train it on a GPU is like a whole set of work and a developer spend quite a bit of time doing this okay now how do I like train this Hagenface model on a GPU somewhere and evaluate it in a consistent way maybe send my evaluation back to some sort of model store or an alpha or something like that so I just saw both of them were in one medical I saw at least spending months on these tasks and then even on top of just trying out a single model now you have like on samples and oftentimes on samples prefer better than single models and we've definitely seen that one medical where on samples of various types and perform better than a single model so now you want to be able to actually combine all these open source models that are out there and see how they perform in an ensemble as well and that's a whole bunch work and so all this could be automated and that's the system that org is trying to go towards is to say like all you need to do is define your task which is multi-class text classification in the sense provide me a data set and now we provide the connectors out for you to just use any Hagenface model any torch model any sensor for one without needing to do any about what massaging this into that framework and then you can train them all distributed octupernettings right so you could train out 100 models theoretically at a time all on GPUs all on Kubernetes at the same time with the consistent evaluation and the consistent report coming back to compare these problems so that makes yeah definitely it sounds a standardization all over like it means like tooling and that the tooling of like how to train to be able to utilize everything from the different platforms and also afterwards evaluate everything at the same way so that's yeah yeah sounds like a really good yeah yeah it's been really thought about and it's been interesting the other piece is problem that's been spent fun and that kind of fits into this is one medical we have a real desire our core apps are in Ruby and another report the app is written and go and we have a ton of machine learning problems just like and there's so many machine learning problems and medical data it's unbelievable because most of it is like doctors like handwriting and there's a lot of valuable data that we can find out of that but to get that data we obviously need machine learning there's a strong desire one medical to enable all of our engineers to use them out and this should happen ideally not in Python it should be able to happen in any language so along the lines of standardizing the model tasks right it's like multi-class text classification now this bridge to use this with all the open source models and all the SAS models likewise I should be able to do that from any language so not just Python I should be able to train 100 models of Ruby I should fill the train 100 models and do and this should all be possible by creating this sort of interface bridge between the bottle and the task you're trying to accomplish so you're saying Ruby and you saying go and you haven't said PHP but you said so many languages I don't know maybe someone wants to build a model with PHP or yeah for God's sake JavaScript I definitely want to support JavaScript too but who's using it is it like the data scientist is it software engineer back in the engineer yeah so any of the above like in this world the data scientists creates models right but models are commodities right so they a data scientist if they wanted to create a custom model they could do that but that custom model basically implements this API and so if they create that model that implements this API then any developer can use that API from any language to both train and deploy that model for any given solution and that just becomes one more model you can basically like use for a set of data and then the ultimate goal when I'm trying to work on right now is integrating something like Y logs which is basically going to look at the distribution of data that we're taking at and help you select models based on the distribution of data great God so how does it look like am I going into some UI and clicking hey this is the model I want and this is the code that they should use how is it look like not a UI yet so we haven't gotten that far currently the interface is only by thumb like I said I'm starting to look at turn expand this out to other languages so I've really focused on like the model creation experience to start because obviously we need a bunch of models or my goals pulling all the hugging face models and other replicate models and all the torch up and all that the other part is to make it really easy to create models on Kubernetes which I think right now it's pretty hard but we have a lot of tools in the ecosystem they don't glue together I've kind of another project I call WAPBags which aims at making these tools a lot easier to consume but even with that I think the overheads still pretty hard so right now it's basically a data scientist and MLE goes in and they just implement a Python class and that class can wrap any models so you can wrap like the carousel model and it just simple bets a basic generic type so for being multi class text classification you just have an x and a y the x is of text and the y is class and so your input output always has to follow these standard API types and that's just enforced by the class and then as you iterate on it it just sinks your code like our sinks your code up into Kubernetes so it's just a way of like really easily using Kubernetes without like really as new backend dependencies currently it stores all your artifacts as ACI artifacts so there's just very few dependencies it's super lightweight you can just write code and by a thaw and it'll just sync it up to your Kubernetes cluster and run it up there and it happens pretty doesn't it's just like a spin of a chair and your your model is running up in Kubernetes with GPUs whenever you get there so as a data scientist I'm building my model I'm wrapping my project with this class and then how it goes to Kubernetes I'm a little bit confused here yeah yeah so this is interesting part so obviously there's some other libraries that do this kind of stuff like ray and I use a big fan array but I'm not a fan of all of it so like the ray Kubernetes implementation I think could use some work and part of me starting this project originally it was actually in reaction to ray because I felt the ray model of syncing dependencies and local code just wasn't what I wanted and I thought there's a lot simpler way to do in Kubernetes basically how it works right now is when you go to say hey I want to run this model in Kubernetes sync it up it'll look at your current repot the look at the closest Python project to confine so we're reconfined or requirements to external environment yaml or a poetry project you'll grab up that and it'll create a container based on your environment so what is the Python environment that you are currently executing on in your local machine and then it'll create a container that that replicates that environment up in Kubernetes and then as you iterate on your code it just syncs your k straight up into Kubernetes basically by just like copying your code directly into that container and that basically starts a server and then you're just interacting with that server to like train using web sockets currently I'm certainly debated like using gRPC here whatnot use a lot gRPC in the past but I really want to be able to access this from the browser as well and it's there's gRPC web and stuff like that with this I don't know it doesn't work super well so I just want to use plain old HTTP and web sockets so let's currently out function yeah Patrick do you feel like the framework is more intended to experimentation training or is it more production oriented currently experimentation and training currently the original goal was to just explore all the model spaces fast as possible and to be able to create models in a really consistent way that they can be evaluated in system way very rapidly so like those initial goal the framework was just this rapid iteration of known space and be able to add to that narrowed space and it consistent matter I do want to get to productionizing this stuff and we certainly have a means of taking these models today and running them I but I haven't fleshed that part out to where I would say I'm far out of it or anything yet yeah that sounds awesome by the way I want to share there we're using something a bit similar which is a framework called open source a clear email it reminds me a little bit I think like the maybe some of the x-ray benefit here that it's very native to Kubernetes and like the infrastructure what do you think it's different between clear and malnus yeah there are definitely some similarities there but yeah this runs all open source like Kubernetes there are no bag ends and the recent I've tried to create like an ultra minimalist framework that's still super powerful so I'm taking a lot of inspiration from go-lang which is just let's create the minimal set of things but create a really strong tool chain that's easy to use everything's totally open source there's no servers you don't need the pointy thing you just need Kubernetes in the name of registry that's it there's nothing to maintain it's all what I call like just in time infrastructure so it's just going to spin up what it needs as it needs it and then it's going to destroy it so you don't need to maintain it it's just I saw this is a big kind of barrier to entry for a lot of the ML tools is there so hard to deploy and configure so at this other project a lot back just try and make these things easier to deploy and get bigger so like how to use kser with ML file with prefacts or flight and just connect those things and make them really easy for developers to spin up and utilize so I have another project that does that and trust to make it easier but like I can tell you it's very hard to consume these open source and ML tools and connect them it takes a ton of time so a lot of our gets a reaction to that and just saying can we create something much simpler that gets the job done for 90% of the use cases and maybe it doesn't touch everything if you have certain crazy production requirements like this one word for but at least both of you more and one medical this solves all the problems I've seen with that yeah that sounds super interesting looking at in the in the future of this project what do you see that maybe it's missing for you what would you like to implement next yeah toi I definitely want to have a many UI to make it super friendly and that's a whole world of implementing a UI my wife is a former UI engineer so I'm trying to convince her to write a UI for me otherwise she doesn't feel up for that I'm going to be ready myself but definitely the long-term goal here is to be able to connect every model to every language and so from every major language I should say you should be able to train and production wise every booth open source it's awesome yeah I think this is one of the most ambitious and really cool projects that I heard about in the infrastructure ML infrastructure area another question do you have thoughts about open sourcing this project yeah it's all giving me open source I'm a couple weeks away from open sourcing got the very start of it I was hoping I've been ready by the day but about some hang up slightly couple weeks away I don't think I'm going to open source about the end of the month didn't get the initial one out there it's still pretty bare bones but I think people will hopefully find it useful and just running and trading models in Kubernetes and then it's kind of spanning out from there yeah so we're waiting for that definitely yeah that sounds pretty exciting yeah I think it'll be pretty fun yeah I think it's a pretty interesting idea and that's stuff that got me in it energized right now so I'm hoping to build a little bit of a community around it hopefully and one of the things I've tried to pull out right now and I was chatting in the Kubernetes group that Morgan Eyre part of is trying to separate out this notion of ML tasks so what is multi-class text classification what does that API look like and is there a way we can get the full ML ops ecosystem to agree on an API nothing beyond that nothing in the implementation realm but that people could implement all the way really Kubernetes way right like Kubernetes created this one API that you can use across all the clouds and provide a set of functionality to do all these things can we do a similar thing for machine learning and so far it's a little challenging there's a lot of vendors in this phase and there's a lot of all the vendors can walk you into their own thing and I'd really like to try to break that a Kubernetes is able to do that with the clouds which you never one thought was possible but it was possible and one of you could just create an API that everyone agrees on that this is what this is an API for text classification and if you're a client and you implement that now you have a choice between a bunch of back downs from this thing or could that I'm building could be one of those back ends but there could hopefully be many people could choose to eat a lot oh this is definitely very ambitious yeah pre-embejus yeah yeah so that was really fun and before we finish there is a this question that I prepared all over the week actually since last week I prepared for this question which is surprising question so I'm going to ask you a flash question something really really fun so you should you should answer as fast as you can there is no right or wrong here just your honest truth don't worry we won't tell your friends unless they're going to hear that because it's been reported so what is the first thing you do when you get to the office I get to the office I'm at I sit down in my dad's upstairs they eat a full bar of 95% dark chocolate and that's how I wake up in the mornings and usually some peanut some drink a little tea oh that's very specific yeah every morning yeah that sounds like a good way to start the morning with some energy yeah I can't do coffee for some reason I don't know why I just like coffee really gives me the jitter super bad if I've just never been able to get over it but dark chocolate I just want so you're basically saying oh I'm gonna do science all day long so just you better just sugar me and every sugar I need to be energized or something like that yeah this actually has no sugar it's just pure like dark chocolate like stevia I also do the keto diet so I don't do any sugar so this is just like pure cocoa basically that's cool never heard about chocolate without sugar but yeah that's fun stay tuned with us and thank you very much Patrick that was very fun this is definitely a very new concept and problem that we don't hear that much I'm not talking about the chocolate anymore right I'm talking about it's new thing too yeah thank you very much Patrick was fascinating and I'm going to use for sure the open source as it's going out so you have at least the first user maybe a contributor see that would be awesome yeah thank you also much around me I really appreciate this fun job and thank you for listening to AI infrastructure is Almog did you have a good time yes definitely admit how can all listeners connect with us on LinkedIn you can like our show and invite us to connect wait wait wait what about our discord community yes you are welcome to join our community the link is in the episode description wait wait wait wait wait what about our Twitter yes please subscribe on Twitter to hear when a new episode comes out wait wait wait wait wait wait what about trading uh 15 out of five stars on app and Spotify yes please rate us with a nice five star on Apple podcasts and Spotify see you next episode bye bye