AI Infra Stories

REAL stories about REAL ML challenges from REAL companies. In each episode, our hosts Almog Baku and Amit BenDor, will bring you a different story of real-world stories deploying ML models to production. We'll learn from ML practitioners about the challenges that companies are facing, how they solve them, and what the future bears for us. --- Discord community - Twitter - Linkedin - Almog - Amit -

Model Monitoring Platform with Simarpal Khira(Intuit)

Model Monitoring Platform with Simarpal Khira(Intuit)

Sun, 23 Oct 2022 19:12

Deploying a model is not the end of the story. Most of the time, it's only the beginning of a new one. Model tracking, data drift, inference performance, and many more challenges are hiding in the journey of rolling a model for users.

In this episode, we hosted Simarpal Khira from Intuit, who shared with us Intuit's story of building their internal Monitoring Platform with us. We learned about their challenge, why they decided to build something, and how people use it internally.


Join our Discord community -

Simarpal Linkedin -

Discord community -

Twitter -

Linkedin -

Almog -

Amit -

Listen to Episode

Copyright © Almog Baku & Amit BenDor

Read Episode Transcript

You only see Bay Eye Info Stories. We're stories behind real-time challenges from real companies. This is a model. I'm a cloud and Emily infrastructure innovator. This is a myth I'm an AI research leader. In each episode, we will host a different guest who will share a story from their journey about the challenge they faced, how they solved it, and what the future holds for us. Let's get started. Hey, so we have Simo Pal here today. Simo Pal Kira is the leading product manager for ML platform products at Intuit. Simo Pal, tell us please, what's winning you to a male infrastructure from the beginning? Sure. Currently, I'm a product manager at Intuit. Working on several areas. I started off with feature management, but recently I've been working on model monitoring and explainability of models. I love building tools and technology for data scientists, machine learning engineers, and whatnot. I really feel like with these tools, all these different types of engineers can really accelerate development of new set of technologies. I always found it very useful. If you build tools, somebody can accelerate a lot of work. Using those tools, people can build another set of more advanced technologies. Right from my background, I started off as an engineer. I was myself an engineer, but I wanted to expand my scope to actually help other engineers. I transitioned to a product role in the development of the company. I started off with a product role for a few years ago. I've been working with different companies in the data management space. I've worked in analytics. I've worked in machine learning startups. Recently, in a FinTech company, I am at Intuit. Working on the ML machine learning platform to help accelerate the development of AI in the company. How is your typical day look like? I'm a product manager for ML platform tools. Sure. As a product manager, one of the most important things for me is to really understand my users. The tools that we are building, I usually connect with different types of personas, who are mainly engineers, like machine learning engineers or data scientists. On a routine basis, I connect with them, try to understand how they use these tools to see if they run into any challenges. What difficulties do they face? How can these tools be further improved? What are the shortcomings? How are these tools really helping them? I would like to understand how things are working for them and how things can further be improved. I'm constantly trying to iterate on making the product better. Then I come back to my whiteboard and then I look at what are the things which are most important for users and prioritize them. I'm a little bit confused. Your users are like the employees of Intuit. Always it like the end users that use Intuit products. My current set of users are mainly internal users, like who are engineers within the company. But these engineers do work on products for the end users. I have to help them to help the end user. But my immediate users are internal users within the company and how to help them be more productive so that they can deliver these products easily and fast to the end user. This is awesome. Intuit is a big organization. There's a lot of machine learning going on. It is so. Yeah, especially like we have different types of products. We have tax-related products. Then we have small business-related products. So all these different types of use cases have their specific benefits that we want to drive for the end users. So like how do we get people to save more on taxes? How do we help small businesses to work more profitably or save money and all those types of things? So machine learning is being used to help these end users to actually, like, monitor economically do better in their life. So these all are driven through machine learning. Awesome. So, Sybar, in this podcast, we focus on real stories, right, from companies, from the rate people, about one challenge that you had in the solution that you found for this specific problem. So I want to hear from you what is the challenge, what is the story that you chose to present to us? Sure. So I wanted to talk to you guys about, you know, the area that I'm focusing right now is around model monitoring. So model monitoring is a very important area, especially because in the last few years, maybe you can save two years, the development of models have really accelerated. For more and more use cases, it is getting used, right? So everybody's jumping on, okay, let's build more models and all that. But what is not being thought about is that what is the effect of these models once they're live in production? Because, I mean, models, as you know, they're they degrade very quickly over time, because because the user behavior could change. And especially with all these events that happen recently, like, COVID and all that, the user behavior has changed significantly, right? So all the models which were deployed before time, they are no longer valid or they don't longer produce good results. But most of the time machine learning engineers and data scientists, they are always moving to the next set of models. So nobody is monitoring the models that were worked on earlier. So what happens is that there are many aspects of problems with that. One problem is the cost could increase suddenly over time for these models. So we have to constantly monitor that the costs are in control. The second thing is around any issues that comes in like meantime to recovery, right? Sometimes models start degrading and it takes hundreds of days or it could be several days before the issue is found that the model has really gone bad. Because nobody was paying attention to that. So what we want to do is that with this model monitoring, we want to proactively tell people that the models are not performing well so that they can take corrective action, the same set of engineers who deployed those, they can come back and make a better model. So that is an important area. So the motivation came from actual things that happened in the history of the company, like when models were deployed a few years ago, there was no monitoring and something happened. Well, yeah, I mean from what I can discuss with you, there have been some instances where the meantime to recovery, right? For issues where models have gone bad could be several days like in hundreds, which is not good, right? Because for 100 days or something, your model is not performing good. And the end user is suffering and the results are not well. So historically, yeah, there could have been instances where models have not performed well for some time and nobody realized it. So what this model monitoring does is that it helps you figure these out, you know, tells you that there is something wrong. So go and take action on that. And because there are alerts that can be set up, people will get alerts if something goes wrong and then they can take an action on it. I can say that in my company as well, we have a problem that we first deployed many, many different models. And we haven't thought so much about what's going to be next. And after a while looking at some dashboard that we had, there was low alerts. We saw some shifts. So yeah, I can say that did something which is super important generally. But I want to ask you also, is it not enough to have monitoring on business metrics or logs or what we do usually like in software or project management? Yeah, that's a good question. So in the traditional software, you know, most of the time we had built like unit tests or those types of things to constantly check whether there is a right set of output delivered for an input, right? But what happens in machine learning space, I think it's not a surprise that, you know, over time data changes. Models are built on data, right? So there is always a data drift happening because let's say in COVID environment, people started spending less money outside going and all that. So because of that, the user changing user behavior, the data that is being used to make predictions is now changing from the data that was used for training. And this could affect your model, right? So in the traditional software, we didn't depend too much on data, but in machine learning, we depend a lot on data. So detecting data drift or any kind of model performance drift becomes very important. And in addition, we also keep check on the operational health of the of the pipeline also, which is being built. So you are checking the SLA is what are the latencies input, output configurations and all that. So that is the operational health, but we have to keep check on the data and the model drift aspects also around the model. So I know that model monitoring is a huge problem, just a few months ago, you really introduced this offensive website with many vendors, many startups that they're working with to solve this problem. So I guess it's also a big problem with your company, but when we are trying to find the personas and who actually suffer from this problem, can you tell us a little bit more about it? Is it only the data scientists? Is it engineers? Is it product manager, business guys? Yeah, no, that's great. I think there are there are a lot many personas involved in this, but at a very fundamental level, the way to look at these personas is you can start up with model developers. So in many companies, model developers could be many different people. So in a very technical company model developers are data scientists and machine learning engineers, but you also have analysts or product managers who are acting as model developers these days. So it depends on who is developing a model. So that is one person persona for them, we need to provide model monitoring to detect any issues with the models and and setting up alerts to take action on that. And then there are other types of persona also like legal and compliance, which is also very important because I mean, there is another area that we are currently working on, which is around explainability of models. So these legal and compliance people want to make sure that there is no fairness or bias issue with models, because if you think about it, we work in the financial services space, where we also issues loans. So it becomes important from a regulatory standpoint as well that the models are not having any issues and not making any bias decisions with specific segment of society. So I mean model monitoring becomes important from that aspect as well. So the other set of persona is the legal and compliance people. So we have to surface information to them that how models are build what is the purpose, what type of data is getting used and what kind of predictions they are producing and they're not impacting any specific segments. So those are the different personas, I would say model developers, compliance and then there are senior executives who also want to take a look at their model overall. So when you try to look for a solution, did you look for one stop one stop shop solution or is it like a sweet talk for dogs, how did you what was what actually you built maybe maybe jump just right ahead to this. Sure. So I mean the our process around building solution is first of all, we we try to look at are there any kind of open source solutions. Right. So if there are open solution, we would like to prefer that right, but then we would like to compare with any kind of vendor solutions available in the market, right, because I mean they could be better than the open source if they're performing well, then we can prefer that. And if either of them are not available, then we try to build our own also right. So so in this case, we used for some of these there, you know, like model monitoring solutions, we try to look at different types of vendors that were available. So we started off with certain looking at certain X open source solutions as well. So you can you can think about these are some open source packages to let's say I'm talking about feature importance like what features are important to a given model. So for that, there are certain open source packages available like model explorer and all that. And then there are some vendor solutions also in the market that we evaluate. But eventually we decide on the basis of which is more economical, which is more, you know, which has more features in it like capabilities. So in this case, what we ended up deciding is that we build like a front layer for the further like data scientist to interact with it. And we hide the vendor details behind the scene so that we can always change the vendors for our users at any time. So we take a look at what is the best vendor available. So for some of the solution we have we have vendors that we use and we abstract it by another layer for our users. So we don't change that layer. So the users are always data scientists are always always interacting with our front layer. And then we have the vendor solution behind the scene. And the kind of solution we have built is where people can use certain set of you can say GitHub based packages so to configure their monitoring. So they can define what data what they want to monitor. They can define what metrics they want to calculate on top of these data. And then we also have another solution to visualize these. So if you think about model monitoring, there are two aspects. One is tracking and the other is visualization. So for visualization also we provide them a vendor solution, which they can, you know, look into different changes that are happening. And then can set up alerts on it and you know, accordingly they get alerts and insights and take actions on that. So so we take a combination of either open source vendor or we build our own. Yeah. So you told us that you use the set of tools behind the scene and you've been like a front solution on top of that. But how is it look like I am a data scientist and I want to introduce a new model or do monitor my existing model. What am I doing? Like am I going to into some like a control panel or console and configuring it. Am I using YAML files? Sure. So so basically we have something called as a developer portal, which is like a centralized solution for everyone at end to it. Right. So for the model building. There is a paved path where people can go and build like you know any kind of models. So when they actually are about to build models, we generate certain like default GitHub repositories to write code. So within that there is a repository created, which is called configuration repository for model monitoring. So in that they can write code to which is like our own data data our own DSL which we provide for people to write their configuration on model monitoring. And in that they can define the data through code. It's all GitHub operated. So they can the code they can provide what data they want to monitor what metrics. So we have certain default metrics, but they can always contribute any kind of new metrics that are missing that they think are important for their specific models. So because we could start of it like let's say we are talking about classification models. So we have precision recall and all those metrics. But let's say tomorrow they come up with some more advanced metrics, which are not available. So they can always contribute those metrics and it becomes part of our library. And then anybody can use it. It's not just that team, but anybody can use it. So they it's all through the code. And after that, you know behind the scene we run these pipelines. So you can think about like AWS based EMR clusters that constantly run, you know analysis on top of these data pipelines to generate those metrics. And we publish them on our visualization tool that I was talking about. And then the users can monitor them constantly. So Samar, is it today for a data scientist, the one to publish a model to Dutch and is it like a blocker? Is it something that they have to do? Yeah, so I mean right now it's not a hard block blocking or you know, they are not totally restricted. But but in future we are thinking of, you know, making this is a hard requirement that every monitor every model has to be monitored. So this all is a is a is a part of something we call as responsible AI. So as bottom responsive lay, all model model needs to be monitored. So right now it's not a hard requirement. It's still a soft requirement that models need to be monitored. But but that can be made part of the whole pipeline release into production requirement that model needs to be monitored. But right now it's not. And another question, when you looked for taking us a bit back when you created this ensemble of different tools, what was the like the main ideas that guided you with choosing those tools, whether some things that were more important or less important. Sure, so I mean, I always think of a guiding point is, you know, trying to first understand what type of users are we dealing with. So I mean, right now our users are as I was saying, like data scientists or machine learning engineers, I feel more and more these people are getting familiar with like Python code or you know using technologies like, you know, dealing with code. So they're very technical. So so I started off, I wanted to start off with something where they are comfortable with. So one of my guiding point is always like thinking of certain set of APIs or some kind of code based solution as a starting point. Right now down the line, I would like to make it little bit more UI based, you know, where you are you can just tell me this is my data source and this is my like we can populate a list of metrics and then they can select from them. But that's still something work we will do in future because in future, we might face a new set of more and more new set of users like product managers or senior executives, right, who are not so much technical savvy. But they can they they know what metrics to monitor right so one of my guiding principle principle is always who are your users right now. So I I like to keep it little bit starting with it, you know, little bit more code based APIs and then slowly transition to the UI based to make it more user friendly for any type of users. So that is always one guiding principle. The other guiding principles are we have to like when we were selecting vendors and all that we have to keep cost in mind. Right, so we always try to see what is the return on investment like how much is going to cost, you know, buying some solution from a given vendors and how much benefits are we going to drive from a business standpoint to our end users. So how much is there a venue coming in. So that's also another construction I keep in mind. Yeah, that's awesome. Another question. How do you see like our data scientists actually implemented using the solution. Are they happy with it? What are your observation. Oh, yeah, so so the way, you know, I mean, there is one way I would like to build that into within the product is like a net promoter score NPS score that's still work in progress. So when people are using these product, we can always prompt them with a with like a flyer saying like, hey, would you recommend this product to others and all this type of things that's like a net promoter score right. So I want to build that experience. It's not still there, but that's one way to get feedback from users, but right now I'm reaching out to different users manually on my own trying to see how they like it or they don't like this product. But one way to find this out is basically, you know, through the net promoter score, then the other, the other aspect to look at is the user engagement like how, how much they are using this on a constant basis. So one thing we want to instrument is like the visualization tools that I was talking about like wait, they say all these monitoring and alerts. We want to instrument them to see like whether people are interacting with it or not. So, so that's another area that I use to actually find out whether the user are engaging with the product or not because it's the monitoring is one time setup like you set it up once and then all you care about is when something goes wrong and just inform me right. This is great to hear how much you invest in users, although they are internal users that's I think the really exception on. So as Samar, we're looking ahead on the future of this project of the whole like area and how you you attempt to it into it. What do you see that is there is what is missing maybe today that you'd like to to focus that well. Yeah, so one area that I find that is missing today is how do we connect like business metrics. So when I say business metrics is the actual end user benefit like let's say this is the amount of money that you save for the end user or this is the amount of time that the end user saves because of this that is a business benefit and then the model benefits are more round like model performance right. So that the data scientists know good they know already lot about the how the model is performing but they don't know about whether this model after it performs like let's say do predictions whether it's actually benefiting the end user or not right because I could make all the really good predictions but at the end of the day this is being used by human right and user. Whether whether that human gets a benefit or not so what I want to do in future is to make this connection between how given model is actually driving the end user benefit. So right now we do not have a way to visualize that right next to like this is the model this is the benefit so I want to make that connection that is one area. The second is around experiment tracking right now if you think about the teams have grown a lot over time and in a given team there are multiple engineers or multiple data scientists and they're working on the same project and then they're doing their same like different types of experiments you know using different settings like you know hyper parameters or what not but there is no central way to track how these experiments are doing for all these different projects right. Because somebody might be using different tools somebody might be using let's say a notebook a Jupyter notebook somebody might be using an IDE like you know I don't know whatever is if you Intelli J or whatever your favorite ID is so but but they might be running in their own computer they might be doing somewhere else so everything needs to be brought together to see like okay these all point to the same project and they need to show that this experiment is better than the other so that they can make comparison between different. So I mean what I want to do in future is also with this whole monitoring you know include the experiment tracking also so that people can see the benefits of these different experiment in one place and can make a decision fast which want to go ahead with so that's another area. Yeah, yeah, that sounds so very fascinating interesting and there is a really need there is another question which might be obvious maybe many people are asking that right now is that are you going to thinking about open source your framework or is it anything that isn't the plants in your minds. Sure, I mean the team is always thinking about that we always keep that in mind right now first before we open source anything we would like to make sure it works really well for our set of users right so that people find value out of it then only we can think about open sourcing anything but we our team is definitely open to you know. You know making it available for the broader community for the benefit of everyone so but right now I would like to focus on internal set of users that they really drive value from it because internally also we have hundreds of users right so first let's see if this product really you know they find value once they find value then we can obviously make it available for everyone broader community yeah. Awesome we're waiting for that yeah that's pretty exciting and now for the question I prepared myself the world week for which is. So we are going to ask you a surprising question haven't prepared for it are you ready yes we're ready all right we're going to randomly choose one question from the bank and you should answer it as fast as you can what is the worst job per you got in your career. I would say maybe I got I got a bag I mean I don't like that bag like for for to get to office I mean they always have what do you mean by bag you should describe it like I want to imagine it like is it like like nylon bag I mean it's a regular bag but they usually have lot of logos on like companies logos and stuff on that I sometimes I don't want to do it. Sometimes I don't want to you know show off like I have work in this company or necessarily that place sometimes I feel like you know the company's name should be hidden it should be just you know plain bags but this bag also the other problems I had that it didn't have too many pockets in it so I couldn't even use it for going for a you know trip or something so I feel like which was more with the Tasman then a bag correct correctly I would like a bag with at least six seven pockets in it so that I can carry. So that I can carry it for a trip for somewhere yeah yeah I must say sometimes you feel like you're a walking advertising to your company or whatever company. Yeah that's right yeah thank you very much Simone that was super interesting and inspiring yeah it was great chatting with you guys. And thank you for listening to AI infrastructure is Almog did you have a good time yes definitely I meet how can all listeners connect with us on LinkedIn you can like our show and invite us to connect. Wait wait wait what about our discord community yes you are welcome to join our community the link is in the episode description. Wait wait wait wait wait what about our Twitter yes please subscribe on Twitter to hear when a new episode comes out. Wait wait wait wait wait wait what about trading us 15 out of five stars on app and Spotify yes please rate us with a nice five star on Apple podcasts and Spotify. See you next episode bye bye.