What You Need to Know About MLOps

Posted by Tirthankar RayChaudhuri on Jan 15,2024

Most heavy-duty industrial operations today have monitoring systems in place which gather data records (logs) of the daily operational cycle. Such data is stored, archived for a while and then  destroyed. In case of an operational incident the team engaged to resolve the issue will need to refer to these monitoring logs in their efforts to identify the root cause of the incident.

More recently the popular field of MLOps has emerged where AI/ML agents are tasked with analyzing without human intervention the operational monitoring logs of malfunctioning systems to identify the root cause of an incident. As per the IBM definition “MLOps enables IT operations teams to respond more quickly, even proactively to slowdowns and outages, with end-to-end visibility and context”.

Within this discussion on MLOps we would like to present to the reader the AI/ML mechanism to proactively identify and even predict a system slowdown or an outage known as “Anomaly Detection”. This is described as follows

AI/ML systems are first trained on time-series data from numerous monitored parameters to recognize the ‘normal’ operational levels of these parameters. Having learned these normal boundary levels of operation such a system is instructed to identify significant variations from such boundaries when provided with a continuous feed of incoming monitoring data. Automated identification of such variations is known as anomaly detection in MLOps. When an MLOps monitoring system detects such an anomaly the incident management team are alerted and investigations commence to discover the root cause of the anomaly so as to proactively prevent an outage or a slowdown. In addition MLOps systems can also be trained on past data from anomalies and outages whose root causes have previously been detected and resolved. Such trained MLOps agents can thereafter be employed to conduct automated root cause analysis of new incidents and thereby accelerate the time of incident resolution. 

Anomaly Detection within MLOps is thus a highly beneficial technique for an Operations Manager to both proactively enhance the uptime and reliability of business systems as well as to have available a means of intelligent and efficient incident data analysis for quickly identifying root causes of system failures. MLOps also includes mechanisms for assessing and scoring risk levels of upcoming planned operational changes to systems based on training a machine learning agent on past data from earlier change records. Such a machine learning agent is thereafter employed to evaluate the risk score of an upcoming planned change to operational parameters.

MLOps systems can also be configured to generate daily reports of operational performance of multiple systems being monitored including health check details, anomalies, incidents, risks of changes, etc. Such reports can be made available by an Operations Manager to executive leaders as needed for detailed tracking and review of the overall performance of various business systems in continuous operation.

Related Posts:

Jan 10,2024

Structuring and Resourcing for a Killer AI Project

In general a Project Manager in Technology is tasked with planning and leading all end-to-end activities of delivering projects for implementing tech

By Tirthankar RayChaudhuri

Nov 13,2023

All the False Stories You Have Been Hearing about AI. Science Fiction and Smart Weapons

A number of negative statements which are misleading in nature about the discipline of Artificial Intelligence have been written and circulated in the

By Tirthankar RayChaudhuri

Dec 12,2023

Do We Need Ethical Considerations for AI?

There is much paranoia and pessimism about AI misconceptions being promulgated in an irresponsible manner by individuals who are ignorant about what A

By Tirthankar RayChaudhuri

Jan 12,2024

Why is Everyone Talking about AI Today?

While researchers have been working on the various disciplines within the board umbrella of AI for over 7 decades, there existed earlier a major road

By Tirthankar RayChaudhuri