Turing-point

What You Need to Know About MLOps

Posted by Tirthankar RayChaudhuri on Jan 15,2024

Most heavy-duty industrial operations today have monitoring systems in place which gather data records (logs) of the daily operational cycle. Such data is stored, archived for a while and then destroyed. In case of an operational incident the team engaged to resolve the issue will need to refer to these monitoring logs in their efforts to identify the root cause of the incident.

More recently the popular field of MLOps has emerged where AI/ML agents are tasked with analyzing without human intervention the operational monitoring logs of malfunctioning systems to identify the root cause of an incident. As per the IBM definition “MLOps enables IT operations teams to respond more quickly, even proactively to slowdowns and outages, with end-to-end visibility and context”.

Within this discussion on MLOps we would like to present to the reader the AI/ML mechanism to proactively identify and even predict a system slowdown or an outage known as “Anomaly Detection”. This is described as follows

AI/ML systems are first trained on time-series data from numerous monitored parameters to recognize the ‘normal’ operational levels of these parameters. Having learned these normal boundary levels of operation such a system is instructed to identify significant variations from such boundaries when provided with a continuous feed of incoming monitoring data. Automated identification of such variations is known as anomaly detection in MLOps. When an MLOps monitoring system detects such an anomaly the incident management team are alerted and investigations commence to discover the root cause of the anomaly so as to proactively prevent an outage or a slowdown. In addition MLOps systems can also be trained on past data from anomalies and outages whose root causes have previously been detected and resolved. Such trained MLOps agents can thereafter be employed to conduct automated root cause analysis of new incidents and thereby accelerate the time of incident resolution.

Anomaly Detection within MLOps is thus a highly beneficial technique for an Operations Manager to both proactively enhance the uptime and reliability of business systems as well as to have available a means of intelligent and efficient incident data analysis for quickly identifying root causes of system failures. MLOps also includes mechanisms for assessing and scoring risk levels of upcoming planned operational changes to systems based on training a machine learning agent on past data from earlier change records. Such a machine learning agent is thereafter employed to evaluate the risk score of an upcoming planned change to operational parameters.

MLOps systems can also be configured to generate daily reports of operational performance of multiple systems being monitored including health check details, anomalies, incidents, risks of changes, etc. Such reports can be made available by an Operations Manager to executive leaders as needed for detailed tracking and review of the overall performance of various business systems in continuous operation.

Dec 12,2023

Do We Need Ethical Considerations for AI?

There is much paranoia and pessimism about AI misconceptions being promulgated in an irresponsible manner by individuals who are ignorant about what A

By Tirthankar RayChaudhuri

Dec 05,2023

How Did We Come So Far? - Taking a Few Steps Back and Tracing AI's Journey

We are have now well and truly embarked upon the era of Enterprise Machine Intelligence Systems. The AI industry today is already worth half a trilli

By Tirthankar RayChaudhuri

Jan 10,2024

Structuring and Resourcing for a Killer AI Project

In general a Project Manager in Technology is tasked with planning and leading all end-to-end activities of delivering projects for implementing tech

By Tirthankar RayChaudhuri

Jan 16,2024

Linguistics and its Relation to Machine Learning

Text Data - Example The term data (plural of the Latin datum which means a given entity) refers to qualitative parameters and/or quantitative values

By Somsukla Banerjee

What You Need to Know About MLOps

Related Posts:

Do We Need Ethical Considerations for AI?

How Did We Come So Far? - Taking a Few Steps Back and Tracing AI's Journey

Structuring and Resourcing for a Killer AI Project

Linguistics and its Relation to Machine Learning