Most production failures don’t strike without warning; they leave behind weak signals that teams learn to ignore. In this talk, I’ll share how operating large cloud-native systems moved beyond dashboards, alert tuning, and postmortems to identify and act on those signals before incidents escalated to major severity. Using real production examples, I’ll walk through how logs, metrics, and traces can be combined with simple machine-learning techniques to surface early indicators of failure. This isn’t about replacing engineers with AI, it’s about evolving Ops practices to reduce alert fatigue, improve operational intuition, and give teams more quiet nights on call.
Adithya Jakkaraju is a Technical Architect at HCLTech and Founder of Fit Choice World, with prior experience leading engineering initiatives at USAA and consulting for Boston University. He brings a blend of enterprise technical leadership and entrepreneurial execution across hybrid and on-site environments.