AxeCrafted Blog

Wandering thoughts from a wandering mind.

EN | PT

Simple Anomaly Detection in Databricks Using Mean and Standard Deviation

Posted on June 23rd, 2025

Data quality and monitoring are becoming critical challenges in modern data platforms. Whether you’re detecting silent failures after a deploy or uncovering unexpected user behavior, being able to spot anomalies quickly can prevent lost revenue, degraded user experience, or compliance risks. Here’s how we tackled this problem using a simple statistical approach combined with generative AI.

In the land of data, regular patterns are comforting companions. When your morning coffee is brewed exactly at 7:30 or when your cat is expecting you to wake up and feed it at precisely 6:00, it's the regularity and predictability that make the experience delightful. But occasionally, these friendly patterns break - a missed coffee alarm or a loud meow announcing you messed up. Welcome to the intriguing world of anomalies.

At our company, Consumidor Positivo, we've had problems with deploys that would unexpectedly affect our systems. We would only find out what happened much later, despite having robust unit testing and integration testing. Even worse, identifying the cause and correlating it with a deploy took considerable time. This led us to think about simple ways to detect when data patterns shift suddenly so we could be alerted and diagnose problems faster.

Since then, we've built a system called ADULA (the name is an internal joke which I can't tell here, sadly - unless we hire you of course) responsible for detecting anomalies in our events data, which also evolved to diagnose them automatically for us (to the best of its efforts).

Read more...

Photo of Leonardo Machado

Leonardo Machado

Some guy from Brazil. Loves his wife, cats, coffee, and data. Often found trying to make sense of numbers or cooking something questionable.