Thursday, December 4, 2025

Drift Detection across Distinct Reviews Datasets

 Model Drift leads to invalid results from AI/ ML inference models in production. There could be various causes of Drift such as conceptual drift, structural changes and ingestion pipeline issues with upstream data sources, domain change, prompt injections and other model exploits, etc. These lead to the AI/ ML model that was trained on a certain kind(s) of data having to run inferences on completely different drifted data which causes to wrong/ incorrect results. So Drift detection (periodical, near real-time, etc) is crucial for any productionized model. 

As mentioned previously Evidently is a handy library to do drift detection. Evidently has features like Metrics, Descriptors, Eval, etc that can be plugged in to detect drift in the current data vis-a-vis a reference baseline data (~training data).

In the DriftTextReviews.py Drift detection is done for an existing Text Classification model in PyTorch originally trained on an Imdb Movie's review dataset.  For Reference data a sample of the same Imdb Movie data is used. For Current, data from a completely different domain of Code Reviews is used. As expected, there was significant drift detected for these two datasets that belong to two completely different domains. Evidently reports below make the drift evidently clear!

  • The characteristic words have changed across the two domains. While the movie domain includes words like frame, character, minutes, etc, the coding domain has words like readable, test, method, etc. 
  • In terms of Length of review text, Imdb reviews are much much longer and include many more words than the Code reviews. These word length and count features hooked in as Descriptors are duly detected and shown in the reports.
  • Interestingly, the Label either Positive (1) or Negative (0) shows no Drift. Across both datasets equal no of the two classes Positive & Negative is seen.

 









 



 

 

 

 

 


Fig 1: Drift Review Length & Word Count

Fig 2: No Drift in Label

Fig 3: Characteristic Words - CurrentFig 4: Characteristic Words - Reference

No comments:

Post a Comment