The Technical Excellence Award

The forest

PhD | Data Analyst | Neuroscientifique

Valentin Ritou

TLDR

The Forest: Multi-Modal AI for High-Precision Data Curation Scaling scientific data validation by mirroring expert decision-making through a "Committee of Experts" AI architecture.

  • The Challenge: In neuroscience, automated spike sorting still requires experts to manually "quality control" thousands of neural units. This subjective, slow, and non-reproducible step creates a massive bottleneck for scaling large-scale brain studies.
  • The Solution: Built a modular AI assistant that evaluates neural data using a family of specialized CNNs—each focused on a different modality (waveforms, ISI distributions, etc.). These "experts" are unified by a Random Forest meta-model that weights their predictions to output a calibrated probability of data quality, flagging ambiguous cases for human review.
  • The Result: Transformed manual curation from a subjective bottleneck into a scalable, reproducible workflow with a clear audit trail. While built for neuroscience, the architecture generalizes to any high-stakes validation task—from clinical signal triage to industrial inspection—where algorithms propose and experts must dispose.

Project Introduction

Manual curation is the hidden tax of modern data science. In neuroscience spike sorting, automated algorithms (e.g., Kilosort/Phy workflows) can detect candidate neurons, but final quality control still relies on experts visually inspecting waveforms, ISI violations, and correlograms—often thousands of units per dataset. This step is slow, subjective, hard to reproduce, and a major bottleneck to scaling studies.This project, The Forest, is a practical AI assistant for manual curation. Instead of replacing experts with a single “black box,” it mirrors how experts decide: by cross-checking multiple complementary views of the same unit. We trained a family of specialized CNN models—each focused on one modality (waveform shape, ISI distributions, cross-/auto-correlograms)—plus a multi-modal model. Their predictions are then combined by a lightweight meta-model (a Random Forest), which learns how to weight each “expert” depending on the unit’s signature and uncertainty. The output is a calibrated probability of “good unit” vs “noise/artifact,” plus the ability to flag ambiguous cases for human review.The result is faster, more consistent curation, with explicit uncertainty and a clear audit trail of which evidence drove decisions. Beyond spike sorting, the same architecture generalizes to any pipeline where algorithms produce candidates but humans must validate them—single-cell QC, imaging segmentation QC, artifact rejection in EEG/MEG, clinical signal triage, or industrial inspection. The Forest turns manual curation from a subjective bottleneck into a scalable, reproducible, and measurable component of the data workflow.

What client problem does this project solve?

Organizations can now generate massive datasets, but they still can’t trust them without a slow, manual “last mile” of curation. That last mile is the client’s pain.In spike sorting, automated pipelines (e.g., Kilosort/Phy) produce candidate neuronal units, but experts must manually validate each unit by inspecting multiple evidence views (waveforms, ISI violations, correlograms). At Neuropixels scale, that becomes thousands of subjective decisions per session—hours to days of specialized work. The consequences are consistent across clients and labs:Throughput bottleneck: manual review becomes the rate-limiting step for analysis, iteration, and publication.Inconsistency and bias: decisions drift across curators, days, and sites (fatigue, differing thresholds), undermining reproducibility.Poor auditability: it’s hard to quantify confidence, document criteria, or justify why borderline items were kept/rejected.This project solves that by transforming curation into a scalable, standardized, and measurable step. It outputs calibrated probabilities and flags ambiguous cases for targeted human review, reducing workload while preserving expert control. Importantly, it combines multiple “views” of each item—mirroring how experts reason—so decisions are more robust than a single black-box classifier.The same client problem appears far beyond neuroscience: whenever automation produces candidates but humans must verify quality and biological meaning. Examples include single-cell and bulk RNA-seq QC, variant calling and annotation triage, multi-omics integration, biomarker discovery pipelines (filtering artefacts vs true signals), medical imaging segmentation QC, and clinical signal artifact rejection (EEG/ECG). In all these domains, the pain is identical: manual curation is slow, subjective, and hard to reproduce. The solution is a general “curation assistant” architecture that scales expert judgment while making it consistent and auditable.

AI Solution Implemented (technical details)

The solution is a stacked ensemble (“The Forest”) designed for tasks where humans curate outputs from automated pipelines.Inputs (per item/unit): we extract multiple complementary representations that experts typically use for quality control. In spike sorting these include (1) waveform/template shape features, (2) inter-spike interval (ISI) distributions and refractory-period violations, and (3) auto-/cross-correlograms capturing contamination and bursting structure. Each representation is standardized into fixed-size tensors so models can operate consistently across recordings.Base models (specialist CNNs): instead of one monolithic network, we train several compact convolutional models, each specialized on a single evidence modality (waveform-only, ISI-only, correlogram-only), plus an optional multi-modal model that learns joint interactions. This “mixture of specialists” improves robustness: if one modality is noisy or atypical, others still contribute reliable signal.Meta-model (the ensemble): the base models output calibrated probabilities. These probabilities are concatenated into a low-dimensional feature vector and fed to a lightweight Random Forest (the “Forest”) that learns how to weight each specialist depending on unit type and uncertainty. To prevent optimistic bias, the meta-model is trained on out-of-fold base predictions (stacking with cross-validation), ensuring the ensemble learns from genuinely unseen predictions.Outputs: for each unit, the system returns (1) final probability and label, (2) disagreement/uncertainty signals (e.g., variance across specialists), and (3) priority flags for human review of borderline cases. The result is faster, more consistent curation with an auditable decision trail, while keeping humans in the loop for edge cases.

What are the quantifiable results (ROI, KPIs, etc.) of this project?

Before deployment, manual spike-sorting curation required roughly one full working day per animal to review ~1,000 units in Phy, making curation the rate-limiting step for large-scale recordings. With the AI-assisted workflow (probability scoring + borderline flagging), the same volume can be triaged in approximately 30 minutes per animal, shifting human effort from exhaustive review to targeted validation of ambiguous cases.Key outcomes:Curation time: ~1 day/animal → ~30 minutes/animal for ~1,000 units, i.e. a dramatic reduction in hands-on curation time per dataset.Throughput: enables curating multiple animals per day rather than one, accelerating analysis cycles and time-to-results.Reproducibility: curator disagreement reduced by 96%, substantially improving consistency across operators and sessions—one of the major sources of irreproducibility in manual QC.Operational impact: the time savings translate directly into recovered expert hours that can be reinvested in experiment design, interpretation, and downstream analyses rather than repetitive QC.These KPIs demonstrate both clear ROI and a concrete quality improvement, which are the two most critical success criteria for any manual-curation bottleneck—whether in spike sorting or extendable domains such as single-cell QC, variant triage, and biomarker discovery pipelines.

Proof of excellence: why should you win this award?

This project deserves to win because it solves a real, high-impact bottleneck with a solution that is both technically sound and immediately useful.First, it targets the “last mile” problem that quietly limits many scientific and industrial pipelines: manual curation. In spike sorting, that last mile is expensive, subjective, and hard to reproduce. We turned it into something scalable and auditable—without removing the expert from the loop. The result is not a marginal improvement: we reduced curation from ~one day per animal (~1,000 units) to ~30 minutes, and cut curator disagreement by 96%.Second, the approach is excellent engineering, not hype. Instead of a single opaque model, the system uses specialist neural networks trained on complementary evidence views and a stacked meta-model to combine them. This mirrors how experts reason (cross-validating waveform, ISI, and correlograms), provides uncertainty signals, and supports responsible triage—auto-accepting only high-confidence items while flagging ambiguous cases for review. It’s designed to be robust, maintainable, and extensible.Finally, it generalizes beyond neuroscience. The same architecture applies anywhere automated pipelines still need human validation: single-cell QC, variant interpretation, multi-omics filtering, biomarker discovery triage, imaging segmentation QC, and clinical signal artefact rejection. In other words, this is not a one-off model—it’s a reusable “curation engine” that converts expert judgment into a faster, more consistent, measurable workflow.We should win because we delivered outsized ROI, improved reproducibility, and built a scalable pattern that can uplift multiple data-heavy fields—not just one dataset or one lab.