The Technical Excellence Award

The Segmentation Bottleneck in Oncology Research, Solved.

Data Scientist

Yanis Emeriau

Malt Profile

TLDR

AI-Automated Tumor Segmentation for Oncology ResearchEliminating a critical manual bottleneck at Europe's leading cancer center (Gustave Roussy) to unlock large-scale, life-saving studies.

The Problem: Researchers were forced to manually trace tumors in 3D CT scans slice-by-slice. This highly time-consuming process made large-scale oncology studies practically impossible and delayed scientific progress.
The Solution: Designed and deployed a fully automated 3D deep learning pipeline for volumetric CT data. Integrated directly into the center's infrastructure, it does in seconds what used to take hours, with zero manual intervention.
The Result: Validated across thousands of scans in active production, the system completely removed the annotation bottleneck. It made previously unthinkable, massive-scale cancer research executable.

Project Introduction

Behind every CT scan in an oncology study, there is a patient.
And before researchers at Gustave Roussy could learn anything from that scan, someone had to sit down and manually trace the tumor, slice by slice, in three dimensions.
Hours of precise, skilled work. Per case. Multiplied by thousands.
Not because it was the right way to do it. Because there was no other way.
That kind of bottleneck doesn't just slow down research. It quietly limits what questions can even be asked.
Large-scale studies become unfeasible. Patterns that might exist across thousands of patients stay invisible. The science waits.
I was brought in to change that.
I designed and deployed an automated tumor segmentation pipeline using 3D deep learning, built for volumetric CT data and integrated directly into the center's research infrastructure.
The system does in seconds what previously took hours, with no manual intervention, at any scale.
The result wasn't just faster workflows. It was a different kind of research becoming possible. Studies that required annotating thousands of scans, previously unthinkable, became executable.
The constraint was gone.
This was production work, validated on several thousand CT scans, used in active research at Europe's leading cancer center.
And behind every one of those scans, there was a patient. They will maybe never know this pipeline exists. But the research that might one day change their prognosis is no longer held back by the hours it used to take to begin.

‍

What client problem does this project solve?

Modern oncology research runs on data.

Imaging data in particular holds enormous potential, hidden patterns in CT scans that could predict treatment response, identify new biomarkers, or feed predictive models that help clinicians make better decisions.

But before any of that analysis can begin, there is a mandatory first step: the tumor must be segmented. Precisely delineated, in three dimensions, in every scan.

Image analysis can't run without it. Biomarker extraction can't happen without it. Predictive modeling can't be trained without it.

At Gustave Roussy, that step was manual. A clinician or trained researcher had to trace the tumor boundary slice by slice on every CT scan, a process taking several hours per case.

The center had the imaging data, the scientific expertise, and the research ambitions.

What it didn't have was a way to prepare that data at scale.

The result was a hard ceiling on research. Studies requiring hundreds or thousands of annotated scans were simply not executable. Entire lines of investigation were blocked not by scientific limitations, but by the time cost of a preparatory step that had to happen before the real work could start.

That is the problem this project solved. Not a workflow inconvenience, a structural barrier between raw imaging data and the research it could enable.

‍

AI Solution Implemented (technical details)

The core of the solution is a 3D UNet architecture, a deep learning model designed specifically for volumetric image segmentation.

Rather than building from scratch, I fine-tuned a pre-trained model, which allowed the network to leverage existing learned representations while adapting precisely to the characteristics of the center's CT data.

A significant part of the work happened before the model itself: preprocessing. Raw CT scans are not model-ready. They vary in resolution, orientation, intensity range, and acquisition protocol.

I designed and implemented the full preprocessing pipeline : resampling, normalization, and spatial standardization, to ensure the model received consistent, well-structured volumetric input regardless of scan origin.

The model was trained and validated on several thousand CT scans, achieving a Dice score above 0.9. In medical image segmentation, a Dice score measures the overlap between the model's output and the ground truth annotation. Above 0.9 is considered high-quality, clinically relevant performance.

The output of the pipeline integrates directly into downstream research workflows.

Once a scan is processed, the resulting segmentation mask is immediately usable for radiomics analysis, biomarker extraction, or predictive modeling, no manual correction required in the standard case.

The system was built in PyTorch and designed for reliability and reproducibility at scale, not for a one-time experiment.

The entire pipeline, from raw scan to validated segmentation, runs in seconds to minutes per case.

What are the quantifiable results (ROI, KPIs, etc.) of this project?

Before the pipeline, segmenting a single CT scan required 30 minutes to one hour of manual work.

Processing a cohort of 100 patients meant roughly 40 hours of annotation before any research could begin.

After deployment: 5 seconds per scan. A 40-hour manual process now runs in minutes.

Segmentation accuracy, measured by Dice score, exceeded 0.9, confirming that the speed gain came with no loss in quality.

The pipeline performs at a level the research team considers clinically relevant.

The downstream impact was direct. Studies requiring large annotated datasets, previously unfeasible, became executable.

Radiomics analysis, biomarker extraction, and predictive modeling could finally run at the scale the science required. The team could work with their full imaging dataset rather than subsets constrained by annotation time.

Validated on several thousand real CT scans. In production. At Europe's leading cancer center.

Proof of excellence: why should you win this award?

There are many ways to measure the value of an AI project. Speed gains. Cost reduction. Efficiency metrics.

This project has all of those. But it also has something harder to quantify.

Every patient facing cancer is already fighting the hardest battle of their life.

What they need, what they deserve, is for the science working on their behalf to move as fast as it possibly can.

To ask every question the data allows. To leave no insight on the table because the tools weren't there.

Manual segmentation was leaving insights on the table. There is now a pipeline that isn't.

Every study this unlocks is a study that might one day matter to someone sitting in a waiting room, hoping the science has caught up with their disease.

That is what this project is about. The metrics are real. The stakes could not be higher.

Previous project

Vote for this project

Next project