Training Data

4 datasets · 4000 records · 12 features · 10 training rounds
How each dataset shapes what /comprendre returns

The pipeline — data to product

Customer message
    |
    v
Extraction (regex or Haiku) --> 12 features
    |
    +---> routing_model trained on routing dataset --> routage.service
    |
    +---> urgency_model trained on urgency dataset --> routage.urgence
    |
    +---> sub_routing_model trained on sub_routing dataset --> routage.sous_categorie
    |
    +---> churn_model trained on churn dataset --> risque_client.churn
    |
    v
Combined --> priorite + actions_suggerees + trust_score + xai

Each model reads the same 12 features but listens to different signals. Routing cares about intention and keywords. Urgency cares about frustration and escalade. Churn cares about menace and emotion. The datasets teach each model what to focus on.

The 12 features

FeatureTypeRoutingUrgencySubCatChurn
intention_primairecategoricalprimarymoderateprimarymoderate
mots_clestextprimarymoderateprimarystrong
frustrationfloatweakprimaryweakprimary
menace_churnbooleanweakprimaryweakprimary
escaladestringweakstrongweakstrong
retard_joursintegermoderatestrongmoderatemoderate
urgence_percuecategoricalweakstrongweakmoderate
intention_secondairecategoricalmoderateweakmoderateweak
ref_commandestringweakweakweakweak
produit_mentionnestringweakweakmoderateweak
canalcategoricalweakweakweakweak
date_livraison_prevuestringmoderateweakweakweak

Snake doesn't use feature importance — it builds SAT clauses. But the training data encodes which features carry signal via the label-feature correlations. The table above reflects the synthetic data design: routing labels correlate with intention/keywords, urgency labels correlate with frustration/menace.

The 4 datasets

routing_model strong

0.990

Routes to the right department. 1000 records, 5 balanced classes (~20% each).

ClassCountDistributionPer-class AUROC
SAV206
20.6%
0.998
Technique205
20.5%
0.985
Comptabilite202
20.2%
0.988
Commercial194
19.4%
0.981
Logistique193
19.3%
0.999

Product impact: Drives routage.service — which department handles this request. A wrong routing = client waits, gets bounced between departments, frustration rises. At 0.990 AUROC, misrouting is rare.

Real data potential: Every resolved ticket has a department tag. 500 labeled tickets would build a production-grade model. Current synthetic model is already near-ceiling — real data would refine edge cases (multi-department requests).

urgency_model strong

0.979

Classifies how fast to act. 1000 records, 4 imbalanced classes. Critique is the rare-but-critical class (11.4%).

ClassCountDistributionPer-class AUROC
Normale357
35.7%
0.972
Haute299
29.9%
0.962
Basse230
23.0%
0.993
Critique114
11.4%
0.988

Product impact: Drives routage.urgence and routage.priorite. A missed Critique = chantier bloque, client escalates, churn risk materializes. The model feeds the trust score via model_agreement (urgence + churn alignment).

Real data potential: Use time-to-first-response or SLA tier as proxy label. 800 tickets minimum. The Haute/Normale boundary is where real data helps most — agents themselves disagree on this.

sub_routing_model needs real data

0.853

Classifies the specific problem type. 1000 records, 6 imbalanced classes. "Autre" catch-all dominates at 41%.

ClassCountDistributionPer-class AUROC
Autre410
41.0%
0.700
Erreur produit191
19.1%
0.839
Facturation132
13.2%
0.965
Retard livraison117
11.7%
0.964
Casse transport91
9.1%
0.814
Info stock59
5.9%
0.837

Product impact: Drives routage.sous_categorie. At 54% accuracy, this is advisory, not actionable. The model correctly ranks sub-categories (AUROC 0.853) but can't commit to a hard prediction reliably. Facturation and Retard livraison are sharp (0.96+); Autre and Casse transport are blurry.

Real data potential: Highest priority for real data. Real messages distinguish "verre arrive casse" from "commande toujours pas recue" trivially — the synthetic generator can't. Expected jump: 0.853 → 0.92+ AUROC, 54% → 80%+ accuracy. 1500 labeled tickets minimum.

churn_model strong

0.901

Detects customer flight risk. 1000 records, 3 classes. The middle class (Risque modere) is the hard one.

ClassCountDistributionPer-class AUROC
Pas de risque449
44.9%
0.934
Risque modere297
29.7%
0.859
Risque eleve254
25.4%
0.911

Product impact: Drives risque_client.churn, risque_client.facteurs, and escalation actions. A "Risque eleve" triggers: immediate callback, commercial gesture, management alert. This is where revenue is protected. Also feeds the trust score via model_agreement (aligned urgence + churn = higher trust).

Real data potential: Ground truth = customers who actually left (6-12 month lookback). 600 labeled records minimum. The payoff: Risque modere (0.859 AUROC) would sharpen with subtle signals — order frequency decline, tone shift across multiple tickets, payment cycle changes.

How datasets interact in the product

The 4 models don't run in isolation — their outputs combine to produce the final response:

Priority = f(urgence, churn)

priorite is computed from urgence + churn predictions. Critique OR Risque eleve → P1. Haute OR Risque modere → P2. This means the urgency and churn datasets jointly determine priority. If both agree (trust_score.model_agreement = 100), the priority is confident.

Actions = f(service, urgence, churn)

Suggested actions depend on all three: "Rappel immediat par responsable logistique" uses the routing result. "< 2h" uses urgency. "Geste commercial" triggers on churn risk. Wrong routing → wrong department in the action. Wrong urgency → wrong timing.

Trust = f(all 4 models)

The trust score's classification_confidence component averages the top probability across all 4 models. If routing is 99% confident but sub_routing is 50/50, confidence drops. model_agreement checks urgence vs churn alignment — datasets that produce contradictory signals lower trust.

XAI = f(extraction + all models)

The audit trail cites each model's decision with the features that drove it. routing_audit references intention + keywords (from extraction). churn_audit references menace + frustration. Better datasets → more precise audits → more trust from the human operator.

What to label next

PriorityDatasetWhyMinimumImpact
1sub_routing54% accuracy is advisory, not actionable. Real lexical distinctions ("casse" vs "retard") would unlock 80%+.1500 ticketsAUROC 0.85 → 0.92+
2churnRisque modere (0.859 AUROC) is the weak link. Real churn labels (customers who actually left) would sharpen the boundary.600 ticketsAUROC 0.90 → 0.94+
3urgencyHaute/Normale boundary is noisy. Real SLA data would clarify.800 ticketsAUROC 0.98 → 0.98 (calibration)
4routingAlready 0.990. Real data for edge cases (multi-dept requests) only.500 ticketsAUROC 0.99 → 0.99 (marginal)

One NDJSON record

All 4 datasets share the same 12 features. Only the label column changes.

{
  "intention_primaire": "reclamation_livraison",
  "intention_secondaire": "menace_depart",
  "ref_commande": "CMD-2026-1847",
  "produit_mentionne": "feuillete 44.2",
  "date_livraison_prevue": "2026-04-03",
  "retard_jours": 7,
  "frustration": 0.92,
  "urgence_percue": "critique",
  "menace_churn": 1,
  "escalade": "troisieme contact",
  "mots_cles": "attends bloque poseurs concurrent troisieme",
  "canal": "portail",
  "label": "Logistique"           // routing dataset
  "label": "Critique"              // urgency dataset
  "label": "Retard livraison"      // sub_routing dataset
  "label": "Risque eleve"          // churn dataset
}

To add real data: take a resolved ticket, run it through extraction (regex or Haiku), add the label from your ticket system, append to the NDJSON. Retrain with python3 -m requestclassifier.train_models.