Customer message
|
v
Extraction (regex or Haiku) --> 12 features
|
+---> routing_model trained on routing dataset --> routage.service
|
+---> urgency_model trained on urgency dataset --> routage.urgence
|
+---> sub_routing_model trained on sub_routing dataset --> routage.sous_categorie
|
+---> churn_model trained on churn dataset --> risque_client.churn
|
v
Combined --> priorite + actions_suggerees + trust_score + xai
Each model reads the same 12 features but listens to different signals. Routing cares about intention and keywords. Urgency cares about frustration and escalade. Churn cares about menace and emotion. The datasets teach each model what to focus on.
| Feature | Type | Routing | Urgency | SubCat | Churn |
|---|---|---|---|---|---|
intention_primaire | categorical | primary | moderate | primary | moderate |
mots_cles | text | primary | moderate | primary | strong |
frustration | float | weak | primary | weak | primary |
menace_churn | boolean | weak | primary | weak | primary |
escalade | string | weak | strong | weak | strong |
retard_jours | integer | moderate | strong | moderate | moderate |
urgence_percue | categorical | weak | strong | weak | moderate |
intention_secondaire | categorical | moderate | weak | moderate | weak |
ref_commande | string | weak | weak | weak | weak |
produit_mentionne | string | weak | weak | moderate | weak |
canal | categorical | weak | weak | weak | weak |
date_livraison_prevue | string | moderate | weak | weak | weak |
Snake doesn't use feature importance — it builds SAT clauses. But the training data encodes which features carry signal via the label-feature correlations. The table above reflects the synthetic data design: routing labels correlate with intention/keywords, urgency labels correlate with frustration/menace.
Routes to the right department. 1000 records, 5 balanced classes (~20% each).
| Class | Count | Distribution | Per-class AUROC |
|---|---|---|---|
| SAV | 206 | 20.6% | 0.998 |
| Technique | 205 | 20.5% | 0.985 |
| Comptabilite | 202 | 20.2% | 0.988 |
| Commercial | 194 | 19.4% | 0.981 |
| Logistique | 193 | 19.3% | 0.999 |
Product impact: Drives routage.service — which department handles this request. A wrong routing = client waits, gets bounced between departments, frustration rises. At 0.990 AUROC, misrouting is rare.
Real data potential: Every resolved ticket has a department tag. 500 labeled tickets would build a production-grade model. Current synthetic model is already near-ceiling — real data would refine edge cases (multi-department requests).
Classifies how fast to act. 1000 records, 4 imbalanced classes. Critique is the rare-but-critical class (11.4%).
| Class | Count | Distribution | Per-class AUROC |
|---|---|---|---|
| Normale | 357 | 35.7% | 0.972 |
| Haute | 299 | 29.9% | 0.962 |
| Basse | 230 | 23.0% | 0.993 |
| Critique | 114 | 11.4% | 0.988 |
Product impact: Drives routage.urgence and routage.priorite. A missed Critique = chantier bloque, client escalates, churn risk materializes. The model feeds the trust score via model_agreement (urgence + churn alignment).
Real data potential: Use time-to-first-response or SLA tier as proxy label. 800 tickets minimum. The Haute/Normale boundary is where real data helps most — agents themselves disagree on this.
Classifies the specific problem type. 1000 records, 6 imbalanced classes. "Autre" catch-all dominates at 41%.
| Class | Count | Distribution | Per-class AUROC |
|---|---|---|---|
| Autre | 410 | 41.0% | 0.700 |
| Erreur produit | 191 | 19.1% | 0.839 |
| Facturation | 132 | 13.2% | 0.965 |
| Retard livraison | 117 | 11.7% | 0.964 |
| Casse transport | 91 | 9.1% | 0.814 |
| Info stock | 59 | 5.9% | 0.837 |
Product impact: Drives routage.sous_categorie. At 54% accuracy, this is advisory, not actionable. The model correctly ranks sub-categories (AUROC 0.853) but can't commit to a hard prediction reliably. Facturation and Retard livraison are sharp (0.96+); Autre and Casse transport are blurry.
Real data potential: Highest priority for real data. Real messages distinguish "verre arrive casse" from "commande toujours pas recue" trivially — the synthetic generator can't. Expected jump: 0.853 → 0.92+ AUROC, 54% → 80%+ accuracy. 1500 labeled tickets minimum.
Detects customer flight risk. 1000 records, 3 classes. The middle class (Risque modere) is the hard one.
| Class | Count | Distribution | Per-class AUROC |
|---|---|---|---|
| Pas de risque | 449 | 44.9% | 0.934 |
| Risque modere | 297 | 29.7% | 0.859 |
| Risque eleve | 254 | 25.4% | 0.911 |
Product impact: Drives risque_client.churn, risque_client.facteurs, and escalation actions. A "Risque eleve" triggers: immediate callback, commercial gesture, management alert. This is where revenue is protected. Also feeds the trust score via model_agreement (aligned urgence + churn = higher trust).
Real data potential: Ground truth = customers who actually left (6-12 month lookback). 600 labeled records minimum. The payoff: Risque modere (0.859 AUROC) would sharpen with subtle signals — order frequency decline, tone shift across multiple tickets, payment cycle changes.
The 4 models don't run in isolation — their outputs combine to produce the final response:
priorite is computed from urgence + churn predictions. Critique OR Risque eleve → P1. Haute OR Risque modere → P2. This means the urgency and churn datasets jointly determine priority. If both agree (trust_score.model_agreement = 100), the priority is confident.
Suggested actions depend on all three: "Rappel immediat par responsable logistique" uses the routing result. "< 2h" uses urgency. "Geste commercial" triggers on churn risk. Wrong routing → wrong department in the action. Wrong urgency → wrong timing.
The trust score's classification_confidence component averages the top probability across all 4 models. If routing is 99% confident but sub_routing is 50/50, confidence drops. model_agreement checks urgence vs churn alignment — datasets that produce contradictory signals lower trust.
The audit trail cites each model's decision with the features that drove it. routing_audit references intention + keywords (from extraction). churn_audit references menace + frustration. Better datasets → more precise audits → more trust from the human operator.
| Priority | Dataset | Why | Minimum | Impact |
|---|---|---|---|---|
| 1 | sub_routing | 54% accuracy is advisory, not actionable. Real lexical distinctions ("casse" vs "retard") would unlock 80%+. | 1500 tickets | AUROC 0.85 → 0.92+ |
| 2 | churn | Risque modere (0.859 AUROC) is the weak link. Real churn labels (customers who actually left) would sharpen the boundary. | 600 tickets | AUROC 0.90 → 0.94+ |
| 3 | urgency | Haute/Normale boundary is noisy. Real SLA data would clarify. | 800 tickets | AUROC 0.98 → 0.98 (calibration) |
| 4 | routing | Already 0.990. Real data for edge cases (multi-dept requests) only. | 500 tickets | AUROC 0.99 → 0.99 (marginal) |
All 4 datasets share the same 12 features. Only the label column changes.
{
"intention_primaire": "reclamation_livraison",
"intention_secondaire": "menace_depart",
"ref_commande": "CMD-2026-1847",
"produit_mentionne": "feuillete 44.2",
"date_livraison_prevue": "2026-04-03",
"retard_jours": 7,
"frustration": 0.92,
"urgence_percue": "critique",
"menace_churn": 1,
"escalade": "troisieme contact",
"mots_cles": "attends bloque poseurs concurrent troisieme",
"canal": "portail",
"label": "Logistique" // routing dataset
"label": "Critique" // urgency dataset
"label": "Retard livraison" // sub_routing dataset
"label": "Risque eleve" // churn dataset
}
To add real data: take a resolved ticket, run it through extraction (regex or Haiku), add the label from your ticket system, append to the NDJSON. Retrain with python3 -m requestclassifier.train_models.