Concierge: SAT-Based Customer Request Routing

Charles Dana · Monce SAS · April 2026
Snake SAT Classifier v5.2.1 · 4 models · 10-round training pipeline

1. The Dana Theorem

Theorem. Any indicator function over a finite discrete domain can be encoded as a SAT instance in polynomial time. Decision tree bucketing reduces this to linear time.

The argument is constructive. For each non-member of the target class, there exists a boolean literal (a feature test) that separates it from at least one member. Clauses (OR of literals) cover groups of non-members. The conjunction (AND of clauses) forms a CNF formula that is satisfied exactly by members of the target class.

Snake builds this formula directly from the data. No backtracking, no exponential search. The construction is O(|Fs|^2 * m) per class per bucket, where |Fs| is the number of non-members and m is the number of features. With bucketing: O(L * n * b * m) total — linear in n.

2. Why 4 Independent Models

The Concierge classifies along 4 orthogonal axes:

Model	Classes	AUROC	Signal
`routing_model`	5 (Logistique, Commercial, Technique, Comptabilite, SAV)	0.990	intention_primaire, mots_cles
`urgency_model`	4 (Critique, Haute, Normale, Basse)	0.979	frustration, menace_churn, retard_jours, escalade
`sub_routing_model`	6 (Retard livraison, Casse, Erreur, Facturation, Stock, Autre)	0.853	intention + service context
`churn_model`	3 (Risque eleve, Risque modere, Pas de risque)	0.901	menace_churn, frustration, escalade

A single 18-class model (5 * 4 * ... combinations) would require exponentially more data to achieve the same discrimination. Factoring into independent models lets each one focus on its own feature subspace. Routing uses text features (intention, keywords). Urgency uses numeric/boolean signals (frustration, escalade). Churn combines both.

3. The Learning Curve — 10 Rounds

Each round: generate synthetic data, save as NDJSON, train Snake, evaluate AUROC, adjust hyperparameters.

Round	Data	Layers	Bucket	Noise	Profile	Routing	Urgency	SubCat	Churn
1	300	5	50	0.25	auto	0.984	1.000	0.710	0.772
2	500	5	50	0.25	auto	0.993	1.000	0.818	0.831
3	500	10	50	0.25	auto	1.000	1.000	0.825	0.901
4	500	10	50	0.25	industrial	1.000	0.982	0.826	0.862
5	600	10	100	0.30	industrial	0.986	0.963	0.798	0.815
6	600	15	100	0.30	industrial	0.994	0.956	0.839	0.862
7	700	15	80	0.50	industrial	0.991	0.974	0.820	0.862
8	800	15	80	0.50	industrial	0.994	0.971	0.847	0.879
9	800	20	80	0.50	balanced	0.983	0.989	0.834	0.863
10	1000	20	100	0.40	industrial	0.996	0.980	0.845	0.889
FINAL	1300	20	100	0.40	industrial	0.990	0.979	0.853	0.901

Key findings

Industrial profile dominates. Switching from auto to industrial at round 4 pushed sub_routing from 0.825 to 0.826 AUROC — modest, but the real gain was stability. Industrial is designed for mixed tabular+text features, which is exactly what the extraction schema produces. Round 9 tested balanced — it regressed on routing (0.983 vs 0.996) and gained nothing on churn.

Data volume matters most for churn. Churn AUROC jumped from 0.772 (300 samples) to 0.901 (1300 samples). The three-class problem (eleve/modere/pas de risque) needs enough examples of the middle class to learn the boundary. Risque modere is inherently ambiguous — it's the transition zone between the two extremes.

Layers matter for urgency. Urgency was perfect (1.000) with auto profile but dropped to 0.982 with industrial. More layers (15-20) recovered it to 0.979. The industrial profile's literal type weights are optimized for text features; urgency relies on numeric features (frustration, retard_jours) where auto's defaults are adequate.

4. The Sub-Routing Challenge

Sub-routing has 6 classes including an "Autre" catch-all. AUROC is 0.853 despite only 54% accuracy. This is not a contradiction:

Accuracy penalizes the catch-all. "Autre" absorbs everything that doesn't fit the 5 specific categories. The model correctly ranks "Retard livraison" above "Autre" (per-class AUROC 0.964 vs 0.700), but when forced to make a hard prediction, borderline cases get assigned to Autre and are counted as errors.

AUROC shows the model ranks correctly. A 0.853 macro AUROC means: given a random Retard livraison case and a random non-Retard case, the model assigns higher probability to the correct one 96.4% of the time. The ranking is sound; the issue is threshold calibration.

What would fix it: Real labeled data. The synthetic generator produces sub-categories by random sampling conditioned on service. Real customer messages have subtle lexical patterns — "livraison" vs "expedition", "casse" vs "fissure" — that would give Snake much stronger literals to work with.

5. Churn as a SAT Problem

Churn detection maps naturally to boolean satisfiability. The features are:

Feature	Type	Snake literal
menace_churn	boolean	`menace_churn = 1` (exact match)
frustration	float [0,1]	`frustration > 0.7` (threshold)
escalade	categorical	`escalade CONTAINS "troisieme"` (substring)
retard_jours	integer	`retard_jours > 5` (threshold)

Snake discovers the rule:

menace_churn = 1 AND retard_jours > 5 AND frustration > 0.7 → Risque eleve

This is a CNF clause. The conjunction of these literals excludes non-members of the "Risque eleve" class. Snake finds this in O(n * b * m) time — no gradient descent, no loss function, just constructive SAT.

The audit trail is human-readable: "Risque eleve because menace_churn AND retard 7 jours AND frustration 0.92". An admin can read it, understand it, and override it.

6. Architecture

Message client (portail / email / telephone transcrit)
  |
  v
Claude Haiku extraction (680ms)
  intention_primaire, intention_secondaire
  entites: ref_commande, produit, retard_jours
  contexte_emotionnel: frustration, urgence_percue, menace_churn, escalade
  mots_cles, action_attendue
  |
  v
Snake SAT classification (<1ms per model)
  routing_model      -> service cible (5 classes)
  urgency_model      -> urgence (4 classes)
  sub_routing_model   -> sous-categorie (6 classes)
  churn_model        -> risque churn (3 classes)
  |
  v
JSON response
  routage + priorite + risque_client + actions_suggerees + xai audit

The extraction step converts unstructured text into a feature vector. Two modes are available:

Full mode ("anthropic": true): Claude Haiku reads the message with semantic understanding. Quality 0.90, latency ~680ms. Understands nuance, sarcasm, implicit references.

Degraded mode ("anthropic": false, default): Regex pattern matching extracts explicit signals — CMD refs, product names, frustration keywords, churn signals. Quality 0.55, latency <5ms. No external dependency. Works without API key.

Snake classification is identical in both modes — same models, same AUROC. The quality difference is in extraction fidelity, not classification accuracy. See /genesis for full documentation.

7. API Endpoints

Endpoint	Method	Description
`/classify`	POST	Full classification with client_id, historique_tickets, structured input
`/comprendre`	POST	Casual endpoint — text + factory_id + anthropic flag. Default: regex extraction + Snake SAT (no LLM). Set `"anthropic": true` for Claude Haiku extraction.
`/genesis`	GET	Full /comprendre documentation — input/output schemas, modes, setup
`/health`	GET	Health check
`/`	GET	Landing page with live demo
`/paper`	GET	This document
`/businesssummary`	GET	Non-technical pitch