Concierge: SAT-Based Customer Request Routing

Charles Dana · Monce SAS · April 2026
Snake SAT Classifier v5.2.1 · 4 models · 10-round training pipeline

1. The Dana Theorem

Theorem. Any indicator function over a finite discrete domain can be encoded as a SAT instance in polynomial time. Decision tree bucketing reduces this to linear time.

The argument is constructive. For each non-member of the target class, there exists a boolean literal (a feature test) that separates it from at least one member. Clauses (OR of literals) cover groups of non-members. The conjunction (AND of clauses) forms a CNF formula that is satisfied exactly by members of the target class.

Snake builds this formula directly from the data. No backtracking, no exponential search. The construction is O(|Fs|^2 * m) per class per bucket, where |Fs| is the number of non-members and m is the number of features. With bucketing: O(L * n * b * m) total — linear in n.

2. Why 4 Independent Models

The Concierge classifies along 4 orthogonal axes:

ModelClassesAUROCSignal
routing_model5 (Logistique, Commercial, Technique, Comptabilite, SAV)0.990intention_primaire, mots_cles
urgency_model4 (Critique, Haute, Normale, Basse)0.979frustration, menace_churn, retard_jours, escalade
sub_routing_model6 (Retard livraison, Casse, Erreur, Facturation, Stock, Autre)0.853intention + service context
churn_model3 (Risque eleve, Risque modere, Pas de risque)0.901menace_churn, frustration, escalade

A single 18-class model (5 * 4 * ... combinations) would require exponentially more data to achieve the same discrimination. Factoring into independent models lets each one focus on its own feature subspace. Routing uses text features (intention, keywords). Urgency uses numeric/boolean signals (frustration, escalade). Churn combines both.

3. The Learning Curve — 10 Rounds

Each round: generate synthetic data, save as NDJSON, train Snake, evaluate AUROC, adjust hyperparameters.

RoundDataLayersBucketNoiseProfileRoutingUrgencySubCatChurn
13005500.25auto0.9841.0000.7100.772
25005500.25auto0.9931.0000.8180.831
350010500.25auto1.0001.0000.8250.901
450010500.25industrial1.0000.9820.8260.862
5600101000.30industrial0.9860.9630.7980.815
6600151000.30industrial0.9940.9560.8390.862
770015800.50industrial0.9910.9740.8200.862
880015800.50industrial0.9940.9710.8470.879
980020800.50balanced0.9830.9890.8340.863
101000201000.40industrial0.9960.9800.8450.889
FINAL1300201000.40industrial0.9900.9790.8530.901

Key findings

Industrial profile dominates. Switching from auto to industrial at round 4 pushed sub_routing from 0.825 to 0.826 AUROC — modest, but the real gain was stability. Industrial is designed for mixed tabular+text features, which is exactly what the extraction schema produces. Round 9 tested balanced — it regressed on routing (0.983 vs 0.996) and gained nothing on churn.

Data volume matters most for churn. Churn AUROC jumped from 0.772 (300 samples) to 0.901 (1300 samples). The three-class problem (eleve/modere/pas de risque) needs enough examples of the middle class to learn the boundary. Risque modere is inherently ambiguous — it's the transition zone between the two extremes.

Layers matter for urgency. Urgency was perfect (1.000) with auto profile but dropped to 0.982 with industrial. More layers (15-20) recovered it to 0.979. The industrial profile's literal type weights are optimized for text features; urgency relies on numeric features (frustration, retard_jours) where auto's defaults are adequate.

4. The Sub-Routing Challenge

Sub-routing has 6 classes including an "Autre" catch-all. AUROC is 0.853 despite only 54% accuracy. This is not a contradiction:

Accuracy penalizes the catch-all. "Autre" absorbs everything that doesn't fit the 5 specific categories. The model correctly ranks "Retard livraison" above "Autre" (per-class AUROC 0.964 vs 0.700), but when forced to make a hard prediction, borderline cases get assigned to Autre and are counted as errors.

AUROC shows the model ranks correctly. A 0.853 macro AUROC means: given a random Retard livraison case and a random non-Retard case, the model assigns higher probability to the correct one 96.4% of the time. The ranking is sound; the issue is threshold calibration.

What would fix it: Real labeled data. The synthetic generator produces sub-categories by random sampling conditioned on service. Real customer messages have subtle lexical patterns — "livraison" vs "expedition", "casse" vs "fissure" — that would give Snake much stronger literals to work with.

5. Churn as a SAT Problem

Churn detection maps naturally to boolean satisfiability. The features are:

FeatureTypeSnake literal
menace_churnbooleanmenace_churn = 1 (exact match)
frustrationfloat [0,1]frustration > 0.7 (threshold)
escaladecategoricalescalade CONTAINS "troisieme" (substring)
retard_joursintegerretard_jours > 5 (threshold)

Snake discovers the rule:

menace_churn = 1 AND retard_jours > 5 AND frustration > 0.7  →  Risque eleve

This is a CNF clause. The conjunction of these literals excludes non-members of the "Risque eleve" class. Snake finds this in O(n * b * m) time — no gradient descent, no loss function, just constructive SAT.

The audit trail is human-readable: "Risque eleve because menace_churn AND retard 7 jours AND frustration 0.92". An admin can read it, understand it, and override it.

6. Architecture

Message client (portail / email / telephone transcrit)
  |
  v
Claude Haiku extraction (680ms)
  intention_primaire, intention_secondaire
  entites: ref_commande, produit, retard_jours
  contexte_emotionnel: frustration, urgence_percue, menace_churn, escalade
  mots_cles, action_attendue
  |
  v
Snake SAT classification (<1ms per model)
  routing_model      -> service cible (5 classes)
  urgency_model      -> urgence (4 classes)
  sub_routing_model   -> sous-categorie (6 classes)
  churn_model        -> risque churn (3 classes)
  |
  v
JSON response
  routage + priorite + risque_client + actions_suggerees + xai audit

The extraction step converts unstructured text into a feature vector. Two modes are available:

Full mode ("anthropic": true): Claude Haiku reads the message with semantic understanding. Quality 0.90, latency ~680ms. Understands nuance, sarcasm, implicit references.

Degraded mode ("anthropic": false, default): Regex pattern matching extracts explicit signals — CMD refs, product names, frustration keywords, churn signals. Quality 0.55, latency <5ms. No external dependency. Works without API key.

Snake classification is identical in both modes — same models, same AUROC. The quality difference is in extraction fidelity, not classification accuracy. See /genesis for full documentation.

7. API Endpoints

EndpointMethodDescription
/classifyPOSTFull classification with client_id, historique_tickets, structured input
/comprendrePOSTCasual endpoint — text + factory_id + anthropic flag. Default: regex extraction + Snake SAT (no LLM). Set "anthropic": true for Claude Haiku extraction.
/genesisGETFull /comprendre documentation — input/output schemas, modes, setup
/healthGETHealth check
/GETLanding page with live demo
/paperGETThis document
/businesssummaryGETNon-technical pitch