The argument is constructive. For each non-member of the target class, there exists a boolean literal (a feature test) that separates it from at least one member. Clauses (OR of literals) cover groups of non-members. The conjunction (AND of clauses) forms a CNF formula that is satisfied exactly by members of the target class.
Snake builds this formula directly from the data. No backtracking, no exponential search. The construction is O(|Fs|^2 * m) per class per bucket, where |Fs| is the number of non-members and m is the number of features. With bucketing: O(L * n * b * m) total — linear in n.
The Concierge classifies along 4 orthogonal axes:
| Model | Classes | AUROC | Signal |
|---|---|---|---|
routing_model | 5 (Logistique, Commercial, Technique, Comptabilite, SAV) | 0.990 | intention_primaire, mots_cles |
urgency_model | 4 (Critique, Haute, Normale, Basse) | 0.979 | frustration, menace_churn, retard_jours, escalade |
sub_routing_model | 6 (Retard livraison, Casse, Erreur, Facturation, Stock, Autre) | 0.853 | intention + service context |
churn_model | 3 (Risque eleve, Risque modere, Pas de risque) | 0.901 | menace_churn, frustration, escalade |
A single 18-class model (5 * 4 * ... combinations) would require exponentially more data to achieve the same discrimination. Factoring into independent models lets each one focus on its own feature subspace. Routing uses text features (intention, keywords). Urgency uses numeric/boolean signals (frustration, escalade). Churn combines both.
Each round: generate synthetic data, save as NDJSON, train Snake, evaluate AUROC, adjust hyperparameters.
| Round | Data | Layers | Bucket | Noise | Profile | Routing | Urgency | SubCat | Churn |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 300 | 5 | 50 | 0.25 | auto | 0.984 | 1.000 | 0.710 | 0.772 |
| 2 | 500 | 5 | 50 | 0.25 | auto | 0.993 | 1.000 | 0.818 | 0.831 |
| 3 | 500 | 10 | 50 | 0.25 | auto | 1.000 | 1.000 | 0.825 | 0.901 |
| 4 | 500 | 10 | 50 | 0.25 | industrial | 1.000 | 0.982 | 0.826 | 0.862 |
| 5 | 600 | 10 | 100 | 0.30 | industrial | 0.986 | 0.963 | 0.798 | 0.815 |
| 6 | 600 | 15 | 100 | 0.30 | industrial | 0.994 | 0.956 | 0.839 | 0.862 |
| 7 | 700 | 15 | 80 | 0.50 | industrial | 0.991 | 0.974 | 0.820 | 0.862 |
| 8 | 800 | 15 | 80 | 0.50 | industrial | 0.994 | 0.971 | 0.847 | 0.879 |
| 9 | 800 | 20 | 80 | 0.50 | balanced | 0.983 | 0.989 | 0.834 | 0.863 |
| 10 | 1000 | 20 | 100 | 0.40 | industrial | 0.996 | 0.980 | 0.845 | 0.889 |
| FINAL | 1300 | 20 | 100 | 0.40 | industrial | 0.990 | 0.979 | 0.853 | 0.901 |
Industrial profile dominates. Switching from auto to industrial at round 4 pushed sub_routing from 0.825 to 0.826 AUROC — modest, but the real gain was stability. Industrial is designed for mixed tabular+text features, which is exactly what the extraction schema produces. Round 9 tested balanced — it regressed on routing (0.983 vs 0.996) and gained nothing on churn.
Data volume matters most for churn. Churn AUROC jumped from 0.772 (300 samples) to 0.901 (1300 samples). The three-class problem (eleve/modere/pas de risque) needs enough examples of the middle class to learn the boundary. Risque modere is inherently ambiguous — it's the transition zone between the two extremes.
Layers matter for urgency. Urgency was perfect (1.000) with auto profile but dropped to 0.982 with industrial. More layers (15-20) recovered it to 0.979. The industrial profile's literal type weights are optimized for text features; urgency relies on numeric features (frustration, retard_jours) where auto's defaults are adequate.
Sub-routing has 6 classes including an "Autre" catch-all. AUROC is 0.853 despite only 54% accuracy. This is not a contradiction:
Accuracy penalizes the catch-all. "Autre" absorbs everything that doesn't fit the 5 specific categories. The model correctly ranks "Retard livraison" above "Autre" (per-class AUROC 0.964 vs 0.700), but when forced to make a hard prediction, borderline cases get assigned to Autre and are counted as errors.
AUROC shows the model ranks correctly. A 0.853 macro AUROC means: given a random Retard livraison case and a random non-Retard case, the model assigns higher probability to the correct one 96.4% of the time. The ranking is sound; the issue is threshold calibration.
What would fix it: Real labeled data. The synthetic generator produces sub-categories by random sampling conditioned on service. Real customer messages have subtle lexical patterns — "livraison" vs "expedition", "casse" vs "fissure" — that would give Snake much stronger literals to work with.
Churn detection maps naturally to boolean satisfiability. The features are:
| Feature | Type | Snake literal |
|---|---|---|
| menace_churn | boolean | menace_churn = 1 (exact match) |
| frustration | float [0,1] | frustration > 0.7 (threshold) |
| escalade | categorical | escalade CONTAINS "troisieme" (substring) |
| retard_jours | integer | retard_jours > 5 (threshold) |
Snake discovers the rule:
This is a CNF clause. The conjunction of these literals excludes non-members of the "Risque eleve" class. Snake finds this in O(n * b * m) time — no gradient descent, no loss function, just constructive SAT.
The audit trail is human-readable: "Risque eleve because menace_churn AND retard 7 jours AND frustration 0.92". An admin can read it, understand it, and override it.
Message client (portail / email / telephone transcrit) | v Claude Haiku extraction (680ms) intention_primaire, intention_secondaire entites: ref_commande, produit, retard_jours contexte_emotionnel: frustration, urgence_percue, menace_churn, escalade mots_cles, action_attendue | v Snake SAT classification (<1ms per model) routing_model -> service cible (5 classes) urgency_model -> urgence (4 classes) sub_routing_model -> sous-categorie (6 classes) churn_model -> risque churn (3 classes) | v JSON response routage + priorite + risque_client + actions_suggerees + xai audit
The extraction step converts unstructured text into a feature vector. Two modes are available:
Full mode ("anthropic": true): Claude Haiku reads the message with semantic understanding. Quality 0.90, latency ~680ms. Understands nuance, sarcasm, implicit references.
Degraded mode ("anthropic": false, default): Regex pattern matching extracts explicit signals — CMD refs, product names, frustration keywords, churn signals. Quality 0.55, latency <5ms. No external dependency. Works without API key.
Snake classification is identical in both modes — same models, same AUROC. The quality difference is in extraction fidelity, not classification accuracy. See /genesis for full documentation.
| Endpoint | Method | Description |
|---|---|---|
/classify | POST | Full classification with client_id, historique_tickets, structured input |
/comprendre | POST | Casual endpoint — text + factory_id + anthropic flag. Default: regex extraction + Snake SAT (no LLM). Set "anthropic": true for Claude Haiku extraction. |
/genesis | GET | Full /comprendre documentation — input/output schemas, modes, setup |
/health | GET | Health check |
/ | GET | Landing page with live demo |
/paper | GET | This document |
/businesssummary | GET | Non-technical pitch |