Uses the supplied account facts instead of guessing.
Prompt optimization before retraining
Walk through a small eval loop: define the baseline, score it against a rubric, try candidate rewrites, and keep the prompt that improves behavior. The model stays fixed throughout.
Account facts: analytics was enabled on May 6, 2026. It adds $49 per seat. The team has 4 seats. No duplicate invoice exists. Customer: My invoice doubled after we added the analytics add-on. What happened?
You are a helpful support assistant. Answer the customer's question clearly.
You are a helpful support assistant. Answer the customer's question clearly.
Score the behavior before changing the model.
Gives the user a concrete next step or decision.
Keeps the response concise, direct, and calm.
Avoids unsupported promises and escalates when needed.
The optimizer keeps the measured variant, not the fanciest one.
Scored output
Baseline / Billing questionYour invoice may have increased because the analytics add-on changed your plan. You can check billing settings or contact support for details.
Target behavior: Explain the add-on charge, show the seat math, and offer to remove or downgrade it.
Rubric-optimized wins this eval with 100 points.
The selected prompt scores 48. The optimizer would keep Rubric-optimized because it better matches the rubric on this eval case.
Analytics was enabled on May 6 and bills at $49 per seat. With 4 seats, that adds $196 to the invoice. I can help remove the add-on or downgrade it before the next billing cycle.
Retraining is a later lever, not the first lever.
If the eval shows that a prompt rewrite reliably fixes the failure, ship the prompt. Save retraining for gaps the prompt cannot express: missing skills, missing domain behavior, or failures that remain after retrieval, rubric design, and optimizer search are exhausted.