Controlled validation — not a general product release

SDVM surfaces silent degradation in long-horizon agentic workflows.

It is being developed to turn workflow traces into structured PRE/POST/DELTA diagnostic reports, helping teams understand where workflows drift, what evidence supports the finding, and what tuning step should be tested next.

SDVM refers to four diagnostic dimensions: Synchrony, Depth, Vulnerability and Metacognition.

  1. Traces Raw workflow evidence
  2. SDVM Diagnostic Layer Evidence organization
  3. PRE/POST/DELTA Report Comparable diagnosis
  4. Intervention Guidance What to test next

Problem

In long-horizon agentic workflows, the most expensive failures are not the visible ones. They are the ones that look like normal operation.

Expected steps get skipped without triggering an error. Repairs accumulate across cycles without being recognized as a pattern. Handoffs introduce noise that compounds quietly. Outputs remain plausible while the workflow drifts further from the intended behavior — and the system continues to run, producing results that pass surface-level checks, until the degradation is significant enough to become undeniable.

By that point, you have already run the affected workflow many times. You have no structured record of when the drift began, what changed, or what intervention would address it.

Existing observability tools give you the raw evidence — traces, events, costs, errors. What they do not give you is an interpretation of how the workflow is degrading across cycles. That gap is what SDVM is designed to close.

What SDVM does

SDVM is being developed as a diagnostic and tuning layer for agentic workflows. It is designed to sit on top of existing observability surfaces and convert trace evidence into workflow diagnosis.

Its focus is not simply whether an agent produced a bad answer. The focus is how the workflow behaves across cycles: where it drifts, where it requires repair, where handoffs create friction, where expected steps are skipped, and where available evidence is still too weak for a strong conclusion.

Observability shows what happened. SDVM is designed to help interpret how the workflow is degrading.

Existing observability:

  • Captures traces and events
  • Shows latency, cost and errors
  • Monitors system behavior

SDVM diagnostic layer:

  • Interprets workflow degradation patterns
  • Shows drift, repairs, skips and hotspots
  • Guides workflow tuning decisions

How it works

SDVM takes a set of traces from the same recurring workflow — runs of the same task type across two observation windows — and applies its four diagnostic dimensions to produce a structured PRE/POST/DELTA comparison.

The output is not another trace view. It shows which signals moved between windows, what the magnitude of the shift was, how strong the evidence is for each finding, and which intervention lever is most likely to address the identified pattern.

Where evidence is insufficient for a strong conclusion, the report flags it explicitly. Interpretation limits are a first-class output, not a footnote.

Example diagnostic fragment

A typical SDVM output is not another trace view. It is a compact diagnostic view of how a workflow changed, what evidence supports the assessment, and what tuning decision should be tested next.

Synthetic example — not client data PRE/POST/DELTA excerpt

Signal PRE POST DELTA
Repair pressure 3.1 repairs / cycle 1.2 repairs / cycle −61 %
Handoff noise 4 of 7 handoffs flagged 1 of 7 handoffs flagged −75 %
Step skip rate 4 skips observed 1 skip observed −75 %
Evidence strength 0.51 0.74 +0.23
Interpretation limit Medium Low-medium Stronger, not definitive

Recommended next intervention: "Tighten checkpoint summaries and handoff contracts on the flagged edges where repair pressure and handoff noise increased before expanding workflow scope."

SDVM recommendations are designed to focus tuning on the workflow edges where the diagnostic signals concentrate, rather than applying generic fixes to the entire workflow.

Synthetic example — not client data

Example only. Values are illustrative; actual reports depend on trace quality, workflow structure and available evidence.

Pilot fit

Preferred early pilots are coding or bugfix-style workflows, but the fit is structural rather than domain-specific. Other agentic workflows may qualify if they have recurring task types, traceable multi-step execution, observable revisions, repairs or handoffs, and enough comparable runs for PRE/POST analysis.

  • One recurring workflow type
  • Traceable multi-step execution
  • Observable repairs, revisions or handoffs
  • Enough runs to compare workflow behavior before and after adjustments

What a pilot requires

SDVM pilots are intentionally narrow. The goal is to evaluate whether structured trace diagnosis can identify degradation patterns and guide one controlled workflow intervention.

What a pilot requires

  • One recurring agentic workflow, preferably coding or bugfix-style for the first validation track
  • Trace access through Langfuse, the current validation path
  • Enough comparable runs for baseline and follow-up analysis
  • Observable repairs, revisions, handoffs or skipped steps
  • One technical owner available to review findings and test an intervention
  • Willingness to share anonymized traces or metadata for analysis

What you receive

  • A structured PRE/POST/DELTA diagnostic report
  • A prioritized view of likely degradation patterns
  • Interpretation limits and evidence-strength boundaries
  • One recommended intervention path to test next
  • A follow-up comparison when enough post-intervention traces are available

Pilot process

  1. Scoping call — confirm workflow fit, trace availability and pilot boundaries
  2. Baseline diagnostic — analyze historical or initial traces and identify candidate degradation patterns
  3. Intervention and follow-up — test one controlled tuning step and compare workflow behavior after adjustment

Typical pilot shape: 2–4 weeks, depending on trace availability and team cadence.

Current validation scope

SDVM is designed to be workflow-engine agnostic. The first validation track is narrow by design: coding and bugfix-style workflows, trace-based diagnosis, and Langfuse-first ingestion. This track was chosen because it provides repeatable, traceable, multi-step agentic runs with clear intervention points.

The broader design is intended to remain surface-agnostic, with future compatibility paths that may include observability surfaces such as Phoenix/OpenInference and LangSmith.

The goal is to validate the diagnostic model on real or semi-real workflows, refine evidence thresholds, and test whether the reports help teams make better workflow tuning decisions.

Origin

SDVM is being developed by Ibrahim José Jamhour, an independent researcher working on Distributed Relational Cognition and the operational risks of agentic systems. The work builds on published research, a formal SDVM V3 technical specification and ongoing controlled validation on AI-assisted workflows.

Jamhour also brings prior executive experience in institutional finance and risk-sensitive operations, including the Stanford Sloan Fellows program.

Pilot conversations

I am currently looking for a small number of controlled pilot conversations with teams running recurring, traceable agentic workflows, with priority for coding or bugfix-style cases in the first validation track. The best candidates have traceable multi-step runs, enough history for PRE/POST comparison, and a clear owner for workflow tuning decisions.

Data handling

Pilot analysis can be performed on anonymized traces and metadata. Data scope, access method and retention expectations are agreed case by case. NDA support is available when required.

If this matches how you work, you can start a scoped conversation using the email link below.

Discuss a private pilot

Pilot conversations are exploratory and scoped case by case.