This page explains Maestro's internal architecture, execution model, and design decisions. It is written for engineers who need to understand the system before trusting it — not for first-time users.
Architecture Overview
Maestro is a distributed, per-station system. Each test station runs a full, autonomous instance. There is no single coordinating server; the only shared infrastructure across stations is a PostgreSQL database used for persistent result storage.
┌────────────────────────────────────────────────────────────────────┐
│ Client Layer │
│ Blazor UI (browser) · REST API consumers · MCP │
└──────────────────────┬─────────────────────────────────────────────┘
│ SignalR (WebSocket) + REST/HTTP
▼
┌────────────────────────────────────────────────────────────────────┐
│ API Layer (ASP.NET Core 10, Kestrel) │
│ │
│ YAML Compiler → Workflow Executor → Measurement │
│ │ Evaluator │
│ SignalR Hub ←──────────┤ │
│ Runner Orchestrator ────┤ │
│ Variable Engine ────────┘ │
└──────┬────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────────────┐ ┌─────────────────────┐
│ Redis │ │ gRPC Runners │ │ PostgreSQL │
│ │ │ │ │ │
│ Variables │ │ .NET 10 runner │ │ test_executions │
│ Pub/Sub │ │ .NET 4.8 runner │ │ step_results │
│ (ephemeral)│ │ Python runner │ │ measurements │
└──────────────┘ │ Mock runner │ │ station_config │
│ Delay runner │ │ packages │
└────────┬─────────────┘ └─────────────────────┘
│
▼
┌────────────────────────┐
│ Instruments & DUT │
│ VISA · serial · TCP │
└────────────────────────┘
The five layers
Client layer. The Blazor UI, any REST consumer (CI/CD, MES), and the MCP server all interact with Maestro through the same public interface — the API layer. There is no privileged internal client. The Blazor UI is a SignalR consumer, just like any other.
API layer. The central process on each station. It receives execution requests, compiles YAML, drives the workflow, orchestrates runner calls via gRPC, evaluates measurements, broadcasts events over SignalR, and persists results. It owns the execution state for the lifetime of a run.
Redis. The variable store and pub/sub bus. Variables created during a test live in Redis keyed by execution ID (realm:{guid}:{name}). Redis is ephemeral — it holds no permanent state. If Redis restarts during a test, the in-progress run is lost; completed records in PostgreSQL are unaffected.
gRPC runners. Separate processes (one per language/runtime) that execute test code and return results. They have no knowledge of the test sequence, no access to the database, and no shared state. They receive a single step's parameters and return output variables, measurements, and log entries. Each runner is independently deployable and versioned.
PostgreSQL. All permanent state. Results, measurements, configuration, and package metadata. The schema is owned by the API layer and migrated automatically on startup via EF Core.
Technology choices
|
Decision |
Choice |
Rationale |
|---|---|---|
|
API protocol |
REST/HTTP |
Firewall-friendly; Swagger UI works out of the box |
|
Real-time channel |
SignalR (WebSocket with HTTP fallback) |
Built into http://ASP.NET Core; no additional broker |
|
Runner IPC |
gRPC (Protocol Buffers) |
Strongly typed, cross-platform, bidirectional streaming, works across process and container boundaries |
|
Variable store |
Redis |
Sub-millisecond, pub/sub, multi-reader/multi-writer, runs on ARM |
|
Persistent store |
PostgreSQL |
JSONB for flexible metadata, relational tables for structured measurements, native statistical functions for SPC |
|
Templating |
Scriban |
Safe, sandboxed, readable syntax |
|
Preconditions |
DynamicExpresso |
C#-like boolean expressions without allowing arbitrary code |
Execution Model
Step-by-step: what happens when a test runs
-
Request received. A
POST /api/testexecution/runcall arrives with a YAML file path, serial number, operator ID, and optional tags. -
YAML compiled. The YAML compiler loads the file, resolves any
type: sequenceincludes, validates the schema, and produces an in-memory execution plan — an ordered list of steps with their parameters and control fields. -
Execution record created. The API inserts a
test_executionrow into PostgreSQL with statusRunning, the Git commit hash of the installed package, and a JSONB snapshot of the merged station configuration at this moment. The execution ID (GUID) is returned to the caller immediately. -
Variable initialisation. All variables declared in the YAML
variables:block are written to Redis under the execution's realm key. Station config entries (cfg.*) and execution metadata (exec.*) are merged and also written to Redis. -
Step loop. Steps execute strictly sequentially. For each step:
a. Precondition evaluated. If
enabled: false, the step is immediately markedSkipped. Ifprecondition:is set, the DynamicExpresso engine evaluates it against current variable values. If it evaluatesfalse, the step isSkipped. Skipped steps do not affect the execution verdict.b.
StepStartedevent broadcast. All connected SignalR clients receive the step name and index.c. gRPC
ExecuteStepsent. The API sends the step's parameters (with all{{variable}}placeholders resolved from Redis) to the appropriate runner. The runner executes the test code synchronously and streams back aStepResult.d. Measurements evaluated. The API evaluates each measurement point returned by the runner against its limits from the YAML definition — not from the runner. The runner reports raw values only; verdict logic is always in the engine.
e. Output variables written. Return values from the runner are written back to Redis under the execution realm. They are immediately available to subsequent steps — including steps in a different runner.
f. Step result persisted. A
step_resultrow is written to PostgreSQL with the verdict, duration, and all measurement points.g.
StepCompletedevent broadcast. All SignalR clients receive the verdict, duration, and measurement data.h.
post_execution_actionevaluated. If the step failed andpost_execution_action: terminate-on-failis set, the execution enters abort mode. Steps withrun_on_abort: truestill execute; others are skipped. -
Execution verdict determined. The worst step verdict propagates: any
FAILmakes the executionFAIL; anyUNDETERMINEDwith no fails makes itUNDETERMINED; allPASSmakes itPASS. Skipped steps are not counted. -
Cleanup. The Redis realm is deleted. The
test_executionrow is updated with the final verdict, end time, and duration. -
ExecutionCompletedevent broadcast. -
MES reporting. If a MES implementation is configured, the result is posted to the MES. If the post fails, the result is queued in the
mes_result_queuetable for automatic retry.
Runner model: one method per step
The architectural contract between the YAML definition and runner code is one-to-one: each YAML step maps to exactly one method or function in runner code. This is a deliberate constraint, not a limitation.
When test logic is written as one large method — connect, measure, disconnect — the engine cannot distinguish between a measurement failure and an initialisation failure. Retry, skip, and run_on_abort have no granularity. The verdict is a single opaque pass/fail.
When each operation is its own method and its own step:
-
A flaky instrument read can be retried with
retry:without repeating the connect step -
A teardown step runs even after an abort via
run_on_abort: true -
Each measurement has its own named step with its own verdict, searchable independently in the database
-
A step can be disabled during development with
enabled: falsewithout modifying runner code
Variable system
Variables form the state shared between steps. They live in Redis for the duration of the execution.
|
Namespace |
Injected by |
Available in templates |
Available in |
|---|---|---|---|
|
Bare ( |
YAML declaration or runner output |
✅ |
✅ |
|
|
Station config at test start |
✅ |
❌ |
|
|
Execution request (serial number, operator) |
✅ |
❌ |
|
|
MES routing response |
✅ |
❌ |
|
|
Repeat engine |
✅ |
✅ |
The cfg.*, exec.*, and mes.* namespaces cannot be used in precondition: expressions because DynamicExpresso interprets the dot as C# member access. To branch on a config value, copy it into a bare variable in a preceding step.
Cross-language variable passing is built into this model. A .NET step writes {{voltage}} to Redis; a Python step in the next step reads {{voltage}} from Redis. No user code is required — it is a consequence of all runners sharing the same Redis-backed variable store.
Step control fields
These fields control how the execution engine behaves around each step:
|
Field |
Effect |
|---|---|
|
|
Step is statically skipped — never evaluated, never sent to runner |
|
|
Step is skipped if the expression evaluates to false at runtime |
|
|
Overrides the verdict regardless of what the runner returns |
|
|
Re-attempts the runner call up to |
|
|
Re-runs the step (or sub-sequence) while the condition holds; |
|
|
Wall-clock deadline for the runner to respond; exceeded → step is aborted |
|
|
Step executes even when the test is in abort mode; required on all cleanup steps |
|
|
|
Data Flow & Result Handling
Measurement storage design
Measurements are stored as individual relational rows, not as blobs or log strings. Each row contains:
|
Column |
Type |
Description |
|---|---|---|
|
|
text |
Name from the YAML definition |
|
|
numeric |
The value reported by the runner |
|
|
numeric |
Lower bound from YAML |
|
|
numeric |
Upper bound from YAML |
|
|
text |
Engineering unit (V, A, °C, etc.) |
|
|
enum |
|
|
|
timestamptz |
When the measurement was recorded |
|
|
text |
Which station produced this row |
|
|
text |
Which device was being tested |
|
|
FK |
The step that produced this measurement |
This structure enables SQL-native statistical analysis without log parsing or ETL:
-- 3-sigma bounds for a voltage rail over the past week
SELECT
measurement_name,
ROUND(AVG(actual_value)::numeric, 4) AS mean,
ROUND(STDDEV(actual_value)::numeric, 4) AS stddev,
ROUND((AVG(actual_value) - 3 * STDDEV(actual_value))::numeric, 4) AS lower_3sigma,
ROUND((AVG(actual_value) + 3 * STDDEV(actual_value))::numeric, 4) AS upper_3sigma
FROM measurements
WHERE measurement_name = 'VOUT_3V3'
AND timestamp > NOW() - INTERVAL '7 days'
GROUP BY measurement_name;
-- Top failing measurements in the last 100 runs
SELECT measurement_name, COUNT(*) AS fail_count
FROM measurements
WHERE verdict = 'FAIL'
AND timestamp > NOW() - INTERVAL '24 hours'
GROUP BY measurement_name
ORDER BY fail_count DESC
LIMIT 10;
Configuration snapshot
At the moment a test begins, the engine reads all station configuration (global + station-local), merges them (station-local wins on key collision), and stores the result as JSONB on the test_execution row. This snapshot is immutable.
The consequence: "what configuration was active when UNIT-4421 failed last Tuesday?" is answerable from the database alone, even if the station configuration has been changed since then. The snapshot decouples historical analysis from current configuration state.
Limit evaluation: engine, not runner
The runner reports a raw numeric value. The engine evaluates the limit — from the YAML definition — and assigns the verdict. The runner has no knowledge of what constitutes a pass or fail.
This separation has two consequences that matter in regulated environments:
-
Limits are in version control. A limit change is a YAML commit. The commit timestamp, author, and diff are the audit trail. There is no question about whether the code change and the limit change happened atomically.
-
Test code is limit-agnostic. The same Python or C# method can be used across multiple YAML definitions with different limits without modification. Limit changes do not require code review, recompilation, or redeployment of runner assemblies.
MES result queue
When a MES integration is configured, the engine posts the execution result to the MES after every run. If the MES is unavailable (network fault, scheduled maintenance), the result is written to a mes_result_queue table and the execution is not blocked. A background worker retries queued results on a configurable interval until the post succeeds.
Timing, Synchronisation & Determinism
Sequential execution
Steps execute strictly in declaration order. There is no parallel step execution within a single test definition. This is a deliberate design choice: hardware tests frequently have physical dependencies (power a rail before measuring it; close a relay before reading continuity) where out-of-order execution would damage the DUT or produce meaningless results.
Timing guarantees
Maestro provides ordered, sequential execution with no hard real-time guarantees. The timing constraints are:
|
Source |
Characteristics |
|---|---|
|
gRPC call to runner |
Network latency to a local process — typically < 5 ms over loopback |
|
YAML step overhead |
Compilation, variable resolution, database write — typically 5–20 ms per step |
|
|
Implemented via |
|
|
Wall-clock timeout; enforced by the API, not the runner |
For tests that require sub-millisecond timing between hardware events, the timing logic must live inside the runner (a single step), not in the YAML sequence. The inter-step overhead is unsuitable for precise waveform generation or synchronised signal acquisition.
timeout_ms as a mandatory safety net
Every step that calls hardware should declare timeout_ms. An instrument that stops responding without throwing an exception will block the runner indefinitely. The timeout_ms field sets a wall-clock deadline on the API side: if the runner does not return within the window, the step is aborted and the test moves to abort mode.
A missing timeout_ms on hardware steps is the most common cause of a run that hangs at a specific step forever. This is treated as an anti-pattern in Maestro's YAML conventions.
Retry semantics
retry: re-attempts the gRPC call from the API to the runner. It does not replay any state that the runner may have changed before the failure. If a step that partially configures an instrument fails and is retried, the instrument may be in an unknown state on the retry attempt. Test code that is retried must be written to be safe for repeated execution (idempotent or self-resetting).
Variable isolation
Each test execution gets its own Redis realm (realm:{execution_guid}:*). Concurrent executions on the same station do not share variable state. There is no locking or contention between runs.
Global variables (global:* in Redis) are shared across all executions on a station and are not protected by a transaction. Writing a global variable from two concurrent executions produces undefined ordering. Global variables should be treated as read-only from within test code.
Scaling Maestro
Horizontal station scaling
The fundamental scaling unit is the station. Adding capacity means adding stations — each gets its own API, Redis, and runners. There is no central API to scale. The shared bottleneck is PostgreSQL.
Station A Station B Station C ... Station N
API API API API
Redis Redis Redis Redis
\ | | /
\ | | /
└────────┴────────────┴──────────────┘
PostgreSQL
(shared DB)
This architecture makes a station network outage graceful: if PostgreSQL is unreachable, in-progress tests continue executing; only result persistence fails. Completed tests that could not write their results queue locally (implementation-specific; check your deployment's configuration).
Multi-station fleet monitoring: Orchestra
Orchestra is the multi-station monitoring service. It queries PostgreSQL on a configurable interval and aggregates station health, recent verdicts, yield percentages, and currently-running tests into a fleet dashboard. It does not proxy or route test traffic — it is read-only over the shared database.
Orchestra's dashboard is designed to be readable from across a production floor: large colour-coded status cards per station, yield timeline with a configurable target line, and a Pareto chart of failing steps.
PostgreSQL scaling considerations
At scale, the measurements table grows fastest. At 100 steps per execution, 50 measurements per step, and 500 executions per day across a fleet, the table accumulates approximately 2.5 million rows per day. PostgreSQL handles this volume well with appropriate indexing (measurement_name, timestamp, station_id), but a retention and archival strategy should be planned before deployment at high-volume sites.
Partitioning the measurements table by month using PostgreSQL's declarative partitioning is the recommended approach for sites running more than ~1,000 executions per day across all stations.
What cannot be scaled horizontally
A single station can run only one test at a time. There is no mechanism within one Maestro station to run two tests in parallel against two devices simultaneously. Adding a second concurrent execution requires a second station instance (API + Redis + runners) pointing at the same PostgreSQL database.
Known Limitations & Design Trade-offs
These are the architectural trade-offs that were made deliberately. Understanding them is necessary for designing tests and deployments that work reliably.
Sequential-only step execution
Limitation: Steps within a single test run sequentially. There is no parallel step execution.
Trade-off made: Hardware tests have physical dependencies. Parallelism at the step level would require explicit dependency tracking and creates ordering hazards when shared instruments or bus resources are involved. The simplicity of a sequential model is worth the constraint.
Workaround: If two independent measurements genuinely have no physical dependencies, combine them into a single step implemented as a multi-measurement function. The runner code can parallelise internally; Maestro sees one step.
No sub-millisecond timing
Limitation: The inter-step overhead (5–20 ms) makes Maestro unsuitable as a real-time sequencer.
Trade-off made: Maestro is a test orchestrator, not a real-time controller. It is designed to coordinate steps across runners and record results, not to generate precise waveforms or synchronise events at microsecond granularity.
Workaround: Any sequence that requires tight timing must live inside a single runner step. The runner method controls its own internal timing. Maestro calls the method once and evaluates the result.
cfg.* and exec.* not usable in precondition:
Limitation: Precondition expressions cannot reference cfg.*, exec.*, or mes.* variables directly.
Trade-off made: DynamicExpresso interprets the dot (.) as C# member access, not as a namespace separator. Supporting dot-namespaced variables in preconditions would require a custom expression language or a breaking change to the namespace syntax.
Workaround: Copy the needed config value into a bare variable in a preceding step. This is documented and supported; it costs one extra step but keeps the precondition expression simple.
No built-in authentication (current release)
Limitation: The REST API and MCP server do not enforce authentication in the current release. Any client that can reach the API port can start a test, modify configuration, or read all results.
Trade-off made: Authentication was scoped out of the initial release to allow rapid adoption in controlled LAN environments. The IPermissionService interface is already wired throughout the codebase, so enabling authentication is an implementation swap, not an architectural change.
Mitigation today: Isolate station API ports to a VLAN or VPN. Do not expose ports 7000, 7001, or 7004 to untrusted networks.
Retry does not reset instrument state
Limitation: retry: re-calls the runner method but does not revert any side effects the method may have had before it failed.
Trade-off made: The engine cannot know what physical state a failed step left behind. Only the test code can know that, and only the test developer can decide what reset is appropriate.
Workaround: Write retried steps to be idempotent or self-resetting. A step that configures an instrument before reading it should re-issue the full configuration on every call, not assume the previous configuration is still valid.
Redis ephemeral: in-progress run is lost on Redis restart
Limitation: If Redis restarts or crashes during a test execution, the in-flight run cannot be recovered. Variable state is lost. The test_execution record in PostgreSQL will remain in a Running state indefinitely.
Trade-off made: Redis is used for its speed. Making the variable store durable would add latency to every variable write — unacceptable for a step-by-step hardware test loop.
Mitigation: Use Redis persistence (AOF or RDB snapshots) in production deployments if hardware-test hang recovery is important. Stale Running execution records can be identified and closed with a scheduled cleanup query.
One execution per station at a time
Limitation: A station cannot run two test executions concurrently.
Trade-off made: Hardware test stations are physical systems with one device under test at a time. Concurrent execution is architecturally unnecessary for the primary use case and would complicate instrument resource management significantly.
Workaround: For high-throughput sites testing multiple units in parallel, deploy one station instance per physical test fixture. Each station instance is independent and can run at full speed without interfering with others.
Package install requires manual trigger
Limitation: After pushing a new package version to the GitLab registry, the station does not automatically detect the update. A refresh must be explicitly triggered (UI button or POST /api/packages/refresh).
Trade-off made: Automatic background polling was rejected to avoid silent, uncontrolled package updates on production stations. Operators and automation systems must explicitly initiate an update — this is the intended safety mechanism.
Measurements stored with limits as-declared, not as-evaluated
Limitation: The low_limit and high_limit stored in the database are the values from the YAML definition, not any dynamic limits that might have been computed at runtime via variable substitution (e.g. low_limit: "{{computed_lower}}").
Implication: Queries that compare actual_value against low_limit and high_limit columns may not match the runtime evaluation if dynamic limits were in use. The verdict column is always correct. The stored limit columns reflect the YAML, not the runtime state.