Best Practices & Recommendations

Experience-based guidance for teams building and operating Maestro test suites. These recommendations are drawn from real deployment patterns and failure modes — not from theory.

Recommended Project Structure

Package directory layout

The scaffolder creates this structure. Do not deviate from it — the engine expects certain paths and the validation tooling is wired to them.

{PackageId}_FunctionalTest_v00/
├── package.json              ← Manifest: name, version, lifecycle, supported products
├── README.md                 ← What the package tests, hardware required, known issues
├── validate.py               ← Two-level validation — run before every commit
├── run_local.py              ← Local Python test runner (Python packages only)
│
├── tests/
│   ├── main.yaml             ← Top-level test definition (entry point)
│   └── sub-sequences/
│       ├── power-on.yaml     ← Reusable sub-sequence
│       └── power-off.yaml
│
├── assemblies/               ← Built .NET DLLs — committed to the repo
│   └── BoardTests.dll
│
├── python_modules/           ← Python test code
│   ├── instruments.py
│   └── board_tests.py
│
├── wheels/                   ← Vendored pip wheels for air-gapped production stations
│   └── pyvisa-1.14.1-py3-none-any.whl
│
└── data/                     ← Calibration tables, firmware images, reference files
    └── calibration.bin

The assemblies/ directory must be committed. The built DLLs are not compiled on the station — they are deployed as-is. If assemblies/ is in .gitignore, the package will fail to execute on every station.

The data/ directory uses relative paths. The runner sets the working directory to the package root. Use Path.Combine("data", "calibration.bin") in .NET and os.path.join("data", "calibration.bin") in Python. Never use absolute paths or Assembly.Location — the package installation directory varies between stations and deployments.

Naming conventions

Consistent naming makes packages findable in the catalog, measurements queryable in SQL, and reports readable without explanation.

Element	Convention	Example
Package name	`<ProductId>_<TestName>_v<NN>`	`ESH10000121_FunctionalTest_v00`
Test name (YAML)	Title Case, descriptive	`"ESH10000121 Functional Test v00"`
Step name	Active verb phrase	`"Measure 3V3 Rail"`, `"Connect to DMM"`
Measurement name	`UPPER_SNAKE_CASE`	`VCORE`, `RAIL_3V3`, `SUPPLY_CURRENT`
Variable name	`snake_case`	`supply_voltage`, `dmm_ready`
Config key (hardware address)	`UPPER_SNAKE_CASE`	`DMM_VISA`, `PSU_COM_PORT`
Config key (tuning value)	`lower_snake_case`	`dmm_timeout_ms`, `settle_delay_s`
Unit string	Standard abbreviation	`V`, `A`, `mA`, `mV`, `degC`, `dBm`, `MHz`

Measurement names are your SQL column. You query them by name: WHERE measurement_name = 'RAIL_3V3'. A measurement named "3.3V rail" (with a space and a dot) is harder to work with than RAIL_3V3. Name them as if you are naming a database column.

Package version is in the name and the manifest. The _v00, _v01 suffix in the package name allows two generations to coexist in the catalog simultaneously during rollout. The version field in package.json tracks patch revisions within a generation.

ESH10000121_FunctionalTest_v00   version: 1.0.3   ← Released (running in production)
ESH10000121_FunctionalTest_v01   version: 1.0.0   ← Evaluation (being validated)

Promote by setting _v01 lifecycle to Released and _v00 to Obsolete.

VS Code setup

Every scaffolded package includes schema wiring for VS Code. To activate it:

Install the redhat.vscode-yaml extension
Open the package folder as a workspace root (File → Open Folder — not the parent directory)
YAML files under tests/ immediately get field autocomplete, inline validation (red squiggles for unknown fields), and hover documentation

Do not open individual YAML files directly — the .vscode/settings.json at the package root is what wires the schema.

Writing Maintainable Test Configurations

One step, one operation

The most important structural decision in a test package is step granularity. Write one step for every operation that can fail independently.

Why this matters:

Retry granularity: retry: re-runs one step, not the whole sequence. A test that puts power-on, measurement, and power-off in one step cannot retry just the measurement.
Abort safety: run_on_abort: true can only be set on a step, not on a block inside a method. If cleanup is inside a large method that throws, the cleanup may not run.
Traceability: Each step has its own verdict, duration, and measurements in the database. A report that shows 20 steps with named verdicts is far more actionable than one step with a pass or fail.
Local testing: Each method can be unit-tested independently in MSTest or run_local.py.

The rule: if you would give an operation its own row in a test specification, it gets its own method and its own YAML step.

Keep limits in YAML, not in code

Runner code returns a raw value. The YAML definition holds the limits. This is not optional — it is the architectural contract.

Python

# ✅ Correct — return the value, let the engine evaluate the limit
def measure_3v3_rail(channel: str) -> dict:
    voltage = read_adc(channel)
    return {"voltage": str(voltage)}

Python

# ❌ Wrong — limit hardcoded in code; not version-controlled, not auditable
def measure_3v3_rail(channel: str) -> dict:
    voltage = read_adc(channel)
    if not (3.135 <= voltage <= 3.465):
        raise ValueError(f"RAIL_3V3 out of limits: {voltage}")
    return {"voltage": str(voltage)}

When a limit changes, the correct workflow is: edit the YAML, commit, deploy. No code review, no recompilation, no risk of introducing a bug while touching test logic. The YAML commit is the audit trail.

Keep instrument addresses in Station Config, not in YAML

A YAML file that works on one station and breaks on another is a maintenance hazard.

YAML

# ✅ Correct — address comes from Station Config, YAML is station-agnostic
parameters:
  address: "{{cfg.DMM_VISA}}"

YAML

# ❌ Wrong — hardcoded address breaks on any other station
parameters:
  address: "TCPIP::192.168.1.100::INSTR"

Instrument addresses, COM port assignments, agent hostnames, and calibration offsets all belong in Station Config. The YAML file should be deployable to any station without modification.

Structure steps for the critical path

Use post_execution_action: terminate-on-fail on steps where failure makes subsequent steps meaningless or dangerous. Use continue (the default) where a failure is informational and subsequent steps can still run.

YAML

# Critical path — stop immediately on any failure here
- name: "Power on"
  post_execution_action: terminate-on-fail

- name: "Connect to DUT"
  retry: { count: 2, delay_ms: 1000 }
  post_execution_action: terminate-on-fail

# Main measurements — collect all, don't stop on individual failures
- name: "Measure 3V3 rail"
  # post_execution_action omitted → continue (default)

- name: "Measure 5V rail"
  # continue

# Always runs — power off even if measurements failed
- name: "Power off"
  run_on_abort: true

The pattern: connect steps terminate on fail (no point measuring if the board isn't powered), measurement steps continue on fail (collect all failures in one run), cleanup steps run on abort.

Use feature-flag variables for conditional test paths

Rather than maintaining separate YAML files for "full test" vs "quick test" or "with calibration" vs "without", use precondition: with YAML-declared variables:

YAML

variables:
  run_calibration: true
  run_extended_thermal: false

steps:
  - name: "Calibration sequence"
    type: sequence
    sequence: "tests/calibration.yaml"
    precondition: "run_calibration == true"

  - name: "Extended thermal soak"
    runner: dotnet
    precondition: "run_extended_thermal == true"
    ...

The caller overrides these at execution time via the API. The YAML file is the single source of truth for all test variants.

Use sub-sequences for reusable instrument interactions

If multiple test files initialise the same instrument, put the init sequence in a shared sub-sequence:

YAML

- name: "Init DMM"
  type: sequence
  sequence: "tests/sub-sequences/dmm-init.yaml"
  parameters:
    dmm_address: "{{cfg.DMM_VISA}}"
  outputs:
    dmm_ready: "{{dmm_ready}}"

Sub-sequences are YAML files that follow the same format as top-level test files. They must have a test: root element with name: and version:. The engine ignores setup: and teardown: blocks inside sub-sequences.

Always declare units on measurements

The unit field appears in reports, in the Test Monitor UI, and in the database. A measurement without a unit (V, A, °C, dBm) is ambiguous in a shared report and forces the reader to check the YAML to understand the scale.

YAML

measurement:
  name: "SUPPLY_CURRENT"
  value: "{{current}}"
  low_limit: 0.15
  high_limit: 0.90
  unit: "A"    # ← always include this

Debugging Failed Runs Efficiently

Start with the Test Results report, not the logs

When a run fails, open the detailed report in Test Results first. The step-level verdict table tells you immediately which step failed and what the measurement values were. In most cases this is sufficient — you do not need the logs.

Only open the Logs panel when:

The step failed with ABORTED or UNDETERMINED (exception, not a measurement limit)
The measured values look correct but the verdict is wrong (suggests a variable substitution problem)
A step timed out (logs may show what the runner was doing at the moment)

Expand measurements before checking code

When a measurement fails, expand the step row to see the actual stored value and the stored limits. Before touching any code or YAML:

Is the actual value sensible? If it's 0, null, or the default variable value, the variable was not set by the runner — the issue is in the outputs: block, not in the measurement logic.
Are the limits in the right order? low_limit > high_limit will always fail.
Is the value close to a limit? A value like 3.134 against a low_limit: 3.135 is real hardware variation, not a code bug.

Use the config snapshot for post-mortem analysis

Every execution record stores a JSONB snapshot of the station configuration active at the time. If a test starts failing intermittently after a configuration change, compare the snapshot from a passing run against a failing run:

SQL

SELECT id, started_at, verdict, config_snapshot
FROM test_executions
WHERE serial_number = 'UNIT-042'
ORDER BY started_at DESC
LIMIT 10;

Differences in config_snapshot between runs — even on the same station — identify configuration drift as the root cause.

Develop and debug locally before deploying

The inner loop for test development is: write code → validate locally → deploy → run. Shortening the deploy-and-run step is less valuable than eliminating it.

For .NET packages: Write an MSTest project that calls the same static methods the runner will call. This gives you breakpoints, watch windows, and immediate feedback without a deploy cycle.

For Python packages: Write a run_local.py script at the package root. Call the test functions directly in the same sequence as the YAML, evaluating measurement results locally. Use python run_local.py --host <agent-ip> --serial DEV-001.

In both cases: declare all system-injected values (cfg.*, exec.*) as local constants at the top of the file, commented with their Maestro source. When the test runs on a real station, those values are injected automatically — your local constants are never deployed.

// MSTest — local constants that Maestro injects at runtime
private const string AgentHost = "http://agent64.local:5000";  // cfg.AGENT_HOST
private const string SerialNumber = "DEV-001";                  // exec.serial_number

Validate YAML before every commit

Run py validate.py before every commit. Level 1 validation catches structural errors (wrong field names, missing required fields, inverted limits) without a running stack. Level 2 catches semantic errors (undeclared variables, template resolution failures) and requires the API to be running.

Most YAML errors are caught by the VS Code schema extension in real time — treat a commit with red squiggles in the editor as a build failure.

Bash

# Validate all YAML files in the package
py validate.py

# Validate a specific file
py validate.py tests/main.yaml

Use the YAML Validator before running

For a quick sanity check without the command line, paste or upload your YAML to the Validator page (/yaml-validator) in the Maestro UI. The same validation rules run as in py validate.py. This is useful when making a quick change directly on a station.

Read logs with level filtering

In the Test Results Logs panel, filter by level before reading through the full output. Error entries (red) are almost always the root cause. Warning entries (yellow) identify conditions that preceded the failure. Information entries provide context.

Use [DEBUG] prefixes generously in runner code during development:

Python

print(f"[DEBUG] Raw ADC reading: {raw_value}")
print(f"[DEBUG] Converted voltage: {voltage}V")

Remove [DEBUG] lines before promoting a package to Released — they add noise to production reports and inflate log storage.

Avoiding Common Beginner Mistakes

Forgetting `run_on_abort: true` on cleanup steps

The most common hardware hazard in Maestro. If a test aborts — due to a runner exception, a failed step with terminate-on-fail, or an operator clicking Abort — any step without run_on_abort: true is skipped. A board left powered on, a relay left closed, or an instrument left in a triggered state can damage hardware or corrupt the next test.

Every step that powers off, disconnects, releases a resource, or resets an instrument must have run_on_abort: true.

YAML

# Mark ALL of these with run_on_abort: true
- name: "Power down DUT"
  run_on_abort: true

- name: "Disconnect from agent"
  run_on_abort: true

- name: "Release DMM"
  run_on_abort: true

Omitting `timeout_ms` from hardware steps

A step without timeout_ms that calls hardware will hang indefinitely if the instrument stops responding. The run never transitions to a terminal state. The only recovery is a manual Abort, and the only fix is adding timeout_ms.

Set timeout_ms on every step that communicates with an instrument. The value should be the realistic worst-case duration plus a safety margin — not the typical duration.

YAML

- name: "Flash firmware"
  runner: dotnet
  timeout_ms: 45000    # flashing takes ~30s; 45s gives headroom

Enabling MSTest parallelization

The MSTest project template enables method-level parallelization by default. For hardware tests, this causes multiple test methods to execute simultaneously, issuing concurrent commands to the same instrument. The result is intermittent failures that are hard to reproduce.

Add this to a dedicated MSTestSettings.cs file (not inline in a test class):

[assembly: DoNotParallelize]

Without this, expect to see ObjectDisposedException, instrument communication errors, and incorrect measurement values when running more than one test method.

Using `type: mock` in a deployed package

Mock steps return the static target: value from YAML without calling any runner code. They exist for development and should never appear in a package promoted to Evaluation or Released. A package with mock steps in production always passes — the steps are not testing anything.

Search for mock steps before promotion: grep -r "type: mock" tests/.

Using the legacy Scriban variable syntax

Old Maestro documentation and some generated stubs use {{test.variable}} (dot prefix) for variable references. The current engine uses flat names: {{variable}}. Using the legacy syntax produces no error — the template is left unresolved and the literal string "{{test.variable}}" is passed as the parameter value.

If a measurement value or parameter appears as a literal {{...}} string in a report, search the YAML for dot-prefixed variable references.

Placing `cfg.*` in a `precondition:` expression

YAML

# ❌ This silently fails — DynamicExpresso reads the dot as member access
precondition: "cfg.ENABLE_EXTENDED_TESTS == 'true'"

# ✅ Copy the config value into a bare variable first
variables:
  enable_extended: "false"

# setup step reads cfg.ENABLE_EXTENDED_TESTS into enable_extended...

precondition: "enable_extended == 'true'"

There is no error message — the precondition either throws (aborting the step) or evaluates incorrectly. This causes steps to be unexpectedly skipped or unexpectedly run.

Not bumping the version before deploying

Deploying the same version twice can cause the registry to serve the cached version. Bump the version field in package.json before every commit that will be deployed — even for a one-line limit fix. A patch bump (1.0.0 → 1.0.1) takes five seconds and prevents a category of "why isn't my change taking effect?" debugging sessions.

Using `Thread.Sleep` inside runner code for settle delays

// ❌ Sleep is invisible in results and non-tuneable without a rebuild
Thread.Sleep(500);
voltage = ReadVoltage();

// ✅ Settle delay is a YAML step — visible in results, tuneable without code change

YAML

- name: "Settle delay"
  type: delay
  duration: 0.5   # seconds

- name: "Measure voltage after settle"
  runner: dotnet
  ...

A type: delay step appears in the execution timeline with its duration, making it visible to anyone reviewing why the test takes as long as it does. The duration is tunable in YAML without rebuilding or redeploying runner code.

Using the Python `logging` module for log output

The runner captures sys.stdout and sys.stderr. The logging module holds a reference to the original stderr and is unaffected by the capture. logging.info(...) output does not appear in Maestro's Logs panel.

Use print() for all log output from Python runner code:

Python

print(f"[DEBUG] Connecting to {address}")
print(f"Measured voltage: {voltage}V")
print("[WARN] Response slower than expected", file=sys.stderr)

Stability & Reliability Best Practices

Retry the connect step, not the measurement steps

Network and instrument connection steps are the most likely to fail transiently (instrument not yet ready, DHCP delay, TCP reset). Measurement steps are far less likely to fail transiently once connected.

Structure your retry configuration accordingly:

YAML

# Connect — retry aggressively; failure here means nothing else can run
- name: "Connect to DUT agent"
  retry:
    count: 3
    delay_ms: 2000
  timeout_ms: 10000
  post_execution_action: terminate-on-fail

# Measurements — retry sparingly or not at all
- name: "Measure VCORE"
  timeout_ms: 5000
  # no retry — a failed measurement is a real failure, not a transient

Adding retry to every step makes the test suite resilient-looking but hides real failures. Reserve retry for steps where transient failure is a known characteristic of the hardware or network path.

Make retry-enabled steps idempotent

retry: re-calls the runner method from the beginning. If the method partially configured an instrument before it failed, the instrument is in an unknown state on the retry. Write retried steps so that re-running them from scratch produces the same result:

// ✅ Idempotent — always resets before configuring
public static void Connect(string host)
{
    _client?.Dispose();   // dispose any partial connection
    _client = new AgentClient(host);
    _client.Reset();      // put instrument in known state
    _client.Connect();
}

Pin image tags in production

Production stations should run known image versions, not latest:

Bash

# .env
IMAGE_TAG=abc1234   # specific commit SHA

Using latest means every docker compose pull potentially deploys new code. In a production environment, image updates should be a deliberate, tested operation — not an automatic side effect of a container restart.

Validate after every package refresh

After refreshing the registry and downloading a new package version, validate the YAML files before running them on a production station. Run py validate.py from the package directory, or use the YAML Validator UI. A structural error in a deployed YAML file prevents any test from starting.

Use lifecycle status as a promotion gate

The lifecycle states (NotReleased → Evaluation → Released → Obsolete) exist to prevent unvalidated packages from running on production stations. Enforce this:

State	Allowed on
`NotReleased`	Developer stations only
`Evaluation`	Validation/QA stations; never production
`Released`	All stations
`Obsolete`	No station — remove from catalog

Automate the promotion check in your CI pipeline: reject any deployment that changes a package from NotReleased directly to Released without an Evaluation stage.

Configure meaningful station names early

The STATION_NAME (or StationId in appsettings.json) is stamped on every execution record, every measurement, and every log entry. Changing it after data has been collected breaks historical queries that filter by station.

Choose station names before running any production tests. Use a consistent format (ST-01, LAB-A3, PROD-LINE-2) and document the naming scheme. A station renamed mid-deployment creates a split in the measurement history that is difficult to reconcile.

Keep global config minimal

Global config entries (those with StationId = NULL) apply to all stations. The more values you put in global config, the harder it becomes to configure stations differently when needed. Prefer station-local entries for any value that could plausibly differ between stations — even if it currently doesn't.

A good rule: only put values in global config that you are confident will never need to be different on any station. Default timeouts and registry URLs are good candidates. Instrument addresses are never global config candidates.

Archive or partition measurements before they affect query performance

At typical production throughput, the measurements table grows by millions of rows per week. Queries that were fast at 10 million rows may become slow at 100 million rows. Plan archival before you need it, not after you notice slowness.

Options, in order of implementation effort:

Retention window: Delete measurements older than N months on a schedule. Appropriate if you do not need long-term trend data.
PostgreSQL declarative partitioning: Partition by month. Old partitions can be archived or dropped without a table lock. Requires a one-time schema migration.
Separate archive database: Move data older than a threshold to a separate PostgreSQL instance or data warehouse.

Start planning when the table exceeds 50 million rows.

Test unattended mode explicitly before enabling it in CI

A test that runs correctly in manual mode may behave differently in unattended mode if it contains prompts. Before wiring a test into CI with unattendedMode: true:

Run the test interactively once and note every prompt step
Verify each prompt step has a Continue or Pass button (for button-only prompts) or an input.default (for value-input prompts)
Run the test with unattendedMode: true and verify the logged auto-responses are appropriate
Mark the test as unattended-validated in a comment in the YAML header:

YAML

test:
  name: "ESH10000121 Functional Test v00"
  version: "1.2.0"
  # unattended-safe: validated 2026-03-15 by jsmith

Without this discipline, a prompt step that only has a Fail button will cause automated runs to fail every time.

Recommended Project Structure

Package directory layout

Naming conventions

VS Code setup

Writing Maintainable Test Configurations

One step, one operation

Keep limits in YAML, not in code

Keep instrument addresses in Station Config, not in YAML

Structure steps for the critical path

Use feature-flag variables for conditional test paths

Use sub-sequences for reusable instrument interactions

Always declare units on measurements

Debugging Failed Runs Efficiently

Start with the Test Results report, not the logs

Expand measurements before checking code

Use the config snapshot for post-mortem analysis

Develop and debug locally before deploying

Validate YAML before every commit

Use the YAML Validator before running

Read logs with level filtering

Avoiding Common Beginner Mistakes

Forgetting run_on_abort: true on cleanup steps

Omitting timeout_ms from hardware steps

Enabling MSTest parallelization

Using type: mock in a deployed package

Using the legacy Scriban variable syntax

Placing cfg.* in a precondition: expression

Not bumping the version before deploying

Using Thread.Sleep inside runner code for settle delays

Using the Python logging module for log output

Stability & Reliability Best Practices

Retry the connect step, not the measurement steps

Make retry-enabled steps idempotent

Pin image tags in production

Validate after every package refresh

Use lifecycle status as a promotion gate

Configure meaningful station names early

Keep global config minimal

Archive or partition measurements before they affect query performance

Test unattended mode explicitly before enabling it in CI

Forgetting `run_on_abort: true` on cleanup steps

Omitting `timeout_ms` from hardware steps

Using `type: mock` in a deployed package

Placing `cfg.*` in a `precondition:` expression

Using `Thread.Sleep` inside runner code for settle delays

Using the Python `logging` module for log output