Skip to content

spacedock-dev/research-workflow-template

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

commissioned-by spacedock@template
entity-type experiment
entity-label experiment
entity-label-plural experiments
id-style slug
state $inline
stages
defaults states transitions
worktree concurrency
false
1
name initial
concept
true
name
ideate
name terminal
expanded
true
name initial
hypothesis
true
name gate
propose
true
name parked gate
pilot
true
true
name parked
full
true
name
analyze
name terminal
conclude
true
from to label
concept
ideate
turn a research direction into candidate experiments
from to label
ideate
expanded
the concept has been expanded into hypotheses
from to label
hypothesis
propose
prepare the protocol and evidence plan
from to label
propose
pilot
protocol passes review and can run a pilot
from to label
pilot
full
pilot justifies the full experiment
from to label
pilot
hypothesis
pilot found a revisable flaw
from to label
pilot
conclude
pilot falsified the hypothesis
from to label
full
analyze
full experiment complete; interpret evidence
from to label
analyze
conclude
verdict and follow-up routing recorded

General Science Experiment Research Workflow

This repository is a public Spacedock workflow template for running repeatable scientific research loops. It helps a team turn broad research directions into falsifiable hypotheses, review each protocol, run a pilot before the full experiment, analyze evidence, and preserve lessons for future work.

Use this template when your project needs disciplined experiment tracking across ideas, protocol review, execution, analysis, and conclusion. It is intentionally domain-neutral: adapt the executor commands to your lab, simulator, benchmark, survey, notebook, or evaluation harness.

Use With Spacedock

Reference this public README URL when invoking spacedock:commission:

https://raw.githubusercontent.com/spacedock-dev/research-workflow-template/main/README.md

Example prompt:

Commission a new Spacedock workflow using this public workflow template:

https://raw.githubusercontent.com/spacedock-dev/research-workflow-template/main/README.md

Adapt it to my project. Keep the concept -> ideate -> expanded path for
research directions and the hypothesis -> propose -> pilot -> full -> analyze ->
conclude path for experiments. Preserve the one-independent-variable rule,
proposal review, pilot gate, artifact-level attribution, and durable learning
logs. My research area is: <brief description>. Put the generated workflow in:
docs/research.

Core Discipline

  • One independent variable per hypothesis. Change one protocol element at a time: treatment, prompt, model, reagent, parameter, dataset slice, measurement method, or analysis rule.
  • Fixed controls. Hold controls, sampling plan, runtime, instrument setup, inclusion criteria, randomization, and scoring method constant unless the hypothesis is specifically about one of them.
  • Pilot before full. A focused pilot checks whether the intervention fires, whether safety and validity checks pass, and whether the full run is worth the cost.
  • Clean audit before score. Do not trust a result until provenance, coverage, and exclusion checks are clean.
  • Evidence over assertion. Credit an effect only when the intervention reached the committed artifact or measured system.
  • Learning is an artifact. The experiment entity is the source of truth. Durable cross-experiment lessons belong in a learning log; workflow changes belong in a workflow-refinement log.

Roles

Role Responsibility
Captain Owns research strategy and approves gates.
First officer Runs the Spacedock workflow, dispatches workers, advances state, and owns waits.
Ensign Performs scoped work: ideation, protocol authoring, pilot execution, analysis, and artifact reads.
Gatekeeper Reviews proposed protocols before pilot execution.
Executor Runs the project-specific experiment, simulation, benchmark, study, or analysis job.

Entities

Two entity kinds share this workflow directory:

  • Concept (exp<NNNN>-<slug>.md, kind: concept) is a research direction. It follows concept -> ideate -> expanded.
  • Hypothesis (exp<NNNN>-<slug>.md, kind: hypothesis) is one testable protocol change. It follows:
hypothesis -> propose -> pilot -> full -> analyze -> conclude
                         |
                         +-> hypothesis  (revisable flaw)
                         +-> conclude    (cleanly falsified)

This workflow uses id-style: slug, so the filename slug is the Spacedock identity. The exp<NNNN> prefix is part of the slug, not a separate generated frontmatter id.

File Naming

  • Concepts and hypotheses share one exp<NNNN> slug prefix space.
  • Do not set a separate id: field in new entities; the slug is the id.
  • Use folder form (exp<NNNN>-<slug>/index.md) only when evidence becomes too large for a single markdown file.

Schema

Field Type Description
title string Human-readable title.
status enum concept, ideate, expanded, hypothesis, propose, pilot, full, analyze, conclude.
kind enum concept or hypothesis.
source string Where the entity came from.
started / completed ISO 8601 Start and terminal dates.
verdict enum PASSED, REJECTED, or INCONCLUSIVE at terminal state.
score number Optional priority from 0.0 to 1.0.
worktree string Optional working directory if the experiment needs one.

Stages

concept

A broad research direction is filed.

  • Inputs: prior findings, literature gaps, failed experiments, reviewer questions, operator hunches, or a task-gap ranking.
  • Outputs: a concept entity with ## Direction, expected value, and known constraints.
  • Good: concrete enough to generate falsifiable hypotheses.
  • Bad: "improve results" without a suspected mechanism.

ideate

An ensign reads the concept, prior learnings, current baseline protocol, and available evidence, then writes 2-5 hypothesis entities.

  • Inputs: concept entity, prior conclusions, baseline method, constraints, and available executor surface.
  • Outputs: hypotheses with one independent variable, named target outcomes, controls, acceptance criteria, and expected artifact signatures.
  • Good: each hypothesis can be falsified by a pilot.
  • Bad: one large hypothesis containing several unrelated interventions.

expanded

The concept has produced candidate hypotheses and no longer needs active work.

  • Inputs: concept entity and generated hypothesis list.
  • Outputs: concept body updated with links to the generated hypotheses.
  • Good: later readers can see how the direction branched.
  • Bad: marking a concept expanded without creating or linking hypotheses.

hypothesis

A queued, fully formed hypothesis.

Each hypothesis should include:

  • ## Hypothesis with the falsifiable claim and the single change.
  • ## Independent variable naming exactly what changes.
  • ## Held constant naming controls and invariants.
  • ## Target outcomes naming primary and secondary outcomes.
  • ## Acceptance criteria with pass/fail thresholds and audit requirements.
  • ## Risk and validity notes for leakage, confounds, safety, and cost.

propose (gate)

The ensign authors the protocol package, then a gatekeeper reviews it. The captain makes the final call unless autonomous approval is explicitly enabled for a clean happy path.

  • Inputs: hypothesis entity, baseline protocol, prior learnings, and domain constraints.
  • Outputs: protocol diff, pilot plan, full-run plan, frozen or versioned execution artifacts, and a ## Gatekeeper review block.
  • Good: the protocol is mechanically executable and changes only the declared independent variable.
  • Bad: hidden control changes, missing audit path, leakage, undeclared safety risk, or vague success criteria.

pilot (gate)

A focused pilot checks whether the intervention is real, measurable, and worth a full run.

  • Inputs: frozen or versioned pilot protocol.
  • Outputs: pilot run directory, audit/provenance check, outcome delta versus baseline, artifact-level evidence, and a go/revise/reject recommendation.
  • Good: the pilot exercises the changed behavior without damaging controls or canaries.
  • Bad: advancing to full because the result "looks promising" without clean audit and attribution.

full

Run the full experiment using the same protocol that passed pilot, with only the declared sample-size or coverage expansion.

  • Inputs: approved pilot protocol and full-run plan.
  • Outputs: full run directory, raw results, audit/provenance report, and headline score or effect estimate.
  • Good: pilot and full differ only in declared coverage or sample size.
  • Bad: changing method, controls, or scoring between pilot and full.

analyze

Interpret the full experiment quantitatively and mechanistically.

  • Inputs: full run artifacts, audit report, baseline comparison, target outcomes, controls, and canaries.
  • Outputs: analysis answering result, attribution, moved controls, remaining confounds, and recommended next step.
  • Good: conclusions distinguish confirmed effect, noise, underpowered signal, confound, and infrastructure failure.
  • Bad: treating a score delta as truth without checking mechanism and audit.

conclude

Write the verdict and archive or promote.

  • Inputs: analysis, acceptance criteria, audit result, and follow-up routing.
  • Outputs: final verdict, evidence summary, caveats, mechanism, next action, and one-line durable learning.
  • Good: a future researcher can tell why the hypothesis was accepted, rejected, revised, or marked inconclusive.
  • Bad: terminal state without a verdict, evidence pointer, or follow-up.

Gatekeeper Review

At propose, review the protocol before pilot execution. Each rule receives PASS, WARN, or FAIL; unevaluable rules are FAIL with evidence naming what was missing.

Rule Check
G1 single independent variable The proposed protocol changes exactly the variable named in the hypothesis.
G2 provenance and leakage guard The protocol avoids holdout answers, hidden labels, future observations, and unauthorized references.
G3 controls held constant Baseline, control group, sample definition, scoring, environment, randomization, and runtime remain fixed.
G4 focused pilot The pilot includes target cases, stable controls, and canaries for broad changes.
G5 reproducibility Execution artifacts are frozen, versioned, or immutable enough to rerun.
G6 measurement validity Outcomes measure the claim and are not self-anchored.
G7 actionability The executor can mechanically run the proposed change and observe whether it fired.
G8 safety, ethics, and cost Risks are declared and bounded.
G9 analysis before results Acceptance criteria, exclusions, and statistical tests are written before execution.
G10 follow-up routing The proposal names how outcomes route: advance, revise, reject, or escalate.

Recommendation:

  • APPROVE when no rules fail.
  • REVISE when failures are mechanical and fixable without changing the hypothesis.
  • REJECT when a failure compromises integrity, leakage guard, controls, safety, or the declared scientific claim.

Executor Contract

Spacedock manages workflow state; the project supplies the command that runs the experiment. Both tiers call one command shape:

./scripts/run-experiment <hypothesis-id> --tier pilot --out runs/<hypothesis-id>/pilot
./scripts/run-experiment <hypothesis-id> --tier full  --out runs/<hypothesis-id>/full

Each run writes meta.json, protocol.md, results.json, audit.json, logs/, and artifacts/ under --out. Long-running runs that outlive an agent turn use a detached launcher and a done sentinel.

See EXECUTOR.md for the full contract: required output schemas, the pilot-vs-full rule, the detached-run handle directory, and example foreground and detached wrappers. EXECUTOR.md is the source of truth.

Entity Templates

Concept:

---
title: <research direction>
status: concept
kind: concept
source:
started:
completed:
verdict:
score:
worktree:
---

## Direction

<theme, rationale, constraints, and why this direction may improve the target outcome>

Hypothesis:

---
title: <one-line hypothesis>
status: hypothesis
kind: hypothesis
source:
started:
completed:
verdict:
score:
worktree:
---

## Hypothesis

<falsifiable claim>

## Independent variable

<the one thing that changes>

## Held constant

<controls, runtime, sampling, scoring, inclusion criteria, environment>

## Target outcomes

<primary, secondary, controls/canaries>

## Acceptance criteria

<thresholds, audit requirements, attribution requirements>

## Gatekeeper review

## Pilot result

## Run result

## Analysis

## Failure Review

## Follow-up Routing

## Verdict

Learning Logs

Use a self-learning log for portable scientific lessons:

# Self-Learning Log

## Concluded Experiments

- **exp<NNNN> - PASSED/REJECTED/INCONCLUSIVE.** One-line lesson with the mechanism,
  caveat, and evidence pointer.

Use a workflow-refinement log for changes to this workflow's structure:

# Workflow-Refinement Log

## exp<NNNN> - <title>
- layer: <which workflow layer changed>
- refinement type: new-stage | reorder | replace | new-protocol | gate-rule | other
- finding: <what happened across the pilot/full run>
- learning: <transferable workflow lesson>
- bears-on: <related experiment ids>
- evidence: <entity section / run dir / artifact pointer>
- status: open | adopted-into-workflow | rejected-as-written

Maintainer Notes

  • Keep this README free of private paths, private benchmark names, and machine-specific commands.
  • Prefer stable branch or versioned tag URLs when sharing the template.
  • If the template changes incompatibly, publish a new versioned URL instead of silently changing old behavior.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors