General Science Experiment Research Workflow

commissioned-by

spacedock@template

entity-type

experiment

entity-label

experiment

entity-label-plural

experiments

id-style

slug

state

$inline

stages

defaults

states

transitions

worktree	concurrency
false	1

name	initial
concept	true

name
ideate

name	terminal
expanded	true

name	initial
hypothesis	true

name	gate
propose	true

name	parked	gate
pilot	true	true

name	parked
full	true

name
analyze

name	terminal
conclude	true

from	to	label
concept	ideate	turn a research direction into candidate experiments

from	to	label
ideate	expanded	the concept has been expanded into hypotheses

from	to	label
hypothesis	propose	prepare the protocol and evidence plan

from	to	label
propose	pilot	protocol passes review and can run a pilot

from	to	label
pilot	full	pilot justifies the full experiment

from	to	label
pilot	hypothesis	pilot found a revisable flaw

from	to	label
pilot	conclude	pilot falsified the hypothesis

from	to	label
full	analyze	full experiment complete; interpret evidence

from	to	label
analyze	conclude	verdict and follow-up routing recorded

General Science Experiment Research Workflow

This repository is a public Spacedock workflow template for running repeatable scientific research loops. It helps a team turn broad research directions into falsifiable hypotheses, review each protocol, run a pilot before the full experiment, analyze evidence, and preserve lessons for future work.

Use this template when your project needs disciplined experiment tracking across ideas, protocol review, execution, analysis, and conclusion. It is intentionally domain-neutral: adapt the executor commands to your lab, simulator, benchmark, survey, notebook, or evaluation harness.

Use With Spacedock

Reference this public README URL when invoking spacedock:commission:

https://raw.githubusercontent.com/spacedock-dev/research-workflow-template/main/README.md

Example prompt:

Commission a new Spacedock workflow using this public workflow template:

https://raw.githubusercontent.com/spacedock-dev/research-workflow-template/main/README.md

Adapt it to my project. Keep the concept -> ideate -> expanded path for
research directions and the hypothesis -> propose -> pilot -> full -> analyze ->
conclude path for experiments. Preserve the one-independent-variable rule,
proposal review, pilot gate, artifact-level attribution, and durable learning
logs. My research area is: <brief description>. Put the generated workflow in:
docs/research.

Core Discipline

One independent variable per hypothesis. Change one protocol element at a time: treatment, prompt, model, reagent, parameter, dataset slice, measurement method, or analysis rule.
Fixed controls. Hold controls, sampling plan, runtime, instrument setup, inclusion criteria, randomization, and scoring method constant unless the hypothesis is specifically about one of them.
Pilot before full. A focused pilot checks whether the intervention fires, whether safety and validity checks pass, and whether the full run is worth the cost.
Clean audit before score. Do not trust a result until provenance, coverage, and exclusion checks are clean.
Evidence over assertion. Credit an effect only when the intervention reached the committed artifact or measured system.
Learning is an artifact. The experiment entity is the source of truth. Durable cross-experiment lessons belong in a learning log; workflow changes belong in a workflow-refinement log.

Roles

Role	Responsibility
Captain	Owns research strategy and approves gates.
First officer	Runs the Spacedock workflow, dispatches workers, advances state, and owns waits.
Ensign	Performs scoped work: ideation, protocol authoring, pilot execution, analysis, and artifact reads.
Gatekeeper	Reviews proposed protocols before pilot execution.
Executor	Runs the project-specific experiment, simulation, benchmark, study, or analysis job.

Entities

Two entity kinds share this workflow directory:

Concept (exp<NNNN>-<slug>.md, kind: concept) is a research direction. It follows concept -> ideate -> expanded.
Hypothesis (exp<NNNN>-<slug>.md, kind: hypothesis) is one testable protocol change. It follows:

hypothesis -> propose -> pilot -> full -> analyze -> conclude
                         |
                         +-> hypothesis  (revisable flaw)
                         +-> conclude    (cleanly falsified)

This workflow uses id-style: slug, so the filename slug is the Spacedock identity. The exp<NNNN> prefix is part of the slug, not a separate generated frontmatter id.

File Naming

Concepts and hypotheses share one exp<NNNN> slug prefix space.
Do not set a separate id: field in new entities; the slug is the id.
Use folder form (exp<NNNN>-<slug>/index.md) only when evidence becomes too large for a single markdown file.

Schema

Field	Type	Description
`title`	string	Human-readable title.
`status`	enum	`concept`, `ideate`, `expanded`, `hypothesis`, `propose`, `pilot`, `full`, `analyze`, `conclude`.
`kind`	enum	`concept` or `hypothesis`.
`source`	string	Where the entity came from.
`started` / `completed`	ISO 8601	Start and terminal dates.
`verdict`	enum	`PASSED`, `REJECTED`, or `INCONCLUSIVE` at terminal state.
`score`	number	Optional priority from 0.0 to 1.0.
`worktree`	string	Optional working directory if the experiment needs one.

Stages

`concept`

A broad research direction is filed.

Inputs: prior findings, literature gaps, failed experiments, reviewer questions, operator hunches, or a task-gap ranking.
Outputs: a concept entity with ## Direction, expected value, and known constraints.
Good: concrete enough to generate falsifiable hypotheses.
Bad: "improve results" without a suspected mechanism.

`ideate`

An ensign reads the concept, prior learnings, current baseline protocol, and available evidence, then writes 2-5 hypothesis entities.

Inputs: concept entity, prior conclusions, baseline method, constraints, and available executor surface.
Outputs: hypotheses with one independent variable, named target outcomes, controls, acceptance criteria, and expected artifact signatures.
Good: each hypothesis can be falsified by a pilot.
Bad: one large hypothesis containing several unrelated interventions.

`expanded`

The concept has produced candidate hypotheses and no longer needs active work.

Inputs: concept entity and generated hypothesis list.
Outputs: concept body updated with links to the generated hypotheses.
Good: later readers can see how the direction branched.
Bad: marking a concept expanded without creating or linking hypotheses.

`hypothesis`

A queued, fully formed hypothesis.

Each hypothesis should include:

## Hypothesis with the falsifiable claim and the single change.
## Independent variable naming exactly what changes.
## Held constant naming controls and invariants.
## Target outcomes naming primary and secondary outcomes.
## Acceptance criteria with pass/fail thresholds and audit requirements.
## Risk and validity notes for leakage, confounds, safety, and cost.

`propose` (gate)

The ensign authors the protocol package, then a gatekeeper reviews it. The captain makes the final call unless autonomous approval is explicitly enabled for a clean happy path.

Inputs: hypothesis entity, baseline protocol, prior learnings, and domain constraints.
Outputs: protocol diff, pilot plan, full-run plan, frozen or versioned execution artifacts, and a ## Gatekeeper review block.
Good: the protocol is mechanically executable and changes only the declared independent variable.
Bad: hidden control changes, missing audit path, leakage, undeclared safety risk, or vague success criteria.

`pilot` (gate)

A focused pilot checks whether the intervention is real, measurable, and worth a full run.

Inputs: frozen or versioned pilot protocol.
Outputs: pilot run directory, audit/provenance check, outcome delta versus baseline, artifact-level evidence, and a go/revise/reject recommendation.
Good: the pilot exercises the changed behavior without damaging controls or canaries.
Bad: advancing to full because the result "looks promising" without clean audit and attribution.

`full`

Run the full experiment using the same protocol that passed pilot, with only the declared sample-size or coverage expansion.

Inputs: approved pilot protocol and full-run plan.
Outputs: full run directory, raw results, audit/provenance report, and headline score or effect estimate.
Good: pilot and full differ only in declared coverage or sample size.
Bad: changing method, controls, or scoring between pilot and full.

`analyze`

Interpret the full experiment quantitatively and mechanistically.

Inputs: full run artifacts, audit report, baseline comparison, target outcomes, controls, and canaries.
Outputs: analysis answering result, attribution, moved controls, remaining confounds, and recommended next step.
Good: conclusions distinguish confirmed effect, noise, underpowered signal, confound, and infrastructure failure.
Bad: treating a score delta as truth without checking mechanism and audit.

`conclude`

Write the verdict and archive or promote.

Inputs: analysis, acceptance criteria, audit result, and follow-up routing.
Outputs: final verdict, evidence summary, caveats, mechanism, next action, and one-line durable learning.
Good: a future researcher can tell why the hypothesis was accepted, rejected, revised, or marked inconclusive.
Bad: terminal state without a verdict, evidence pointer, or follow-up.

Gatekeeper Review

At propose, review the protocol before pilot execution. Each rule receives PASS, WARN, or FAIL; unevaluable rules are FAIL with evidence naming what was missing.

Rule	Check
G1 single independent variable	The proposed protocol changes exactly the variable named in the hypothesis.
G2 provenance and leakage guard	The protocol avoids holdout answers, hidden labels, future observations, and unauthorized references.
G3 controls held constant	Baseline, control group, sample definition, scoring, environment, randomization, and runtime remain fixed.
G4 focused pilot	The pilot includes target cases, stable controls, and canaries for broad changes.
G5 reproducibility	Execution artifacts are frozen, versioned, or immutable enough to rerun.
G6 measurement validity	Outcomes measure the claim and are not self-anchored.
G7 actionability	The executor can mechanically run the proposed change and observe whether it fired.
G8 safety, ethics, and cost	Risks are declared and bounded.
G9 analysis before results	Acceptance criteria, exclusions, and statistical tests are written before execution.
G10 follow-up routing	The proposal names how outcomes route: advance, revise, reject, or escalate.

Recommendation:

APPROVE when no rules fail.
REVISE when failures are mechanical and fixable without changing the hypothesis.
REJECT when a failure compromises integrity, leakage guard, controls, safety, or the declared scientific claim.

Executor Contract

Spacedock manages workflow state; the project supplies the command that runs the experiment. Both tiers call one command shape:

./scripts/run-experiment <hypothesis-id> --tier pilot --out runs/<hypothesis-id>/pilot
./scripts/run-experiment <hypothesis-id> --tier full  --out runs/<hypothesis-id>/full

Each run writes meta.json, protocol.md, results.json, audit.json, logs/, and artifacts/ under --out. Long-running runs that outlive an agent turn use a detached launcher and a done sentinel.

See EXECUTOR.md for the full contract: required output schemas, the pilot-vs-full rule, the detached-run handle directory, and example foreground and detached wrappers. EXECUTOR.md is the source of truth.

Entity Templates

Concept:

---
title: <research direction>
status: concept
kind: concept
source:
started:
completed:
verdict:
score:
worktree:
---

## Direction

<theme, rationale, constraints, and why this direction may improve the target outcome>

Hypothesis:

---
title: <one-line hypothesis>
status: hypothesis
kind: hypothesis
source:
started:
completed:
verdict:
score:
worktree:
---

## Hypothesis

<falsifiable claim>

## Independent variable

<the one thing that changes>

## Held constant

<controls, runtime, sampling, scoring, inclusion criteria, environment>

## Target outcomes

<primary, secondary, controls/canaries>

## Acceptance criteria

<thresholds, audit requirements, attribution requirements>

## Gatekeeper review

## Pilot result

## Run result

## Analysis

## Failure Review

## Follow-up Routing

## Verdict

Learning Logs

Use a self-learning log for portable scientific lessons:

# Self-Learning Log

## Concluded Experiments

- **exp<NNNN> - PASSED/REJECTED/INCONCLUSIVE.** One-line lesson with the mechanism,
  caveat, and evidence pointer.

Use a workflow-refinement log for changes to this workflow's structure:

# Workflow-Refinement Log

## exp<NNNN> - <title>
- layer: <which workflow layer changed>
- refinement type: new-stage | reorder | replace | new-protocol | gate-rule | other
- finding: <what happened across the pilot/full run>
- learning: <transferable workflow lesson>
- bears-on: <related experiment ids>
- evidence: <entity section / run dir / artifact pointer>
- status: open | adopted-into-workflow | rejected-as-written

Maintainer Notes

Keep this README free of private paths, private benchmark names, and machine-specific commands.
Prefer stable branch or versioned tag URLs when sharing the template.
If the template changes incompatibly, publish a new versioned URL instead of silently changing old behavior.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
_artifacts		_artifacts
_gatekeeper		_gatekeeper
EXAMPLE-USE-CASE.md		EXAMPLE-USE-CASE.md
EXECUTOR.md		EXECUTOR.md
PUBLISH.md		PUBLISH.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

General Science Experiment Research Workflow

Use With Spacedock

Core Discipline

Roles

Entities

File Naming

Schema

Stages

`concept`

`ideate`

`expanded`

`hypothesis`

`propose` (gate)

`pilot` (gate)

`full`

`analyze`

`conclude`

Gatekeeper Review

Executor Contract

Entity Templates

Learning Logs

Maintainer Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

General Science Experiment Research Workflow

Use With Spacedock

Core Discipline

Roles

Entities

File Naming

Schema

Stages

concept

ideate

expanded

hypothesis

propose (gate)

pilot (gate)

full

analyze

conclude

Gatekeeper Review

Executor Contract

Entity Templates

Learning Logs

Maintainer Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

`concept`

`ideate`

`expanded`

`hypothesis`

`propose` (gate)

`pilot` (gate)

`full`

`analyze`

`conclude`

Packages