Skip to content

The Ascent: One Training Cycle

Each training cycle is called The Ascent. The model that runs signal extraction on Monday is not the same model that runs it on Friday. By the time the cycle completes, every verified decision from the previous run has been absorbed into the weights. The system compounds.

This page walks through what happens inside a single Ascent — the mechanics that turn market outcomes into a smarter model.


The current fine-tuned model runs inference across the historical window using quantized weights loaded locally. For each candle in the window, three knight agents evaluate independent views of the market state.

The raw inputs are sanitized features — no dates, no absolute timestamps. The model sees price behavior and derived indicators, not calendar context. This prevents the model from learning that specific calendar periods were historically favorable rather than structurally favorable.

Each agent emits a structured vote: BUY, SELL, or HOLD, with reasoning attached. Votes are written to per-role signal files. The extractor is designed so only one instance runs at a time — concurrent writers corrupt the signal log.


After extraction, market outcomes are attached to each decision. The price movement following the decision window becomes the ground truth label. The labeling layer applies outcome-driven correction rules: correct decisions are reinforced, incorrect decisions are relabeled to reflect what the market actually demanded.

This produces a dataset where the market — not a human annotator — is the judge. The specific thresholds and relabeling logic are the proprietary core of the training loop. What can be said publicly: wrong calls get corrected, right calls get reinforced, and ambiguous calls get resolved toward the action the market would have rewarded.


Labeled signals are split into train and validation sets. A regime-balancing pass corrects for market condition bias in the training window — if the window skews heavily toward one market regime, the dataset is rebalanced before training begins.

The preparation stage also applies a difficulty-weighted sampling pass: decisions the model has already mastered contribute less training signal. Compute concentrates on the decisions the model hasn’t solved yet.


Training runs on a cloud accelerator via a burst pod. The base model is loaded, a LoRA adapter is trained on the labeled dataset, and the adapter is merged back into the full model weights.

Knights use compact dense models optimized for throughput. The King uses a large thinking-capable model optimized for deliberation. Different roles have permanently different inference profiles — the tracks are built differently from the start.

A watchdog monitors the pod, tracks spend, and tears down the instance automatically when training completes.


The merged model is immediately quantized for local inference. The quantized artifact is what ships back to the local machine. Full-precision weights are not kept — only the quantized model and the adapter checkpoint.

Each cycle’s model is archived with a cycle label before the next cycle overwrites the default. This archive is critical — without it, prior model generations are unrecoverable.


On the next cycle, the new GGUF replaces the old one as the extractor. It runs signal extraction over the same window. Because the model is now better at identifying the patterns it previously missed, the new round of signals is higher quality — more precise votes, fewer relabeling corrections needed.

Better model → better signal → better training data → better next model.

This is the compounding loop. It does not converge quickly. Early cycles are noisy. By the tenth Ascent, the institutional memory embedded in the weights is meaningfully different from the base model it started from.


The Canon is the institutional record that survives model death. When a model is replaced, the Canon remains: verified decisions, confirmed patterns, the Scribe’s ledger. The next generation inherits the accumulated knowledge, not just the weights.

This is what distinguishes the Tribunal from a standard fine-tuning loop. The loop is not about making one model good. It is about building a self-improving institution — one that grows stronger across generations.