Cortex v1.0: Internal-State Modulation for Robot Policies

Cortex v1.0: Internal-State Modulation for Robot Policies

Cortex is an experiment in giving robot policies an explicit, interpretable execution state: values such as urgency, caution, uncertainty, attention, progress deficit, recovery phase, and push direction.

The first Go1 result is deliberately narrow. A trained multi-mode Go1 policy can walk normally or enter a low-stance crouch/brace mode from an explicit posture_command. When we add a compact Cortex state, the same policy can use that state as recovery context, and a prototype selector can turn high-risk moments into temporary crouch requests.

The core result is:

Cortex can expose an interpretable internal execution state that modulates how the same trained policy executes the same command.

At the company level, Cortex points toward an adaptive control layer for physical AI: a way for robot policies to change execution style based on risk, uncertainty, recovery state, and task context while keeping those changes legible to engineers and operators.

For Go1, that modulation is:

normal walk <-> crouch / brace walk
Cortex execution loop for Go1 push recovery
The demonstrated loop: Cortex exposes a compact execution state, the policy receives a mode request, and the robot returns to normal walking after recovery.

Why Now

Robot policies are becoming more capable, but deployment still needs more than raw action generation. Real systems need interpretable execution state, adaptive safety behavior, controllable recovery modes, and enough transparency for operators to trust what the robot is doing.

Cortex is Daionics' first step toward that layer: not another single-task controller, but an interface between learned robot competence and the execution context needed to deploy it.

Why Go1

Locomotion is not the final target for Cortex. It is a clean first test because execution state matters: a robot that is walking normally should not behave the same way when it is being hit, tipping, or recovering from a disturbance.

The research question was simple:

Can an interpretable internal state modulate how the same locomotion policy executes a task?

For this experiment, the execution-mode knob was a posture command:

posture_command = 0.0 -> normal walk
posture_command = 1.0 -> low-stance / crouch / brace walk

The blog-level question is whether a policy can remain the same policy while changing execution style when the internal state changes.

What Was Learned

The neural policy learned the motor behavior. It was trained with PPO to map robot observations, posture command, and optionally the Cortex state into joint actions.

There is no hand-coded rule like "if recovery phase is high, move this leg this way." If the robot recovers differently when Cortex is live, that difference comes from the learned action mapping using the Cortex inputs.

The policy learned two important interfaces:

  1. It learned to execute both normal walking and crouch/brace walking from the posture command.
  2. In the Cortex-conditioned checkpoint, it learned to use the additional Cortex state as part of push recovery.

Prototype Boundary

This first result uses a prototype Cortex state and a prototype mode selector. That boundary matters: the policy learned the motor skills and learned to condition on Cortex inputs, while the current selector is still a scaffold for testing the interface.

The useful thing to share here is not the exact implementation recipe. The useful thing is the separation:

Cortex state -> execution context
policy -> learned motor behavior
mode request -> temporary change in execution style

This result does not claim that Cortex has learned the final mode-selection strategy. It shows that learned motor competence can be modulated through an interpretable state layer. Future versions should make more of this layer learned, richer, and task-dependent.

Where This Matters First

The near-term opportunity is not every robot at once. Cortex is most useful in robot tasks where failure is expensive, contacts are messy, and the correct behavior depends on the current execution state.

The first deployment targets are:

  1. Mobile robots and quadrupeds that need better recovery from pushes, slips, and terrain instability.
  2. Humanoid and whole-body systems that need explicit recovery, bracing, and cautious execution modes.
  3. Industrial manipulation where stiffness, compliance, retry behavior, and grip conservatism should change with contact uncertainty.
  4. Robot handoffs and mobile manipulation, where the same policy may need to move confidently, slow down, yield, or recover depending on risk.
  5. Safety and oversight layers for robotics teams that need interpretable state rather than opaque policy activations.

Training Setup

The experiment used a two-stage policy.

StageSetupResult
Stage 1Multi-mode base policyLearned normal walk and crouch/brace walk
Stage 2Cortex-conditioned fine-tuneLearned to use Cortex inputs during push recovery
ControlDisturbance-trained no-Cortex policyUsed as a comparison point

Stage 1 trained the posture interface. The policy walked normally when posture_command=0 and crouch-walked/braced when posture_command=1.

Stage 2 warm-started from Stage 1, added the Cortex state, and trained under push disturbances.

Result 1: Cortex Helps Recovery

The first paired evaluation compared the same Cortex-trained checkpoint with live Cortex values versus zeroed Cortex values under strong planar pushes.

MetricLive CortexZeroed Cortex
Mean steps survived457.03229.00
Fall rate0.250.97
Mean reward16.758.96
Final height0.260.20

Live Cortex roughly doubled the mean survival duration and reduced falls from 97% to 25%. The same policy and same robot observation behaved differently when only the Cortex slot was zeroed, which is the key evidence that the policy learned to use Cortex as recovery context.

Chart comparing live Cortex and zeroed Cortex recovery results
Paired strong-push evaluation of the Cortex-trained checkpoint. Higher is better for steps and reward; lower is better for fall rate.

Result 2: Crouch Is a Real Mode

The posture command is not cosmetic. In interactive body-push evaluation, normal walking tended to fall under horizontal hits applied above ground, while pressing P to enter crouch/brace mode survived the same style of hit.

The body-push setting makes low stance physically relevant: the robot is not only being pushed sideways, it has to resist tipping and recover without permanently abandoning normal walking.

Result 3: Cortex Drives Temporary Bracing

The final evaluation turned on the prototype Cortex posture selector under body-push disturbances. The selector used the Cortex state to request temporary bracing.

MetricLive Cortex SelectorZeroed Cortex Selector
Mean steps survived396.75265.94
Fall rate0.410.91
Mean reward15.0710.40
Mean posture0.220.00
Max posture1.000.00
Crouch fraction0.210.00
Posture transitions2.970.00

Live Cortex selector survived longer and fell less often. It did not simply stay crouched forever: the robot spent about 21% of the rollout in crouch/brace, with roughly three posture transitions per rollout.

Cortex-enabled mode selection in the body-push setting. The tracking camera keeps the robot centered through the rollout.
Matched rollout with the Cortex signal zeroed. The robot loses stability earlier under the same evaluation setup.
Chart comparing live Cortex selector with zeroed Cortex selector
With body pushes, the live Cortex selector requests temporary bracing and improves survival compared with a zeroed selector.

What This Proves

These results support four claims:

  1. A Go1 policy can learn an explicit posture-mode interface.
  2. A compact Cortex state is behaviorally useful during disturbance recovery.
  3. A prototype Cortex posture controller can turn that internal state into an explicit posture_command.
  4. Under strong pushes, live Cortex produces temporary brace/crouch behavior and improves survival relative to zeroed Cortex.

What It Does Not Prove

This is an early interface-validation result, not a full general robotics claim.

It does not prove that Cortex learned posture selection end to end. It also does not prove that this Go1 result alone transfers to manipulation and whole-body control.

A stricter future report should add a timer baseline that crouches for a fixed number of steps after each detected push, fixed-normal versus fixed-crouch headless tables under body pushes, and confidence intervals for the comparisons.

Why This Matters

The important part is not "Cortex is a better locomotion policy." The important part is that the policy has a clean execution-mode interface, and Cortex can expose interpretable state that changes how the policy uses that interface.

For Go1, the mode knob is posture:

normal walk <-> crouch / brace walk

In later systems, the knobs can be richer:

speed
compliance
stiffness
grip conservatism
retry behavior
handoff caution
contact recovery

The next target should be a task where execution style matters more than survival locomotion: manipulation with contact uncertainty, handoff recovery, walk-to-reach, or whole-body control. In those settings, Cortex should evolve from a prototype scaffold into a learned module.

Go1 is the scaffold. The larger goal is a robot control stack where policies are not only trained to act, but can be steered through interpretable internal execution state.

Work With Us

Daionics is looking for robotics partners, investors, and engineers who care about deployable physical AI rather than demos alone.

For robotics teams: we want to test Cortex-style execution state on real recovery, manipulation, and handoff tasks.

For investors and technical partners: we can share the deeper technical memo behind this result.

For engineers and researchers: Daionics is building across robot learning, control, orchestration, and operator-facing systems.

Email info@daionics.com to start a conversation.