Simovian Intelligence — Physical Intelligence Data for Healthcare Robotics

Simovian Intelligence builds expert-annotated physical intelligence datasets from real healthcare and care workflows. We package task-level demonstrations for robotics and embodied AI teams training vision-language-action models, generalist manipulation policies, and humanoid systems intended to operate in real-world settings.

The training-data bottleneck for embodied AI is no longer simulation fidelity or model architecture. It is real-world ground truth captured at the interface between people, tools, and procedure-bound environments. That is the data we build.

What we capture

Tool-mediated procedures: applying a blood pressure cuff, taking vitals, drawing blood, transferring a patient with a gait belt.
Equipment operation: raising, lowering, and reconfiguring a hospital bed; adjusting an over-bed table; operating an IV pole.
Bimanual deformable-object handling: making a hospital bed, changing linens, dressing wounds, applying compression sleeves.
Human-interaction tasks under procedural logic: greeting, verbal handoff, consent confirmation, repositioning a patient with cooperation cues.
Failure cases and recovery: dropped tools, missed timing, communication breakdowns — the long tail that determines deployment readiness.

How we capture it

Demonstrations come from real exam rooms, ward bedsides, recovery bays, and care-home settings. Every session runs under consent-first protocols designed for regulated healthcare environments and is annotated by clinical experts who can name the procedural intent behind each motion — not just the trajectory. Annotations include action verbs, sub-task boundaries, tool grasp points, force class, hand-off events, and patient-state context.

Who we serve

Robotics labs building generalist manipulation policies for healthcare and care settings.
Embodied AI teams training vision-language-action (VLA) models that need real-world ground truth, not synthetic demonstrations.
Humanoid robotics companies preparing for clinical or care-adjacent deployment.
Strategic partners and investors building the physical intelligence data layer.

Why care settings

Care settings concentrate the workflows that generalist robots will eventually need to perform: bimanual handling of deformable objects, tool use under procedural logic, and high-repetition tasks involving cooperative human subjects. The data is hard to capture — consent, compliance, and clinical-grade annotation all matter — but it is exactly the data the field needs to bridge from lab demos to real deployments.

Why expert annotation

A nurse can tell you, in one second, whether a patient transfer was textbook, improvised, or risky. A crowdworker cannot. Our annotators are clinical professionals trained to label procedure, intent, and risk class at the resolution policy training requires. That signal is what separates a dataset that teaches a robot to mimic a motion from one that teaches it to do a job.

Data formats and annotation schema

Every capture session produces a structured record with synchronized multi-modal streams: RGB-D video from multiple viewpoints, verbal-context audio, and where applicable force and grasp telemetry. Annotations are hierarchical. At the scene level, each session is tagged with procedure family, environment type, and subject count. Within the scene, sub-task boundaries segment the workflow into discrete phases — preparation, execution, verification, cleanup — each with start and end timestamps aligned to the video stream.

At the atomic level, every manipulation event carries an action verb, a tool identifier, grasp-type classification, contact-point coordinates, force-class estimate, and a procedural-intent label drawn from a controlled clinical vocabulary. Hand-off events between participants are annotated with initiator, receiver, verbal cue, and confirmation signal. Error and recovery sequences receive additional labels: error type, detection latency, recovery strategy, and outcome assessment. This annotation depth is what makes the data usable for policy architectures that need to reason about intent and procedure, not just reproduce trajectories.

From demonstrations to deployable policies

Raw demonstrations are necessary but not sufficient. The pipeline from capture to policy training requires consistent formatting, validated annotation quality, and partitioning that surfaces the generalization gaps that matter for deployment. We deliver datasets with train, validation, and test splits structured around environment variation, subject variation, and task-complexity progression. Each split is designed so that evaluation performance predicts deployment performance, not just held-out accuracy on familiar settings. For teams training vision-language-action models, we provide aligned text descriptions at the sub-task level so that language-conditioned policies can ground instructions in real procedural context.

Coverage and environment diversity

Captures span acute-care hospital rooms, outpatient exam rooms, rehabilitation bays, long-term care facilities, and home-care settings. Task families include patient transfer and mobility assistance, vital-sign measurement, wound care and dressing changes, bed management and linen handling, medication preparation and delivery, and equipment setup and breakdown. Each environment introduces distinct physical constraints — room geometry, furniture placement, lighting conditions, floor surfaces — that affect how tasks are performed. This environmental diversity is critical for training policies that generalize beyond a single lab setup to the range of real-world conditions where care robots will eventually operate.

Working with us · Field notes