24 frames of hands clasping , plate 533 from Animal Locomotion by Eadweard Muybridge, 1887. Public domain.
Hands clasping. Plate 533, Animal Locomotion (Muybridge, 1887).

Stera is an open infrastructure forEmbodied AI

Capture, process, and export multimodal data on hardware you already own.

Stera-10M.Open today.

Ten million frames of in-the-wild egocentric activity. 200 hours, 354 sessions, 108 minutes longest continuous capture. Every frame annotated with depth, 6-DoF pose, MANO hands, and an atomic-to-session-scale instruction tree. Available on Hugging Face.

A data lab in your pocket.

Start your own portable data lab with an iPhone Pro and the Stera App. ARKit fuses RGB, depth, IMU, and 6-DOF tracking entirely on-device. Letting you capture multi-modal data anywhere. Just mount, record, and go.

Stera Capture , recording settings sheet
Stera Capture , home, collect multimodal data
Stera Capture , library of uploaded sessions

Raw frames in. Research signals out.

$pip install stera-sdk

One pipeline turns every session into multiple modalities - RGB-D, 6-DoF poses, 21 MANO articulations, per-hand, IMU, upper body co-ordinates, 3D mesh for real-to-sim, hierarchical textual instruction trees with no human in the loop.

recording.mcapStera SDK7 stages · 1 config · no glue codeRGB-D15 Hz6-DoFper frameMANO21 jointsIMU100 HzBodyskeletonMesh+ cloudTexttreeannotation.hdf55 groups · frame-alignedEVALUATE94Health · 0–100Per-stream metricsreport.htmlEXPORT EPISODErgb.mp4H.264 videomesh.plyscene meshannotation.hdf5all signalsvisualization.rrdRerun replaycalibrations/intrinsics · TFthumbnail.jpgpreview

Stera FAQ

How do I access Stera?
Join here. If you're a university or organization seeking large-scale access, contact [email protected].
What is Stera, and who is it for?
Stera is an open data pipeline for embodied AI training , it includes a capture stack, a processing pipeline, and an SDK for multimodal data. Built for researchers and labs training VLAs, world models, and manipulation policies that need real human-work data without buying gated hardware.
How is Stera different from Project Aria?
Aria produces very high-fidelity egocentric data but requires gated hardware access. Stera runs on a consumer iPhone Pro and ships the entire capture and processing stack as open code , anyone can record data tomorrow, in environments Aria can't reach. You get hour-plus continuous sessions with depth, 6-DoF pose, and 21-joint MANO hands, comparable to Aria across the modalities researchers actually train on.
What hardware do I need to capture my own data?
An iPhone Pro (12 Pro or newer, with LiDAR) and the Stera Capture iOS app. No external sensors or rigs required.
What format does the dataset come in?
Each session is delivered as a directory: one MP4 of RGB, one HDF5 with all per-frame annotations (depth, pose, MANO hands, IMU, hierarchical text labels), a PLY scene mesh, calibrations, and Rerun-sdk visualization recordings. The SDK exports to LeRobot (Hugging Face's robotics format), raw MCAP, and RRD for visualization.
Can I bring my own storage bucket?
Yes. Contact us at [email protected] to bring your own bucket. We can help you configure your own S3, GCS, or Azure bucket in the app settings or via the SDK config, and all recordings and processed outputs will route to that bucket instead of FPV's default storage.
Do I need to pay to use Stera?
Our free plan offers up to 25 GB of free storage. If you're a research lab or organization seeking large-scale access, contact us at [email protected].
How do you handle privacy, faces, and sensitive locations in recorded data?
We provide privacy and PII models in our SDK that you can use to blur faces and other personally identifiable information.
What's the recommended workflow for researchers wanting to contribute back to Stera-10M?
Start collecting data and reach out to us on Discord or [email protected] to contribute. We can add your contributions in the next release.
How do I stay up to date with Stera news?
Follow @fpvlabs on X for shorter-form updates, release notes, new datasets, and research updates.