Data lakehouse and query engine for robotics
Query by topic, time range, or tags across your fleet's sensor data. Stream results to your analysis and visualization tools.
Robot fleets generate terabytes of multimodal sensor data: cameras, LiDAR, IMU, joint states. Legged robots (humanoids and quadrupeds) are particularly data-intensive with 28-40+ degrees of freedom, streaming high-frequency data from joint encoders and force sensors. Each robot produces thousands of files with intermittent connectivity and varying file sizes.
Engineers need instant access to sensor data for efficient debugging and visualization. Load only the data you need, when you need it.
We're adapting data lakehouse principles for robotics workflows: open formats on object storage, metadata-driven architecture, columnar analytics. Arrow Flight API provides streaming access to time-series data. Your robot logs remain in standard files while gaining database-like queryability.
Built with Rust and Arrow for performance. Kubernetes operators handle orchestration. S3-compatible storage is the only dependency — runs anywhere from local development to cloud scale.
The system supports common robotics formats like MCAP files and system logs. Custom converters can be added to handle proprietary formats and vendor-specific file types.
Internally, all data is stored as RLD (RobotLogs Data), our columnar format derived from MCAP (the ROS 2 default). Similar to Parquet and ORC, RLD organizes data in columns for efficient analytics. The key difference is that RLD preserves opaque binary messages (Protobuf, ROS, and custom formats) exactly as recorded, without requiring structured schemas. RLD reorganizes MCAP's time-ordered chunks into Apache Arrow RecordBatches grouped by topic, enabling efficient columnar access through Arrow IPC format while maintaining message-level compatibility.
When querying for /joint_states, only that column is read from storage. Each column includes time-range indices to support precise queries for specific time windows. Teams already using MCAP gain columnar query performance without changing their tools or workflows.
RLD File Structure:
Section | Format | Content |
---|---|---|
Header | Metadata | Format identification |
Messages | Arrow Stream | All messages grouped by topic |
Attachments | Arrow Stream | Binary attachments |
Schemas | Arrow File | Message schemas |
Channels | Arrow File | Topic definitions |
Message Index | Arrow File | Batch locations & time ranges |
Metadata | Arrow File | Key-value metadata |
Footer | Fixed 28 bytes | Index offsets & checksums |
Each robot has a single continuous timeline. Data from different sources maps to timestamps on this timeline. Sources upload at their own schedules and the system correlates data by time.
Robot-42 Timeline ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Sensor data: │←─30min─→│ │←─30min─→│ System logs: │←──────────── 24 hours ──────────→│ Diagnostics: │←─15min─→│ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10:00 12:00 14:00 16:00
Time-based queries return data from multiple sources together. The system maintains a continuous timeline without segmenting data into fixed sessions.
Upload Service
REST API that generates pre-signed S3 URLs for direct robot uploads. Creates upload records with tags and metadata for tracking and organization.
Lakehouse Operator
Kubernetes controller that orchestrates the processing pipeline:
Flight Service
Arrow Flight RPC server for high-performance data queries. Streams specific topics and time ranges directly from columnar storage without loading entire files.
Query robot logs using Arrow Flight RPC for high-performance streaming:
# Connect and query specific topics
client = flight.connect("flight.robotlogs.io:8815")
query = {
"robot_id": "robot-42",
"topics": ["/joint_states", "/imu/data"],
"time_range": ["2024-01-01T10:00:00Z", "10:03:00Z"]
}
# Stream data as Arrow RecordBatches
for batch in client.stream(query):
process(batch) # Your analysis code
All data and metadata live in S3 as standard files — no proprietary formats or lock-in. Build your own analytics pipelines, integrate with existing tools, or query directly with Arrow Flight. RobotLogs Lakehouse provides the foundation; you choose how to build on top.
Planning to open source under MIT license. The community edition will include full platform functionality with community-driven support. Enterprise customers can access stable releases, professional support with SLAs, and compliance documentation.
Foundation for building debugging and analysis tools on top of robot logs
Retrieve sensor data from specific time windows around failures. Access only the topics relevant to the incident.
Query across your robot fleet to identify patterns and anomalies. Analyze historical behaviors from walking gaits of legged robots to manipulation sequences. Compare performance across software versions.
Foundation to build visualization tools or integrate with existing ones. Stream data on-demand for interactive debugging and analysis.
Store years of robot data cost-effectively with columnar compression. Query historical data as easily as recent logs.
Generate operational reports and maintain audit trails. Export data for regulatory compliance or safety investigations.
Extract specific sensor streams for machine learning pipelines. The columnar format will enable efficient filtering by conditions and events.