Submitting more applications increases your chances of landing a job.

Here’s how busy the average job seeker was last month:

Opportunities viewed

Applications submitted

Keep exploring and applying to maximize your chances!

Looking for employers with a proven track record of hiring women?

Click here to explore opportunities now!
We Value Your Feedback

You are invited to participate in a survey designed to help researchers understand how best to match workers to the types of jobs they are searching for

Would You Be Likely to Participate?

If selected, we will contact you via email with further instructions and details about your participation.

You will receive a $7 payout for answering the survey.


User unblocked successfully
https://bayt.page.link/77e7JdY3MkCiGWSq5
Back to the job results

Data Engineer (m/f/d)

2 days ago 2026/10/18
Other Business Support Services
Create a job alert for similar positions
Job alert turned off. You won’t receive updates for this search anymore.

Job description

Role Overview

This role focuses on designing and building scalable data infrastructure to support advanced autonomous systems. The position is responsible for transforming large-scale multimodal sensor data into high-quality, structured datasets that are ready for downstream processing and machine learning workflows.



The work involves establishing foundational systems and architectural decisions that will support long-term scalability, including how data is recorded, ingested, stored, versioned, labeled, and served. The environment handles hundreds of terabytes of data generated from LiDAR, cameras, IMU, GPS, and radar across multiple platforms.



Key Responsibilities
  • On-vehicle data recording pipeline
    Design and manage high-throughput recording systems, including topic selection, multi-GB/s write pipelines, and efficient data formats (MCAP/rosbag2). Oversee on-platform storage and ensure reliable data transfer to cloud environments with integrity checks. Ensure timestamp accuracy and synchronization across recorded data.



  • Data lake architecture
    Design and maintain scalable storage solutions across S3, FSx/Lustre, and GCS. Define data organization, regional placement, caching strategies, retention policies, data lineage, and cost optimization.



  • Dataset pipeline development
    Build pipelines that convert raw sensor data into structured, training-ready datasets. Ensure accurate time alignment across modalities, including ego-pose, calibration metadata, and scenario tagging.



  • Versioning and dataset management
    Implement robust dataset versioning and discovery processes. Evaluate and deploy tools such as DVC, LakeFS, Deep Lake, and FiftyOne, ensuring datasets are reproducible, traceable, and easily accessible.



  • Dataset format design
    Contribute to defining efficient on-disk dataset formats, focusing on write performance and optimized I/O for large-scale training workloads.



  • Annotation workflows
    Develop and manage annotation pipelines, including defining vendor handoff formats, ingesting labeled data, performing quality control, handling schema evolution, and supporting iterative dataset improvements.



Required Experience
  • 5+ years of experience building production-grade data infrastructure, ideally involving large-scale multimodal or sensor data (e.g., robotics, autonomous systems, geospatial, or scientific domains)
  • Strong proficiency in Python, with the ability to work with C++ for ROS2 and pipeline-related tooling
  • Hands-on experience with cloud storage and distributed systems (S3, GCS, FSx, Lustre), including performance and cost optimization
  • Experience with dataset versioning and ML data tools such as DVC, LakeFS, Deep Lake, FiftyOne, or similar platforms
Preferred Qualifications
  • Background in autonomous systems or mobile platforms, particularly in complex or unstructured environments
  • Experience working with large-scale annotation workflows and external labeling providers
  • Familiarity with distributed training approaches (e.g., DDP, FSDP) to support efficient collaboration with machine learning infrastructure


This job post has been translated by AI and may contain minor differences or errors.
You’ve reached the maximum limit of 15 job alerts. To create a new alert, please delete an existing one first.
Job alert created for this search. You’ll receive updates when new jobs match.
Are you sure you want to unapply?

You'll no longer be considered for this role and your application will be removed from the employer's inbox.