Data Engineer
This role involves building and optimizing the data infrastructure that powers analytics, machine learning, and operational decision-making across AI-focused organizations. Data engineers in this position design scalable pipelines to ingest data from infrastructure, product systems, and business operations, then transform that raw data into reliable datasets that serve analysts, data scientists, and product teams. What sets this role apart is its foundation-level focus—rather than analyzing data or building models, these engineers architect the systems, data models, and warehouses that make all downstream work possible. They typically report into data or platform leadership and work cross-functionally with product, engineering, finance, and operations teams to translate business requirements into production-grade data infrastructure that scales with organizational growth.
Skills
What companies are looking for in this role.
Designing and building scalable data pipelines for ingesting and processing structured and unstructured data from multiple sources
Implementing ETL/ELT processes and orchestration workflows to transform raw data into analytical formats
Architecting and managing data warehouses and data lakes with focus on storage optimization, retrieval performance, and lifecycle management
Writing production-quality SQL and Python code for data processing and transformation
Establishing and monitoring data quality frameworks including testing, validation, and observability systems
Designing and optimizing data models including dimensional modeling, star schemas, and appropriate normalization strategies
Designing data governance, access controls, and security measures including PII handling and compliance standards
Building and maintaining dashboards, reporting systems, and semantic layers for business intelligence and analytics
Automating manual data processes and optimizing workflows for performance and cost efficiency
Building observability and monitoring infrastructure for data pipelines including alerting on anomalies and data drift
Implementing CI/CD practices and IT General Controls to ensure reliability and security of data flows
Building data products and metrics that power machine learning training pipelines and model improvement
Designing privacy-first data architectures and implementing controls for sensitive employee and user data
Designing data infrastructure for AI/ML workloads including training data preparation and quality evaluation
Implementing master data management patterns including golden records, deduplication, and identity resolution
Designing and operating medallion architecture patterns for structured data transformation (bronze/silver/gold)
Building streaming data pipelines and real-time data processing systems
Managing reverse ETL and data activation to push insights back into operational systems
Collaborating with cross-functional stakeholders including business teams, analysts, data scientists, and engineers to translate requirements into data solutions
Translating ambiguous business requirements into scalable datasets, metrics, and self-serve data products
Communicating technical concepts clearly through documentation, runbooks, and knowledge sharing with team members
Mentoring team members and establishing best practices, standards, and code review processes
Technology
The tools and technologies that define this role.
Open Jobs
34 open Data Engineer jobs across 24 companies.
Other Data & Analytics roles
Applies statistical modeling, machine learning, and experimentation to extract insights from data.
Bridges data engineering and analytics by building data models, metrics layers, and self-serve analytics tools.
Analyzes data to generate actionable business insights, builds dashboards and reports.
Data professionals specializing in marketing and go-to-market measurement, attribution modeling, and revenue intelligence. Focuses on building analytical frameworks, experimentation, and data-driven insights to optimize GTM strategy. The emphasis is on analytics methodology and data infrastructure for marketing.
Manages data labeling, annotation, and curation operations for machine learning.