Senior Data Engineer

Location: Pune, India

Contract Type: Permanent, full time

Build the systems that keep ships moving.

At CFARER, we create software that helps crews and offices work in rhythm.

Our promise is simple: making maritime life easier.

The role

We are seeking a highly skilled Senior Data Engineer with extensive experience designing, developing, and optimizing enterprise data solutions on the Azure cloud platform.

The ideal candidate will have deep expertise in Azure Synapse, Python, and ETL/Data Pipeline engineering, with strong understanding of modern data architectures, governance frameworks, and distributed data processing.

CFARER is an international, equal-opportunity employer. We celebrate diversity and are committed to an inclusive workplace.

What you’ll do

Data Pipeline & ETL/ELT Engineering

Design, build, and optimize scalable ETL/ELT pipelines using Azure Synapse, Azure Data Factory, and Apache Spark.
Implement incremental, batch, micro-batch, and real-time data processing using ADLS and Delta Lake.
Work with Medallion architecture (Bronze → Silver → Gold) for data lake optimization.

Data Governance, Quality & Security

Implement data governance using Microsoft Purview, Data Catalog, RBAC, and secure access controls.
Define and enforce data quality frameworks using Great Expectations or equivalent tools.

Azure Platform Engineering & Integration

Build and orchestrate workflows using Azure Logic Apps, Azure Functions, REST API integrations, and event-driven services (Event Hub/Service Bus).

Python & Spark Development

Develop Python modules and notebooks for automation, transformations, and ML integrations.
Write optimized PySpark jobs for Synapse Spark or Databricks.

Observability & Performance Optimization

Monitor and optimize pipelines using Azure Monitor, Log Analytics, and Application Insights.
Tune SQL queries and Spark jobs for improved performance.

Collaboration, Agile Delivery & Documentation

Collaborate with cross-functional teams including data architects, analysts, and business stakeholders.
Document data flows, governance policies, and architecture diagrams.
Implement CI/CD using Azure DevOps.

AI-Driven Data Migration Skills & Strategies

Use AI-assisted data profiling and discovery to assess legacy data platforms and migration complexity.
Apply ML-based data quality and anomaly detection to identify inconsistencies, duplicates, and loss risks during migration.
Leverage AI-assisted schema and SQL conversion techniques to modernize legacy databases into Azure SQL and Synapse SQL.

AI, ML & GenAI Skills

Design and build data pipelines to support Machine Learning (ML) model training, validation, and inference using Azure ML and Synapse.
Enable MLOps workflows including dataset versioning, feature engineering, experiment tracking, and model monitoring.
Support Generative AI (GenAI) and LLM-based solutions using Azure OpenAI and Retrieval-Augmented Generation (RAG) architectures.
Build and manage embedding pipelines, vectorized data, and metadata enrichment for AI-driven search and copilots.

What you bring

Bachelor’s degree in Computer Science, Information Technology, Data Engineering, Software Engineering, or a related technical field.

Master’s degree in Data Engineering, Computer Science, Information Systems, or related domain is a plus.

Relevant certifications preferred but not mandated (any of the below):

Microsoft Certified: Azure Data Engineer Associate
Microsoft Certified: Fabric Data Engineer Associate
Microsoft Certified: Azure Solutions Architect
Any recognized certification in Python, Big Data, or Cloud Engineering

Work Experience Requirements

10+ years of professional experience in Data Engineering, Data Warehousing, or Big Data platform development.
Minimum 5+ years hands-on experience with Azure Data Services (ADF, Synapse, ADLS, Databricks, Logic Apps).
5+ years of experience in ETL/ELT pipeline development in cloud or hybrid environments.
Strong Python experience (automation, data transformation, reusable libraries).
Demonstrated experience working with large-scale distributed data processing (Spark / PySpark).
Proven experience building production-grade data pipelines, implementing data lake architectures, and managing end-to-end data lifecycle.
Experience working in Agile/Scrum teams, CI/CD pipelines, and DevOps practices.

What we offer

Generous paid leaves (Annual, Sick, Compassionate, Public holidays, Marriage, Maternity, Paternity, Medical leaves)
Medical benefits (Insurance and Annual health check-up)
Pension and insurance policies (Group Term Life Insurance, Group Personal Accident Insurance, Travel Insurance)
Additional benefits (Internet, Phone bill reimbursement)

Why CFARER

CFARER provides a work environment driven by strong leadership, professional growth, and a healthy work-life balance. We use modern tools and embrace digital opportunities that help us scale and innovate.

Apply now: career@cfarer.world

Senior Data Engineer

Location: Pune, India

Contract Type: Permanent, full time

The role

We are seeking a highly skilled Senior Data Engineer with extensive experience designing, developing, and optimizing enterprise data solutions on the Azure cloud platform.

The ideal candidate will have deep expertise in Azure Synapse, Python, and ETL/Data Pipeline engineering, with strong understanding of modern data architectures, governance frameworks, and distributed data processing.

CFARER is an international, equal-opportunity employer. We celebrate diversity and are committed to an inclusive workplace.

What you’ll do

Data Pipeline & ETL/ELT Engineering

Design, build, and optimize scalable ETL/ELT pipelines using Azure Synapse, Azure Data Factory, and Apache Spark.

Implement incremental, batch, micro-batch, and real-time data processing using ADLS and Delta Lake.

Work with Medallion architecture (Bronze → Silver → Gold) for data lake optimization.

Data Governance, Quality & Security

Implement data governance using Microsoft Purview, Data Catalog, RBAC, and secure access controls.

Define and enforce data quality frameworks using Great Expectations or equivalent tools.

Azure Platform Engineering & Integration

Build and orchestrate workflows using Azure Logic Apps, Azure Functions, REST API integrations, and event-driven services (Event Hub/Service Bus).

Python & Spark Development

Develop Python modules and notebooks for automation, transformations, and ML integrations.

Write optimized PySpark jobs for Synapse Spark or Databricks.

Observability & Performance Optimization

Monitor and optimize pipelines using Azure Monitor, Log Analytics, and Application Insights.

Tune SQL queries and Spark jobs for improved performance.

Collaboration, Agile Delivery & Documentation

Collaborate with cross-functional teams including data architects, analysts, and business stakeholders.

Document data flows, governance policies, and architecture diagrams.

Implement CI/CD using Azure DevOps.

AI-Driven Data Migration Skills & Strategies

Use AI-assisted data profiling and discovery to assess legacy data platforms and migration complexity.

Apply ML-based data quality and anomaly detection to identify inconsistencies, duplicates, and loss risks during migration.

Leverage AI-assisted schema and SQL conversion techniques to modernize legacy databases into Azure SQL and Synapse SQL.

AI, ML & GenAI Skills

Design and build data pipelines to support Machine Learning (ML) model training, validation, and inference using Azure ML and Synapse.

Enable MLOps workflows including dataset versioning, feature engineering, experiment tracking, and model monitoring.

Support Generative AI (GenAI) and LLM-based solutions using Azure OpenAI and Retrieval-Augmented Generation (RAG) architectures.

Build and manage embedding pipelines, vectorized data, and metadata enrichment for AI-driven search and copilots.

What you bring

Microsoft Certified: Azure Data Engineer Associate

Microsoft Certified: Fabric Data Engineer Associate

Microsoft Certified: Azure Solutions Architect

Any recognized certification in Python, Big Data, or Cloud Engineering

Work Experience Requirements

10+ years of professional experience in Data Engineering, Data Warehousing, or Big Data platform development.

Minimum 5+ years hands-on experience with Azure Data Services (ADF, Synapse, ADLS, Databricks, Logic Apps).

5+ years of experience in ETL/ELT pipeline development in cloud or hybrid environments.

Strong Python experience (automation, data transformation, reusable libraries).

Demonstrated experience working with large-scale distributed data processing (Spark / PySpark).

Proven experience building production-grade data pipelines, implementing data lake architectures, and managing end-to-end data lifecycle.

Experience working in Agile/Scrum teams, CI/CD pipelines, and DevOps practices.

What we offer

Generous paid leaves (Annual, Sick, Compassionate, Public holidays, Marriage, Maternity, Paternity, Medical leaves)

Medical benefits (Insurance and Annual health check-up)

Pension and insurance policies (Group Term Life Insurance, Group Personal Accident Insurance, Travel Insurance)

Additional benefits (Internet, Phone bill reimbursement)