Open to full time Data Engineer roles across Berlin, Germany and the EU

MeheliSinha.

Data & AI Engineer with over 3.5 years building data platforms and automation layers across banking and enterprise governance systems. I specialize in Python, SQL, and cloud based ETL pipelines that turn complex enterprise data into trusted, AI ready foundations at a Berlin energy technology company.

Contact me LinkedIn GitHub

Data Pipeline

Sources

Python, SQL
PostgreSQL, Oracle
REST APIs, NoSQL

Pipelines

Azure Data Factory
Databricks, PySpark
Airflow

AI & Insight

LLMs, RAG
FastAPI
Power BI

0.0+ years

in production data engineering

0K+

records per pipeline run

SQL objects migrated to Databricks

Python automation components

PythonSQLPostgreSQLNoSQLPySparkDatabricksAzure Data FactoryETL / ELTAirflowFastAPIAzureDelta LakeDockerCI/CDPower BIRAGLLMsNLPGenAIAgentic AINeo4jMilvusData Vault 2.0

Profile

Data driven engineering, end to end.

Data Engineer with over 3.5 years building data platforms and automation layers across banking and enterprise governance systems. At Tata Consultancy Services I architected ELT pipelines processing 500K+ daily records for Nordea Bank ABP. Now at E.ON Digital Technology GmbH in Berlin, I migrate enterprise systems to Databricks and build automated governance pipelines, turning complex enterprise IT data into trusted, structured foundations for AI powered systems.

Production Data Engineering

Over 3.5 years building robust ETL and ELT pipelines on Python, SQL, Azure, and Databricks, handling 500K+ records per run.

Cloud & Governance Platforms

Building automated pipelines integrating Power BI, Azure SQL, Blob Storage, and internal security and asset systems for enterprise governance.

MSc in Data Science & AI

Pursuing a Master's at GISMA University of Applied Sciences in Berlin, with a thesis on multi agent LLM systems for Data Vault 2.0.

Stack

Key strengths and technologies

The tools I reach for to build scalable, AI ready, data intensive systems.

ETL / ELT Pipelines

Scalable ELT and ETL pipelines across SQL Server, Oracle, and Databricks, processing 500K+ daily records.

Databricks & Medallion

Migrating SQL views, tables, and stored procedures to Databricks using Medallion architecture.

FastAPI

High performance APIs exposing data and AI capabilities as production ready services.

Python & SQL

pandas, NumPy, scikit-learn, and async/await with advanced SQL and PL/SQL on PostgreSQL and Oracle.

Azure Cloud

Azure Data Factory, SQL, Functions, and Blob Storage with Power BI for enterprise governance.

AI & Machine Learning

ML pipelines, feature engineering, RAG, LLMs, and agentic AI for AI ready data platforms.

Experience

Where the pipeline runs

Working Student, Data Engineer

May 2025 to Present

E.ON Digital Technology GmbH, Berlin, Germany

Migrated 100+ SQL views, tables, and stored procedures to Databricks using Medallion architecture, building the reference foundation for the broader cloud migration program.
Diagnosed and fixed a Databricks orchestrator job that had failed for 3+ runs and blocked 419K+ daily records, by resolving a credential race condition with an in memory PEM key auth fix.
Resolved a silent SCD2 data quality bug caused by inconsistent Azure API ID casing, eliminating 4,800+ duplicate records across 59 views by automating blast radius analysis over 31 tables with PySpark.
Delivered fully automated Power BI KPI dashboards adopted by governance and business stakeholders, replacing manual reporting, and integrated REST APIs to synchronize IT asset and security data across enterprise platforms.

Data Engineer

August 2021 to December 2024

Tata Consultancy Services, Bengaluru, India

Architected end to end ELT and ETL pipelines processing 500K+ daily records across SQL Server, Oracle, and Databricks for Nordea Bank's Credit and Risk Transformation program.
Engineered 50+ Python automation components and advanced PL/SQL packages, procedures, and triggers, improving data pipeline efficiency by 30% and speed by 20%.
Re-engineered a legacy batch pipeline into an incremental load model, cutting processing time by 60%.
Conducted root cause analysis on 30+ data discrepancies, improving downstream data trust scores by 25%, and orchestrated workflows with Python, SQL Developer, and Apache Airflow.

Machine Learning Intern

August 2020 to September 2020

Tequed Labs, Bengaluru, India

Built a used car price prediction model with supervised regression using pandas, NumPy, matplotlib, seaborn, and SciPy.

Projects

Featured work

Production grade, agentic, and full stack data and AI, from raw ingestion to auditable, queryable insight.

GISMA University × E.ON Digital Technology · Master's Thesis

Multi-Agent Data Vault 2.0

A multi agent LLM system that auto generates Data Vault 2.0 warehouse models (hubs, links, and satellites) from raw source schemas. A six stage pipeline combines LLM based schema inference with deterministic risk checks and majority vote consensus across model runs, so no single bad output can break a run. It includes a thread safe rate limiter with adaptive batch splitting, a human in the loop governance layer with severity graded validation, and an insert only audit trail for full data lineage. A full stack review app lets engineers move a new source from connection to approved model without manual scripting.

PythonLLMsMulti-AgentData Vault 2.0FastAPIReactPostgreSQLAudit Trail

View on GitHub

Energy & Lakehouse

In progress

EV Battery Passport & Energy Grid Platform

An end to end Databricks lakehouse (medallion architecture) ingesting real German grid data and streaming battery telemetry. A Structured Streaming pipeline with watermarking serves real time battery health via Redis, a Neo4j chain of custody graph traces recalls for EU Battery Regulation compliance, and a React dashboard surfaces battery passports and live grid KPIs with OpenLineage lineage.

DatabricksPySparkAzureDelta LakeNeo4jRedisFastAPIReact

Graph RAG

Multilingual Graph RAG Platform

A multilingual RAG platform retrieving across German, English, and French documents with Milvus vector search and a Neo4j graph layer linking entities and citations for multi hop queries. An async ingestion pipeline (Celery and Redis) handles PDF parsing, chunking, and NER, with a React and TypeScript frontend offering a graph explorer and streaming, citation grounded answers.

PythonFastAPIReactTypeScriptMilvusNeo4jPostgreSQLDocker

Education

MSc Data Science, AI & Digital Business

GISMA University of Applied Sciences

December 2024 to September 2026, Berlin

BE Information Science & Engineering

Visvesvaraya Technological University

2017 to 2021, Bengaluru

Core Skills

AI and ML

ML Pipelines
Feature Engineering
RAG
NLP
LLMs
Agentic AI
GenAI

Data Engineering

Python
SQL
ETL and ELT
Databricks
Azure Data Factory
Airflow
PostgreSQL
Docker

Languages

English

C1 — Fluent

German

A2 — Elementary

Key Impact

419K+ daily records unblocked

Fixed a Databricks orchestrator job failing for 3+ runs with an in memory PEM key auth fix.

4,800+ duplicate records eliminated

Resolved a silent SCD2 data quality bug across 59 views via PySpark blast radius analysis.

60% faster batch processing

Re-engineered a legacy batch pipeline into an incremental load model at Nordea Bank.

30% pipeline efficiency gain

Engineered 50+ Python automation components, improving processing speed by 20%.

Certifications

Python Programming — Beginner to Advanced

Skill Development Program on Artificial Intelligence

McKinsey.org Forward Program

Programming in Java

Connect

Let's build something scalable.

Open to full time Data Engineer roles across Berlin, Germany and the EU. The fastest way to reach me is via email and LinkedIn.

Emailmehelisinha@gmail.com LinkedInin/meheli-sinha GitHubmehelisinha Webmehelisinha.tech