Meheli Sinha/tech
Open to full time Data Engineer roles across Berlin, Germany and the EU

MeheliSinha.

Data & AI Engineer with over 3.5 years building data platforms and automation layers across banking and enterprise governance systems. I specialize in Python, SQL, and cloud based ETL pipelines that turn complex enterprise data into trusted, AI ready foundations at a Berlin energy technology company.

Data Pipeline

Sources

  • Python, SQL
  • PostgreSQL, Oracle
  • REST APIs, NoSQL

Pipelines

  • Azure Data Factory
  • Databricks, PySpark
  • Airflow

AI & Insight

  • LLMs, RAG
  • FastAPI
  • Power BI
0.0+ years
in production data engineering
0K+
records per pipeline run
0+
SQL objects migrated to Databricks
0+
Python automation components
PythonSQLPostgreSQLNoSQLPySparkDatabricksAzure Data FactoryETL / ELTAirflowFastAPIAzureDelta LakeDockerCI/CDPower BIRAGLLMsNLPGenAIAgentic AINeo4jMilvusData Vault 2.0
Profile

Data driven engineering, end to end.

Data Engineer with over 3.5 years building data platforms and automation layers across banking and enterprise governance systems. At Tata Consultancy Services I architected ELT pipelines processing 500K+ daily records for Nordea Bank ABP. Now at E.ON Digital Technology GmbH in Berlin, I migrate enterprise systems to Databricks and build automated governance pipelines, turning complex enterprise IT data into trusted, structured foundations for AI powered systems.

01

Production Data Engineering

Over 3.5 years building robust ETL and ELT pipelines on Python, SQL, Azure, and Databricks, handling 500K+ records per run.

02

Cloud & Governance Platforms

Building automated pipelines integrating Power BI, Azure SQL, Blob Storage, and internal security and asset systems for enterprise governance.

03

MSc in Data Science & AI

Pursuing a Master's at GISMA University of Applied Sciences in Berlin, with a thesis on multi agent LLM systems for Data Vault 2.0.

Stack

Key strengths and technologies

The tools I reach for to build scalable, AI ready, data intensive systems.

ETL / ELT Pipelines

Scalable ELT and ETL pipelines across SQL Server, Oracle, and Databricks, processing 500K+ daily records.

Databricks & Medallion

Migrating SQL views, tables, and stored procedures to Databricks using Medallion architecture.

FastAPI

High performance APIs exposing data and AI capabilities as production ready services.

Python & SQL

pandas, NumPy, scikit-learn, and async/await with advanced SQL and PL/SQL on PostgreSQL and Oracle.

Azure Cloud

Azure Data Factory, SQL, Functions, and Blob Storage with Power BI for enterprise governance.

AI & Machine Learning

ML pipelines, feature engineering, RAG, LLMs, and agentic AI for AI ready data platforms.

Experience

Where the pipeline runs

Working Student, Data Engineer

May 2025 to Present
E.ON Digital Technology GmbH, Berlin, Germany
  • Migrated 100+ SQL views, tables, and stored procedures to Databricks using Medallion architecture, building the reference foundation for the broader cloud migration program.
  • Diagnosed and fixed a Databricks orchestrator job that had failed for 3+ runs and blocked 419K+ daily records, by resolving a credential race condition with an in memory PEM key auth fix.
  • Resolved a silent SCD2 data quality bug caused by inconsistent Azure API ID casing, eliminating 4,800+ duplicate records across 59 views by automating blast radius analysis over 31 tables with PySpark.
  • Delivered fully automated Power BI KPI dashboards adopted by governance and business stakeholders, replacing manual reporting, and integrated REST APIs to synchronize IT asset and security data across enterprise platforms.

Data Engineer

August 2021 to December 2024
Tata Consultancy Services, Bengaluru, India
  • Architected end to end ELT and ETL pipelines processing 500K+ daily records across SQL Server, Oracle, and Databricks for Nordea Bank's Credit and Risk Transformation program.
  • Engineered 50+ Python automation components and advanced PL/SQL packages, procedures, and triggers, improving data pipeline efficiency by 30% and speed by 20%.
  • Re-engineered a legacy batch pipeline into an incremental load model, cutting processing time by 60%.
  • Conducted root cause analysis on 30+ data discrepancies, improving downstream data trust scores by 25%, and orchestrated workflows with Python, SQL Developer, and Apache Airflow.

Machine Learning Intern

August 2020 to September 2020
Tequed Labs, Bengaluru, India
  • Built a used car price prediction model with supervised regression using pandas, NumPy, matplotlib, seaborn, and SciPy.
Projects

Featured work

Production grade, agentic, and full stack data and AI, from raw ingestion to auditable, queryable insight.

GISMA University × E.ON Digital Technology · Master's Thesis

Multi-Agent Data Vault 2.0

A multi agent LLM system that auto generates Data Vault 2.0 warehouse models (hubs, links, and satellites) from raw source schemas. A six stage pipeline combines LLM based schema inference with deterministic risk checks and majority vote consensus across model runs, so no single bad output can break a run. It includes a thread safe rate limiter with adaptive batch splitting, a human in the loop governance layer with severity graded validation, and an insert only audit trail for full data lineage. A full stack review app lets engineers move a new source from connection to approved model without manual scripting.

PythonLLMsMulti-AgentData Vault 2.0FastAPIReactPostgreSQLAudit Trail
View on GitHub
Education
01

MSc Data Science, AI & Digital Business

GISMA University of Applied Sciences

December 2024 to September 2026, Berlin
02

BE Information Science & Engineering

Visvesvaraya Technological University

2017 to 2021, Bengaluru
Core Skills
AI and ML
  • ML Pipelines
  • Feature Engineering
  • RAG
  • NLP
  • LLMs
  • Agentic AI
  • GenAI
Data Engineering
  • Python
  • SQL
  • ETL and ELT
  • Databricks
  • Azure Data Factory
  • Airflow
  • PostgreSQL
  • Docker
Languages
English
C1 — Fluent
German
A2 — Elementary
Key Impact

419K+ daily records unblocked

Fixed a Databricks orchestrator job failing for 3+ runs with an in memory PEM key auth fix.

4,800+ duplicate records eliminated

Resolved a silent SCD2 data quality bug across 59 views via PySpark blast radius analysis.

60% faster batch processing

Re-engineered a legacy batch pipeline into an incremental load model at Nordea Bank.

30% pipeline efficiency gain

Engineered 50+ Python automation components, improving processing speed by 20%.

Certifications

Python Programming — Beginner to Advanced

Skill Development Program on Artificial Intelligence

McKinsey.org Forward Program

Programming in Java

Connect

Let's build something scalable.

Open to full time Data Engineer roles across Berlin, Germany and the EU. The fastest way to reach me is via email and LinkedIn.