You're seeing this page as if you were . The main menu is still yours, though. Exit from immersion
Gyan Bahadur TamangGB

Gyan Bahadur Tamang

Data Engineer | ETL | PySpark | Fabric | Databrick

EUR 180/Tag
Madrid, ES
3-7 Jahre

Durchschnittliche Reaktionszeit: 1h

Über Gyan Bahadur

Results oriented Data Engineer with 6+ years of experience building scalable ETL/ELT pipelines, real-time streaming systems, and cloud data platforms. Strong in Python, Spark (PySpark/Scala), Kafka, dbt, and Airflow, with expertise across AWS, Azure, Databricks, and Microsoft Fabric. Experienced in designing data lakehouse architectures, optimizing large-scale datasets, and implementing data modeling and performance tuning. Skilled in handling millions of records, CI/CD, and Infrastructure as Code (Terraform, CloudFormation), with strong experience in healthcare and enterprise data environments focused on reliable, high-quality analytics.


TECHNICAL SKILLS
____________________________________
Programming Languages: Python, SQL, Scala, Java, Bash, Shell Scripting.
Data Integration & ETL: dbt, Apache Airflow, Informatica, Azure Data Factory, AWS Glue, Fivetran, Great Expectations.
Cloud Platforms: AWS (S3, Glue, Redshift, Lambda, EMR, RDS, MSK, Lake Formation), Azure (Data Factory, Synapse Analytics, Data Lake, Key Vault, AKS, Purview), Databricks.
Big Data & Streaming: Apache Spark (Scala/PySpark), Spark Streaming, Hadoop, Kafka, Flink.
Databases & Storage: PostgreSQL, MySQL, MongoDB, Cassandra, Oracle, Snowflake, Redshift, Synapse Analytics.
File Formats: Parquet, JSON, Avro, ORC, CSV, XML
DevOps & CI/CD: Jenkins, GitHub Actions, GitLab, AWS CloudFormation, Docker, Kubernetes (EKS/AKS), Prometheus, Datadog.
BI & Visualization: Power BI, Tableau, Looker, Mode Analytics
Machine Learning Integration: Spark ML, Scikit-learn, TensorFlow (light), Feature Engineering Collaboration
Metadata & APIs: RESTful APIs, Azure Purview, AtScale, Data Catalogs.
Methodologies: Agile (SCRUM), Jira, Confluence
  • Englisch

    Muttersprachlich oder zweisprachig

Nur remote
Führt Projekte hauptsächlich remote aus

Projekt- und Berufserfahrung

  • DHIS2
    Data Engineer
    SOFTWARE-HERSTELLER
    Juli 2020 - Heute (5 Jahre und 11 Monate)
    • Designed and implemented scalable ETL/ELT pipelines using Apache Spark (PySpark/Scala), Databricks, Microsoft Fabric, and Azure Data Factory, processing millions of public health records including HIV/AIDS surveillance, immunization, maternal health, and disease reporting used in WHO and Save the Children-supported programs.
    • Architected a Medallion Data Lake (Bronze, Silver, Gold layers) with strong data modeling practices (dimensional and analytical modeling) to standardize healthcare data ingestion, transformation, and analytics for disease monitoring, outbreak tracking, and national health reporting.
    • Engineered multi-source data ingestion pipelines from relational databases (SQL Server, PostgreSQL), APIs (DHIS2 and external health systems), CSV, Excel, JSON, and flat files, enabling unified processing of heterogeneous public healthcare data.
    • Built and tuned high-performance Spark pipelines using caching, in-memory computation, and distributed processing optimizations, enabling efficient processing of large-scale public health datasets in Databricks and Microsoft Fabric environments.
    • Collaborated with public health stakeholders, including WHO-aligned initiatives and Save the Children programs, to define data models, ETL rules, validation frameworks, and standardized health indicators, ensuring accurate reporting for HIV/AIDS and other critical health programs.
    • Built and maintained cloud-based data lakes on AWS S3 and Azure Data Lake, implementing scalable transformations using AWS Glue, Microsoft Fabric, and Spark with optimized partitioning and caching strategies for large datasets.
    • Developed Power BI dashboards integrated with SQL, APIs, DHIS2 systems, and Microsoft Fabric datasets, enabling visualization of health program performance, treatment outcomes, and key public health indicators at scale.
    ETL-Prozesse (Extrahieren, Transformieren, Laden) Datenbankmanagement (z. B. SQL, NoSQL) Datenbereinigung & Vorverarbeitung Databricks Microsoft Fabric
  • OpenMRS
    Jr. Data Engineer
    SOFTWARE-HERSTELLER
    Juli 2019 - Juli 2020 (1 Jahr)
    Dallas, Vereinigte Staaten
    • Designed and developed scalable ETL pipelines using AWS Glue and PySpark to ingest and transform large-scale healthcare data (millions of patient and medicine records) from S3 and external systems into Amazon Redshift for analytics and reporting.
    • Automated data discovery and querying using AWS Glue Crawlers, Data Catalog, and Amazon Athena, enabling efficient access to high-volume patient and clinical datasets for analytics teams.
    • Built a reusable and scalable ETL framework using Spark (Python/Scala) to standardize ingestion, transformation, and loading of millions of healthcare records including patient history, prescriptions, and treatment data into Hive and HBase.
    • Designed and optimized data models for healthcare analytics, structuring raw, staging, and curated layers (Medallion-style modeling) to support efficient querying and reporting on patient and medicine datasets.
    • Optimized Hive table design with partitioning and bucketing strategies, significantly improving performance for millions of patient-level and pharmaceutical records.
    • Implemented event-driven data pipelines using AWS Lambda and S3 triggers, enabling automated and near real-time processing of incoming patient and medicine data at scale.
    • Orchestrated end-to-end workflows using Apache Airflow DAGs, ensuring reliable scheduling, dependency management, and monitoring of large-scale healthcare data pipelines.
    • Developed and optimized distributed processing jobs using PySpark and Spark SQL, efficiently handling millions of records across patient demographics, prescriptions, and clinical events.
    • Built real-time streaming pipelines using Apache Kafka and Apache Flink, and containerized workloads using Docker and Kubernetes, enabling scalable processing of high-volume healthcare data streams.
    Data Engineer Databricks ETL-Prozesse (Extrahieren, Transformieren, Laden) SQL Server Apache Kafka
  • Solulab Inc
    Software Developer Intern
    DIGITALAGENTUREN & IT-CONSULTING
    Januar 2019 - Juli 2019 (6 Monate)
    Ahmedabad, Indien
    ● Developed and integrated RESTful APIs using FastAPI and PostgreSQL into the Bevvi application, enabling seamless data exchange and functionality with third-party services.
    ● Designed a responsive user interface (UI) using React and Material UI for the Bevvi application, increasing mobile traffic by 25% and improving user satisfaction.
    ● Utilized Jenkins for Continuous Integration and Continuous Deployment (CI/CD), reducing deployment times by 40% and improving release consistency and reliability.
    ● Designed and implemented scalable AWS cloud infrastructure using services like EC2, S3, and DynamoDB, ensuring optimal performance and cost efficiency.
    ● Automated serverless workflows using AWS Lambda and API Gateway, reducing operational overhead and enabling event-driven
    Apache Kafka SQL Python API REST SQL Server

Empfehlungen

Sei die erste Person, die Gyan Bahadur empfiehlt

Teile Deine Erfahrung aus der Zusammenarbeit mit diesem Freelancer.

Diese Freelancer passen auch zu Ihren Kriterien

AgathaA

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

2

BaptisteB

Baptiste Duhen

Fullstack developer

4.6

(4)

5

AmedA

Amed Hamou

Senior Lead Developer

4

(2)

7

AudreyA

Audrey Champion

Web developer

4.3

(3)

4

Ausbildung und Abschlüsse

  • Master of Science
    University of the Cumberlands
    2025
    Computer Science
  • Bachelor in Science & Technology
    Maharaja Ranjit Singh Punjab Technical University
    2019
    Computer Science & Engineering

Zertifizierungen

Fähigkeiten

Kategorien