Beschreibung

Results oriented Data Engineer with 6+ years of experience building scalable ETL/ELT pipelines, real-time streaming systems, and cloud data platforms. Strong in Python, Spark (PySpark/Scala), Kafka, dbt, and Airflow, with expertise across AWS, Azure, Databricks, and Microsoft Fabric. Experienced in designing data lakehouse architectures, optimizing large-scale datasets, and implementing data modeling and performance tuning. Skilled in handling millions of records, CI/CD, and Infrastructure as Code (Terraform, CloudFormation), with strong experience in healthcare and enterprise data environments focused on reliable, high-quality analytics.

TECHNICAL SKILLS

____________________________________

Programming Languages: Python, SQL, Scala, Java, Bash, Shell Scripting.

Data Integration & ETL: dbt, Apache Airflow, Informatica, Azure Data Factory, AWS Glue, Fivetran, Great Expectations.

Cloud Platforms: AWS (S3, Glue, Redshift, Lambda, EMR, RDS, MSK, Lake Formation), Azure (Data Factory, Synapse Analytics, Data Lake, Key Vault, AKS, Purview), Databricks.

Big Data & Streaming: Apache Spark (Scala/PySpark), Spark Streaming, Hadoop, Kafka, Flink.

Databases & Storage: PostgreSQL, MySQL, MongoDB, Cassandra, Oracle, Snowflake, Redshift, Synapse Analytics.

File Formats: Parquet, JSON, Avro, ORC, CSV, XML

DevOps & CI/CD: Jenkins, GitHub Actions, GitLab, AWS CloudFormation, Docker, Kubernetes (EKS/AKS), Prometheus, Datadog.

BI & Visualization: Power BI, Tableau, Looker, Mode Analytics

Machine Learning Integration: Spark ML, Scikit-learn, TensorFlow (light), Feature Engineering Collaboration

Metadata & APIs: RESTful APIs, Azure Purview, AtScale, Data Catalogs.

Methodologies: Agile (SCRUM), Jira, Confluence

Branchenexpertise

Sprachen

Englisch
Muttersprachlich oder zweisprachig

Arbeitsortpräferenzen

Nur remote

Führt Projekte hauptsächlich remote aus

DHIS2
Data Engineer
SOFTWARE-HERSTELLER
Juli 2020 - Heute (5 Jahre und 11 Monate)
Designed and implemented scalable ETL/ELT pipelines using Apache Spark (PySpark/Scala), Databricks, Microsoft Fabric, and Azure Data Factory, processing millions of public health records including HIV/AIDS surveillance, immunization, maternal health, and disease reporting used in WHO and Save the Children-supported programs.
Architected a Medallion Data Lake (Bronze, Silver, Gold layers) with strong data modeling practices (dimensional and analytical modeling) to standardize healthcare data ingestion, transformation, and analytics for disease monitoring, outbreak tracking, and national health reporting.
Engineered multi-source data ingestion pipelines from relational databases (SQL Server, PostgreSQL), APIs (DHIS2 and external health systems), CSV, Excel, JSON, and flat files, enabling unified processing of heterogeneous public healthcare data.
Built and tuned high-performance Spark pipelines using caching, in-memory computation, and distributed processing optimizations, enabling efficient processing of large-scale public health datasets in Databricks and Microsoft Fabric environments.
Collaborated with public health stakeholders, including WHO-aligned initiatives and Save the Children programs, to define data models, ETL rules, validation frameworks, and standardized health indicators, ensuring accurate reporting for HIV/AIDS and other critical health programs.
Built and maintained cloud-based data lakes on AWS S3 and Azure Data Lake, implementing scalable transformations using AWS Glue, Microsoft Fabric, and Spark with optimized partitioning and caching strategies for large datasets.
Developed Power BI dashboards integrated with SQL, APIs, DHIS2 systems, and Microsoft Fabric datasets, enabling visualization of health program performance, treatment outcomes, and key public health indicators at scale.
ETL-Prozesse (Extrahieren, Transformieren, Laden) Datenbankmanagement (z. B. SQL, NoSQL) Datenbereinigung & Vorverarbeitung Databricks Microsoft Fabric
OpenMRS
Jr. Data Engineer
SOFTWARE-HERSTELLER
Juli 2019 - Juli 2020 (1 Jahr)
Dallas, Vereinigte Staaten
Designed and developed scalable ETL pipelines using AWS Glue and PySpark to ingest and transform large-scale healthcare data (millions of patient and medicine records) from S3 and external systems into Amazon Redshift for analytics and reporting.
Automated data discovery and querying using AWS Glue Crawlers, Data Catalog, and Amazon Athena, enabling efficient access to high-volume patient and clinical datasets for analytics teams.
Built a reusable and scalable ETL framework using Spark (Python/Scala) to standardize ingestion, transformation, and loading of millions of healthcare records including patient history, prescriptions, and treatment data into Hive and HBase.
Designed and optimized data models for healthcare analytics, structuring raw, staging, and curated layers (Medallion-style modeling) to support efficient querying and reporting on patient and medicine datasets.
Optimized Hive table design with partitioning and bucketing strategies, significantly improving performance for millions of patient-level and pharmaceutical records.
Implemented event-driven data pipelines using AWS Lambda and S3 triggers, enabling automated and near real-time processing of incoming patient and medicine data at scale.
Orchestrated end-to-end workflows using Apache Airflow DAGs, ensuring reliable scheduling, dependency management, and monitoring of large-scale healthcare data pipelines.
Developed and optimized distributed processing jobs using PySpark and Spark SQL, efficiently handling millions of records across patient demographics, prescriptions, and clinical events.
Built real-time streaming pipelines using Apache Kafka and Apache Flink, and containerized workloads using Docker and Kubernetes, enabling scalable processing of high-volume healthcare data streams.
Data Engineer Databricks ETL-Prozesse (Extrahieren, Transformieren, Laden) SQL Server Apache Kafka
Solulab Inc
Software Developer Intern
DIGITALAGENTUREN & IT-CONSULTING
Januar 2019 - Juli 2019 (6 Monate)
Ahmedabad, Indien
● Developed and integrated RESTful APIs using FastAPI and PostgreSQL into the Bevvi application, enabling seamless data exchange and functionality with third-party services.
● Designed a responsive user interface (UI) using React and Material UI for the Bevvi application, increasing mobile traffic by 25% and improving user satisfaction.
● Utilized Jenkins for Continuous Integration and Continuous Deployment (CI/CD), reducing deployment times by 40% and improving release consistency and reliability.
● Designed and implemented scalable AWS cloud infrastructure using services like EC2, S3, and DynamoDB, ensuring optimal performance and cost efficiency.
● Automated serverless workflows using AWS Lambda and API Gateway, reducing operational overhead and enabling event-driven
Apache Kafka SQL Python API REST SQL Server

Sei die erste Person, die Gyan Bahadur empfiehlt

Teile Deine Erfahrung aus der Zusammenarbeit mit diesem Freelancer.

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

Baptiste Duhen

Fullstack developer

4.6

(4)

Amed Hamou

Senior Lead Developer

(2)

Audrey Champion

Web developer

4.3

(3)

Anmelden, um Profile zu sehen

Master of Science
University of the Cumberlands
2025
Computer Science
Bachelor in Science & Technology
Maharaja Ranjit Singh Punjab Technical University
2019
Computer Science & Engineering

Ausbildung und Abschlüsse von Gyan Bahadur ansehen

Algorithmic Toolbox
UC San Diego
2024
https://www.coursera.org/account/accomplishments/verify/PRFAW9OY5LNN
Data science Algorithm Data Structure Python

Cloud Engineers & Architects

Gyan Bahadur Tamang

Data Engineer | ETL | PySpark | Fabric | Databrick

Über Gyan Bahadur

Projekt- und Berufserfahrung

Empfehlungen

Diese Freelancer passen auch zu Ihren Kriterien

Ausbildung und Abschlüsse

Zertifizierungen

Fähigkeiten

Kategorien