Data Engineer Interview Questions

My interview experiences and Interview questions.

Data Engineering concepts

  • ETL vs ELT
  • Data warehouse, Data Mart, Data lake, data lakehouse, delta lake, Data Mesh
  • Data Modeling and Concepts Best Resource
    • Star schema vs Snowflake schema
    • SCD type 1,2,3, with examples (How to track history of data warehouse)
    • Full load vs incremental load
    • Lambda vs Kappa architecture
  • Data governance and security – who can access the data, data security.
    • Data integrity
    • Data Quality
    • Data Privacy, security, and compliance – Role-based access control
    • Data Discovery
  • Data lineage
  • Data Profiling
  • Data Catalogue
  • Data granularity
    • How to handle data granularity through out the life cycle
  • Data Architect
    • Batch processing
    • Real/Stream processing
    • Event-driven architecture

Big Data

Hive

PySpark

The most common skill many companies looking for a data engineer

I made a list of the most demanding technology

  • Programming language – Python, PySpark, SQL
    • Pyspark – PySpark Streaming
  • Cloud
    • AWS
      • ETL/ANALYTICS: EMR, GLUE, ATHENA, Redshift,
      • ECS, EC2, S3, LAMBDA, Step Functions, API Gate way, SNS, RDS, Aurora Postgres,
      • Data Migration: AWS DMS
    • GCP
      • BigQuery, Cloud Storage, DataProc
  • Orchestrations: Airflow, Docker, Kubernetes
  • Big data: Hadoop/ HDFS, Kafka, Hive
  • Non-Relational Database/Data Store (Good to have, company-specific)
    • Object storage
    • Document storage
      • key-value store
    • Graph Database
    • Column-family Database
    • Mango db, redis, elastic Search

The following are Technology are alternatives to each other

  • Data warehouse: – Hive/ Redsfhit/ Bigquery/ Snowflake
  • Distributed/Spark Processing: – DataProc / EMR / DataBricks