CAP theorem

Lambda vs kapp architecture

Star vs snowflake schema

Data warehouse , data lake, delta lake, dataware house

Scd types

SQL

How would you write a query to calculate a cumulative sum or running total within a specific partition in SQL?

How do window functions differ from aggregate functions, and when would you use them?

– How do you identify and remove duplicate records in SQL without using temporary tables?

Python

– How do you manage memory efficiently when processing large files in Python?

What are Python decorators, and how would you use them to optimize reusable code in ETL processes?

– How do you use Python’s built-in logging module to capture detailed error and audit logs?

Pyspark

– How would you handle skewed data in a Spark job to prevent performance issues?

– What is the difference between the Spark Session and Spark Context? When should each be used?

– How do you handle backpressure in Spark Streaming applications to manage load effectively?

Azure Databricks

– How do you configure cluster autoscaling in Databricks, and when should it be used?

How do you implement data versioning in Delta Lake tables within Databricks?

– How would you monitor and optimize Databricks job performance metrics?

Azure Data Factory

– What are tumbling window triggers in Azure Data Factory, and how do you configure them?

– How would you enable managed identity-based authentication for linked services in ADF?

– How do you create custom activity logs in ADF for monitoring data pipeline execution?

CI/CD

– What are blue-green deployments, and how would you use them for ETL jobs?

– How do you implement rollback mechanisms in CI/ CD pipelines for data integration processes?

– What strategies do you use to handle schema evolution in data pipelines as part of CI/CD?1

Azure Databricks

  • How do you configure cluster autoscaling in Databricks, and when should it be used?

How do you implement data versioning in Delta Lake tables within Databricks?

  • How would you monitor and optimize Databricks job performance metrics?

Azure Data Factory

  • What are tumbling window triggers in Azure Data Factory, and how do you configure them?
  • How would you enable managed identity-based authentication for linked services in ADF?
  • How do you create custom activity logs in ADF for monitoring data pipeline execution?

CI/CD

  • What are blue-green deployments, and how would you use them for ETL jobs?
  • How do you implement rollback mechanisms in CI/ CD pipelines for data integration processes?
  • What strategies do you use to handle schema evolution in data pipelines as part of CI/CD?

Data Warehousing

  • How do you optimize join operations in a data warehouse to improve query performance?
  • What is a slowly changing dimension (SCD), and what are different ways to implement it in a data warehouse?
  • How do surrogate keys benefit data warehouse design over natural keys?

Data Modeling

How do you decide between a star schema and a snowflake schema for a data warehouse? Provide examples of scenarios where each is ideal.

  • What is dimensional modeling, and how does it differ from entity-relationship modeling in terms of use cases?
  • How do you handle one-to-many relationships in a dimensional model to ensure efficient querying?

Leave a Reply

Your email address will not be published. Required fields are marked *