2025

What happens when you enable cache() in PySpark and the dataset exceeds the available memory? How does Spark handle this situation, and what potential issues might arise?

both cache() and persist() store data in memory to speed up the retrieval of intermediate data used for computation. However, persist() is more flexible and allows users to specify storage…

Data Engineering

SCD Types Explained with Examples

Vishal Jadhav No Comments

Slowly Changing Dimensions (SCD) are used in data warehousing to manage historical changes in dimension tables. There are several types of SCDs, each handling data changes differently. Types of SCDs…

Data Engineering

What is Spark Speculative Execution?

Vishal Jadhav No Comments

Set speculative execution configuration -> spark.speculation = True read more here

Data Engineering

sql to pyspark query converstion

Vishal Jadhav No Comments

In this article, we are converting the SQL queries to Pyspark code Task SQL Command PySpark Command Selecting Data SELECT col1, col2 FROM table; df.select(“col1”, “col2”) Filtering Data SELECT *…

Data Engineering

Pypark – Databriks + spark Rajas data engineering

Vishal Jadhav No Comments

PySpark Interview Questions (lec 6) How to read files in spark? You are going to see how to read different file formats in pyspark. first, you need to create the…

Data Engineering

How would you handle skewed data in PySpark

Vishal Jadhav No Comments

Handling skewed data in PySpark is crucial for optimizing performance and ensuring efficient processing. Skewed data occurs when some partitions have significantly more data than others, leading to uneven workload…

Data Engineering

Sliding Window Maximum (LeetCode 239)

Vishal Jadhav No Comments

Steps: 1. Iterate through the array nums from index 0 to len(nums) – k. 2. For each window of size k, find the maximum element. 3. Store the maximum value…

Data Engineering

Valid Parentheses (LeetCode 20)

Vishal Jadhav No Comments

Given a string s containing just the characters '(', ')', '{', '}', '', determine if the input string is valid. An input string is valid if: Example 1: Input: s…

Data Engineering

Merge Intervals (LeetCode 56) python solution

Vishal Jadhav No Comments

Given an array of intervals where intervals = , merge all overlapping intervals, and return an array of the non-overlapping intervals that cover all the intervals in the input. Example…

Data Engineering

zepto global data engineer interview questions

Vishal Jadhav No Comments

Programming/Algorithm Question:You are acting as a bank. You are given a list of customers’ transactions.The transactions can be either a deposit (positive value) or awithdrawal(negative value). The initial balance in…

What happens when you enable cache() in PySpark and the dataset exceeds the available memory? How does Spark handle this situation, and what potential issues might arise?

SCD Types Explained with Examples

What is Spark Speculative Execution?

sql to pyspark query converstion

Pypark – Databriks + spark Rajas data engineering

How would you handle skewed data in PySpark

Sliding Window Maximum (LeetCode 239)

Valid Parentheses (LeetCode 20)

Merge Intervals (LeetCode 56) python solution

zepto global data engineer interview questions

You Missed

Python Question asked me in interview [Coding + Theroy]

Data Modeling – How to design it

GCP BigQuery

Pyspark Top 100 Interview Question – Crack any interview