𝗣𝗿𝗲𝗽𝗮𝗿𝗲 𝗳𝗼𝗿 𝘁𝗵𝗶𝘀 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻 𝗮𝗻𝗱 𝗰𝗿𝗮𝗰𝗸 𝗮𝗻𝘆 𝗶𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄!
𝗣𝘆𝗦𝗽𝗮𝗿𝗸 𝗕𝗮𝘀𝗶𝗰 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀
- What is PySpark?
- How is PySpark different from Apache Spark?
- What are the key features of Apache Spark?
- What is a SparkSession?
- What is an RDD?
- What is a DataFrame in PySpark?
- What is a Dataset in Spark?
- What is lazy evaluation in Spark?
- What is the difference between RDD, DataFrame, and Dataset?
- What is the Spark driver program?
𝗥𝗗𝗗 (𝗥𝗲𝘀𝗶𝗹𝗶𝗲𝗻𝘁 𝗗𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗲𝗱 𝗗𝗮𝘁𝗮𝘀𝗲𝘁)
- How do you create an RDD in PySpark?
- What are the types of RDD operations?
- What is the difference between 𝚖𝚊𝚙() and 𝚏𝚕𝚊𝚝𝙼𝚊𝚙()?
- What is the difference between 𝚛𝚎𝚍𝚞𝚌𝚎𝙱𝚢𝙺𝚎𝚢() and 𝚐𝚛𝚘𝚞𝚙𝙱𝚢𝙺𝚎𝚢()?
- What is the purpose of 𝚏𝚒𝚕𝚝𝚎𝚛() in RDD?
- What is the difference between 𝚌𝚘𝚕𝚕𝚎𝚌𝚝() and 𝚝𝚊𝚔𝚎()?
- What is the purpose of 𝚞𝚗𝚒𝚘𝚗() in RDD?
- What is the difference between 𝚍𝚒𝚜𝚝𝚒𝚗𝚌𝚝() and 𝚍𝚛𝚘𝚙𝙳𝚞𝚙𝚕𝚒𝚌𝚊𝚝𝚎𝚜()?
- What is the purpose of 𝚌𝚊𝚌𝚑𝚎() and 𝚙𝚎𝚛𝚜𝚒𝚜𝚝()?
- What are the storage levels in Spark? 𝗗𝗮𝘁𝗮𝗙𝗿𝗮𝗺𝗲 𝗮𝗻𝗱 𝗦𝗤𝗟
- How do you create a DataFrame in PySpark?
- What is the difference between 𝚜𝚎𝚕𝚎𝚌𝚝() and 𝚠𝚒𝚝𝚑𝙲𝚘𝚕𝚞𝚖𝚗()?
- How do you rename a column in a DataFrame?
- What is the purpose of 𝚍𝚛𝚘𝚙() in DataFrame?
- How do you filter rows in a DataFrame?
- What is the difference between 𝚘𝚛𝚍𝚎𝚛𝙱𝚢() and 𝚜𝚘𝚛𝚝()?
- How do you handle missing data in a DataFrame?
- What is the purpose of 𝚗𝚊.𝚏𝚒𝚕𝚕() and 𝚗𝚊.𝚍𝚛𝚘𝚙()?
- How do you join two DataFrames in PySpark?
- What are the different types of 𝗷𝗼𝗶𝗻𝘀 in Spark?
𝗦𝗽𝗮𝗿𝗸 𝗦𝗤𝗟
- What is Spark SQL?
- How do you register a DataFrame as a temporary table?
- What is the purpose of 𝚌𝚛𝚎𝚊𝚝𝚎𝙾𝚛𝚁𝚎𝚙𝚕𝚊𝚌𝚎𝚃𝚎𝚖𝚙𝚅𝚒𝚎𝚠()?
- How do you run SQL queries on a DataFrame?
- What is the 𝗖𝗮𝘁𝗮𝗹𝘆𝘀𝘁 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗲𝗿?
- What is the 𝗧𝘂𝗻𝗴𝘀𝘁𝗲𝗻 𝗘𝗻𝗴𝗶𝗻𝗲 in Spark?
- How do you optimize Spark SQL queries?
- What is the purpose of 𝚎𝚡𝚙𝚕𝚊𝚒𝚗() in Spark SQL?
- How do you handle nested 𝗝𝗦𝗢𝗡 data in Spark SQL?
- What is the difference between 𝗽𝗮𝗿𝗾𝘂𝗲𝘁() and 𝗷𝘀𝗼𝗻() file formats?
𝗦𝗽𝗮𝗿𝗸 𝗦𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴
- What is Spark Streaming?
- What is the difference between batch processing and stream processing?
- What is Structured Streaming?
- How do you read streaming data in PySpark?
- What is a 𝗰𝗵𝗲𝗰𝗸𝗽𝗼𝗶𝗻𝘁 in Spark Streaming?
- How do you handle late data in Spark Streaming?
- What is the purpose of 𝘄𝗶𝗻𝗱𝗼𝘄() in Spark Streaming?
- What is the difference between 𝘂𝗽𝗱𝗮𝘁𝗲𝗦𝘁𝗮𝘁𝗲𝗕𝘆𝗞𝗲𝘆() and 𝗺𝗮𝗽𝗪𝗶𝘁𝗵𝗦𝘁𝗮𝘁𝗲()?
- How do you write streaming data to a sink?
- What is the difference between 𝗳𝗼𝗿𝗲𝗮𝗰𝗵𝗕𝗮𝘁𝗰𝗵() and 𝗳𝗼𝗿𝗲𝗮𝗰𝗵()?
- Performance Tuning and Optimization
- How do you optimize a slow Spark job?
- What is the purpose of 𝗯𝗿𝗼𝗮𝗱𝗰𝗮𝘀𝘁() in Spark?
- What is the difference between 𝗰𝗼𝗮𝗹𝗲𝘀𝗰𝗲() and 𝗿𝗲𝗽𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻()?
- How do you handle skewed data in Spark?
- What is the purpose of caching in Spark?
- How do you monitor Spark jobs?
- What is the Spark UI used for?
- How do you handle 𝚖𝚎𝚖𝚘𝚛𝚢 𝚒𝚜𝚜𝚞𝚎𝚜 in Spark?
- What is the purpose of 𝚜𝚙𝚊𝚛𝚔.𝚜𝚚𝚕.𝚜𝚑𝚞𝚏𝚏𝚕𝚎.𝚙𝚊𝚛𝚝𝚒𝚝𝚒𝚘𝚗𝚜?
- How do you reduce shuffle operations in Spark?
𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗖𝗼𝗻𝗰𝗲𝗽𝘁𝘀
- What is a DAG in Spark?
- What is the difference between 𝚗𝚊𝚛𝚛𝚘𝚠 and 𝚠𝚒𝚍𝚎 transformations?
- What is a shuffle in Spark?
- What is the purpose of 𝚊𝚌𝚌𝚞𝚖𝚞𝚕𝚊𝚝𝚘𝚛() in Spark?
- What is the difference between 𝚛𝚎𝚍𝚞𝚌𝚎() and 𝚏𝚘𝚕𝚍()?
- What is the purpose of 𝚊𝚐𝚐𝚛𝚎𝚐𝚊𝚝𝚎𝙱𝚢𝙺𝚎𝚢()?
- What is the difference between 𝚖𝚊𝚙𝙿𝚊𝚛𝚝𝚒𝚝𝚒𝚘𝚗𝚜() and 𝚖𝚊𝚙()?
- What is the purpose of 𝚏𝚘𝚛𝚎𝚊𝚌𝚑𝙿𝚊𝚛𝚝𝚒𝚝𝚒𝚘𝚗()?
- What is the difference between 𝚣𝚒𝚙() and 𝚣𝚒𝚙𝚆𝚒𝚝𝚑𝙸𝚗𝚍𝚎𝚡()?
- What is the purpose of 𝚐𝚕𝚘𝚖() in RDD? 𝗦𝗽𝗮𝗿𝗸 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲
- What is the role of the Spark driver?
- What is the role of the Spark executor?
- What is the difference between a worker node and an executor?
- What is the purpose of the cluster manager in Spark?
- What are the different cluster modes in Spark?
- What is the difference between local and cluster mode?
- What is the purpose of the SparkContext?
- How does Spark handle fault tolerance?
- What is the 𝚕𝚒𝚗𝚎𝚊𝚐𝚎 𝚐𝚛𝚊𝚙𝚑 in Spark?
- What is the purpose of the Block Manager in Spark?
𝗙𝗶𝗹𝗲 𝗙𝗼𝗿𝗺𝗮𝘁𝘀 𝗮𝗻𝗱 𝗗𝗮𝘁𝗮 𝗦𝗼𝘂𝗿𝗰𝗲𝘀
- What file formats does Spark support?
- What is the difference between 𝙲𝚂𝚅 and 𝙿𝚊𝚛𝚚𝚞𝚎𝚝 formats?
- How do you read and write data in Parquet format?
- What is the advantage of using Avro format?
- How do you handle schema evolution in Spark?
- What is the purpose of 𝚜𝚙𝚊𝚛𝚔.𝚛𝚎𝚊𝚍.𝚘𝚙𝚝𝚒𝚘𝚗()?
- How do you read data from a JDBC source?
- How do you write data to a JDBC sink?
- What is the purpose of 𝚜𝚙𝚊𝚛𝚔.𝚛𝚎𝚊𝚍.𝚜𝚝𝚛𝚎𝚊𝚖()?
- How do you handle partitioned data in Spark?
𝗮𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗹𝗲𝘃𝗲𝗹
- What is the purpose of 𝚜𝚙𝚊𝚛𝚔-𝚜𝚞𝚋𝚖𝚒𝚝?
- How do you configure 𝚂𝚙𝚊𝚛𝚔 𝚙𝚛𝚘𝚙𝚎𝚛𝚝𝚒𝚎𝚜?
- What is the purpose of 𝚜𝚙𝚊𝚛𝚔.𝚍𝚎𝚏𝚊𝚞𝚕𝚝.𝚙𝚊𝚛𝚊𝚕𝚕𝚎𝚕𝚒𝚜𝚖?
- How do you handle logging in Spark?
- What is the purpose of 𝚜𝚙𝚊𝚛𝚔.𝚜𝚚𝚕.𝚊𝚞𝚝𝚘𝙱𝚛𝚘𝚊𝚍𝚌𝚊𝚜𝚝𝙹𝚘𝚒𝚗𝚃𝚑𝚛𝚎𝚜𝚑𝚘𝚕𝚍?
- How do you handle 𝚜𝚌𝚑𝚎𝚖𝚊 𝚒𝚗𝚏𝚎𝚛𝚎𝚗𝚌𝚎 in Spark?
- What is the purpose of 𝚜𝚙𝚊𝚛𝚔.𝚜𝚚𝚕.𝚌𝚛𝚘𝚜𝚜𝙹𝚘𝚒𝚗.𝚎𝚗𝚊𝚋𝚕𝚎𝚍?
- How do you handle 𝚝𝚒𝚖𝚎𝚣𝚘𝚗𝚎 issues in Spark?
- What is the purpose of 𝚜𝚙𝚊𝚛𝚔.𝚜𝚚𝚕.𝚊𝚍𝚊𝚙𝚝𝚒𝚟𝚎.𝚎𝚗𝚊𝚋𝚕𝚎𝚍?
- How do you debug a Spark application?
- Searching 1 file for “^n” (regex)