- Automatically identifies slow-running tasks in a stage.
- Launches a duplicate copy (speculative task) of the slow task on a different worker node.
- Accepts the result from the first task to finish (original or speculative) and kills the other.
Set speculative execution configuration -> spark.speculation = True
- Speculative Execution means Spark tries to fix slow tasks by running duplicates of them.
- If one task is really slow, Spark launches another copy of it.
- The first one to finish is the one that counts, and the slow one is ignored.
- It helps your big job finish faster by reducing the impact of slow tasks.