• Automatically identifies slow-running tasks in a stage.
  • Launches a duplicate copy (speculative task) of the slow task on a different worker node.
  • Accepts the result from the first task to finish (original or speculative) and kills the other.

Set speculative execution configuration -> spark.speculation = True

  • Speculative Execution means Spark tries to fix slow tasks by running duplicates of them.
  • If one task is really slow, Spark launches another copy of it.
  • The first one to finish is the one that counts, and the slow one is ignored.
  • It helps your big job finish faster by reducing the impact of slow tasks.

read more here

Leave a Reply

Your email address will not be published. Required fields are marked *