• Post author:

Last week one of my development team member came to me with below error which he faced while executing a hive query on hue. 

Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
Spark job failed due to: Job aborted due to stage failure: 
Aborting TaskSet 108.0 because task 4 (partition 4)cannot run anywhere due to node and executor blacklist. 
Most recent failure: Lost task 4.0 in stage 108.0 (TID 451, hostname.com, executor 5): 
UnknownReason Blacklisting behavior can be configured via spark.blacklist.*.

Spark 2.1.0 comes with a new feature called “blacklist”. Blacklisting enables you to set thresholds on the number of failed tasks on each executor and node, such that a task set or even an entire stage will be blacklisted for those problematic units. To understand more about this please refer this Cloudera link. To resolve this issue we have to set below properties in Spark service. You can get details and description of these properties in this link.

spark.blacklist.enabled=true
spark.blacklist.task.maxTaskAttemptsPerExecutor=1
spark.blacklist.task.maxTaskAttemptsPerNode=2
spark.blacklist.application.maxFailedTasksPerExecutor=2
spark.blacklist.stage.maxFailedTasksPerExecutor=2
spark.blacklist.application.maxFailedExecutorsPerNode=2
spark.blacklist.stage.maxFailedExecutorsPerNode=2
spark.blacklist.timeout=1h
spark.blacklist.killBlacklistedExecutors=true

Configuring Properties in Spark Service using Cloudera Manager

Go to the Spark service.
Click the Configuration tab.
Select Scope > Gateway.
Select Category > Advanced.
Locate the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf property.
Specify properties and save changes
Deploy and restart spark service

After restarting the service issue is resolved.

Leave a Reply