azkaban多Executor模式报错找不到文件问题

故障上报

某天收到大数据同事反馈,某个新部署的azkaban任务(执行mysqldump备份)报错提示 No such file or directory,但登录服务器并手动执行该脚本则正常

排障过程

  • 1.将该脚本中涉及的 path 全部改为完整路径(比如 sh 改为 /bin/sh)也失败
  • 2.日志中发现 effective user is: azkaban,但服务器上并无该用户,随后创建 azkaban 用户并多次重试,有时候ok有时候failed,排除用户问题
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - Starting job rizhao_mysqldump at 1712237103312
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - job JVM args: '-Dazkaban.flowid=rizhao_source' '-Dazkaban.execid=3731409' '-Dazkaban.jobid=rizhao_mysqldump'
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - user.to.proxy property was not set, defaulting to submit user azkaban
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - Building command job executor. 
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - Failed with 5 inputs with exception e = null
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - Memory granted for job rizhao_mysqldump
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - 1 commands to execute.
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - cwd=/home/fusion_data/package/azkaban-exec-server/executions/3731409/rizhao_import
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - effective user is: azkaban
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - Command: sh /data/it_jobs/jobs.sh
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - Environment variables: {JOB_OUTPUT_PROP_FILE=/home/fusion_data/package/azkaban-exec-server/executions/3731409/rizhao_import/rizhao_mysqldump_output_382962359329691643_tmp, JOB_PROP_FILE=/home/fusion_data/package/azkaban-exec-server/executions/3731409/rizhao_import/rizhao_mysqldump_job_props_8893110997836978845_tmp, KRB5CCNAME=/tmp/krb5cc__rizhao_import__rizhao_source__rizhao_mysqldump__3731409__azkaban, JOB_NAME=rizhao_mysqldump}
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - Working directory: /home/fusion_data/package/azkaban-exec-server/executions/3731409/rizhao_import
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - Spawned process with id 225312
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - sh: /data/it_jobs/jobs.sh: No such file or directory
04-04-2024 21:25:03 CST rizhao_mysqldump INFO - Process with id 225312 completed unsuccessfully in 0 seconds.
04-04-2024 21:25:03 CST rizhao_mysqldump ERROR - Job run failed!
java.lang.RuntimeException: azkaban.jobExecutor.utils.process.ProcessFailureException: Process exited with code 127
  • 3.登录azkaban数据库,发现有2台 executor,而脚本只部署在 task-1 这1台机器
mysql> use fusion_metabase_azkaban
mysql> select * from executors limit 20;
+----+----------------+-------+--------+
| id | host           | port  | active |
+----+----------------+-------+--------+
| 29 | task-4.bigdata | 12321 |      1 |
| 30 | task-1.bigdata | 12321 |      1 |   #.脚本部署在这台
+----+----------------+-------+--------+
mysql> select executor_id,count(1),FROM_UNIXTIME(max(update_time)/1000) from execution_flows where executor_id in (29,30) and update_time>=1645459200000 group by executor_id;
+-------------+----------+--------------------------------------+
| executor_id | count(1) | FROM_UNIXTIME(max(update_time)/1000) |
+-------------+----------+--------------------------------------+
|          29 |   118342 | 2024-04-04 21:44:32.9970             |
|          30 |   122038 | 2024-04-04 21:46:33.0120             |
+-------------+----------+--------------------------------------+
  • 4.登录另外一台 executor 服务器,配置azkaban任务所用到的脚本
[root@task-1 ~]# ping task-4.bigdata
PING task-4.bigdata (192.168.1.4) 56(84) bytes of data.
64 bytes from task-4.bigdata (192.168.1.4): icmp_seq=1 ttl=64 time=0.165 ms

[root@task-1 ~]# ssh -p1618 root@192.168.1.4

[root@task-4 ~]# mkdir -p /home/azkaban/it_jobs/dump
[root@task-4 ~]# vi jobs.sh
  • 5.再次执行azkaban任务,成功~

    注:第2步多次执行有成功过,应该是调度到 task-1.bigdata 这台executor.

04-04-2024 21:49:40 CST rizhao_mysqldump INFO - Starting job rizhao_mysqldump at 1712238580810
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - job JVM args: '-Dazkaban.flowid=rizhao_source' '-Dazkaban.execid=3731477' '-Dazkaban.jobid=rizhao_mysqldump'
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - user.to.proxy property was not set, defaulting to submit user azkaban
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - Building command job executor. 
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - Failed with 5 inputs with exception e = null
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - Memory granted for job rizhao_mysqldump
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - 1 commands to execute.
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - cwd=/home/fusion_data/package/azkaban-exec-server/executions/3731477/rizhao_import
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - effective user is: azkaban
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - Command: /bin/sh /home/azkaban/it_jobs/jobs.sh
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - Environment variables: {JOB_OUTPUT_PROP_FILE=/home/fusion_data/package/azkaban-exec-server/executions/3731477/rizhao_import/rizhao_mysqldump_output_2013999851185447972_tmp, JOB_PROP_FILE=/home/fusion_data/package/azkaban-exec-server/executions/3731477/rizhao_import/rizhao_mysqldump_job_props_3619555286561983768_tmp, KRB5CCNAME=/tmp/krb5cc__rizhao_import__rizhao_source__rizhao_mysqldump__3731477__azkaban, JOB_NAME=rizhao_mysqldump}
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - Working directory: /home/fusion_data/package/azkaban-exec-server/executions/3731477/rizhao_import
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - Spawned process with id 319255
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - Current date and time: 2024-04-04 21:49:40
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - Dumping database: fusiondb2_zzhw
04-04-2024 21:49:40 CST rizhao_mysqldump INFO - mysqldump: [Warning] Using a password on the command line interface can be insecure.

解决方案

  • 针对多Executor模式,除了在所有Executor节点部署脚本之外,还可以在执行 azkaban 任务的时候添加变量 useExecutor 指定某个 Executor(比如 task-1.bigdata 对应的Executor的id为30)

Copyright © www.sqlfans.cn 2024 All Right Reserved更新时间: 2024-04-08 10:56:00

results matching ""

    No results matching ""