Skip to content

Centos7安装azkaban

Centos7安装Azkaban

Centos7安装Gradle

1.下载Gradle安装包

wget https://downloads.gradle.org/distributions/gradle-3.2.1-all.zip

2.新建版本目录并解压安装

mkdir /opt/gradle
unzip -d /opt/gradle gradle-4.10.3-all.zip

3.配置环境变量

vi ~/.bash_profile
export PATH=$PATH:/opt/gradle/gradle-4.10.3/bin
source ~/.bash_profile
gradle -v

4.配置全局变量替换源

vi ~/.gradle/init.gradle
allprojects{
    repositories {
        def ALIYUN_REPOSITORY_URL = 'https://maven.aliyun.com/repository/public/'
        def ALIYUN_JCENTER_URL = 'https://maven.aliyun.com/repository/jcenter/'
        def ALIYUN_GOOGLE_URL = 'https://maven.aliyun.com/repository/google/'
        def ALIYUN_GRADLE_PLUGIN_URL = 'https://maven.aliyun.com/repository/gradle-plugin/'
        all { ArtifactRepository repo ->
            if(repo instanceof MavenArtifactRepository){
                def url = repo.url.toString()
                if (url.startsWith('https://repo1.maven.org/maven2/')) {
                    project.logger.lifecycle "Repository ${repo.url} replaced by $ALIYUN_REPOSITORY_URL."
                    remove repo
                }
                if (url.startsWith('https://jcenter.bintray.com/')) {
                    project.logger.lifecycle "Repository ${repo.url} replaced by $ALIYUN_JCENTER_URL."
                    remove repo
                }
                if (url.startsWith('https://dl.google.com/dl/android/maven2/')) {
                    project.logger.lifecycle "Repository ${repo.url} replaced by $ALIYUN_GOOGLE_URL."
                    remove repo
                }
                if (url.startsWith('https://plugins.gradle.org/m2/')) {
                    project.logger.lifecycle "Repository ${repo.url} replaced by $ALIYUN_GRADLE_PLUGIN_URL."
                    remove repo
                }
            }
        }
        maven { url ALIYUN_REPOSITORY_URL }
        maven { url ALIYUN_JCENTER_URL }
        maven { url ALIYUN_GOOGLE_URL }
        maven { url ALIYUN_GRADLE_PLUGIN_URL }
    }
}

5.gradlew和gradle命令的区别 ```Gradlew是包装器,自动下载包装器里定义好的gradle 版本,保证编译环境统一,gradle 是用本地的gradle版本。

[Wrapper (gradlew)](https://www.zybuluo.com/xtccc/note/275168)
## 安装Azkanban
### AZkanban前期知识准备
1.介绍

Azkaban是由Linkedin公司推出的一个批量工作流任务调度器,主要用于在一个工作流内以一个特定的顺序运行一组工作和流程,它的配置是通过简单的key:value对的方式,通过配置中的dependencies 来设置依赖关系(依赖必须是一个DAG)。 Azkaban使用job配置文件建立任务之间的依赖关系,并提供一个易于使用的web用户界面维护和跟踪你的工作流。 Azkaban官网:https://azkaban.github.io/

2.Azkanban的特点

1.功能强大,可以调度几乎所有软件的执行,提供模块化和可插拔的插件机制,原生支持command、Java、Hive、Pig、Hadoop; 2.基于Java开发,代码结构清晰,易于二次开发; 3.简单易用的Web用户界面,可以监控每一个步骤; 4.提供job配置文件快速建立任务和任务之间的依赖关系; 5.提供了Restful接口,方便我们平台进行定制化调用。

3.Azkaban的架构
```markdown
Azkaban由三个关键组件构成:
  1.关系型数据库(MySQL):用于保存工作流相关信息;
  2.AzkabanWebServer: 整个 Azkaban 工作流系统的主要管理者,它用于用户登录认证、负责 project 管理、定时执行工作流、跟踪工作流执行进度等一系列任务;
  3.AzkabanExecutorServer: 负责具体的工作流的调度提交。

4.Azkaban的三大模块

azkaban有三大模块,azkaban-common,azkaban-exec-server,azkaban-web-server.
  azkaban-common:是公共模块,比如访问数据库,trigger管理工具,邮件工具,以及job.
  azkaban-exec-server:是执行器,主要用于执行任务
  azkaban-web-server:是调度中心,用于任务展示、编辑,调度任务

5.部署模式

solo-server模式:
DB使用的是一个内嵌的H2,Web Server和Executor Server运行在同一个进程里。这种模式包含Azkaban的所有特性,但一般用来学习和测试。

two-server模式:
DB使用的是MySQL,MySQL支持master-slave架构,Web Server和Executor Server运行在不同的进程中。

分布式multiple-executor模式:
DB使用的是MySQL,MySQL支持master-slave架构,Web Server和Executor Server运行在不同机器上,且有多个Executor Server。

编译安装

1.编译

git clone https://github.com/azkaban/azkaban.git [[clone]] the repo
cd azkaban; sh ./gradlew build  [[build]] & install package
./gradlew clean 
./gradlew installDist
./gradlew test
./gradlew build -x test

cd azkaban-solo-server/build/install/azkaban-solo-server; bin/start-solo.sh [[start]] solo server 
bin/shutdown-solo.sh  [[stop]] 

2.使用编译好的文件进行安装
节点准备:
- web-serve:172.16.3.109
- executor-server:172.16.3.179
- DB-server:172.16.3.123
编译后文件上传并解压:
azkaban-web-server

mkdir opt/azkaban
cd azkaban
tar -zxvf azkaban-web-server-0.1.0-SNAPSHOT.tar.gz
tar -zxvf azkaban-db-0.1.0-SNAPSHOT.tar.gz
mv azkaban-db-0.1.0-SNAPSHOT azkaban-db
mv azkaban-web-server-0.1.0-SNAPSHOT azkaban-web-server

excutor-server

mkdir azkaban
cd azkaban
tar -zxvf azkaban-exec-server-0.1.0-SNAPSHOT.tar.gz
mv azkaban-exec-server-0.1.0-SNAPSHOT azkaban-exec-server

DB-server

mysql> CREATE DATABASE azkaban;
mysql> use azkaban;
mysql> source /opt/azkaban/azkaban-db/create-all-sql-0.1.0-SNAPSHOT.sql; [[工具导入失败建议使用SQL]]语句导入
insert into executors(host,port,active) values('172.16.3.179',12331,1);

3.Executor端Server安装

[root@kdc conf]# cat azkaban.properties
# Azkaban Personalization Settings
azkaban.name=Test
azkaban.label=My Local Azkaban
azkaban.color=[[FF3601]]
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=America/Los_Angeles
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=false
jetty.maxThreads=25
jetty.port=8081
# Where the Azkaban web server is located
azkaban.webserver.url=http://172.16.3.109:8081
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban plugin settings
azkaban.jobtype.plugin.dir=plugins/jobtypes
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=172.16.3.123
mysql.database=azkabab
mysql.user=root
mysql.password=root
mysql.numconnections=100
# Azkaban Executor settings
executor.maxThreads=50
executor.flow.threads=30
executor.port=12331
executor.connector.stats=true
executor.maxThreads=50
executor.flow.threads=30
execution.logs.retention.ms=2419200000

4.启动executor服务
注意:启动顺序一定是先启动azkaban-exec-server,再启动azkaban-web-server

./bin/start-web.sh [[一定要在bin]]文件的上一层目录进行启动
tail -f executorServerLog_*.out #查看启动日志
jps #查看进程
curl -G "localhost:12331/executor?action=activate" && echo [[手动激活executor]]

5.Web端Server安装

[[配置azkaban]].properties文件
cat azkaban.properties
# Azkaban Personalization Settings
azkaban.name=Test
azkaban.label=My Local Azkaban
azkaban.color=[[FF3601]]
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=America/Los_Angeles
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=false
jetty.maxThreads=25
jetty.port=8081
# Azkaban Executor settings
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=172.16.3.123
mysql.database=azkaban
mysql.user=root
mysql.password=root
mysql.numconnections=100
[[Multiple]] Executor
azkaban.use.multiple.executors=true
azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1
azkaban.executorselector.comparator.Memory=1
azkaban.executorselector.comparator.LastDispatched=1
azkaban.executorselector.comparator.CpuUsage=1

启动web服务
注意:启动顺序一定是先启动azkaban-exec-server,再启动azkaban-web-server

./bin/start-web.sh [[一定要在bin]]文件的上一层目录进行启动
tail -f executorServerLog_*.out #查看启动日志
jps #查看进程

配置jetty ssl

# keytool -keystore keystore -alias jetty -genkey -keyalg RSA

Enter keystore password:

Re-enter new password:

What is your first and last name?

[Unknown]: YY

What is the name of your organizational unit?

[Unknown]: YY

What is the name of your organization?

[Unknown]: YY

What is the name of your City or Locality?

[Unknown]: Beijing

What is the name of your State or Province?

[Unknown]: Beijing

What is the two-letter country code for this unit?

[Unknown]: CN

Is CN=YY, OU=YY, O=YY, L=shanghai, ST=shanghai, C=CN correct?

[no]: y

将生成的keystone文件拷贝到web-server的安装目录下,和conf等目录同级 修改conf/azkaban.properties配置文件

[[cat]] azkaban.properties

# Azkaban Personalization Settings

azkaban.name=Test [[服务器UI]]名称,用于服务器上方显示的名字

azkaban.label=My Local Azkaban #描述

azkaban.color=[[FF3601]] [[UI]]颜色

azkaban.default.servlet.path=/index

web.resource.dir=web/ [[默认根web]]目录

default.timezone.id=Asia/Shanghai #默认时区,已改为亚洲/上海

# Azkaban UserManager class

user.manager.class=azkaban.user.XmlUserManager #用户权限管理默认类

user.manager.xml.file=conf/azkaban-users.xml #用户配置,具体配置参见下文

# Loader for projects

executor.global.properties=conf/global.properties [[globa]]配置文件所在位置

azkaban.project.dir=projects



# Velocity dev mode

velocity.dev.mode=false

# Azkaban Jetty server properties. [[jetty]]服务器属性

jetty.maxThreads=25 #最大线程数

jetty.ssl.port=8443 [[jetty]] ssl端口号

jetty.port=8081 [[jetty]]端口

jetty.keystore=keystore [[SSL]]文件名

jetty.password=bigdata@123 [[SSL]]文件密码

jetty.keypassword=bigdata@123 [[jetty主密码与keystore]]文件相同

jetty.truststore=keystore [[SSL]]文件名

jetty.trustpassword=bigdata@123 [[SSL]]文件密码

# Azkaban Executor settings

executor.port=12321 #执行服务器端口

# mail settings #邮件配置(暂没有配置)

mail.sender= #发送邮箱

mail.host= [[发送邮箱smtp]]地址

mail.password= #邮箱密码

job.failure.email= #任务失败时发送邮件的地址

job.success.email= #任务成功时发送邮件的地址

lockdown.create.projects=false

cache.directory=cache #缓存目录

# JMX stats

jetty.connector.stats=true

executor.connector.stats=true

# Azkaban plugin settings

azkaban.jobtype.plugin.dir=plugins/jobtypes



database.type=mysql #数据库类型

mysql.port=3306 #数据库端口号

mysql.host=172.31.217.173 #数据库连接地址

mysql.database=azkaban #数据库实例名

mysql.user=azkaban #数据库用户名

mysql.password=bigdata@123 #数据库密码

mysql.numconnections=100 #数据库最大连接数

azkaban.use.multiple.executors=true

azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus

azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1

azkaban.executorselector.comparator.Memory=1

azkaban.executorselector.comparator.LastDispatched=1

azkaban.executorselector.comparator.CpuUsage=1

参考资料

Azkaban doc github项目地址
Azkaban3.X(可视化任务调度器)多节点集群部署
Azkaban学习
azkaban的部署过程中遇到的一些坑(部署篇)
hadoop学习笔记7-Azkaban
Azkaban任务调度框架的编译与安装