Centos7安装azkaban
Centos7安装Azkaban¶
Centos7安装Gradle¶
1.下载Gradle安装包
wget https://downloads.gradle.org/distributions/gradle-3.2.1-all.zip
2.新建版本目录并解压安装
mkdir /opt/gradle
unzip -d /opt/gradle gradle-4.10.3-all.zip
3.配置环境变量
vi ~/.bash_profile
export PATH=$PATH:/opt/gradle/gradle-4.10.3/bin
source ~/.bash_profile
gradle -v
4.配置全局变量替换源
vi ~/.gradle/init.gradle
allprojects{
repositories {
def ALIYUN_REPOSITORY_URL = 'https://maven.aliyun.com/repository/public/'
def ALIYUN_JCENTER_URL = 'https://maven.aliyun.com/repository/jcenter/'
def ALIYUN_GOOGLE_URL = 'https://maven.aliyun.com/repository/google/'
def ALIYUN_GRADLE_PLUGIN_URL = 'https://maven.aliyun.com/repository/gradle-plugin/'
all { ArtifactRepository repo ->
if(repo instanceof MavenArtifactRepository){
def url = repo.url.toString()
if (url.startsWith('https://repo1.maven.org/maven2/')) {
project.logger.lifecycle "Repository ${repo.url} replaced by $ALIYUN_REPOSITORY_URL."
remove repo
}
if (url.startsWith('https://jcenter.bintray.com/')) {
project.logger.lifecycle "Repository ${repo.url} replaced by $ALIYUN_JCENTER_URL."
remove repo
}
if (url.startsWith('https://dl.google.com/dl/android/maven2/')) {
project.logger.lifecycle "Repository ${repo.url} replaced by $ALIYUN_GOOGLE_URL."
remove repo
}
if (url.startsWith('https://plugins.gradle.org/m2/')) {
project.logger.lifecycle "Repository ${repo.url} replaced by $ALIYUN_GRADLE_PLUGIN_URL."
remove repo
}
}
}
maven { url ALIYUN_REPOSITORY_URL }
maven { url ALIYUN_JCENTER_URL }
maven { url ALIYUN_GOOGLE_URL }
maven { url ALIYUN_GRADLE_PLUGIN_URL }
}
}
5.gradlew和gradle命令的区别 ```Gradlew是包装器,自动下载包装器里定义好的gradle 版本,保证编译环境统一,gradle 是用本地的gradle版本。
[Wrapper (gradlew)](https://www.zybuluo.com/xtccc/note/275168)
## 安装Azkanban
### AZkanban前期知识准备
1.介绍
Azkaban是由Linkedin公司推出的一个批量工作流任务调度器,主要用于在一个工作流内以一个特定的顺序运行一组工作和流程,它的配置是通过简单的key:value对的方式,通过配置中的dependencies 来设置依赖关系(依赖必须是一个DAG)。 Azkaban使用job配置文件建立任务之间的依赖关系,并提供一个易于使用的web用户界面维护和跟踪你的工作流。 Azkaban官网:https://azkaban.github.io/
2.Azkanban的特点
1.功能强大,可以调度几乎所有软件的执行,提供模块化和可插拔的插件机制,原生支持command、Java、Hive、Pig、Hadoop; 2.基于Java开发,代码结构清晰,易于二次开发; 3.简单易用的Web用户界面,可以监控每一个步骤; 4.提供job配置文件快速建立任务和任务之间的依赖关系; 5.提供了Restful接口,方便我们平台进行定制化调用。
3.Azkaban的架构
```markdown
Azkaban由三个关键组件构成:
1.关系型数据库(MySQL):用于保存工作流相关信息;
2.AzkabanWebServer: 整个 Azkaban 工作流系统的主要管理者,它用于用户登录认证、负责 project 管理、定时执行工作流、跟踪工作流执行进度等一系列任务;
3.AzkabanExecutorServer: 负责具体的工作流的调度提交。
4.Azkaban的三大模块
azkaban有三大模块,azkaban-common,azkaban-exec-server,azkaban-web-server.
azkaban-common:是公共模块,比如访问数据库,trigger管理工具,邮件工具,以及job.
azkaban-exec-server:是执行器,主要用于执行任务
azkaban-web-server:是调度中心,用于任务展示、编辑,调度任务
5.部署模式
solo-server模式:
DB使用的是一个内嵌的H2,Web Server和Executor Server运行在同一个进程里。这种模式包含Azkaban的所有特性,但一般用来学习和测试。
two-server模式:
DB使用的是MySQL,MySQL支持master-slave架构,Web Server和Executor Server运行在不同的进程中。
分布式multiple-executor模式:
DB使用的是MySQL,MySQL支持master-slave架构,Web Server和Executor Server运行在不同机器上,且有多个Executor Server。
编译安装¶
1.编译
git clone https://github.com/azkaban/azkaban.git [[clone]] the repo
cd azkaban; sh ./gradlew build [[build]] & install package
./gradlew clean
./gradlew installDist
./gradlew test
./gradlew build -x test
cd azkaban-solo-server/build/install/azkaban-solo-server; bin/start-solo.sh [[start]] solo server
bin/shutdown-solo.sh [[stop]]
2.使用编译好的文件进行安装
节点准备:
- web-serve:172.16.3.109
- executor-server:172.16.3.179
- DB-server:172.16.3.123
编译后文件上传并解压:
azkaban-web-server
mkdir opt/azkaban
cd azkaban
tar -zxvf azkaban-web-server-0.1.0-SNAPSHOT.tar.gz
tar -zxvf azkaban-db-0.1.0-SNAPSHOT.tar.gz
mv azkaban-db-0.1.0-SNAPSHOT azkaban-db
mv azkaban-web-server-0.1.0-SNAPSHOT azkaban-web-server
excutor-server
mkdir azkaban
cd azkaban
tar -zxvf azkaban-exec-server-0.1.0-SNAPSHOT.tar.gz
mv azkaban-exec-server-0.1.0-SNAPSHOT azkaban-exec-server
DB-server
mysql> CREATE DATABASE azkaban;
mysql> use azkaban;
mysql> source /opt/azkaban/azkaban-db/create-all-sql-0.1.0-SNAPSHOT.sql; [[工具导入失败建议使用SQL]]语句导入
insert into executors(host,port,active) values('172.16.3.179',12331,1);
3.Executor端Server安装
[root@kdc conf]# cat azkaban.properties
# Azkaban Personalization Settings
azkaban.name=Test
azkaban.label=My Local Azkaban
azkaban.color=[[FF3601]]
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=America/Los_Angeles
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=false
jetty.maxThreads=25
jetty.port=8081
# Where the Azkaban web server is located
azkaban.webserver.url=http://172.16.3.109:8081
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban plugin settings
azkaban.jobtype.plugin.dir=plugins/jobtypes
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=172.16.3.123
mysql.database=azkabab
mysql.user=root
mysql.password=root
mysql.numconnections=100
# Azkaban Executor settings
executor.maxThreads=50
executor.flow.threads=30
executor.port=12331
executor.connector.stats=true
executor.maxThreads=50
executor.flow.threads=30
execution.logs.retention.ms=2419200000
4.启动executor服务
注意:启动顺序一定是先启动azkaban-exec-server,再启动azkaban-web-server
./bin/start-web.sh [[一定要在bin]]文件的上一层目录进行启动
tail -f executorServerLog_*.out #查看启动日志
jps #查看进程
curl -G "localhost:12331/executor?action=activate" && echo [[手动激活executor]]
5.Web端Server安装
[[配置azkaban]].properties文件
cat azkaban.properties
# Azkaban Personalization Settings
azkaban.name=Test
azkaban.label=My Local Azkaban
azkaban.color=[[FF3601]]
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=America/Los_Angeles
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=false
jetty.maxThreads=25
jetty.port=8081
# Azkaban Executor settings
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=172.16.3.123
mysql.database=azkaban
mysql.user=root
mysql.password=root
mysql.numconnections=100
[[Multiple]] Executor
azkaban.use.multiple.executors=true
azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1
azkaban.executorselector.comparator.Memory=1
azkaban.executorselector.comparator.LastDispatched=1
azkaban.executorselector.comparator.CpuUsage=1
启动web服务
注意:启动顺序一定是先启动azkaban-exec-server,再启动azkaban-web-server
./bin/start-web.sh [[一定要在bin]]文件的上一层目录进行启动
tail -f executorServerLog_*.out #查看启动日志
jps #查看进程
配置jetty ssl
# keytool -keystore keystore -alias jetty -genkey -keyalg RSA
Enter keystore password:
Re-enter new password:
What is your first and last name?
[Unknown]: YY
What is the name of your organizational unit?
[Unknown]: YY
What is the name of your organization?
[Unknown]: YY
What is the name of your City or Locality?
[Unknown]: Beijing
What is the name of your State or Province?
[Unknown]: Beijing
What is the two-letter country code for this unit?
[Unknown]: CN
Is CN=YY, OU=YY, O=YY, L=shanghai, ST=shanghai, C=CN correct?
[no]: y
将生成的keystone文件拷贝到web-server的安装目录下,和conf等目录同级 修改conf/azkaban.properties配置文件
[[cat]] azkaban.properties
# Azkaban Personalization Settings
azkaban.name=Test [[服务器UI]]名称,用于服务器上方显示的名字
azkaban.label=My Local Azkaban #描述
azkaban.color=[[FF3601]] [[UI]]颜色
azkaban.default.servlet.path=/index
web.resource.dir=web/ [[默认根web]]目录
default.timezone.id=Asia/Shanghai #默认时区,已改为亚洲/上海
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager #用户权限管理默认类
user.manager.xml.file=conf/azkaban-users.xml #用户配置,具体配置参见下文
# Loader for projects
executor.global.properties=conf/global.properties [[globa]]配置文件所在位置
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties. [[jetty]]服务器属性
jetty.maxThreads=25 #最大线程数
jetty.ssl.port=8443 [[jetty]] ssl端口号
jetty.port=8081 [[jetty]]端口
jetty.keystore=keystore [[SSL]]文件名
jetty.password=bigdata@123 [[SSL]]文件密码
jetty.keypassword=bigdata@123 [[jetty主密码与keystore]]文件相同
jetty.truststore=keystore [[SSL]]文件名
jetty.trustpassword=bigdata@123 [[SSL]]文件密码
# Azkaban Executor settings
executor.port=12321 #执行服务器端口
# mail settings #邮件配置(暂没有配置)
mail.sender= #发送邮箱
mail.host= [[发送邮箱smtp]]地址
mail.password= #邮箱密码
job.failure.email= #任务失败时发送邮件的地址
job.success.email= #任务成功时发送邮件的地址
lockdown.create.projects=false
cache.directory=cache #缓存目录
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban plugin settings
azkaban.jobtype.plugin.dir=plugins/jobtypes
database.type=mysql #数据库类型
mysql.port=3306 #数据库端口号
mysql.host=172.31.217.173 #数据库连接地址
mysql.database=azkaban #数据库实例名
mysql.user=azkaban #数据库用户名
mysql.password=bigdata@123 #数据库密码
mysql.numconnections=100 #数据库最大连接数
azkaban.use.multiple.executors=true
azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1
azkaban.executorselector.comparator.Memory=1
azkaban.executorselector.comparator.LastDispatched=1
azkaban.executorselector.comparator.CpuUsage=1
参考资料¶
Azkaban doc
github项目地址
Azkaban3.X(可视化任务调度器)多节点集群部署
Azkaban学习
azkaban的部署过程中遇到的一些坑(部署篇)
hadoop学习笔记7-Azkaban
Azkaban任务调度框架的编译与安装