CDH 6.3.1 集成Atlas

CDH 6.3.1 集成Atlas,第1张

CDH 6.3.1 集成Atlas

备注:
安装之前 先要准备好
JDK1.8
Zookeeper
Kafka
Hbase
Solr

文章目录
  • 一. Atlas下载
  • 二.ATLAS源码编译
    • 2.1 Intellij IDEA 导入Maven项目
    • 2.2 修改pom文件
    • 2.3 兼容HIVE2.1.1版本,修改ATLAS源代码
    • 2.4 编译
      • 2.4.1 JDK版本不够
    • 2.4.2 缺少jar包
    • 2.5 编译成功
  • 三. ATLAS安装
    • 3.1 解压
    • 3.2 修改配置文件atlas-env.sh
    • 3.3 修改配置文件atlas-application.properties
    • 3.4 修改atlas-log4j.xml文件
    • 3.5 集成CDH的Hbase
    • 3.6 集成CDH的SOLR
    • 3.7 集成CDH的KAFKA
  • 四. ATLAS启动
  • 五. ATLAS与HIVE集成
    • 5.1 配置修改
    • 5.2 将HIVE元数据导入ATLAS
    • 5.3 集成Hive后的测试
  • FAQ:
    • 1.编译前pom文件有报错
    • 2.其它节点运行hive报错
  • 参考:

一. Atlas下载

现在linux环境通过git下载:

git clone -b  release-2.1.0-rc3 https://github.com/apache/atlas.git

然后拷贝回到windows环境下,因为后期要用编译ATLAS源码。

二.ATLAS源码编译 2.1 Intellij IDEA 导入Maven项目
  1. 选择Open or import

  2. 选择第一步下载的atlas解压的位置

  3. 完成后进入编辑界面

  4. 之后我们先点开 File -> NewProjects Settings -> Structure for New Projects… 设置全局 JDK

  1. 设置 Maven 点击 File -> NewProjects Settings -> Settings for new Projects…

  1. 点击 OK 完成设置! (
    如果maven无法下载jar包的, 请检查是否联网,或者 前往maven的 E:java_studyapache-maven-3.8.1confsettings.xml 中 设置正确的远程jar包下载路径,我这里设置的阿里云仓库 。

需要等待一段时间,让maven把依赖的包都下载下来

2.2 修改pom文件

因与CDH6.3.1集成,在repositories中新增以下部分:


     cloudera
     https://repository.cloudera.com/artifactory/cloudera-repos
     
         true
     
     
         false
     
    

修改CHD对应的版本

7.4.0-cdh6.3.1
3.0.0-cdh6.3.1
2.1.0-cdh6.3.1
7.4.0-cdh6.3.1
2.1.1-cdh6.3.1
2.2.1-cdh6.3.1
2.11
3.4.5-cdh6.3.1
1.4.7-cdh6.3.1
2.3 兼容HIVE2.1.1版本,修改ATLAS源代码

所需修改的项目位置:atlas-release-2.1.0-rc3/addons/hive-bridge

①.org/apache/atlas/hive/bridge/HivemetaStoreBridge.java 577行

String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null;

改为:

String catalogName = null;

②.org/apache/atlas/hive/hook/AtlasHiveHookContext.java 81行

this.metastoreHandler = (listenerEvent != null) ? metastoreEvent.getIHMSHandler() : null;

改为:

this.metastoreHandler = null;
2.4 编译

命令:

mvn clean -DskipTests package -Pdist -X
2.4.1 JDK版本不够

JDK要求1.8.151及以上版本,我的JDK是1.8.201的,只能重新下载新的JDK,然后安装。

2.4.2 缺少jar包

Could not find artifact org.apache.sqoop:sqoop:pom:1.4.6.2.3.99.0-195 in nexus-aliyun

问题分析:
缺少指定的jar,在aliyun中可能没有,需要手动下载,安装至maven仓库。

此时需要去cloudera的官网去下载指定的jar包和pom文件

下载地址:
https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hive/

全部的jar包和pom文件有200多个,漏掉的话编译会提示漏掉的包,按照指定的提示下载去替换本地的jar包即可。

2.5 编译成功

终于编译成功了

找到编译成功的文件

三. ATLAS安装 3.1 解压

将apache-atlas-2.1.0-bin.tar.gz解压至安装目录

tar -zxvf /root/atlas-release-2.1.0-rc3/distro/target/apache-atlas-2.1.0-bin.tar.gz -C /opt/;
3.2 修改配置文件atlas-env.sh

先清空配置文件,然后将如下配置写入:

export Hbase_CONF_DIR=/etc/hbase/conf

export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=5120m -XX:metaspaceSize=100M -XX:MaxmetaspaceSize=512m"

export ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution -XX:+HeapDumponOutOfMemoryError -XX:HeapDumpPath=dumps/atlas_server.hprof -Xloggc:logs/gc-worker.log -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps"

export MANAGE_LOCAL_Hbase=false

export MANAGE_LOCAL_SOLR=false

export MANAGE_EMBEDDED_CASSANDRA=false

export MANAGE_LOCAL_ELASTICSEARCH=false
3.3 修改配置文件atlas-application.properties

注意修改zookeeper的地址

先清空配置文件,然后将如下配置写入:

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR ConDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#########  Graph Database Configs  #########

# Graph Database

#Configures the graph database to use.  Defaults to JanusGraph
#atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase

# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with  -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various  storage backends.
#
atlas.graph.storage.backend=hbase
atlas.graph.storage.hbase.table=apache_atlas_janus

${graph.storage.properties}

# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true

# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1

# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HbasebasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandrabasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
${entity.repository.properties}

# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1


# Graph Search Index

${graph.index.properties}

atlas.graph.index.search.backend=solr
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=hp1:2181/solr,hp2:2181/solr,hp3:2181/solr,hp4:2181/solr
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000


# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150

#########  import Configs  #########
#atlas.import.temp.directory=/temp/import

#########  Notification Configs  #########
atlas.notification.embedded=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=hp1:2181,hp2:2181,hp3:2181
atlas.kafka.bootstrap.servers=hp1:9092,hp2:9092,hp3:9092
atlas.kafka.zookeeper.session.timeout.ms=60000
atlas.kafka.zookeeper.connection.timeout.ms=30000
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas

atlas.kafka.enable.auto.commit=false
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000

atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/[email protected]
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab

## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443

#########  Security Properties  #########

# SSL config
atlas.enableTLS=false

#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks

#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks

# Authentication config

atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true

#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none

#### user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties

### groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true

######## LDAP properties #########
#atlas.authentication.method.ldap.url=ldap://:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchbase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=


######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.ad.bind.password=
#atlas.authentication.method.ldap.ad.referral=ignore
#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.default.role=

#########  JAAS Configuration ########

#atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule
#atlas.jaas.KafkaClient.loginModuleControlFlag = required
#atlas.jaas.KafkaClient.option.useKeyTab = true
#atlas.jaas.KafkaClient.option.storeKey = true
#atlas.jaas.KafkaClient.option.serviceName = kafka
#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
#atlas.jaas.KafkaClient.option.principal = atlas/[email protected]

#########  Server Properties  #########
atlas.rest.address=http://localhost:21000
# If enabled and set to true, this will run setup steps when the server starts
atlas.server.run.setup.on.start=false

#########  Entity Audit Configs  #########
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=hp1:2181,hp2:2181,hp3:2181

#Hive
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary

#########  High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=:
#atlas.server.ha.zookeeper.auth=:



######### Atlas Authorization #########
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json

#########  Type Cache Implementation ########
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl=

#########  Performance Configs  #########
#atlas.graph.storage.lock.retries=10
#atlas.graph.storage.cache.db-cache-time=120000

#########  CSRF Configs  #########
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER

############ KNOX Configs ################
#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
#atlas.sso.knox.enabled=true
#atlas.sso.knox.providerurl=https://:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey=

############ Atlas Metric/Stats configs ################
# Format: atlas.metric.query..
atlas.metric.query.cache.ttlInSecs=900
#atlas.metric.query.general.typeCount=
#atlas.metric.query.general.typeUnusedCount=
#atlas.metric.query.general.entityCount=
#atlas.metric.query.general.tagCount=
#atlas.metric.query.general.entityDeleted=
#
#atlas.metric.query.entity.typeEntities=
#atlas.metric.query.entity.entityTagged=
#
#atlas.metric.query.tags.entityTags=

#########  Compiled Query Cache Configuration  #########

# The size of the compiled query cache.  Older queries will be evicted from the cache
# when we reach the capacity.

#atlas.CompiledQueryCache.capacity=1000

# Allows notifications when items are evicted from the compiled query
# cache because it has become full.  A warning will be issued when
# the specified number of evictions have occurred.  If the eviction
# warning threshold <= 0, no eviction warnings will be issued.

#atlas.CompiledQueryCache.evictionWarningThrottle=0


#########  Full Text Search Configuration  #########

#Set to false to disable full text search.
#atlas.search.fulltext.enable=true

#########  Gremlin Search Configuration  #########

#Set to false to disable gremlin search.
atlas.search.gremlin.enable=false


########## Add http headers ###########

#atlas.headers.Access-Control-Allow-Origin=*
#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
#atlas.headers.=


#########  UI Configuration ########

atlas.ui.default.version=v1

# whether to run the hook synchronously. false recommended to avoid delays in Sqoop operation completion. Default: false
atlas.hook.sqoop.synchronous=false
# number of retries for notification failure. Default: 3
atlas.hook.sqoop.numRetries=3
# queue size for the threadpool. Default: 10000      
atlas.hook.sqoop.queueSize=10000

atlas.kafka.metric.reporters=org.apache.kafka.common.metrics.JmxReporter
atlas.kafka.client.id=sqoop-atlas
3.4 修改atlas-log4j.xml文件

去掉如下代码的注释:

   
    
    
    
        
    



    
    

3.5 集成CDH的Hbase

添加hbase集群配置文件到/opt/apache-atlas-2.1.0/conf/hbase下

ln -s /etc/hbase/conf/ /opt/apache-atlas-2.1.0/conf/hbase
3.6 集成CDH的SOLR

①将apache-atlas-2.1.0/conf/solr文件拷贝到solr的安装目录下,更名为atlas-solr

scp -r /opt/apache-atlas-2.1.0/conf/solr root@hp1:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/
scp -r /opt/apache-atlas-2.1.0/conf/solr root@hp2:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/
scp -r /opt/apache-atlas-2.1.0/conf/solr root@hp3:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/
scp -r /opt/apache-atlas-2.1.0/conf/solr root@hp4:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/
cd /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/
mv solr/ atlas-solr

②创建collection

/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/bin/solr create -c  vertex_index -d /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/atlas-solr -shards 3 -replicationFactor 2 -force
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/bin/solr create -c  edge_index -d /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/atlas-solr -shards 3 -replicationFactor 2 -force
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/bin/solr create -c  fulltext_index -d /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/atlas-solr -shards 3 -replicationFactor 2 -force

③验证创建collection成功
登录 solr web控制台: http://hp1:8983 验证是否启动成功

3.7 集成CDH的KAFKA

①创建Kafka Topic

kafka-topics --zookeeper hp1:2181,hp2:2181,hp3:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
kafka-topics --zookeeper hp1:2181,hp2:2181,hp3:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
kafka-topics --zookeeper hp1:2181,hp2:2181,hp3:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK

②查看topic

kafka-topics --list --zookeeper hp1:2181,hp2:2181,hp3:2181
四. ATLAS启动

启动命令:

cd /opt/apache-atlas-2.1.0
./bin/atlas_start.py

登录 atlas web控制台: http://hp1:21000 验证是否启动成功!

默认用户名和密码为:admin

如果启动失败,查看日志:

五. ATLAS与HIVE集成 5.1 配置修改

修改hive的相关配置文件
进入CM web控制台–> 进入hive的配置界面
① 搜索 hive-site.xml

修改【hive-site.xml的Hive服务高级代码段(安全阀)】
名称:hive.exec.post.hooks
值:org.apache.atlas.hive.hook.HiveHook
修改【hive-site.xml的Hive客户端高级代码段(安全阀)】
名称:hive.exec.post.hooks
值:org.apache.atlas.hive.hook.HiveHook

②搜索 hive-env

修改 【hive-env.sh 的 Gateway 客户端环境高级配置代码段(安全阀)】
HIVE_AUX_JARS_PATH=/opt/apache-atlas-2.1.0/hook/hive

③搜索 hive_aux
此配置一共增加两块代码段

修改 【hive-site.xml 的 HiveServer2 高级配置代码段(安全阀)】

名称:hive.exec.post.hooks
值:org.apache.atlas.hive.hook.HiveHook

名称:hive.reloadable.aux.jars.path
值:/opt/apache-atlas-2.1.0/hook/hive

④修改 【HiveServer2 环境高级配置代码段(安全阀)】

HIVE_AUX_JARS_PATH=/opt/apache-atlas-2.1.0/hook/hive

完整如下图:

5.2 将HIVE元数据导入ATLAS

命令:

cd /opt/apache-atlas-2.1.0
./bin/import-hive.sh 
		Enter username for atlas :- admin
		Enter password for atlas :- 
		Hive meta Data import was successful!!


提示root用户未设置HIVE_HOME环境变量

export HIVE_HOME=/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hive

读取不到atlas-application.properties配置文件,看了源码发现是在classpath读取的这个配置文件,所以将它压到jar里面

cd /opt/apache-atlas-2.1.0/conf
zip -u /opt/apache-atlas-2.1.0/hook/hive/atlas-hive-plugin-impl/atlas-intg-2.1.0.jar /opt atlas-application.properties

最终成功:

5.3 集成Hive后的测试

代码:

CREATE TABLE t_ppp  (
  id int ,
  pice decimal(2, 1)
) ;

insert into t_ppp values (1,2.2);

CREATE TABLE t_ppp_bak  (
  id int ,
  pice decimal(2, 1)
) ;

insert overwrite table t_ppp_bak select id,pice from t_ppp;

CREATE VIEW  IF NOT EXISTS t_ppp_view AS SELECt id,pice FROM t_ppp_bak;

FAQ: 1.编译前pom文件有报错

本地亲测,这个报错不影响编译源码,可以忽略

2.其它节点运行hive报错

因为我的atlas只安装在了hp1,所以我的atlas的解压包也只存在于hp1,但是我hp1、hp2、hp3、hp4这4个节点都安装了hive,于是我在hp2节点运行hive命令,直接报错:

此时把解压的atlas目录传到其余的三个节点

scp -r /opt/apache-atlas-2.1.0 root@hp2:/opt/
scp -r /opt/apache-atlas-2.1.0 root@hp3:/opt/
scp -r /opt/apache-atlas-2.1.0 root@hp4:/opt/
参考:
  1. https://www.freesion.com/article/51361348863/
  2. https://www.cnblogs.com/kaola8023/p/14069519.html
  3. https://blog.csdn.net/qq_26502245/article/details/108008070

欢迎分享,转载请注明来源:内存溢出

原文地址: https://www.outofmemory.cn/zaji/5664961.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-16
下一篇 2022-12-16

发表评论

登录后才能评论

评论列表(0条)

保存