DDL *** 作_随笔_内存溢出

DDL *** 作

在这里插入代码片

一、使用独立的元数据服务的方式访问Hive
– 概述：当开启多个hive客户端的时候，大家访问同一个metastore服务和元数据进行交互！

– 实现步骤：
– 修改配置文件 hive-site.xml

hive.metastore.uris
thrift://hadoop102:9083

-- 手动单独启动 metastore 服务
hive --service metastore

二、使用JDBC方式访问Hive（使用jdbc的方式展示Hive客户端）
– 概述：出于对客户端用户体验的提升，我们可以采用 beeline 客户端 *** 作hive,
– 通知启动一个支持beeline客户端的服务 hiveserver2

– 实现步骤：
– 修改配置文件 hive-site.xml

hive.server2.thrift.bind.host
hadoop102

```sql


    hive.server2.thrift.port
    10000


-- 启动 hiveserver2 服务 
hive --service hiveserver2

-- 启动beeline客户端
beeline -u jdbc:hive2://hadoop102:10000 -n atguigu

-- 注意：使用beeline 客户端一定要注意 在Hadoop的的core-site.xml 中添加以下配置


  hadoop.proxyuser.atguigu.hosts
  *



  hadoop.proxyuser.atguigu.groups
  *`在这里插入代码片`



  hadoop.proxyuser.atguigu.users
  *

三、 解决 metastore 和 hiveserver2 服务的敞口独占及进程阻塞问题
 
1. metastore 
   -- 处理 控制台 日志输出问题 
   hive --service metastore 1>/opt/module/hive-3.1.2/logs/metastore.log 2>&1

   -- 处理 进程阻塞 问题 
   hive --service metastore 1>/opt/module/hive-3.1.2/logs/metastore.log 2>&1 &

   -- 处理窗口独占 问题 
   nohup hive --service metastore 1>/opt/module/hive-3.1.2/logs/metastore.log 2>&1 &


2. hiveserver2 
   -- 处理 控制台 日志输出问题 
   hive --service hiveserver2 1>/opt/module/hive-3.1.2/logs/hiveserver2.log 2>&1

   -- 处理 进程阻塞 问题 
   hive --service hiveserver2 1>/opt/module/hive-3.1.2/logs/hiveserver2.log 2>&1 &

   -- 处理窗口独占 问题 
   nohup hive --service hiveserver2 1>/opt/module/hive-3.1.2/logs/hiveserver2.log 2>&1 &



四、 Hive的一些配置工作 
1. Hive的交互命令
-- 概述： 当启动Hive客户端 可以结合一些命令参数 完成一些简单交互功能

-- 交互命令：
   -e
   -f
   -hiveconf

```sql
在这里插入代码片

Hive常见属性配置

2.1 hive窗口打印默认库和表头

1）打印当前库和表头
在hive-site.xml中加入如下两个配置:

hive.cli.print.header
true

hive.cli.print.current.db
true

2.2 参数配置方式
– 概述：针对Hive，给hive修改配置参数的方式

– Hive中加载配置的顺序： hive-default.xml.template --> hive-site.xml --> 加载一些临时的配置

– 案例：临时修改Hive的配置项 hive.cli.print.current.db
– 方式一：在启动Hive客户端的时候通过交互命令来修改 -hiveconf
hive -hiveconf hive.cli.print.current.db=false

– 方式二：登录Hive客户端后，通过set 命令临时修改

五、 Hive中的数据类型

– 1. 建表

create table stu(
 id    int,
 name  string,
 sal   double,
 idcard  bigint
);

– 2. 插入数据

insert into stu values(1001, 'hello', 25000.00, 12345678965544333);

– 3. 优化Hive为本地模式（为了执行Job更快一些）

set hive.exec.mode.local.auto=true;

– 4. 简单查询

select * from stu;

±--------±----------±---------±-------------------+
| stu.id | stu.name | stu.sal | stu.idcard |
±--------±----------±---------±-------------------+
| 1001 | hello | 25000.0 | 12345678965544333 |
±--------±----------±---------±-------------------+
– 5.优化：让job执行效率高些（切换本地模式）
set hive.exec.mode.local.auto=true; //开启本地mr

集合数据类型

array map struct

– 1. 分析集合数据结构
{
“name”: “songsong”,
“friends”: [“bingbing” , “lili”] , //列表Array,
“children”: { //键值Map,
“xiao song”: 19 ,
“xiaoxiao song”: 18
}
“address”: { //结构Struct,
“street”: “hui long guan” ,
“city”: “beijing”,
“psotid”:123456
}
}

– 2. 准备一个数据文件
songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing_123456
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing_123_456

– 3. 创建person表

create table person(
name string,
friends array,
children map,
address structstreet:string,city:string,psotid:bigint
)
row format delimited fields terminated by “,”
collection items terminated by “_”
map keys terminated by “:”
lines terminated by “n”
;

– 4. 将数据 person.txt 上传至 person表在HDFS中对应的位置

– 5. 简单查询
select * from person；

– 2. 显式转换
select cast(‘1’ as int) + 3;
±-----+
| _c0 |
±-----+
| 4 |
±-----+

六、 DDL数据库定义 *** 作

CREATE DATAbase [IF NOT EXISTS] database_name – [IF NOT EXISTS] 判断当前库是否存在（不建议加…）
[COMMENT database_comment] – 针对当前库的描述信息
[LOCATION hdfs_path] – 指定当前库对应HDFS中的具体位置（通常不加，走默认路径）
[WITH DBPROPERTIES (property_name=property_value, …)]; – 针对当前库的结构化描述信息

– 1. 创建库
create database mydb1
comment ‘my db’
with dbproperties(‘author’=‘zcq’, ‘create_time’=‘20220119’);

create database mydb2
comment ‘my db2’
location ‘/mydb2’
with dbproperties(‘author’=‘zcq’, ‘create_time’=‘20220119’);

– 2. 查看库
show databases; – 查看当前Hive中所有的库的名称
desc database mydb1; – 查看指定的库的详情
±---------±---------±---------------------------------------------------±------------±------------±------------+
| db_name | comment | location | owner_name | owner_type | parameters |
±---------±---------±---------------------------------------------------±------------±------------±------------+
| mydb1 | my db | hdfs://hadoop102:9820/user/hive/warehouse/mydb1.db | atguigu | USER | |
±---------±---------±---------------------------------------------------±------------±------------±------------+
desc database extended mydb1; – 查看指定的库的详情

– 3. 修改库
alter database mydb2 set dbproperties(‘author’=‘bigdata1118’);

– 4. 删除数据库
drop database mydb1;

drop database mydb cascade; – 级联删除（非空的库的删除 *** 作）

[(col_name data_type [COMMENT col_comment], …)] – 指定当前表的字段以及字段的数据类型，当时根据查询结果复制一张表的时候，可以不加

[COMMENT table_comment] – 针对当前表的描述信息

[PARTITIonED BY (col_name data_type [COMMENT col_comment], …)] – 创建分区表的关键字

[CLUSTERED BY (col_name, col_name, …) INTO num_buckets BUCKETS] – 创建分桶表的关键字

[SORTED BY (col_name [ASC|DESC], …)] – 针对当前表指定默认的排序字段（通常情况不建议加）

[row format delimited fields terminated by “xxxx”] – 指定当前行数据字段对应的值之间的分割符
[collection items terminated by “xxxx”] – 指定集合数据中的元素之间的分割符
[map keys terminated by “xxxx”] – 指定map结构中的key 和 value的分割符
[lines terminated by “n”] – 指定每一行数据的之间分割符

[STORED AS file_format] – 指定当前表的存储文件的格式，默认是textfile(纯文本)

[LOCATION hdfs_path] – -- 指定当前表对应HDFS中的具体位置（通常不加，走默认路径）

[TBLPROPERTIES (property_name=property_value, …)] – 针对当前表的结构化描述信息

[AS select_statement] – 当复制一张表的加此项内容

create table haha1
as select name, friends, children, address from person
SORTED BY (name desc)
;

2.1 内部表和外部表的使用
– 1. 建表（内部表/管理表）
— 案例实 ***
– 准备数据
– 创建student表
create table student(
id int, name string
)
row format delimited fields terminated by ‘t’;

– 基本查询
select * from student;

– 查看当前student表的详细信息
desc formatted student;
| Table Type: | MANAGED_TABLE

– 删除表
drop table student;
– 结论：当前表如果是内部表删除的时候会把其在HDFS对应的真实数据以及目录结构都删除。

– 2. 建表（外部表）
create external table student(
id int, name string
)
row format delimited fields terminated by ‘t’;

– 基本查询
select * from student;

– 查看当前student表的详细信息
desc formatted student;
| Table Type: | EXTERNAL_TABLE

– -- 删除表
drop table student;

– 结论：当前表如果是外部表删除的时候不会把其在HDFS对应的真实数据以及目录结构都删除。

– 3. 分析内部表和外部表的使用场景
– 内部表：一般Hive中中间建表的场景，更适合用内部表
– 外部表：通常情况下，为了保证HDFS中的数据的安全性，我们在Hive中使用外部表。

– 4. 管理表(内部表)与外部表的互相转换
– 将外部表转换成内部表
alter table student set tblproperties(‘EXTERNAL’=‘FALSE’);
alter table student set tblproperties(‘EXTERNAL’=‘TRUE’);

2.2 修改表
– 1. 修改表名
alter table test1 rename to test2;

– 2. 针对列信息的修改
1). 更新列(针对字段的数据类型只能往大改不能往小改)
alter table test3 change column id my_id bigint;

2). 新增列
alter table test3 add columns (age int, sal double);

3). 替换列(针对字段的数据类型只能往大改不能往小改)
alter table test3 replace columns (id bigint, i_name string, age bigint, i_sal double);

2.3 删除表
drop table dept;

七、 DML数据 *** 作
– 概念：数据的导入导出 *** 作

数据导入

1.1 load导入方式
– 语法结构：
load data [local] inpath ‘数据的path’ [overwrite] – [local] 表示导入数据的来源于服务器本地，如果不加local,那相当于导入数据的来源在HDFS
into table student
[partition (partcol1=val1,…)]; – 针对分区表导入数据的可选项

– 案例实 *** ：
– 1. 创建一张表
create table stu1(id string, name string)
row format delimited fields terminated by ‘t’;
– 2. 从本地导入数据(覆盖)
load data local inpath ‘/opt/module/hive-3.1.2/datas/stu.txt’ overwrite
into table stu1;

– 3. 从本地导入数据(不覆盖)
load data local inpath ‘/opt/module/hive-3.1.2/datas/stu1.txt’
into table stu1;

– 4. 从HDFS导入数据
create table stu2(id string, name string)
row format delimited fields terminated by ‘t’;

load data inpath ‘/stu2/student.txt’ overwrite
into table stu2;

1.2 通过查询语句向表中插入数据（Insert）
– 建表
create table stu3(id int, name string)
row format delimited fields terminated by ‘t’;

– 1. 基本模式插入（根据单张表查询结果）
insert overwrite table stu3
select id, name from student;

insert into table stu3
select id, name from student;

– 2. 创建表时通过Location指定加载数据路径
create table stu4(id int, name string)
row format delimited fields terminated by ‘t’
location ‘/stu2’;

– 将查询的结果导出到HDFS
insert overwrite directory ‘/export_data/student2’
row format delimited fields terminated by ‘t’
select * from student;

– Export导出到HDFS上 & import数据到指定Hive表中
– 将数据导出到HDFS
export table student to ‘/export_data/student3’;
– 将以上导出的数据导入到hive中
import table bucunzai from ‘/export_data/student3’;

欢迎分享，转载请注明来源：内存溢出

原文地址: https://www.outofmemory.cn/zaji/5718284.html

DDL *** 作

发表评论

评论列表（0条）