Apache Sqoop 将mysql导入到Hadoop HDFS
时间:2022-05-03
本文章向大家介绍Apache Sqoop 将mysql导入到Hadoop HDFS,主要内容包括第 21 章 Apache Sqoop、21.2. sqoop2-tool、21.3. sqoop2-shell、基本概念、基础应用、原理机制和需要注意的事项等,并结合实例形式分析了其使用技巧,希望通过本文能帮助到大家理解应用这部分内容。
第 21 章 Apache Sqoop
目录
- 21.1. 安装 Sqoop
- 21.2. sqoop2-tool
- 21.2.1. verify
- 21.2.2. upgrade
- 21.3. sqoop2-shell
- 21.3.6.1. link
- 21.3.5.1. create job
- 21.3.5.2. show job
- 21.3.5.3. start job
- 21.3.5.4. status job
- 21.3.4.1. hdfs-connector
- 21.3.4.2. generic-jdbc-connector
- 21.3.2.1. server
- 21.3.2.2. 要设置可查看具体出错信息
- 21.3.1. show version
- 21.3.2. set
- 21.3.3. show connector
- 21.3.4. link
- 21.3.5. job
- 21.3.6. update
- 21.4. FAQ
- 21.4.1. Unable to load native-hadoop library for your platform
21.1. 安装 Sqoop
OSCM 一键安装
curl -s https://raw.githubusercontent.com/oscm/shell/master/database/apache-sqoop/sqoop-1.99.7-bin-hadoop200.sh | bash
启动 Sqoop
/srv/apache-sqoop/bin/sqoop.sh server start
检查 Sqoop 线程
[hadoop@netkiller ~]$ jps
2512 SecondaryNameNode
23729 SqoopJettyServer
2290 DataNode
871 ResourceManager
23885 Jps
21.2. sqoop2-tool
21.2.1. verify 配置文件检验工具
[hadoop@iZj6ciilv2rcpgauqg2uuwZ ~]$ sqoop2-tool verify
Setting conf dir: /srv/apache-sqoop/bin/../conf
Sqoop home directory: /srv/apache-sqoop
Sqoop tool executor:
Version: 1.99.7
Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
0 [main] INFO org.apache.sqoop.core.SqoopServer - Initializing Sqoop server.
6 [main] INFO org.apache.sqoop.core.PropertiesConfigurationProvider - Starting config file poller thread
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.
21.3. sqoop2-shell
进入 sqoop2-shell
[hadoop@netkiller ~]$ sqoop2-shell
Setting conf dir: /srv/apache-sqoop/bin/../conf
Sqoop home directory: /srv/apache-sqoop
Sqoop Shell: Type 'help' or 'h' for help.
sqoop:000>
Sqoop client script:
sqoop2-shell /path/to/your/script.sqoop
21.3.1. show version
sqoop:000> show version
client version:
Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
sqoop:000> show version --all
client version:
Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
server version:
Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
API versions:
[v1]
21.3.2. set
21.3.2.1. server
sqoop:000> set server --host master --port 12000 --webapp sqoop
Server is set successfully
21.3.2.2. 要设置可查看具体出错信息
sqoop:000> set option --name verbose --value true
Verbose option was changed to true
21.3.3. show connector
sqoop:000> show connector
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+------------------------+---------+------------------------------------------------------------+----------------------+
| Name | Version | Class | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
| kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
| oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO |
| ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO |
| hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
| kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
| sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO |
+------------------------+---------+------------------------------------------------------------+----------------------+
sqoop:000>
sqoop list-databases --connect jdbc:mysql://192.168.1.1:3306/ --username root --password 123456
sqoop:000> show connector --all
21.3.4. link
21.3.4.1. hdfs-connector
sqoop:000> create link -connector hdfs-connector
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: hdfs
HDFS cluster
URI: hdfs://127.0.0.1:9000
Conf directory:
Additional configs::
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and name hdfs
sqoop:000>
sqoop:000> show link
+------+----------------+---------+
| Name | Connector Name | Enabled |
+------+----------------+---------+
| hdfs | hdfs-connector | true |
+------+----------------+---------+
21.3.4.2. generic-jdbc-connector
sqoop:000> create link -connector generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: mysql
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://127.0.0.1:3306/test
Username: test
Password: ****
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry#
SQL Dialect
Identifier enclose:
New link was successfully created with validation status OK and name mysql
sqoop:000> show link
+-------+------------------------+---------+
| Name | Connector Name | Enabled |
+-------+------------------------+---------+
| mysql | generic-jdbc-connector | true |
| hdfs | hdfs-connector | true |
+-------+------------------------+---------+
21.3.5. job
21.3.5.1. create job
sqoop:000> create job -f "mysql" -t "hdfs"
Creating job for links with from name mysql and to name hdfs
Please fill following values to create new job object
Name: from-mysql-to-hdfs
Database source
Schema name: test
Table name: member
SQL statement:
Column names:
There are currently 0 values in the list:
element#
Partition column:
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
Target configuration
Override null value:
Null value:
File format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 0
Compression codec:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom codec:
Output directory: /sqoop/member
Append mode:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name from-mysql-to-hdfs
21.3.5.2. show job
sqoop:000> show job
+----+--------------------+--------------------------------+-----------------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+--------------------+--------------------------------+-----------------------+---------+
| 1 | from-mysql-to-hdfs | mysql (generic-jdbc-connector) | hdfs (hdfs-connector) | true |
+----+--------------------+--------------------------------+-----------------------+---------+
21.3.5.3. start job
sqoop:000> start job -n from-mysql-to-hdfs
sqoop:000> start job -n from-mysql-to-hdfs
Submission details
Job Name: from-mysql-to-hdfs
Server URL: http://localhost:12000/sqoop/
Created by: hadoop
Creation date: 2017-07-22 23:18:02 CST
Lastly updated by: hadoop
External ID: job_1499236611045_0001
http://iZj6ciilv2rcpgauqg2uuwZ:8088/proxy/application_1499236611045_0001/
2017-07-22 23:18:02 CST: BOOTING - Progress is not available
启动后进入HDFS查看导入情况
[hadoop@netkiller ~]$ hdfs dfs -ls /sqoop
[hadoop@netkiller ~]$ hdfs dfs -ls /member
Found 10 items
-rw-r--r-- 3 hadoop supergroup 0 2017-07-22 23:18 /member/310af608-5533-4bc2-bfb8-eaa45470b04d.txt
-rw-r--r-- 3 hadoop supergroup 48 2017-07-22 23:18 /member/36bc39a5-bc73-4065-a361-ff2d61c4922c.txt
-rw-r--r-- 3 hadoop supergroup 0 2017-07-22 23:18 /member/3e855400-84a9-422d-b50c-1baa9666a719.txt
-rw-r--r-- 3 hadoop supergroup 140 2017-07-22 23:18 /member/3e8dad92-e0f1-4a74-a337-642cf4e6d634.txt
-rw-r--r-- 3 hadoop supergroup 55 2017-07-22 23:18 /member/4a9f47f1-0413-4149-a93a-ed8b51efbc87.txt
-rw-r--r-- 3 hadoop supergroup 0 2017-07-22 23:18 /member/4dc5bfe7-1cd9-4d9b-96a8-07e82ed79a71.txt
-rw-r--r-- 3 hadoop supergroup 0 2017-07-22 23:18 /member/60dbcc60-61f2-4433-af39-1dfdfc048940.txt
-rw-r--r-- 3 hadoop supergroup 0 2017-07-22 23:18 /member/6d02ed89-94d9-4d4b-87ed-d5da9d2bf9fe.txt
-rw-r--r-- 3 hadoop supergroup 209 2017-07-22 23:18 /member/cf7b7185-3ab6-4077-943a-26228b769c57.txt
-rw-r--r-- 3 hadoop supergroup 0 2017-07-22 23:18 /member/f2e0780d-ad33-4b35-a1c7-b3fbc23e303d.txt
21.3.5.4. status job
sqoop:000> status job -n from-mysql-to-hdfs
21.3.6. update
21.3.6.1. link
sqoop:000> update link -n mysql
Updating link with name mysql
Please update link:
Name: mysql
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://127.0.0.1:3306/test
Username: test
Password: ****
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry#
SQL Dialect
Identifier enclose:
link was successfully updated with status OK
- UESTC 1591 An easy problem A【线段树点更新裸题】
- 关关的刷题日记05 —— Leetcode 219. Contains Duplicate II
- 关关的刷题日记05 —— Leetcode 217. Contains Duplicate 方法1和方法2
- HDU 2602 Bone Collector(01背包裸题)
- Appium+python自动化13-native和webview切换
- HDU 2639 Bone Collector II(01背包变形【第K大最优解】)
- 专知内容生产基石-数据爬取采集利器WebCollector 介绍
- python实现字符串模糊匹配
- 动态规划之01背包详解【解题报告】
- hihoCoder #1078 : 线段树的区间修改(线段树区间更新板子题)
- HDU 2546 饭卡(01背包裸题)
- 漫谈文件系统
- AI知识搜索利器:基于ElasticSearch构建专知实时高性能搜索系统
- 【深度干货】专知主题链路知识推荐#5-机器学习中似懂非懂的马尔科夫链蒙特卡洛采样(MCMC)入门教程01
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- 使用Python和Chrome安装Selenium WebDriver
- 牛客网-从上到下打印二叉树
- leetcode 剑指 Offer 40. 最小的k个数
- 一文了解JDK12 13 14 GC调优秘籍-附PDF下载
- JVM系列之:String.intern和stringTable
- 5万字长文:Stream和Lambda表达式最佳实践-附PDF下载
- 实现浏览器中的最大请求并发数控制
- 【漏洞复现】Weblogic漏洞搭建与复现:CVE-2018-2894 任意文件上传
- 终于把进程和线程学会了
- 对方向你转账60元--三角函数方法精确位的实现
- 牛客网-树的子结构
- 牛客网-删除链表中重复的节点
- 牛客网-包含min函数的栈
- 牛客网-反转链表
- 牛客网-替换空格