sqoop数据导入
1.sqoop查询mysql表记录
[root@hadoop-master bin]# ./sqoop eval --connect jdbc:mysql://192.168.2.177:3306/freemovie --username root -P --query "SELECT id,name,director from movie LIMIT 20"
...
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
------------------------------------------------------------
| id | name | director |
------------------------------------------------------------
| 4 | 傲慢与偏见 | 乔·赖特 |
| 8 | 怦然心动4 | 罗伯·莱纳 |
| 9 | 这个杀手不太冷 | 吕克·贝松 |
| 10 | 肖申克的救赎 | 弗兰克·德拉邦特 |
| 11 | 疯狂的赛车 | 宁浩 |
| 12 | 无间道 | 刘伟强 |
| 13 | 倩女幽魂 | 程小东 |
| 14 | 功夫 | 周星驰 |
| 15 | 七宗罪 | 大卫·芬奇 |
| 16 | 泰坦尼克号 | 詹姆斯·卡梅隆 |
| 17 | 看不见的客人 | 奥里奥尔·保罗 |
| 18 | 九品芝麻官 | 王晶 |
| 19 | 人在囧途 | 叶伟民 |
| 20 | 雨中曲 | 斯坦利·多南 |
| 21 | 冰雪奇缘 | 妮弗·李 |
| 22 | 夏洛特烦恼 | 闫非 / 彭大魔 |
| 23 | 花木兰 | 林恩·萨瑟兰 |
| 24 | 白蛇:缘起 | 黄家康 |
| 25 | 白蛇2:青蛇劫起 | 黄家康 |
| 26 | 千与千寻 | 宫崎骏 |
------------------------------------------------------------
2.sqoop导入数据到HDFS
[root@hadoop-master bin]# ./sqoop import --connect jdbc:mysql://192.168.2.177:3306/freemovie --username root -P --table movie --target-dir /sqoop-movie --num-mappers 1
...
23/05/04 09:24:29 INFO mapreduce.Job: map 100% reduce 0%
23/05/04 09:24:29 INFO mapreduce.Job: Job job_1683162046028_0001 completed successfully
23/05/04 09:24:30 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=136781
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=23480
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=2823
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=2823
Total vcore-milliseconds taken by all map tasks=2823
Total megabyte-milliseconds taken by all map tasks=2890752
Map-Reduce Framework
Map input records=43
Map output records=43
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=53
CPU time spent (ms)=1590
Physical memory (bytes) snapshot=216559616
Virtual memory (bytes) snapshot=2169782272
Total committed heap usage (bytes)=110100480
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=23480
23/05/04 09:24:30 INFO mapreduce.ImportJobBase: Transferred 22.9297 KB in 23.2531 seconds (1,009.7558 bytes/sec)
23/05/04 09:24:30 INFO mapreduce.ImportJobBase: Retrieved 43 records.
#查看movie表是否已经导入到HDFS
[root@hadoop-master bin]# hdfs dfs -ls /
Found 7 items
drwxr-xr-x - root supergroup 0 2023-04-24 14:50 /bak
drwxr-xr-x - root supergroup 0 2023-04-23 15:27 /coprocessor
drwxr-xr-x - root supergroup 0 2023-05-04 09:01 /hbase
drwxr-xr-x - root supergroup 0 2023-05-04 09:24 /sqoop-movie
drwxr-xr-x - root supergroup 0 2023-04-19 15:58 /test
drwxr-xr-x - root supergroup 0 2023-04-24 10:28 /tmp
drwxr-xr-x - root supergroup 0 2023-04-23 17:01 /user
3.sqoop导入数据到HBase
[root@hadoop-master bin]# ./sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" --connect jdbc:mysql://192.168.2.177:3306/freemovie --username root -P --query "SELECT id,name,director,create_time from movie where 1=1 AND \$CONDITIONS LIMIT 20" --hbase-table movie --hbase-create-table --hbase-row-key id --split-by create_time --column-family cf
...
23/05/04 09:48:12 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=1002426
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=925
HDFS: Number of bytes written=0
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Killed map tasks=1
Launched map tasks=7
Other local map tasks=7
Total time spent by all maps in occupied slots (ms)=60677
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=60677
Total vcore-milliseconds taken by all map tasks=60677
Total megabyte-milliseconds taken by all map tasks=62133248
Map-Reduce Framework
Map input records=20
Map output records=20
Input split bytes=925
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1979
CPU time spent (ms)=10090
Physical memory (bytes) snapshot=1201238016
Virtual memory (bytes) snapshot=12901285888
Total committed heap usage (bytes)=682098688
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
23/05/04 09:48:12 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 33.6968 seconds (0 bytes/sec)
23/05/04 09:48:12 INFO mapreduce.ImportJobBase: Retrieved 20 records.
#查看数据是否已经导入到HBase
[root@hadoop-master bin]# hbase shell
hbase(main):001:0> list
TABLE
CoprocessorTest
CoprocessorTestBak
FileTable
STUDENTS
SYSTEM:CATALOG
SYSTEM:FUNCTION
SYSTEM:MUTEX
SYSTEM:SEQUENCE
SYSTEM:STATS
TEST
movie
person
12 row(s) in 0.2210 seconds
=> ["CoprocessorTest", "CoprocessorTestBak", "FileTable", "STUDENTS", "SYSTEM:CATALOG", "SYSTEM:FUNCTION", "SYSTEM:MUTEX", "SYSTEM:SEQUENCE", "SYSTEM:STATS", "TEST", "movie", "person"]
注意:
MySQL中movie表的id的数据类型为int,并且为主键,而且是自动递增类型。导入时会出现以下报错信息:
Generating splits for a textual index column allowed only in case of “-Dorg.apache.sqoop.splitter.allow_text_splitter=true” property passed as a parameter
出现这个问题的原因是:“– split-by id”这个参数指定的id是必须是一个文本格式,所以需要在命令中加入选项"-Dorg.apache.sqoop.splitter.allow_text_splitter=true"参数。