HBase基础教程(二十三)

sqoop数据导入

1.sqoop查询mysql表记录

[root@hadoop-master bin]# ./sqoop eval --connect jdbc:mysql://192.168.2.177:3306/freemovie --username root -P --query "SELECT id,name,director from movie LIMIT 20"
...
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
------------------------------------------------------------
| id         | name                 | director             | 
------------------------------------------------------------
| 4          | 傲慢与偏见                | 乔·赖特                 | 
| 8          | 怦然心动4                | 罗伯·莱纳                | 
| 9          | 这个杀手不太冷              | 吕克·贝松                | 
| 10         | 肖申克的救赎               | 弗兰克·德拉邦特             | 
| 11         | 疯狂的赛车                | 宁浩                   | 
| 12         | 无间道                  | 刘伟强                  | 
| 13         | 倩女幽魂                 | 程小东                  | 
| 14         | 功夫                   | 周星驰                  | 
| 15         | 七宗罪                  | 大卫·芬奇                | 
| 16         | 泰坦尼克号                | 詹姆斯·卡梅隆              | 
| 17         | 看不见的客人               | 奥里奥尔·保罗              | 
| 18         | 九品芝麻官                | 王晶                   | 
| 19         | 人在囧途                 | 叶伟民                  | 
| 20         | 雨中曲                  | 斯坦利·多南               | 
| 21         | 冰雪奇缘                 | 妮弗·李                 | 
| 22         | 夏洛特烦恼                |  闫非 / 彭大魔            | 
| 23         | 花木兰                  | 林恩·萨瑟兰               | 
| 24         | 白蛇:缘起                | 黄家康                  | 
| 25         | 白蛇2:青蛇劫起             | 黄家康                  | 
| 26         | 千与千寻                 | 宫崎骏                  | 
------------------------------------------------------------

2.sqoop导入数据到HDFS

[root@hadoop-master bin]# ./sqoop import --connect jdbc:mysql://192.168.2.177:3306/freemovie --username root -P --table movie --target-dir /sqoop-movie --num-mappers 1

...
23/05/04 09:24:29 INFO mapreduce.Job:  map 100% reduce 0%
23/05/04 09:24:29 INFO mapreduce.Job: Job job_1683162046028_0001 completed successfully
23/05/04 09:24:30 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=136781
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=23480
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=2823
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=2823
                Total vcore-milliseconds taken by all map tasks=2823
                Total megabyte-milliseconds taken by all map tasks=2890752
        Map-Reduce Framework
                Map input records=43
                Map output records=43
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=53
                CPU time spent (ms)=1590
                Physical memory (bytes) snapshot=216559616
                Virtual memory (bytes) snapshot=2169782272
                Total committed heap usage (bytes)=110100480
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=23480
23/05/04 09:24:30 INFO mapreduce.ImportJobBase: Transferred 22.9297 KB in 23.2531 seconds (1,009.7558 bytes/sec)
23/05/04 09:24:30 INFO mapreduce.ImportJobBase: Retrieved 43 records.

#查看movie表是否已经导入到HDFS

[root@hadoop-master bin]# hdfs dfs -ls /
Found 7 items
drwxr-xr-x   - root supergroup          0 2023-04-24 14:50 /bak
drwxr-xr-x   - root supergroup          0 2023-04-23 15:27 /coprocessor
drwxr-xr-x   - root supergroup          0 2023-05-04 09:01 /hbase
drwxr-xr-x   - root supergroup          0 2023-05-04 09:24 /sqoop-movie
drwxr-xr-x   - root supergroup          0 2023-04-19 15:58 /test
drwxr-xr-x   - root supergroup          0 2023-04-24 10:28 /tmp
drwxr-xr-x   - root supergroup          0 2023-04-23 17:01 /user

3.sqoop导入数据到HBase

[root@hadoop-master bin]# ./sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" --connect jdbc:mysql://192.168.2.177:3306/freemovie --username root -P --query "SELECT id,name,director,create_time from movie where 1=1 AND \$CONDITIONS  LIMIT 20" --hbase-table movie --hbase-create-table --hbase-row-key id --split-by create_time --column-family cf

...
23/05/04 09:48:12 INFO mapreduce.Job: Counters: 31
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=1002426
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=925
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
        Job Counters 
                Killed map tasks=1
                Launched map tasks=7
                Other local map tasks=7
                Total time spent by all maps in occupied slots (ms)=60677
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=60677
                Total vcore-milliseconds taken by all map tasks=60677
                Total megabyte-milliseconds taken by all map tasks=62133248
        Map-Reduce Framework
                Map input records=20
                Map output records=20
                Input split bytes=925
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=1979
                CPU time spent (ms)=10090
                Physical memory (bytes) snapshot=1201238016
                Virtual memory (bytes) snapshot=12901285888
                Total committed heap usage (bytes)=682098688
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=0
23/05/04 09:48:12 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 33.6968 seconds (0 bytes/sec)
23/05/04 09:48:12 INFO mapreduce.ImportJobBase: Retrieved 20 records.

#查看数据是否已经导入到HBase
[root@hadoop-master bin]# hbase shell

hbase(main):001:0> list
TABLE                                                                                                                                                                        
CoprocessorTest                                                                                                                                                              
CoprocessorTestBak                                                                                                                                                           
FileTable                                                                                                                                                                    
STUDENTS                                                                                                                                                                     
SYSTEM:CATALOG                                                                                                                                                               
SYSTEM:FUNCTION                                                                                                                                                              
SYSTEM:MUTEX                                                                                                                                                                 
SYSTEM:SEQUENCE                                                                                                                                                              
SYSTEM:STATS                                                                                                                                                                 
TEST                                                                                                                                                                         
movie                                                                                                                                                                        
person                                                                                                                                                                       
12 row(s) in 0.2210 seconds

=> ["CoprocessorTest", "CoprocessorTestBak", "FileTable", "STUDENTS", "SYSTEM:CATALOG", "SYSTEM:FUNCTION", "SYSTEM:MUTEX", "SYSTEM:SEQUENCE", "SYSTEM:STATS", "TEST", "movie", "person"]

注意：

MySQL中movie表的id的数据类型为int,并且为主键，而且是自动递增类型。导入时会出现以下报错信息：

Generating splits for a textual index column allowed only in case of “-Dorg.apache.sqoop.splitter.allow_text_splitter=true” property passed as a parameter

出现这个问题的原因是：“– split-by id”这个参数指定的id是必须是一个文本格式，所以需要在命令中加入选项"-Dorg.apache.sqoop.splitter.allow_text_splitter=true"参数。