一、importtsv
把hdfs中数据抽取到HBase表中;
1、准备数据
##student.tsv[root@hadoop-senior datas]# cat student.tsv 10001 zhangsan 35 male beijing 010987654310002 lisi 32 male shanghia 010987656310003 zhaoliu 35 female hangzhou 0109834654310004 qianqi 35 male shenzhen 01098732543##[root@hadoop-senior hadoop-2.5.0]# bin/hdfs dfs -mkdir -p /user/root/hbase/importtsv[root@hadoop-senior hadoop-2.5.0]# bin/hdfs dfs -put /opt/datas/student.tsv /user/root/hbase/importtsv##创建HBase表hbase(main):005:0> create 'student', 'info'0 row(s) in 0.1530 seconds=> Hbase::Table - student
2、执行
##执行,下列命令全部执行export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2export HADOOP_HOME=/opt/modules/hadoop-2.5.0HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`:${HBASE_HOME}/conf \ ${HADOOP_HOME}/bin/yarn jar \${HBASE_HOME}/lib/hbase-server-0.98.6-hadoop2.jar importtsv \-Dimporttsv.columns=HBASE_ROW_KEY,\info:name,info:age,info:sex,info:address,info:phone \student \hdfs://hadoop-senior.ibeifeng.com:8020/user/root/hbase/importtsv##查看结果hbase(main):006:0> scan 'student'ROW COLUMN+CELL 10001 column=info:address, timestamp=1558594471571, value=beijing 10001 column=info:age, timestamp=1558594471571, value=35 10001 column=info:name, timestamp=1558594471571, value=zhangsan 10001 column=info:phone, timestamp=1558594471571, value=0109876543 10001 column=info:sex, timestamp=1558594471571, value=male 10002 column=info:address, timestamp=1558594471571, value=shanghia 10002 column=info:age, timestamp=1558594471571, value=32 10002 column=info:name, timestamp=1558594471571, value=lisi 10002 column=info:phone, timestamp=1558594471571, value=0109876563 10002 column=info:sex, timestamp=1558594471571, value=male 10003 column=info:address, timestamp=1558594471571, value=hangzhou 10003 column=info:age, timestamp=1558594471571, value=35 10003 column=info:name, timestamp=1558594471571, value=zhaoliu 10003 column=info:phone, timestamp=1558594471571, value=01098346543 10003 column=info:sex, timestamp=1558594471571, value=female 10004 column=info:address, timestamp=1558594471571, value=shenzhen 10004 column=info:age, timestamp=1558594471571, value=35 10004 column=info:name, timestamp=1558594471571, value=qianqi 10004 column=info:phone, timestamp=1558594471571, value=01098732543 10004 column=info:sex, timestamp=1558594471571, value=male
二、bulk load
1、bulk load
HBase支持bulk load的入库方式,它是利用hbase的数据信息按照特定格式存储在hdfs内这一原理,直接在HDFS中生成持久化的HFile数据格式文件,然后上传至合适位置,即完成巨量数据快速入库的办法。配合mapreduce完成,高效便捷,而且不占用region资源,增添负载,在大数据量写入时能极大的提高写入效率,并降低对HBase节点的写入压力。通过使用先生成HFile,然后再BulkLoad到Hbase的方式来替代之前直接调用HTableOutputFormat的方法有如下的好处:(1)消除了对HBase集群的插入压力(2)提高了Job的运行速度,降低了Job的执行时间
2、生成HFile
##建表hbase(main):007:0> create 'student2', 'info'0 row(s) in 0.1320 seconds=> Hbase::Table - student2##生成Hfileexport HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2export HADOOP_HOME=/opt/modules/hadoop-2.5.0HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`:${HBASE_HOME}/conf \ ${HADOOP_HOME}/bin/yarn jar \${HBASE_HOME}/lib/hbase-server-0.98.6-hadoop2.jar importtsv \-Dimporttsv.columns=HBASE_ROW_KEY,\info:name,info:age,info:sex,info:address,info:phone \-Dimporttsv.bulk.output=hdfs://hadoop-senior.ibeifeng.com:8020/user/root/hbase/hfileoutput \student2 \hdfs://hadoop-senior.ibeifeng.com:8020/user/root/hbase/importtsv##查看[root@hadoop-senior hadoop-2.5.0]# bin/hdfs dfs -ls /user/root/hbase/hfileoutput/infoFound 1 items-rw-r--r-- 1 root supergroup 1888 2019-05-24 13:31 /user/root/hbase/hfileoutput/info/8c28c6c654bc4fe2aa2c32ef54480771
2、将数据导入进表student2
##导数据export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2export HADOOP_HOME=/opt/modules/hadoop-2.5.0HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`:${HBASE_HOME}/conf \ ${HADOOP_HOME}/bin/yarn jar \${HBASE_HOME}/lib/hbase-server-0.98.6-hadoop2.jar \completebulkload \hdfs://hadoop-senior.ibeifeng.com:8020/user/root/hbase/hfileoutput \student2##scan student2hbase(main):008:0> scan 'student2'ROW COLUMN+CELL 10001 column=info:address, timestamp=1558675878109, value=beijing 10001 column=info:age, timestamp=1558675878109, value=35 10001 column=info:name, timestamp=1558675878109, value=zhangsan 10001 column=info:phone, timestamp=1558675878109, value=0109876543 10001 column=info:sex, timestamp=1558675878109, value=male 10002 column=info:address, timestamp=1558675878109, value=shanghia 10002 column=info:age, timestamp=1558675878109, value=32 10002 column=info:name, timestamp=1558675878109, value=lisi 10002 column=info:phone, timestamp=1558675878109, value=0109876563 10002 column=info:sex, timestamp=1558675878109, value=male 10003 column=info:address, timestamp=1558675878109, value=hangzhou 10003 column=info:age, timestamp=1558675878109, value=35 10003 column=info:name, timestamp=1558675878109, value=zhaoliu 10003 column=info:phone, timestamp=1558675878109, value=01098346543 10003 column=info:sex, timestamp=1558675878109, value=female 10004 column=info:address, timestamp=1558675878109, value=shenzhen 10004 column=info:age, timestamp=1558675878109, value=35 10004 column=info:name, timestamp=1558675878109, value=qianqi 10004 column=info:phone, timestamp=1558675878109, value=01098732543 10004 column=info:sex, timestamp=1558675878109, value=male 4 row(s) in 0.0420 seconds
3、在MapReduce中生成HFile文件