Hbase基础

一、基础理论

Hbase是一个非分布式的，面向列的开源数据库

基于BigTable

非结构化

存储在HDFS上，备份机制

线性扩展

cluster / slave ：Hmaster、Regionserver

Hbase架构图：

二、组件功能：

Hbase数据模型：

master负责维护表结构，regionserver负责存储数据：client通过 ZK 直连regionserver，即使master挂掉也能查询数据，但是不能建表。

每个regionserver服务器上有多个region。

每张表都有一个或多个region，但是一张表中的不同region可能存在不同的regionserver上，这也是不建议一个表多个列族的原因之一。

每张表的每个列族都存在一个文件上，经历从 WAL -> MemStore -> Hfile 的过程。通常读操作的时候会遵循 Block Cache -> MemStore -> Hfile的顺序进行查找，以便效率的提升。

MemStore到一定大小会flush成Hfile，而Hfile数量提升时，会进行Compaction操作：

Minor Compaction：较小的Hfile合并成较大的Hfile，即小文件 –> 大文件。减少Hfile数量，提升Hbase读性能。

Major Compaction：将对应一个column Family的所有Hfile合并成一个大的Hfile，并且会删除已经删除或者过期的Cell。很大的提升Hbase效率。但是Compaction中，包含大量的磁盘I/O和网络通信（有些Hfile已经在其他regionserver上，而此操作会将所有其他regionserver上的Hfile下载到本地），会造成region处于不可访问的状态。

NameSpace：每个命名空间都有可以有多张表，类比 Oracle中的database，MySQL中的schema。

Table：

Rowkey：

family_columns ： hbase 表中的每个列，都归属与某个列族。列族是表的 schema 的一部分(而列不是)，必须在使用表之前定义。列名都以列族作为前缀。例如 courses:history ， courses:math 都属于 courses 这个列族。访问控制、磁盘和内存的使用统计都是在列族层面进行的。列族越多，在取一行数据时所要参与 IO、搜寻的文件就越多，所以，如果没有必要，不要设置太多的列族

（每个列族存放在不同的文件中，建表时列族越少越好）

TimeStamp

cell 没有数据类型，都是字节码的形式， cell有多版本，Rowkey和列唯一确定cell，cell的版本通过时间戳来索引

为了避免数据存在过多版本造成的的管理 (包括存贮和索引)负担， hbase 提供了两种数据版本回收方式：

保存数据的最后 n 个版本

保存最近一段时间内的版本（设置数据的生命周期 TTL）。

用户可以针对每个列族进行设置。

三、 HBase shell 基本操作

HBase shell 正常进入 shell 的界面：

[root@tnode1 ~]# hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.4.2.0-258, rUnknown, Mon Apr 25 06:36:21 UTC 2016

hbase(main):001:0>

命名空间相关

类似与 MySQL 的 schema

创建命名空间

1 2	hbase(main):004:0> create_namespace 'testNS' 0 row(s) in 0.0980 seconds

删除命名空间

1 2	hbase(main):012:0> drop_namespace 'testNS' 0 row(s) in 0.0270 seconds

删除之前要先确定命名空间为空，否则会报错

ERROR: org.apache.hadoop.hbase.constraint.ConstraintException: Only empty namespaces can be removed. Namespace testNS has 1 tables
	at org.apache.hadoop.hbase.master.TableNamespaceManager.remove(TableNamespaceManager.java:198)
	at org.apache.hadoop.hbase.master.HMaster.deleteNamespace(HMaster.java:2507)
	at org.apache.hadoop.hbase.master.MasterRpcServices.deleteNamespace(MasterRpcServices.java:481)
	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:55453)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
	at java.lang.Thread.run(Thread.java:748)

Here is some help for this command:
Drop the named namespace. The namespace must be empty.

表相关

创建表

简单创建和设置列簇属性的创建

hbase(main):005:0> create 'userInfo','baseInfo'
0 row(s) in 1.4830 seconds

=> Hbase::Table - userInfo

hbase(main):030:0> create 'test:userInfo', { NAME=>'baseInfo',VERSIONS=>6 }, { NAME=>'extrInfo',CONFIGURATION=> {'hbase.hstore.blockingStoreFiles'=>'15'} }
0 row(s) in 1.2300 seconds

=> Hbase::Table - test:userInfo

列出表

两种，一种是列出所有表，一种是列出符合正则表达式的表

hbase(main):021:0> list
TABLE
test:customerTable
1 row(s) in 0.0040 seconds

=> ["test:customerTable"]

hbase(main):025:0> list '.*Table'
TABLE
test:customerTable
1 row(s) in 0.0030 seconds

=> ["test:customerTable"]

查看表

desc 和 describe 一样

hbase(main):024:0> desc 'test:customerTable'
Table test:customerTable is ENABLED                                                   test:customerTable                                                                   COLUMN FAMILIES DESCRIPTION                                                           {NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '3', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'
}                                                                                     {NAME => 'other', BLOOMFILTER => 'ROW', VERSIONS => '3', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0
'}                                                                                     2 row(s) in 0.0180 seconds

删除表

删除单张和正则表达式批量删除两种

# 删除单张
hbase(main):009:0> drop 'testNS:userInfo'

ERROR: Table testNS:userInfo is enabled. Disable it first.

Here is some help for this command:
Drop the named table. Table must first be disabled:
  hbase> drop 't1'
  hbase> drop 'ns1:t1'

hbase(main):010:0> disable 'testNS:userInfo'
0 row(s) in 2.2520 seconds

hbase(main):011:0> drop 'testNS:userInfo'
0 row(s) in 1.2390 seconds

# 删除一批
hbase(main):020:0> drop_all '.*Info'
custInfo
userInfo                                                                             

Drop the above 2 tables (y/n)?
y
2 tables successfully dropped

删表之前需要先禁用表

启停表

单表启停，批量启停，检查启停状态等六类

hbase(main):032:0> disable 'test:userInfo'
0 row(s) in 2.2620 seconds

hbase(main):033:0> enable 'test:userInfo'
0 row(s) in 1.2350 seconds

hbase(main):034:0> disable_all 'test:.*s.*'
test:customerTable
test:userInfo

Disable the above 2 tables (y/n)?
y
2 tables successfully disabled

hbase(main):035:0> is_disabled 'test:userInfo'
true                                                                                           
0 row(s) in 0.0050 seconds

hbase(main):036:0> is_enabled 'test:userInfo'
false
0 row(s) in 0.0050 seconds

hbase(main):037:0> enable_all 'test:.*s.*'
test:customerTable
test:userInfo               

Enable the above 2 tables (y/n)?
y
2 tables successfully enabled

hbase(main):038:0> is_disabled 'test:userInfo'
false

0 row(s) in 0.0110 seconds

hbase(main):039:0> is_enabled 'test:userInfo'
true

0 row(s) in 0.0090 seconds

修改表

hbase.online.schema.update.enable 该参数如果是 false，那么修改时必须先

表。如果为 true，那么可以直接修改表

新增修改列簇（HBase中修改和新增是一个操作）。如果传入新的列族名，可以新建列族；如果传入已存在的列族名，可以修改列族属性。列族属性有：

BLOOMFILTER
REPLICATION_SCOPE
MIN_VERSIONS
COMPRESSION
TTL
BLOCKSIZE
IN_MEMORY
IN_MEMORY_COMPACTION
BLOCKCACHE
KEEP_DELETED_CELLS
DATA_BLOCK_ENCODING
CACHE_DATA_ON_WRITE
CACHE_DATA_IN_L1
CACHE_INDEX_ON_WRITE
CACHE_BLOOMS_ON_WRITE
EVICT_BLOCKS_ON_CLOSE
PREFETCH_BLOCKS_ON_OPEN
ENCRYPTION
ENCRYPTION_KEY
IS_MOB_BYTES
MOB_THRESHOLD_BYTES

# 直接新增
hbase(main):013:0> alter 'test:userInfo',{NAME=>'cccfff'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9110 seconds

hbase(main):015:0> alter 'test:userInfo',NAME=>'cccfff'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9050 seconds

# 新增两个列簇
hbase(main):020:0> alter 'test:userInfo','cccfff',{NAME=>'cf123',VERSIONS=>4}
Updating all regions with the new schema...
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
^[[A^[[A^[[ADone.
0 row(s) in 3.7370 seconds

# 删除列簇
hbase(main):005:0> alter 'test:userInfo','delete'=>'cccfff'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9010 seconds

hbase(main):009:0> alter 'test:userInfo',METHOD=>'delete',NAME=>'cf111'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.8940 seconds

# 新增的同时删除一个
# 指定 METHOD 的方式
hbase(main):022:0> alter 'test:userInfo','cccfff',{NAME=>'cf123',METHOD=>'delete'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 3.7420 seconds
# 直接使用 delete 方法指定需要删除的列簇
hbase(main):025:0> alter 'test:userInfo','cccfff','delete'=>'cf123'
Updating all regions with the new schema...
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 3.7350 second

修改表级别属性。表级别的属性有：

MAX_FILESIZE
READONLY
MEMSTORE_FLUSHSIZE
DEFERRED_LOG_FLUSH
DURABILITY
REGION_REPLICATION
NORMALIZATION_ENABLED
PRIORITY
IS_ROOT
IS_META

hbase(main):027:0> alter 'test:userInfo',MAX_FILESIZE => '123217728'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9200 seconds

删除表级别属性。

hbase(main):001:0> alter 'test:userInfo',METHOD => 'table_att_unset', NAME => 'MAX_FILESIZE'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.1860 seconds

设置表配置。 即修改表/列簇在 hbase-site.xml 中的配置，而不影响其他表。

hbase(main):011:0> alter 'test:userInfo',CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10' }
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.8880 seconds

hbase(main):014:0> alter 'test:userInfo',{ NAME => 'extrInfo', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10' }}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.8850 seconds

查询相关

scan

遍历所有数据

1	> scan 'test:userInfo'

指定列

1	> scan 'test:userInfo',{COLUMNS=>'cf1:name'}

限制返回行数

1 2	> scan 'test:userInfo',{ LIMIT=>10 } > scan 'test:userInfo',{ COLUMNS=>'cf1:name', LIMIT=>10}

指定 rowkey 范围。包含 STARTROW，不包含 ENDROW数据

1	> scan 'test:userInfo',{STARTROW => '0001', ENDROW => '00002'}

指定时间戳范围。左闭右开范围。会查询到历史版本信息

1	> scan 'test:userInfo',{TIMERANGE=>[1569306706585, 1569316490296]}

查询多版本数据

1	> scan 'test:userInfo',{VERSIONS=>3}

显示原始记录。即查询所有数据，包含已经标记为删除的数据。必须配合 VERSIONS 使用，不能和 COLUMNS 一起使用

1	> scan 'test:userInfo',{VERSIONS=>3,RAW=>true}

指定过滤器。如果多个，使用 AND 和 OR 来进行连接

1 2	> scan 'test:userInfo', {FILTER => "FamilyFilter(=,'substring:f')"} > scan 'test:userInfo', {FILTER => "FamilyFilter(!=,'substring:info') AND ValueFilter(>=,'binary:X')"}

get

获取对应 rowkey 所有数据

1	> get 'test:userInfo','00002'

获取对应 rowkey 部分列数据，多个列可以用 [] 组合起来。

1 2	> get 'test:userInfo','00002', {COLUMN=> 'cf1'} > get 'test:userInfo','00002', {COLUMN=> ['extrInfo','baseInfo','cf1'] }

获取对应 rowkey 时间戳范围的数据。

1	> get 'test:userInfo','00002', {TIMERANGE=>[1569317161496,1569317463581]}

获取对应 rowkey 数据，指定版本数。不能单独使用 VERSIONS，必须配合 column 或者 timerange一起用

hbase(main):075:0> get 'test:userInfo','00002', { VERSIONS => 3}
COLUMN                                CELL

ERROR: Failed parse of {"VERSIONS"=>3}, Hash

> get 'test:userInfo','00002', {TIMERANGE=>[1569317161495,1569317463582],VERSIONS=>3}
> get 'test:userInfo','00002', {COLUMN=>'cf1', VERSIONS => 3}

获取对应 rowkey 的数据，使用过滤器过滤。

1
2
3

> get 'test:userInfo','00002', {FILTER=>"ValueFilter(=,'substring:XG')"}

> get 'test:userInfo','00002', {TIMERANGE=>[1569317161495,1569317463582],VERSIONS=>3,FILTER=>"ValueFilter(=,'substring:X')"}

count

计算表行数，用法三种：

1、直接出结果

1	> count 'test:userInfo'

2、指定步长，每步打印当前结果

1	> count 'test:userInfo', INTERVAL=1

3、指定缓存

1	> count 'test:userInfo', INTERVAL=1, CACHE=2

删除相关

delete

删除指定行中某列的数据，或者某列的某个版本的数据

1 2	delete 'test:userInfo','00002','cf1:name' delete 'test:userInfo','00002','cf1:name', 1569399092124

deleteall

delete加强版，除原有功能，还可以删除整行

1	deleteall 'test:userInfo','00006'

hbase(main):004:0> scan 'test:userInfo',{VERSIONS=>5,RAW=>true}
ROW                                                                  COLUMN+CELL
 00001                                                               column=cf1:name, timestamp=1569306706585, value=Wade111
 00002                                                               column=cf1:name, timestamp=1569400533054, type=DeleteColumn
 00002                                                               column=cf1:name, timestamp=1569399423111, value=lxmkkk
 00003                                                               column=cf1:name, timestamp=1569396888055, value=lalalla
 00004                                                               column=cf1:name, timestamp=1569396894851, value=lalalla2222
 00005                                                               column=cf1:name, timestamp=1569396907028, value=lalalla222333333
 00006                                                               column=MAX_FILESIZE:, timestamp=1569400663358, type=DeleteFamily
 00006                                                               column=baseInfo:, timestamp=1569400663358, type=DeleteFamily
 00006                                                               column=cf1:, timestamp=1569400663358, type=DeleteFamily
 00006                                                               column=cf1:name, timestamp=1569396916208, value=lalalla2223333335555555
 00006                                                               column=extrInfo:, timestamp=1569400663358, type=DeleteFamily
 00006                                                               column=table_att_unset:, timestamp=1569400663358, type=DeleteFamily
6 row(s) in 0.0180 seconds

过滤器相关

过滤器基础

过滤器的操作符

API	shell
LESS	<
LESS_OR_EQUAL	<=
EQUAL	=
NOT_EQUAL	<> or !=
GREATER_OR_EQUAL	>=
GREATER	>
NO_OP

过滤器中的比较器

API	shell
BinaryComparator 按字节索引顺序比较指定字节数组	binary
BinaryPrefixComparator 和第一个一样的效果，只不过比较前几个	binaryprefix
NullComparator 判断是否为空	null
BitComparator 位比较	bit
RegexStringComparator 提供一个正则的比较器，仅支持 EQUAL 和非EQUAL	regexstring
SubstringComparator 判断提供的子串是否出现在table的value中，仅支持 EQUAL 和非EQUAL	substring

show_filters 命令能查看已有过滤器。

hbase(main):052:0> show_filters
DependentColumnFilter
KeyOnlyFilter
ColumnCountGetFilter
SingleColumnValueFilter
PrefixFilter
SingleColumnValueExcludeFilter
FirstKeyOnlyFilter
ColumnRangeFilter
TimestampsFilter
FamilyFilter
QualifierFilter
ColumnPrefixFilter
RowFilter
MultipleColumnPrefixFilter
InclusiveStopFilter
PageFilter
ValueFilter
ColumnPaginationFilter

RowFilter

基于rowkey来过滤数据。如果知道起始的 rowkey，建议是用 scan 的 STARTROW 和 ENDROW，速度快很多。因为 RowFilter是遍历所有 rowkey 的。

scan 'test:userInfo',{FILTER=>"RowFilter(=,'substring:0000')"}
scan 'test:userInfo',{FILTER=>"RowFilter(<=,'binary:00009')"}
scan 'test:userInfo', FILTER=>"RowFilter(=,'substring:0000')"
scan 'test:userInfo', FILTER=>"RowFilter(<=,'binary:00009')"

FamilyFilter

基于列簇来过滤数据。

1 2	scan 'test:userInfo',FILTER=>"FamilyFilter(=,'substring:info')" scan 'test:userInfo',{FILTER=>"FamilyFilter(=,'substring:info')"}

QualifierFilter

基于列来过滤数据。

1 2	scan 'test:userInfo',{FILTER=>"QualifierFilter(=,'substring:name')"} scan 'test:userInfo', FILTER=>"QualifierFilter(=,'substring:name')"

ValueFilter

基于值来过滤数据。

1 2	scan 'test:userInfo', FILTER=>"ValueFilter(=,'substring:ll')" scan 'test:userInfo',{FILTER=>"ValueFilter(=,'substring:ll')"}

DependentColumnFilter

过滤指定列簇里面，返回与参考列具有相同时间戳的数据。前两个参数指定列簇和列，第三个参数布尔值，是否返回参考列的数据，true表示不返回，false表示返回，两个参数的就是默认false。第四五个参数就是比较运算符和比较器了，用来过滤 value。

1
2
3

scan 'test:userInfo',{FILTER=>"DependentColumnFilter('cf1','name')"}
scan 'test:userInfo',{FILTER=>"DependentColumnFilter('cf1','name',true)"}
scan 'test:userInfo',{FILTER=>"DependentColumnFilter('cf1','name',true,=,'substring:d')"}

SingleColumnValueFilter 和 SingleColumnValueExcludeFilter

用来查找并返回指定条件的列的数据，四个或六个参数，分别为列簇、列、比较符、比较器、是否跳过无该列的行、是否查询历史版本，后面两个默认分别是 false、true 即不跳过、不查询。：

如果遍历到某行时，该行没有此列簇:列，返回所有数据。如果选择跳过则不返回
如果遍历到某行时，该行有此列簇:列，但是不符合条件，则该行所有数据都不返回
如果遍历到某行时，该行有此列簇:列，并且也符合条件，则前者返回该行所有数据，后者返回除该列以外的所有数据

hbase(main):011:0> scan 'test:userInfo'
ROW                                                                  COLUMN+CELL
 00001                                                               column=baseInfo:name, timestamp=1569403548008, value=Michil
 00001                                                               column=cf1:name, timestamp=1569306706585, value=Wade111
 00002                                                               column=baseInfo:name, timestamp=1569396888055, value=dddd
 00003                                                               column=cf1:name, timestamp=1569396888055, value=lalalla
 00004                                                               column=cf1:name, timestamp=1569396894851, value=lalalla2222
 00005                                                               column=cf1:name, timestamp=1569396907028, value=lalalla222333333
 00009                                                               column=cf1:age, timestamp=1569396888055, value=dddd
 00009                                                               column=cf1:name, timestamp=1569396888055, value=dddd
 00010                                                               column=cf1:age, timestamp=1569396888055, value=1111
 00011                                                               column=baseInfo:age, timestamp=1569396888055, value=1111
 00012                                                               column=baseInfo:name, timestamp=1569461934129, value=ddd
 00012                                                               column=cf1:name, timestamp=1569462029439, value=ddd
9 row(s) in 0.0270 seconds

hbase(main):012:0>  scan 'test:userInfo',{FILTER=>"SingleColumnValueFilter('cf1','name',=,'substring:d')"}
ROW                                                                  COLUMN+CELL
 00001                                                               column=baseInfo:name, timestamp=1569403548008, value=Michil
 00001                                                               column=cf1:name, timestamp=1569306706585, value=Wade111
 00002                                                               column=baseInfo:name, timestamp=1569396888055, value=dddd
 00009                                                               column=cf1:age, timestamp=1569396888055, value=dddd
 00009                                                               column=cf1:name, timestamp=1569396888055, value=dddd
 00010                                                               column=cf1:age, timestamp=1569396888055, value=1111
 00011                                                               column=baseInfo:age, timestamp=1569396888055, value=1111
 00012                                                               column=baseInfo:name, timestamp=1569461934129, value=ddd
 00012                                                               column=cf1:name, timestamp=1569462029439, value=ddd
6 row(s) in 0.0300 seconds

hbase(main):013:0>  scan 'test:userInfo',{FILTER=>"SingleColumnValueFilter('cf1','name',=,'substring:d',true,false)"}
ROW                                                                  COLUMN+CELL
 00001                                                               column=baseInfo:name, timestamp=1569403548008, value=Michil
 00001                                                               column=cf1:name, timestamp=1569306706585, value=Wade111
 00009                                                               column=cf1:age, timestamp=1569396888055, value=dddd
 00009                                                               column=cf1:name, timestamp=1569396888055, value=dddd
 00012                                                               column=baseInfo:name, timestamp=1569461934129, value=ddd
 00012                                                               column=cf1:name, timestamp=1569462029439, value=ddd
3 row(s) in 0.0100 seconds

hbase(main):014:0>  scan 'test:userInfo',{FILTER=>"SingleColumnValueExcludeFilter('cf1','name',=,'substring:d')"}
ROW                                                                  COLUMN+CELL
 00001                                                               column=baseInfo:name, timestamp=1569403548008, value=Michil
 00002                                                               column=baseInfo:name, timestamp=1569396888055, value=dddd
 00009                                                               column=cf1:age, timestamp=1569396888055, value=dddd
 00010                                                               column=cf1:age, timestamp=1569396888055, value=1111
 00011                                                               column=baseInfo:age, timestamp=1569396888055, value=1111
 00012                                                               column=baseInfo:name, timestamp=1569461934129, value=ddd
6 row(s) in 0.0080 seconds

PrefixFilter

针对 rowkey 前缀来进行匹配的过滤器

1 2	scan 'test:userInfo',{FILTER=>"PrefixFilter('0001')"} # rowkey 前缀为 0001的所有行 scan 'test:userInfo',FILTER=>"PrefixFilter('0000')" # rowkey 前缀为 0000的所有行

PageFilter

取回N条数据。在shell中用法就是这样，API不知道能不能实现分页

1	scan 'test:userInfo',FILTER=>"PageFilter(8)" # 取回 8 条数据

KeyOnlyFilter

一个参数 lenAsVal，默认为 false。false 时value返回为空， true 时返回的是value的val值。

1	scan 'test:userInfo',FILTER=>"KeyOnlyFilter(true)"

FirstKeyOnlyFilter

返回每行第一个KV对，即每行的第一个 rowkey:列簇:列:value 对。对于count 和 sum 场景，可以带来性能提升。

1	scan 'test:userInfo',FILTER=>"FirstKeyOnlyFilter()"

InclusiveStopFilter

设置 stoprow，返回stoprow之前的数据（包括 stoprow 行）。可以配合 startrow使用，相当于stoprow的闭区间用法

1 2	scan 'test:userInfo',{FILTER=>"InclusiveStopFilter('00009')"} # 起始行到 00009 行 scan 'test:userInfo',{STARTROW=>'00002',FILTER=>"InclusiveStopFilter('00009')"} # 00002 到 00009 行

TimestampsFilter

基于时间戳来过滤。

1	scan 'test:userInfo',{FILTER=>"TimestampsFilter(1569403548008,1569396894851)"}

ColumnCountGetFilter

限制每行最多返回多少列。更适用于 get，不是很适合 scan，但能用。

get 时，每行最多返回多少列。scan 时的用法很乱，没他看懂…

1	get 'test:userInfo' , '00003' ,{FILTER=>"ColumnCountGetFilter(1)"}

ColumnPaginationFilter

限制返回列数，第一个参数为限制的列数，第二个参数为偏移量，即从第几列开始返回（0开始，）。

1	scan 'test:userInfo',{FILTER=>"ColumnPaginationFilter(1,1)"} # 从第二列开始返回，返回一列

ColumnPrefixFilter

列前缀过滤器。过滤出列前缀为 XX 的所有数据

1	scan 'test:userInfo',{FILTER=>"ColumnPrefixFilter('ag')"}

MultipleColumnPrefixFilter

ColumnPrefixFilter 加强版，支持一种和多种前缀

1	scan 'test:userInfo',{FILTER=>"MultipleColumnPrefixFilter('ag','na')"}

ColumnRangeFilter

列范围过滤器。在指定的范围内查找所有列数据。四个参数：最小列、是否包含最小列、最大列、是否包含最大列。大小是按字典排序的。

1	scan 'test:userInfo',{FILTER=>"ColumnRangeFilter('ag',true,'name',true)"}

RandomRowFilter —

shell不支持。

参数小于0，不返回数据。参数大于1，返回所有数据。参数位于 0-1，随机返回数据。

SkipFilter —-

shell不支持

一行中，只要存在一列不满足条件，整行都会被过滤

WhileMatchFilters —-

相当于 while。直到不 match ，然后 break 返回。