Escanear con filtro usando HBase shell


¿Alguien sabe cómo escanear registros basados en algún filtro de escaneo, por ejemplo:

column:something = "somevalue"

Algo como esto , pero de HBase shell?

 43
Author: Community, 2011-08-31

5 answers

Prueba esto. Es un poco feo, pero funciona para mí.

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes
scan 't1', { COLUMNS => 'family:qualifier', FILTER =>
    SingleColumnValueFilter.new
        (Bytes.toBytes('family'),
         Bytes.toBytes('qualifier'),
         CompareFilter::CompareOp.valueOf('EQUAL'),
         SubstringComparator.new('somevalue'))
}

El shell de HBase incluirá lo que tengas en ~/.irbrc, así que puedes poner algo como esto (no soy un experto en Ruby, las mejoras son bienvenidas):

# imports like above
def scan_substr(table,family,qualifier,substr,*cols)
    scan table, { COLUMNS => cols, FILTER =>
        SingleColumnValueFilter.new
            (Bytes.toBytes(family), Bytes.toBytes(qualifier),
             CompareFilter::CompareOp.valueOf('EQUAL'),
             SubstringComparator.new(substr)) }
end

Y luego puedes decir en el shell:

scan_substr 't1', 'family', 'qualifier', 'somevalue', 'family:qualifier'
 47
Author: havanki4j,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2011-10-25 21:19:11
scan 'test', {COLUMNS => ['F'],FILTER => \ 
"(SingleColumnValueFilter('F','u',=,'regexstring:http:.*pdf',true,true)) AND \
(SingleColumnValueFilter('F','s',=,'binary:2',true,true))"}

Se puede encontrar más información aquí. Tenga en cuenta que varios ejemplos residen en el archivo Filter Language.docx adjunto.

 29
Author: dape,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2015-09-28 13:04:17

Utilice el parámetro de FILTRO de scan, como se muestra en la ayuda de uso:

hbase(main):002:0> scan

ERROR: wrong number of arguments (0 for 1)

Here is some help for this command:
Scan a table; pass table name and optionally a dictionary of scanner
specifications.  Scanner specifications may include one or more of:
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH,
or COLUMNS. If no columns are specified, all columns will be scanned.
To scan all members of a column family, leave the qualifier empty as in
'col_family:'.

Some examples:

  hbase> scan '.META.'
  hbase> scan '.META.', {COLUMNS => 'info:regioninfo'}
  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
  hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}

For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false).  By
default it is enabled.  Examples:

  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}
 8
Author: Tony,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2011-12-22 23:09:33
Scan scan = new Scan();
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);

//in case you have multiple SingleColumnValueFilters, 
you would want the row to pass MUST_PASS_ALL conditions
or MUST_PASS_ONE condition.

SingleColumnValueFilter filter_by_name = new SingleColumnValueFilter( 
                   Bytes.toBytes("SOME COLUMN FAMILY" ),
                   Bytes.toBytes("SOME COLUMN NAME"),
                   CompareOp.EQUAL,
                   Bytes.toBytes("SOME VALUE"));

filter_by_name.setFilterIfMissing(true);  
//if you don't want the rows that have the column missing.
Remember that adding the column filter doesn't mean that the 
rows that don't have the column will not be put into the 
result set. They will be, if you don't include this statement. 

list.addFilter(filter_by_name);


scan.setFilter(list);
 6
Author: KannarKK,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2014-02-18 07:03:53

Uno de los filtros es Valuefilter que se puede utilizar para filtrar todos los valores de columna.

hbase(main):067:0> scan 'dummytable', {FILTER => "ValueFilter(=,'binary:2016-01-26')"}

Binary es uno de los comparadores utilizados dentro del filtro. Puede utilizar diferentes comparadores dentro del filtro en función de lo que desee hacer.

Puede referirse a la siguiente url: http: / / www.hadooptpoint.com/filters-in-hbase-shell/. Proporciona buenos ejemplos sobre cómo usar diferentes filtros en HBase Shell.

 4
Author: Chetan Somani,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2016-02-22 17:04:32