- HBase is a distributed column-oriented database built on top of HDFS.
- One of the most popular NoSQL databases
- Hadoop application to use when you require real-time read/write random access to very large datasets.
- It is built from the ground up to scale linearly just by adding nodes.
- It has its own data model operations such as Get, Put, Scan and Delete and it does not offer SQL-like capabilities
- Architecture is based on three key components: HBase Master server, HBase Region Servers and Zookeeper.
- The client needs to find the RegionServers in order to work with the data stored in HBase.
- Regions are the basic elements for distributing tables across the cluster.
- In order to find the Region servers, the client first will have to talk to Zookeeper.
- A sorted multidimensional Map.
- The key elements in the HBase datamodel are tables, column families, columns and rowkeys.
- The tables are made of columns and rows.
- The individual elements at the column and row intersections (cells in HBase term) are version based on timestamp.
- The rows are identified by rowkeys which are sorted – these rowkeys can be considered as primary keys and all the data in the table can be accessed via them.
- The columns are grouped into column families; at table creation time you do not have to specify all the columns, only the column families.
- Columns have a prefix derived from the column family and its own qualifier,a column name looks like this: ‘contents:html’.
- Phoenix is an open source SQL skin for HBase
- Phoenix provides a command line tool called sqlline – it is a utility written in Python.
Access Phoenix via sqlline
In a hortonworks sandbox, navigate to /usr/hdp/current/phoenix-client/bin
..$ ./sqlline.py <host>:2181:/hbase-unsecure
Map Phoenix table to an existing HBase table
- All Phoenix tables can be viewed via HBase shell.
- To access the tables in phoenix, that were created using plain hbase shell, create a table or create a view.