hadoop - Hbase Sorting efficiency -


I have the employee's name "Simon" on line -100 and on line 4000, I have the second name of the same name The employee is "Simon". Now I want to get all employees named "Simon" from my employees' table. The row-key is the SSN of each employee.

My question is, if I name all the employees "Simon". How is the search in Hbase efficient? Since the first name is "Simon" in line 100 and the second "Simon" is in 4000. With the name "Simon" hbase has to find all the table to find the name to get employment by name. Are you scanning a full table in this scenario?

If you have to scan a full table - which you do - it will not be a great measure. In fact, if you have very large rows, then this will be a terrific solution.

To resolve this problem, what is the most relational database management system (or "SQL database")? Indexed Since you are using "NoSQL database", it will not automatically create index for you.

Let's see how to create an index manually so that specific types of questions are efficiently adjusted. S where each unit has a unique key in e in S K (E) and an attribute value V (e) . In addition, assume that your organizations are in an Hibiz table, E as a line key for each unit in the form of K (e) for each line.

In relation to a V , index s is another table that usually comes in one of three forms is.

Index Form 1

Suppose that V (E) is also unique to each unit e . Next to V , there is a table with index s one unit per line, where the row key in the table is V (e ) and K (e) . Just go to that line to see by

one unit e to V (E) .

If your attribute value is V (E) use this approach. Think of a table of employees

employee where each employee has a unique employee ID inside the company, K (e) < / Code>. The main employee table can use EmployeeID as a unique key code, and Employee_SSN_Index employee SSN number V (e) < / Code> (which is also unique). It provides a fast search of employees by its SSN number.

index form 2

Suppose that V (E) is not potentially unique E < / Code>; That is, there can be duplicates, after this there is a table with index s one unit per line in relation to V , Where the key to a line of in the table is V (E) ++ (e) .

To see all the entities with e with V (E) , just start with the V (E) Lines with a prefix scan

When is not fixed with V (E) and it may be impossible to separate the point at which V (E) Ends and starts with K (e) . A separator can be placed between the V (E) and of (e) in the row key, for example V (E) ++ "|" ++ (e) In this case, the prefix V (e) ++ "|" .

employee's Department_index table can use the DepartmentID an employee attribute value V (E) .

index form 3

Suppose that V (e) might possibly be assigned to each unit E is not unique to; That is, there can be duplicates followed by index s in relation to V is a group of organizations in one line, where in the table K (E) with V (e) and a column family F qualifier. That is, the entities are grouped by attributes values ​​in the rows.

To view all entities, type V (E) with E , line V (e) column F by requesting all the columns in the family.

This approach should actually be kept in that case where the number is small in the number of institutions in each row of the index.

Comments

Popular posts from this blog

ios - Adding an SKSpriteNode to SKScene from a child SKSpriteNode -

Matlab transpose a table vector -

c# - Textbox not clickable but editable -