Suche
Lesesoftware
Info / Kontakt
Pro Apache Phoenix - An SQL Driver for HBase
von: Shakil Akhtar, Ravi Magham
Apress, 2016
ISBN: 9781484223703 , 148 Seiten
Format: PDF, Online Lesen
Kopierschutz: Wasserzeichen
Preis: 28,88 EUR
eBook anfordern
Contents at a Glance
4
Contents
5
About the Authors
13
About the Technical Reviewers
14
Chapter 1: Introduction
15
1.1 Big Data Lake and Its Representation
16
1.2 Modern Applications and Big Data
17
1.2.1 Fraud Detection in Banking
17
1.2.2 Log Data Analysis
17
1.2.3 Recommendation Engines
18
1.2.3.1 Social Media Analysis
18
1.3 Analyzing Big Data
18
1.4 An Overview of Hadoop and MapReduce
19
1.5 Hadoop Ecosystem
19
1.5.1 HDFS
20
1.5.2 MapReduce
21
1.5.3 HBase
23
1.5.4 Hive
24
1.5.5 YARN
25
1.5.6 Spark
25
1.5.7 PIG
25
1.5.8 ZooKeeper
25
1.6 Phoenix in the Hadoop Ecosystem
26
1.7 Phoenix’s Place in Big Data Systems
26
1.8 Importance of Traditional SQL-Based Tools and the Role of Phoenix
26
1.8.1 Traditional DBA Problems for Big Data Systems-
27
1.8.2 Which Tool Should I Use for Big Data?
27
1.8.3 Massive Data Storage and Challenges
27
1.8.4 A Traditional Data Warehouse and Querying
27
1.9 Apache Phoenix in Big Data Analytics
28
1.10 Summary
28
Chapter 2: Using Phoenix
29
2.1 What is Apache Phoenix?
29
2.2 Architecture
30
2.2.1 Installing Apache Phoenix
31
2.2.2 Installing Java
31
2.2.2.1 Installing Java on Linux
31
2.2.2.2 Installing Java on Mac OS X
32
2.3 Installing HBase
32
2.4 Installing Apache Phoenix
33
2.5 Installing Phoenix on Hortonworks HDP
34
2.5.1 Downloading Hortonworks Sandbox
35
2.5.2 Start HBase
41
2.5.3 Testing Your Phoenix Installation
42
2.6 Installing Phoenix on Cloudera Hadoop
44
2.7 Capabilities
45
2.8 Hadoop Ecosystem and the Role of Phoenix
46
2.9 Brief Description of Phoenix’s Key Features
47
2.9.1 Transactions
47
2.9.2 User-Defined Functions
47
2.9.3 Secondary Indexes
48
2.9.4 Skip Scan
48
2.9.5 Views
48
2.9.6 Multi-Tenancy
48
2.9.7 Query Server
49
2.10 Summary
49
Chapter 3: CRUD with Phoenix
50
3.1 Data Types in Phoenix
50
3.1.1 Primitive Data Types
50
3.1.2 Complex Data Types
50
3.2 Data Model
51
3.2.1 Steps in data modeling
52
3.3 Phoenix Write Path
52
3.4 Phoenix Read Path
52
3.5 Basic Commands
52
3.5.1 HELP
53
3.5.2 CREATE
54
3.5.3 UPSERT
54
3.5.4 SELECT
54
3.5.5 ALTER
55
3.5.6 DELETE
55
3.5.7 DESCRIBE
55
3.5.8 LIST
56
3.6 Working with Phoenix API
56
3.6.1 Environment setup
56
3.7 Summary
62
Chapter 4: Querying Data
63
4.1 Constraints
63
4.1.1 NOT NULL
63
4.2 Creating Tables
64
4.3 Salted Tables
65
4.4 Dropping Tables
67
4.5 ALTER Tables
67
4.5.1 Adding Columns
68
4.5.2 Deleting or Replacing Columns
68
4.5.3 Renaming a Column
69
4.6 Clauses
69
4.6.1 LIMIT
69
4.6.2 WHERE
70
4.6.3 GROUP BY
70
4.6.4 HAVING
71
4.6.5 ORDER BY
71
4.7 Logical Operators
72
4.7.1 AND
72
4.7.2 OR
72
4.7.3 IN
72
4.7.4 LIKE
73
4.7.5 BETWEEN
73
4.8 Summary
73
Chapter 5: Advanced Querying
74
5.1 Joins
74
5.2 Inner Join
74
5.3 Outer Join
75
5.3.1 Left Outer Join
75
5.3.2 Right Outer Join
76
5.3.3 Full Outer Join
77
5.4 Grouped Joins
78
5.5 Hash Join
79
5.6 Sort Merge Join
80
5.7 Join Query Optimizations
80
5.7.1 Optimizing Through Configuration Properties
81
5.7.2 Optimizing Query
81
5.8 Subqueries
82
5.8.1 IN and NOT IN in Subqueries
83
5.8.2 EXISTS and NOT EXISTS Clauses
83
5.8.3 ANY, SOME, and ALL Operators with Subqueries
84
5.8.4 UPSERT Using Subqueries
84
5.9 Views
85
5.9.1 Creating Views
85
5.9.2 Dropping Views
86
5.10 Paged Queries
86
5.10.1 LIMIT and OFFSET
87
5.10.2 Row Value Constructor
87
5.11 Summary
88
Chapter 6: Transactions
89
6.1 SQL Transactions
89
6.2 Transaction Properties
89
6.2.1 Atomicity
90
6.2.2 Consistency
90
6.2.3 Isolation
90
6.2.4 Durability
90
6.3 Transaction Control
90
6.3.1 COMMIT
90
6.3.2 ROLLBACK
90
6.3.3 SAVEPOINT
91
6.3.4 SET TRANSACTION
91
6.4 Transactions in HBase
91
6.4.1 Integrating HBase with Transaction Manager
91
6.4.2 Components of Transaction Manager
92
6.4.2.1 TransactionAware Client
92
6.4.2.2 Transaction Manager
92
6.4.2.3 Transaction Processor Coprocessor
93
6.4.3 Transaction Lifecycle
94
6.4.4 Concurrency Control
94
6.4.5 Multiversion Concurrency Control
95
6.4.6 Optimistic Concurrency Control
95
6.5 Apache Tephra As a Transaction Manager
95
6.6 Phoenix Transactions
96
6.6.1 Enabling Transactions for Tables
99
6.6.2 Committing Transactions
99
6.7 Transaction Limitations in Phoenix
100
6.8 Summary
100
Chapter 7: Advanced Phoenix Concepts
101
7.1 Secondary Indexes
101
7.1.1 Global Index
102
7.1.1.1 Immutable Tables
104
7.1.1.1.1 Consistency
105
7.1.1.2 Mutable Tables
106
7.1.1.2.1 Configuration
106
7.1.1.2.2 Consistency
106
7.1.2 Local Index
106
7.1.3 Covered Index
109
7.1.4 Functional Indexes
110
7.1.5 Index Consistency
110
7.2 User Defined Functions
112
7.2.1 Writing Custom User Defined Functions
112
7.2.1.1 Configuration
115
7.2.1.2 Runtime Environment
115
7.3 Phoenix Query Server
116
7.3.1 Download
117
7.3.2 Installation
117
7.3.3 Setup
117
7.3.4 Starting PQS
117
7.3.5 Client
117
7.3.6 Usage
118
7.3.7 Additional PQS Features
119
7.3.7.1 Gotchas
119
7.4 Summary
119
Chapter 8: Integrating Phoenix with Other Frameworks
120
8.1 Hadoop Ecosystem
120
8.2 MapReduce Integration
120
8.2.1 Setup
121
8.3 Apache Spark Integration
124
8.3.1 Setup
125
8.3.2 Reading and Writing Using Dataframe
126
8.4 Apache Hive Integration
127
8.4.1 Setup
127
8.4.2 Table Creation
128
8.5 Apache Pig Integration
129
8.5.1 Setup
129
8.5.2 Accessing Data from Phoenix
129
8.5.3 Storing Data to Phoenix
129
8.6 Apache Flume Integration
130
8.6.1 Setup
130
8.6.2 Configuration
130
8.6.3 Running the Above Setup
131
8.7 Summary
131
Chapter 9: Tools & Tuning
132
9.1 Phoenix Tracing Server
132
9.1.1 Trace
132
9.1.2 Span
133
9.1.3 Span Receivers
133
9.1.4 Setup
133
9.1.4.1 Client Configuration
133
9.1.4.2 Server Configuration
134
9.2 Phoenix Bulk Loading
136
9.2.1 Setup
136
9.2.2 Gotchas
137
9.3 Index Load Async
138
9.4 Pherf
138
9.4.1 Setup to Run the Test
142
9.4.2 Gotchas
143
9.5 Summary
144
Index
145