Pro Apache Phoenix - An SQL Driver for HBase

von: Shakil Akhtar, Ravi Magham

Apress, 2016

ISBN: 9781484223703 , 148 Seiten

Format: PDF, Online Lesen

Kopierschutz: Wasserzeichen

Mac OSX,Windows PC für alle DRM-fähigen eReader Apple iPad, Android Tablet PC's Online-Lesen für: Mac OSX,Linux,Windows PC

Preis: 28,88 EUR

eBook anfordern eBook anfordern

Mehr zum Inhalt

Pro Apache Phoenix - An SQL Driver for HBase


 

Contents at a Glance

4

Contents

5

About the Authors

13

About the Technical Reviewers

14

Chapter 1: Introduction

15

1.1 Big Data Lake and Its Representation

16

1.2 Modern Applications and Big Data

17

1.2.1 Fraud Detection in Banking

17

1.2.2 Log Data Analysis

17

1.2.3 Recommendation Engines

18

1.2.3.1 Social Media Analysis

18

1.3 Analyzing Big Data

18

1.4 An Overview of Hadoop and MapReduce

19

1.5 Hadoop Ecosystem

19

1.5.1 HDFS

20

1.5.2 MapReduce

21

1.5.3 HBase

23

1.5.4 Hive

24

1.5.5 YARN

25

1.5.6 Spark

25

1.5.7 PIG

25

1.5.8 ZooKeeper

25

1.6 Phoenix in the Hadoop Ecosystem

26

1.7 Phoenix’s Place in Big Data Systems

26

1.8 Importance of Traditional SQL-Based Tools and the Role of Phoenix

26

1.8.1 Traditional DBA Problems for Big Data Systems-

27

1.8.2 Which Tool Should I Use for Big Data?

27

1.8.3 Massive Data Storage and Challenges

27

1.8.4 A Traditional Data Warehouse and Querying

27

1.9 Apache Phoenix in Big Data Analytics

28

1.10 Summary

28

Chapter 2: Using Phoenix

29

2.1 What is Apache Phoenix?

29

2.2 Architecture

30

2.2.1 Installing Apache Phoenix

31

2.2.2 Installing Java

31

2.2.2.1 Installing Java on Linux

31

2.2.2.2 Installing Java on Mac OS X

32

2.3 Installing HBase

32

2.4 Installing Apache Phoenix

33

2.5 Installing Phoenix on Hortonworks HDP

34

2.5.1 Downloading Hortonworks Sandbox

35

2.5.2 Start HBase

41

2.5.3 Testing Your Phoenix Installation

42

2.6 Installing Phoenix on Cloudera Hadoop

44

2.7 Capabilities

45

2.8 Hadoop Ecosystem and the Role of Phoenix

46

2.9 Brief Description of Phoenix’s Key Features

47

2.9.1 Transactions

47

2.9.2 User-Defined Functions

47

2.9.3 Secondary Indexes

48

2.9.4 Skip Scan

48

2.9.5 Views

48

2.9.6 Multi-Tenancy

48

2.9.7 Query Server

49

2.10 Summary

49

Chapter 3: CRUD with Phoenix

50

3.1 Data Types in Phoenix

50

3.1.1 Primitive Data Types

50

3.1.2 Complex Data Types

50

3.2 Data Model

51

3.2.1 Steps in data modeling

52

3.3 Phoenix Write Path

52

3.4 Phoenix Read Path

52

3.5 Basic Commands

52

3.5.1 HELP

53

3.5.2 CREATE

54

3.5.3 UPSERT

54

3.5.4 SELECT

54

3.5.5 ALTER

55

3.5.6 DELETE

55

3.5.7 DESCRIBE

55

3.5.8 LIST

56

3.6 Working with Phoenix API

56

3.6.1 Environment setup

56

3.7 Summary

62

Chapter 4: Querying Data

63

4.1 Constraints

63

4.1.1 NOT NULL

63

4.2 Creating Tables

64

4.3 Salted Tables

65

4.4 Dropping Tables

67

4.5 ALTER Tables

67

4.5.1 Adding Columns

68

4.5.2 Deleting or Replacing Columns

68

4.5.3 Renaming a Column

69

4.6 Clauses

69

4.6.1 LIMIT

69

4.6.2 WHERE

70

4.6.3 GROUP BY

70

4.6.4 HAVING

71

4.6.5 ORDER BY

71

4.7 Logical Operators

72

4.7.1 AND

72

4.7.2 OR

72

4.7.3 IN

72

4.7.4 LIKE

73

4.7.5 BETWEEN

73

4.8 Summary

73

Chapter 5: Advanced Querying

74

5.1 Joins

74

5.2 Inner Join

74

5.3 Outer Join

75

5.3.1 Left Outer Join

75

5.3.2 Right Outer Join

76

5.3.3 Full Outer Join

77

5.4 Grouped Joins

78

5.5 Hash Join

79

5.6 Sort Merge Join

80

5.7 Join Query Optimizations

80

5.7.1 Optimizing Through Configuration Properties

81

5.7.2 Optimizing Query

81

5.8 Subqueries

82

5.8.1 IN and NOT IN in Subqueries

83

5.8.2 EXISTS and NOT EXISTS Clauses

83

5.8.3 ANY, SOME, and ALL Operators with Subqueries

84

5.8.4 UPSERT Using Subqueries

84

5.9 Views

85

5.9.1 Creating Views

85

5.9.2 Dropping Views

86

5.10 Paged Queries

86

5.10.1 LIMIT and OFFSET

87

5.10.2 Row Value Constructor

87

5.11 Summary

88

Chapter 6: Transactions

89

6.1 SQL Transactions

89

6.2 Transaction Properties

89

6.2.1 Atomicity

90

6.2.2 Consistency

90

6.2.3 Isolation

90

6.2.4 Durability

90

6.3 Transaction Control

90

6.3.1 COMMIT

90

6.3.2 ROLLBACK

90

6.3.3 SAVEPOINT

91

6.3.4 SET TRANSACTION

91

6.4 Transactions in HBase

91

6.4.1 Integrating HBase with Transaction Manager

91

6.4.2 Components of Transaction Manager

92

6.4.2.1 TransactionAware Client

92

6.4.2.2 Transaction Manager

92

6.4.2.3 Transaction Processor Coprocessor

93

6.4.3 Transaction Lifecycle

94

6.4.4 Concurrency Control

94

6.4.5 Multiversion Concurrency Control

95

6.4.6 Optimistic Concurrency Control

95

6.5 Apache Tephra As a Transaction Manager

95

6.6 Phoenix Transactions

96

6.6.1 Enabling Transactions for Tables

99

6.6.2 Committing Transactions

99

6.7 Transaction Limitations in Phoenix

100

6.8 Summary

100

Chapter 7: Advanced Phoenix Concepts

101

7.1 Secondary Indexes

101

7.1.1 Global Index

102

7.1.1.1 Immutable Tables

104

7.1.1.1.1 Consistency

105

7.1.1.2 Mutable Tables

106

7.1.1.2.1 Configuration

106

7.1.1.2.2 Consistency

106

7.1.2 Local Index

106

7.1.3 Covered Index

109

7.1.4 Functional Indexes

110

7.1.5 Index Consistency

110

7.2 User Defined Functions

112

7.2.1 Writing Custom User Defined Functions

112

7.2.1.1 Configuration

115

7.2.1.2 Runtime Environment

115

7.3 Phoenix Query Server

116

7.3.1 Download

117

7.3.2 Installation

117

7.3.3 Setup

117

7.3.4 Starting PQS

117

7.3.5 Client

117

7.3.6 Usage

118

7.3.7 Additional PQS Features

119

7.3.7.1 Gotchas

119

7.4 Summary

119

Chapter 8: Integrating Phoenix with Other Frameworks

120

8.1 Hadoop Ecosystem

120

8.2 MapReduce Integration

120

8.2.1 Setup

121

8.3 Apache Spark Integration

124

8.3.1 Setup

125

8.3.2 Reading and Writing Using Dataframe

126

8.4 Apache Hive Integration

127

8.4.1 Setup

127

8.4.2 Table Creation

128

8.5 Apache Pig Integration

129

8.5.1 Setup

129

8.5.2 Accessing Data from Phoenix

129

8.5.3 Storing Data to Phoenix

129

8.6 Apache Flume Integration

130

8.6.1 Setup

130

8.6.2 Configuration

130

8.6.3 Running the Above Setup

131

8.7 Summary

131

Chapter 9: Tools & Tuning

132

9.1 Phoenix Tracing Server

132

9.1.1 Trace

132

9.1.2 Span

133

9.1.3 Span Receivers

133

9.1.4 Setup

133

9.1.4.1 Client Configuration

133

9.1.4.2 Server Configuration

134

9.2 Phoenix Bulk Loading

136

9.2.1 Setup

136

9.2.2 Gotchas

137

9.3 Index Load Async

138

9.4 Pherf

138

9.4.1 Setup to Run the Test

142

9.4.2 Gotchas

143

9.5 Summary

144

Index

145