Call us on: +91 860-099-8107, Email: contactnull@technogeekscsnull.co.in, Demo Sessions :- Coming Saturday : We are Conducting Demo Sessions on : | Hadoop, BigData, Spark And Scala, Data Science And Machine Learning, : Cloud Computing AWS (Amazon Web Services), Data warehouse, ETL Testing and Informatica | , Please Call Us To Confirm Timings And Book Your Demo Seat On: : 8600998107 |
apache hive tutorial

apache hive tutorial

APACHE HIVE

To start hive  just write hive on terminal  type:

sudo hive 

or 

hive

 

To check the database in hive type

show databases;

 

To create database in hive

create database Technogeeks

 

To use the database

use Technogeeks;

 

Suppose Technogeeks  is already exist in databases then it will throw an error like database already exists.

 

After creating the database create table in selected database;

 

Create table in Hive:

We can create a table.

Syntax:

CREATE TABLE [IF NOT EXISTS] table_name

(col_name1 data_type, col_name2 data_type)

ROW FORMAT row_format

FIELDS TERMINATED BY ',' // TO define the delimiter  of input file

STORED AS file_format;

 

EXAMPLE:

create table emp (e_id int,e_name string,e_sal int,e_dept int)

row format delimited

fields terminated by ','

stored as textfile;

 

 

LOAD YOUR  INPUT FILE IN TABLE

Syntax:

LOAD DATA INPATH '/PATH' into table table name

hive> LOAD DATA    INPATH '/home/cloudera/emp.txt/' into table emp;

 

To  see the content of file

select * from emp

 

 

 

Queries:

Select….From:

Select Column1,Column2 From Table_Name;

Example:

select e_id, e_name from emp;

 

 

Select….Where:

Syntax:

SELECT * FROM Table_Name WHERE Condition;

Example:

select * from emp where sal >= 10000;

 

 

Select….ORDER BY:

Syntax:

SELECT Column1,Column2 FROM Table_Name ORDER BY Column_name;

Example:

select e_id,e_name,e_sal from emp ORDER BY e_id DESC;

 

 

 

 

Select….GROUP BY:

Syntax:

SELECT Column1,Column2 FROM Table_Name GROUP BY Column_name;

Example: select count(*) from emp GROUP by eid

 

JOIN

 

JOIN is a clause that is used for combining specific fields from two tables by using values common to each one.

We have two different table to performed join operation

CUSTOMERS

+----+----------+-----+-----------+----------+ 

| ID | NAME     | AGE | ADDRESS   | SALARY   | 

+----+----------+-----+-----------+----------+ 

| 1  | Aksash  | 32  | Ahmedabad | 2000.00  |  

| 2  | vikash   | 25  | Delhi     | 1500.00  |  

| 3  | Ashish  | 23  | Kota      | 2000.00  | 

+----+----------+-----+-----------+----------+

 

ORDERS

+-----+---------------------+-------------+--------+ 

|OID  | DATE                | CUSTOMER_ID | AMOUNT | 

+-----+---------------------+-------------+--------+ 

| 102 | 2009-10-08 00:00:00 |           3 | 3000   | 

| 100 | 2009-10-08 00:00:00 |           3 | 1500   | 

| 101 | 2009-11-20 00:00:00 |           2 | 1560   | 

| 103 | 2008-05-20 00:00:00 |           4 | 2060   | 

+-----+---------------------+-------------+--------+

 

 

First create the customer and order tables and load the data.

Create customer table and load the data into it

create table customer (id int,name string,age int,address string,sal int)

row format delimited

FIELDS TERMINATED BY '\t'

stored as textfile;

 

 

Load the data into table:

load data local inpath '/home/cloudera/Desktop/customers/' overwrite  into table customer;

 

 

Create order table and load the data 

create table c_order (oid int, o_date string,customer_id, int,amount int)

row format delimited

FIELDS TERMINATED BY '\t'

stored as textfile;

 

 

 

 

Load the data into table:

load data local inpath '/home/cloudera/Desktop/order/' overwrite  into table c_order;

 

JOIN QUERY

SELECT c.ID, c.NAME, c.AGE, o.AMOUNT 

   FROM CUSTOMERS c JOIN ORDERS o 

    ON (c.ID = o.CUSTOMER_ID);

 

 

LEFT OUTER JOIN

 

The HiveQL LEFT OUTER JOIN returns all the rows from the left table, even if there are no matches in the right table.

EXAMPLE

SELECT c.ID, c.NAME, c.AGE, o.AMOUNT 

FROM customer c LEFT OUTER JOIN c_order o 

ON (c.ID = o.customer_id);

RIGHT OUTER JOIN:

The HiveQL RIGHT OUTER JOIN returns all the rows from the right table, even if there are no matches in the left table.

SELECT c.ID, c.NAME, c.AGE, o.AMOUNT ,o.o_date

FROM customer c RIGHT OUTER JOIN c_order o 

ON (c.ID = o.customer_id)

 

 

 

FULL  OUTER JOIN:

The HiveQL FULL OUTER JOIN combines the records of both the left and the right outer tables that fulfil the JOIN condition. The joined table contains either all the records from both the tables, or fills in NULL values for missing matches on either side.

SELECT c.ID, c.NAME, c.AGE, o.AMOUNT ,o.o_date

FROM customer c FULL OUTER JOIN c_order o 

ON (c.ID = o.customer_id)

 

 

Hive - Partitioning

Hive organizes tables into partitions. It is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.

There are two type of partition:

 

 

 

Static partititon:

static partitions are preferred. That saves your time in loading data compared to dynamic partition. You "statically" add a partition in table and move the file into the partition of the table. Since the files are big they are usually generated in HDFS.

EXAMPLE

1.CREATE A TABLE

 

CREATE TABLE India_my (

OFFICE_NAME STRING,

OFFICE_STATUS     STRING,

TELEPHONE   BIGINT,

TALUK       STRING,

DISTRICT    STRING,

POSTAL_DIVISION   STRING,

POSTAL_REGION     STRING,

POSTAL_CIRCLE     STRING

)

PARTITIONED BY (STATE   STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS TEXTFILE; 

 

LOAD THE INPUT FILE

hive> LOAD DATA LOCAL INPATH '/home/cloudera/Desktop/state' INTO TABLE India_my

    partition(STATE='madhyapradash');

 

NOTE: In static partition you have to declare the partition

Dynamic Partition

It designed to solve this problem by dynamically determining which partitions should be created and p0opulated while scanning the input table.

 

Create temp_india table

CREATE TABLE temp_India (

    OFFICE_NAME STRING,

     OFFICE_STATUS     STRING,

     PINCODE           INT,

     TELEPHONE   BIGINT,

     TALUK       STRING,

     DISTRICT    STRING,

     STATE       STRING,

     POSTAL_DIVISION   STRING,

     POSTAL_REGION     STRING,

     POSTAL_CIRCLE     STRING

     )

     ROW FORMAT DELIMITED

     FIELDS TERMINATED BY ','

     STORED AS TEXTFILE;

 

 

Load your input file

load data local inpath '/home/cloudera/Desktop/state' into table temp_India;

 

. 

Create partitioned table

CREATE TABLE India (

    OFFICE_NAME STRING,

     OFFICE_STATUS     STRING,

     PINCODE           INT,

     TELEPHONE   BIGINT,

     TALUK       STRING,

     DISTRICT    STRING,

     POSTAL_DIVISION   STRING,

     POSTAL_REGION     STRING,

     POSTAL_CIRCLE     STRING

     )

     PARTITIONED BY (STATE   STRING)

     ROW FORMAT DELIMITED

     FIELDS TERMINATED BY ','

     STORED AS TEXTFILE;

 

 

Instruct hive to dynamically load partitions

 

SET hive.exec.dynamic.partition = true;

SET hive.exec.dynamic.partition.mode = nonstrict;

 

LOAD PARTITION

INSERT OVERWRITE TABLE India PARTITION (STATE) select * from temp_India

 

 

Output

 

 

Enquiry Now !