Hive Tutorial For Beginners : Learn Hive Basics in 10 Minutes
Hive is a ‘Data Warehousing Solution’ for Hadoop. It is a key component of Hadoop. It does processing on semi-structured & structured data in Hadoop. Hive enabled multiple Reporting and Data Analytics tools usage on the top of HDFS. Hive uses internally Hadoop storage as data storage that is Hadoop Distributed File System and also allows to write the queries using Hive Query Language (HQL). Hive also allows query customization by applying user defined functions and we can save data in different file formats like Text File Format, Parquet File Format, AVRO File Format, ORC File Format etc. more details mentioned below that will help to beginners for sure.
Three main functions of Hive :
- Ad-hoc analysis
- Data summarization
- Data query
The language which Hive uses is HiveQL (HQL) which resembles SQL which makes it easier for the SQL developers to understand Hive Queries.
Hive is very scalable, highly extensible & fast. Now business analysts can also handle Big Data with hive to generate insights easily
Major parts of the Hive architecture are:
- Command Line Interface, UI, and Thrift Server
It provides interface for users to submit HQL queries which reduces complications of Map Reduce. HQL queries are converted into MapReduce jobs, Spark jobs, Apache Tez jobs by Hive with no necessity to write complicated jobs. All three execution engines could run in YARN (Hadoop's Resource Negotiator) as well. There are different modes to submit Spark jobs that you would find in Spark blog separately written on Technogeeks’ site
With its help numerous jobs on the Hadoop cluster can be run by multiple users simultaneously. Analysis of big datasets stored in Amazon S3 file system as well as in HDFS is supported by Hive.
You can also subscribe on our YouTube channel to get Videos update and can contact us to get details about FREE Seminar on multiple Technologies on Weekends in Pune Location (India)