Saturday, February 16, 2013

SQL Processing using Hive - Chennai HUG, Feb '13

Today's Chennai Hadoop User Group dealt with basic introduction to Apache Hive. It was a full fledged tutorial session from +Senthil Kumar and +Prasad S with demos on Basic Queries and selecting top 500 songs by popularity on 1 Million Song Dataset respectively.

Version used for demo was Hive 0.9.0 (http://www.apache.org/dyn/closer.cgi/hive/hive-0.9.0/)

Hive  and Why?

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

Hive Interfaces

Hive comes with three (default) interfaces to work with it.

  1. CLI - Command Line Interface, most widely used interface for working with Hive.
    $ bin/hive
  2. HWI - Hive Web Interface,
    $ bin/hive --service hwi
  3. Server - Hive Server to be used as JDBC backend. 

More Resources on Hive