My Standalone Complexities: February 2013

Today's Chennai Hadoop User Group dealt with basic introduction to Apache Hive. It was a full fledged tutorial session from +Senthil Kumar and +Prasad S with demos on Basic Queries and selecting top 500 songs by popularity on 1 Million Song Dataset respectively.

Version used for demo was Hive 0.9.0 (http://www.apache.org/dyn/closer.cgi/hive/hive-0.9.0/)

Hive and Why?

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

Hive Interfaces

Hive comes with three (default) interfaces to work with it.

CLI - Command Line Interface, most widely used interface for working with Hive.
$ bin/hive
HWI - Hive Web Interface,
$ bin/hive --service hwi
Server - Hive Server to be used as JDBC backend.

More Resources on Hive

https://cwiki.apache.org/confluence/display/Hive/Presentations -- List of Presentations based on Apache Hive. Contains almost everything you want about hive.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual -- Hive Language Manual

My Standalone Complexities

Saturday, February 16, 2013

SQL Processing using Hive - Chennai HUG, Feb '13

Hive and Why?

Hive Interfaces

More Resources on Hive