Hive

Partitioning in Hive

  • Partition improves query performance
  • The way Hive structures data storage changes with Partitioning
  • Partitions are stored as sub-directories in the table directory
  • Over Partitioning to be avoided
    – Each partition creates an HDFS directory with many files in it
    – It increases large number of small sized files in HDFS
    – It eventually consume the capacity of namenode as the metadata is kept in main memory by Hadoop
  • Use a partition scheme that creates partitions with size in multiples of the HDFS block size
  • Hive supports dynamic partitions also where partitions are created from query parameters
  • Static partitions are created by the ‘PARTITIONED BY’ clause
  • Dynamic partitions are not enabled by default and if enabled , it works in ‘strict’ mode.
  • The maximum number of dynamic partitions are limited by default
  • Dynamic and static partitions can be mixed
  • – INSERT OVERWRITE TABLE employees PARTITION (country = ‘IN’, state)
  • SELECT …, se.cnty, se.st FROM staged_employees se WHERE se.cnty = ‘IN’’;

Static Partitions

create table emplocpartitions ( empno int, empname string , empsal float )
PARTITIONED BY ( emploc STRING) ;

INSERT OVERWRITE TABLE EMPLOCPARTITIONS
PARTITION (emploc=’TVM’)
select empno,empname,empsal from emplocations
where emploc LIKE ‘TVM’;

Dynamic Partitions

set hive.exec.dynamic.partition.mode=nonstrict
set hive.exec.dynamic.partition=true

insert overwrite table emplocpartitions
PARTITION ( emploc )
Select e.empno,e.empname,e.empsal,e.emploc from emplocations e;

– Setting the property hive.mapred.mode=strict; prohibits executing
queries with out WHERE clause on partitioned tables

Share This Post

An Ambivert, music lover, enthusiast, artist, designer, coder, gamer, content writer. He is Professional Software Developer with hands-on experience in Spark, Kafka, Scala, Python, Hadoop, Hive, Sqoop, Pig, php, html,css. Know more about him at www.24tutorials.com/sai

Lost Password

Register

24 Tutorials