- Partition improves query performance
- The way Hive structures data storage changes with Partitioning
- Partitions are stored as sub-directories in the table directory
- Over Partitioning to be avoided
– Each partition creates an HDFS directory with many files in it
– It increases large number of small sized files in HDFS
– It eventually consume the capacity of namenode as the metadata is kept in main memory by Hadoop - Use a partition scheme that creates partitions with size in multiples of the HDFS block size
- Hive supports dynamic partitions also where partitions are created from query parameters
- Static partitions are created by the ‘PARTITIONED BY’ clause
- Dynamic partitions are not enabled by default and if enabled , it works in ‘strict’ mode.
- The maximum number of dynamic partitions are limited by default
- Dynamic and static partitions can be mixed
- – INSERT OVERWRITE TABLE employees PARTITION (country = ‘IN’, state)
- SELECT …, se.cnty, se.st FROM staged_employees se WHERE se.cnty = ‘IN’’;
Static Partitions
create table emplocpartitions ( empno int, empname string , empsal float )
PARTITIONED BY ( emploc STRING) ;
INSERT OVERWRITE TABLE EMPLOCPARTITIONS
PARTITION (emploc=’TVM’)
select empno,empname,empsal from emplocations
where emploc LIKE ‘TVM’;
Dynamic Partitions
set hive.exec.dynamic.partition.mode=nonstrict
set hive.exec.dynamic.partition=true
insert overwrite table emplocpartitions
PARTITION ( emploc )
Select e.empno,e.empname,e.empsal,e.emploc from emplocations e;
– Setting the property hive.mapred.mode=strict; prohibits executing
queries with out WHERE clause on partitioned tables