Hive

Bucketing in Hive

• Bucketing decomposes data sets into more manageable parts
• Users can specify the number of buckets for their data set
• Specifying bucketing does not guarantee that table is properly populated
• The number of bucket does not vary with data
• Bucketing is best suited for sampling
• Map-side joins can be done well with bucketing

In the below sample code , a hash function will be done on the  ‘emplid’ and similar ids will be placed in the same bucket

SET hive.enforce.bucketing = true; or
Set mapred.reduce.tasks = <<number of buckets>>

CREATE TABLE empdata(emplid INT, fname STRING, lname STRING)
PARTITIONED BY (join_dt STRING)
CLUSTERED BY (emplid) INTO 64 BUCKETS;

Share This Post

An Ambivert, music lover, enthusiast, artist, designer, coder, gamer, content writer. He is Professional Software Developer with hands-on experience in Spark, Kafka, Scala, Python, Hadoop, Hive, Sqoop, Pig, php, html,css. Know more about him at www.24tutorials.com/sai

Lost Password

Register

24 Tutorials