Difference between Partitioning & bucketing

similarity

divide the data into many parts and then scan only one part of it.

differences

partition is a folder and bucket is a file

select * from orders where state = ?

go with partitioning when there are less number of distinct values in the column.

cardinality( unique/distinct values that the column) of the column is low.

The number of partitions will be equal to the number of distinct values.

when to use partition or bucketing?

when you have the distinct/unique values less then use the partition. For example in the state column you have a few states.

when you have the distinct/unique values more then use buckting. For example, if you choose the id column, it has multiple distinct values.

One thought on “partition vs buckting”

Leave a Reply

Your email address will not be published. Required fields are marked *