Difference between Partitioning & bucketing
similarity
divide the data into many parts and then scan only one part of it.
differences
partition is a folder and bucket is a file
select * from orders where state = ?
go with partitioning when there are less number of distinct values in the column.
cardinality( unique/distinct values that the column) of the column is low.
The number of partitions will be equal to the number of distinct values.
when to use partition or bucketing?
when you have the distinct/unique values less then use the partition. For example in the state column you have a few states.
when you have the distinct/unique values more then use buckting. For example, if you choose the id column, it has multiple distinct values.
[…] partition vs buckting […]