Partitioning in Apache Hive

Hive is a good tool for performing queries on large datasets — especially datasets that require full table scans. But quite often, there are instances in which users need to filter the data on specific column values. And that’s where partitioning comes into play. A partition is nothing but a directory that contains the chunk of data. When we do partitioning, we create a partition for each unique value of the column.

Let’s run a simple example to see what it is. The syntax to create a partition table is:


via Feed

May 16, 2017 at 10:30AM


