Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
When you create a table in Hive, by default Hive will manage the data, which means that Hive moves the data into its warehouse directory. Alternatively, you may create an external table, which tells Hive to refer to the data that is at an existing location outside the warehouse directory.
The difference between the two types of table is seen in the
LOAD and DROP semantics. Let’s consider a managed table first.
When you load data into a managed table, it is moved into Hive’s warehouse directory. For example:
CREATE TABLE managed_table (dummy STRING); LOAD DATA INPATH '/user/tom/data.txt' INTO table managed_table;
will move the file
hdfs://user/tom/data.txt into Hive’s warehouse directory
for the managed_table table, which is
hdfs://user/hive/warehouse/managed_table.[98]