-
Notifications
You must be signed in to change notification settings - Fork 250
Description
Hi,
I'm using impyla for my project. Specifically I'm using it to read data using the Impala engine and writing data using HIve. The only difference (aside from minor syntax) is the port I have to provide when connecting to the cluster.
I need to write new rows into an existing partitioned table using Hive. My first thought was to use the executemany command and pass to it a string containing the query template and a tuple of tuples with all the data I want to add. Since I was struggling with implementing such method, I did a research and bumped into #96 . There people say that using the executemany is not the best practice for adding new rows into an existing table. They suggest instead to use ibis or writing the data into HDFS and then register the tables with a CREATE statements. I'm not sure the second suggestion would apply for adding data into a tables that already exist, anyway I was wondering if using impyla plus Hive engine for the INSERT INTO table is still discouraged.
What is the suggested way for inserting data into an existing Hive table?
Also what's wrong with my code below?
# prepare query
yesterday = datetime.date.today() - datetime.timedelta(days=1)
query = """
SET hive.exec.dynamic.partition.mode=nonstrict
INSERT INTO db_name.table_name
PARTITION (year, month, day)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s);
"""
log_datetime = datetime.datetime.now()
rows = tuple([(str(yesterday), "NULL",
str(i.campaign), str(i.adgroup),
str(i.id), '1', '1', str(yesterday.year),
str(yesterday.month), str(yesterday.day))
for i in target.values()])
conn = connect(host=hostname, port=port, auth_mechanism="PLAIN")
cursor = conn.cursor()
cursor.executemany(query, rows)
The program runs without returning any error message but I cannot see the new rows added to the table in Hive.