Skip to content

Best practice when inserting data to existing table via impyla #213

@IamGianluca

Description

@IamGianluca

Hi,

I'm using impyla for my project. Specifically I'm using it to read data using the Impala engine and writing data using HIve. The only difference (aside from minor syntax) is the port I have to provide when connecting to the cluster.

I need to write new rows into an existing partitioned table using Hive. My first thought was to use the executemany command and pass to it a string containing the query template and a tuple of tuples with all the data I want to add. Since I was struggling with implementing such method, I did a research and bumped into #96 . There people say that using the executemany is not the best practice for adding new rows into an existing table. They suggest instead to use ibis or writing the data into HDFS and then register the tables with a CREATE statements. I'm not sure the second suggestion would apply for adding data into a tables that already exist, anyway I was wondering if using impyla plus Hive engine for the INSERT INTO table is still discouraged.

What is the suggested way for inserting data into an existing Hive table?

Also what's wrong with my code below?

# prepare query 
yesterday = datetime.date.today() - datetime.timedelta(days=1)
query = """
             SET hive.exec.dynamic.partition.mode=nonstrict
             INSERT INTO db_name.table_name
             PARTITION (year, month, day)
             VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s);
             """
log_datetime = datetime.datetime.now()
rows = tuple([(str(yesterday), "NULL",
                      str(i.campaign), str(i.adgroup),
                      str(i.id), '1', '1', str(yesterday.year),
                      str(yesterday.month), str(yesterday.day))
                     for i in target.values()])

conn = connect(host=hostname, port=port, auth_mechanism="PLAIN")
cursor = conn.cursor()
cursor.executemany(query, rows)

The program runs without returning any error message but I cannot see the new rows added to the table in Hive.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions