Skip to content

Read metadata from JDBC ResultSet.metaData instead of DatabaseMetaData (or allow both) in readSqlQuery() #1351

@ThomasD7x

Description

@ThomasD7x

Read metadata from JDBC ResultSet.metaData instead of DatabaseMetaData (or allow both) in readSqlQuery()

  • to solve Teradata-JDBC performance problem
  • to collect metaData for simple sql joins for Teradata-JDBC (and other JDBC drivers?)

(I am new here and this is my first issue.
I hope it helps and you can use my solution.)

My problem:

fun getTableColumnsMetadata() in readJdbc.kt reads the metadata from DatabaseMetaData for each column.
This causes Teradata-JDBC to execute an extremely slow query for each column.
Ultimately, this leads to a timeout when there are too many columns.
Teradata's DatabaseMetaData.getColumns() is useful only
for collecting metadata for a set of columns in one call.

Here are the details of my setup:

I tested DataFrame-Beta2 with Teradata JDBC: 20.00.00.46.
As usual, I created a custom DbType object.

    public object MyTeradb : DbType("teradata") {
        override val driverClassName: String 
            get() = "com.teradata.jdbc.TeraDriver"
    
    // ...  dummies from the example for HSQLDB
    
    }

I tried reading data from an SQL table with about 90 columns.

    val teraSql = "select top 100 * from my_tab_with90cols"
    val df = DataFrame.readSqlQuery( conn, teraSql, MyTeradb, strictValidation = false )

My solution:

I created a modified version of fun getTableColumnsMetadata(),
so that it reads metadata from the ResultSet object,
This works perfectly for Teradata JDBC 20.00.00.46.

I commented out the old code so you can compare the two versions.

As you can see, only the isNullable flag is collected
from DatabaseMetaData in the old version.

The isNullable information is available in ResultSetMetaData

//java.sql.ResultSetMetaData
public abstract int isNullable(int column);

I used that function to retrieve the isNullable information:

/**
 * Retrieves the metadata of the columns in the result set.
 * 
 * metadata are collected from resultSet.metaData 
 * to avoid DataBaseMetaData-queries per column
 * (queries per column are a problem for teradata-jdbc e.g.)
 *
 * @param rs the result set
 * @return a mutable list of [TableColumnMetadata] objects,
 *         where each TableColumnMetadata object contains information such as the column type,
 *         JDBC type, size, and name.
 */
private fun getTableColumnsMetadata(rs: ResultSet): MutableList<TableColumnMetadata> {
    val metaData: ResultSetMetaData = rs.metaData
    val numberOfColumns: Int = metaData.columnCount
    val tableColumns = mutableListOf<TableColumnMetadata>()
    val columnNameCounter = mutableMapOf<String, Int>()
    //val databaseMetaData: DatabaseMetaData = rs.statement.connection.metaData
    //val catalog: String? = rs.statement.connection.catalog.takeUnless { it.isNullOrBlank() }
    //val schema: String? = rs.statement.connection.schema.takeUnless { it.isNullOrBlank() }

    for (i in 1 until numberOfColumns + 1) {
        //val tableName = metaData.getTableName(i)
        val columnName = metaData.getColumnName(i)

        // orig versions queries databaseMetaData to get isNullable.
        // this does not work on teradata for tables with many columns.

        //  // this algorithm works correctly only for SQL Table and ResultSet opened on one SQL table
        //val columnResultSet: ResultSet =
        //    databaseMetaData.getColumns(catalog, schema, tableName, columnName)
        //val isNullable = if (columnResultSet.next()) {
        //    columnResultSet.getString("IS_NULLABLE") == "YES"
        //} else {
        //    true // we assume that it's nullable by default
        //}
        
        // for teradata-jdbc: use isNullable() from metaData
        val isNullable = when (metaData.isNullable(i)) {
            ResultSetMetaData.columnNullable -> true
            ResultSetMetaData.columnNullableUnknown -> true
            ResultSetMetaData.columnNoNulls -> false
            else -> true // default to nullable
        }


        val name = manageColumnNameDuplication(columnNameCounter, columnName)
        val size = metaData.getColumnDisplaySize(i)
        val type = metaData.getColumnTypeName(i)
        val jdbcType = metaData.getColumnType(i)
        val javaClassName = metaData.getColumnClassName(i)

        tableColumns += TableColumnMetadata(name, type, jdbcType, size, javaClassName, isNullable)
    }
    return tableColumns
}

This modification works perfectly with my Teradata JDBC setup.
Now, I can execute simple queries or joins
and collect the data in a data frame.

(Teradata-JDBC reports the nullable information correctly in joins if there are no subqueries.)

However, there may be side effects with other databases
if they don't support ResultSetMetaData.isNullable(int column)
as needed.

One option is to introduce a boolean flag like
useMetaDataFromResultSet = false in DataFrame.readSqlQuery(...)
to maintain the previous version.

Metadata

Metadata

Assignees

Labels

databasesJDBC related issues

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions