v0.1.6
- Added serverless validation using lsql library (#176). Workspaceclient object is used with
product
name andproduct_version
along with correspondingcluster_id
orwarehouse_id
assdk_config
inMorphConfig
object. - Enhanced install script to enforce usage of a warehouse or cluster when
skip-validation
is set toFalse
(#213). In this release, the installation process has been enhanced to mandate the use of a warehouse or cluster when theskip-validation
parameter is set toFalse
. This change has been implemented across various components, including the install script,transpile
function, andget_sql_backend
function. Additionally, new pytest fixtures and methods have been added to improve test configuration and resource management during testing. Unit tests have been updated to enforce usage of a warehouse or cluster when theskip-validation
flag is set toFalse
, ensuring proper resource allocation and validation process improvement. This development focuses on promoting a proper setup and usage of the system, guiding new users towards a correct configuration and improving the overall reliability of the tool. - Patch subquery with json column access (#190). The open-source library has been updated with new functionality to modify how subqueries with JSON column access are handled in the
snowflake.py
file. This change includes the addition of a check for an opening parenthesis after theFROM
keyword to detect and break loops when a subquery is found, as opposed to a table name. This improvement enhances the handling of complex subqueries and JSON column access, making the code more robust and adaptable to different query structures. Additionally, a new test method,test_nested_query_with_json
, has been introduced to thetests/unit/snow/test_databricks.py
file to test the behavior of nested queries involving JSON column access when using a Snowflake dialect. This new method validates the expected output of a specific nested query when it is transpiled to Snowflake's SQL dialect, allowing for more comprehensive testing of JSON column access and type casting in Snowflake dialects. The existingtest_delete_from_keyword
method remains unchanged. - Snowflake
UPDATE FROM
to DatabricksMERGE INTO
implementation (#198). - Use Runtime SQL backend in Notebooks (#211). In this update, the
db_sql.py
file in thedatabricks/labs/remorph/helpers
directory has been modified to support the use of the Runtime SQL backend in Notebooks. This change includes the addition of a newRuntimeBackend
class in thebackends
module and an import statement foros
. Theget_sql_backend
function now returns aRuntimeBackend
instance when theDATABRICKS_RUNTIME_VERSION
environment variable is present, allowing for more efficient and secure SQL statement execution in Databricks notebooks. Additionally, a new test case for theget_sql_backend
function has been added to ensure the correct behavior of the function in various runtime environments. These enhancements improve SQL execution performance and security in Databricks notebooks and increase the project's versatility for different use cases. - Added Issue Templates for bugs, feature and config (#194). Two new issue templates have been added to the project's GitHub repository to improve issue creation and management. The first template, located in
.github/ISSUE_TEMPLATE/bug.yml
, is for reporting bugs and prompts users to provide detailed information about the issue, including the current and expected behavior, steps to reproduce, relevant log output, and sample query. The second template, added under the path.github/ISSUE_TEMPLATE/config.yml
, is for configuration-related issues and includes support contact links for general Databricks questions and Remorph documentation, as well as fields for specifying the operating system and software version. A new issue template for feature requests, named "Feature Request", has also been added, providing a structured format for users to submit requests for new functionality for the Remorph project. These templates will help streamline the issue creation process, improve the quality of information provided, and make it easier for the development team to quickly identify and address bugs and feature requests. - Added Databricks Source Adapter (#185). In this release, the project has been enhanced with several new features for the Databricks Source Adapter. A new
engine
parameter has been added to theDataSource
class, replacing the originalsource
parameter. The_get_secrets
and_get_table_or_query
methods have been updated to use theengine
parameter for key naming and handling queries with aselect
statement differently, respectively. A Databricks Source Adapter for Oracle databases has been introduced, which includes a newOracleDataSource
class that provides functionality to connect to an Oracle database using JDBC. A Databricks Source Adapter for Snowflake has also been added, featuring theSnowflakeDataSource
class that handles data reading and schema retrieval from Snowflake. TheDatabricksDataSource
class has been updated to handle data reading and schema retrieval from Databricks, including a newget_schema_query
method that generates the query to fetch the schema based on the provided catalog and table name. Exception handling for reading data and fetching schema has been implemented for all new classes. These changes provide increased flexibility for working with various data sources, improved code maintainability, and better support for different use cases. - Added Threshold Query Builder (#188). In this release, the open-source library has added a Threshold Query Builder feature, which includes several changes to the existing functionality in the data source connector. A new import statement adds the
re
module for regular expressions, and new parameters have been added to theread_data
andget_schema
abstract methods. The_get_jdbc_reader_options
method has been updated to accept aoptions
parameter of type "JdbcReaderOptions", and a new static method, "_get_table_or_query", has been added to construct the table or query string based on provided parameters. Additionally, a new class, "QueryConfig", has been introduced in the "databricks.labs.remorph.reconcile" package to configure queries for data reconciliation tasks. A new abstract base class QueryBuilder has been added to the query_builder.py file, along with HashQueryBuilder and ThresholdQueryBuilder classes to construct SQL queries for generating hash values and selecting columns based on threshold values, transformation rules, and filtering conditions. These changes aim to enhance the functionality of the data source connector, add modularity, customizability, and reusability to the query builder, and improve data reconciliation tasks. - Added snowflake connector code (#177). In this release, the open-source library has been updated to add a Snowflake connector for data extraction and schema manipulation. The changes include the addition of the SnowflakeDataSource class, which is used to read data from Snowflake using PySpark, and has methods for getting the JDBC URL, reading data with and without JDBC reader options, getting the schema, and handling exceptions. These changes were completed by Ravikumar Thangaraj and SundarShankar89.
remorph reconcile
baseline for Query Builder and Source Adapter for oracle as source (#150).
Dependency updates:
- Bump sqlglot from 22.4.0 to 22.5.0 (#175).
- Updated databricks-sdk requirement from <0.22,>=0.18 to >=0.18,<0.23 (#178).
- Updated databricks-sdk requirement from <0.23,>=0.18 to >=0.18,<0.24 (#189).
- Bump actions/checkout from 3 to 4 (#203).
- Bump actions/setup-python from 4 to 5 (#201).
- Bump codecov/codecov-action from 1 to 4 (#202).
- Bump softprops/action-gh-release from 1 to 2 (#204).
Contributors: @dependabot[bot], @sundarshankar89, @ganeshdogiparthi-db, @vijaypavann-db, @bishwajit-db, @ravit-db, @nfx