Releases: ibis-project/ibis
9.1.0
9.1.0 (2024-06-13)
Features
- all: enable passing in-memory data to create_table (#9251) (fa15c7d), closes #6593 #8863
- api: add
Table.value_counts
for easy group by count on multiple fields (aba913d) - api: isoyear method (#9034) (4707c44)
- api: support
type
arg to ibis.null() (8db686e) - api: support wider range of types in
where
arg to column reductions (582165f) - api: support wider range of types in
where
arg to table reductions (7aba385) - bigquery: implement a few URL ops (#9210) (3d0f9bc)
- bigquery: support filtering by
_TABLE_SUFFIX
when using a wildcard table name (#9375) (62a25c4), closes #9371 - datafusion: use pyarrow for type conversion (#9299) (5bef96a)
- drop Python 3.9 and test on Python 3.10/3.12 (#9213) (c06285e)
- duckdb: add catalog support to create_table (#9147) (07331b5)
- duckdb: allow to use named in-memory db (#9241) (67460aa), closes #9240
- duckdb: support and test 1.0 (#9297) (395c8b5)
- pandas,dask: implement ops.StructColumn (#9302) (ea81d85)
- polars: accept list of CSVs to read_csv (#9232) (7a272e3), closes #9230
- polars: implement
create_view
/drop_view
/drop_table
(#9263) (c4324f5) - postgres: provide translation for
hash
ops (#9348) (57e2348) - pyarrow: support Arrow PyCapsule interface on
ibis.Table
objects (1a262b9) - pyspark: builtin udf support (#9191) (142c105)
- pyspark: provide a mode option to manage both batch and streaming connections (e425ad5)
- pyspark: support reading from and writing to Kafka (#9266) (1c7c6e3)
- selectors: parse Python types in
s.of_type
(#9356) (c0ebdc8) - snowflake: implement array map and array filter (#9178) (9b42751)
- snowflake: implement support for
asof_join
API (#9180) (49c6ce3) - snowflake: implement Table.sample (#9071) (307334b)
- ux: improve error message on unequal schemas during set ops (#9115) (5488896)
Bug Fixes
- api: treat
col == None
orcol == ibis.NA
ascol.isnull()
(#9114) (711bf9f) - bigquery: only register memtable if obj is not None (#9268) (f175d0a)
- bigquery: quote all parts of table names (#9141) (e1338d5)
- bigquery: quote qualified memtable names (#9149) (878d0d5)
- bigquery: strip whitespace from bigquery field names (#9160) (8e5cc3b), closes #9112
- clickhouse: more explicitly disallow null structs (#9305) (fc1d00f)
- convert the uint64's from some backends' hash() to the desired int64 (900ecca)
- datatypes: manually cast the type of
pos
toint16
fortable.info()
(#9139) (9eb1ed1) - datatypes: manually cast the type of pos to int16 for
table.describe()
(#9314) (c7fcddf) - ddl: use column names, not position, for insertion order (#9264) (3506f40)
- deps: remove pydruid sqlalchemy dependency (#9092) (a0df103)
- deps: update dependency datafusion to v37 (#9189) (49ecf8d)
- deps: update dependency datafusion to v38 (#9278) (77aaecd)
- deps: update dependency fsspec to <2024.5.1 (#9201) (15a5257)
- deps: update dependency fsspec to <2024.6.1 (#9304) (d600a0d)
- deps: update dependency sqlglot to >=23.4,<23.14 (#9118) (d8119fb)
- deps: update dependency sqlglot to >=23.4,<23.15 (#9151) (ac2201d)
- deps: update dependency sqlglot to >=23.4,<23.17 (#9209) (82a5f93)
- deps: update dependency sqlglot to >=23.4,<23.18 (#9212) (b92dd7b)
- deps: update dependency sqlglot to >=23.4,<24.2 (#9277) (98cb460)
- deps: update dependency sqlglot to >=23.4,<25.2 ([#9368](htt...
9.0.0
9.0.0 (2024-04-30)
β BREAKING CHANGES
- udf: The
schema
parameter for UDF definition has been removed. A newcatalog
parameter has been added. Ibis uses the word database to refer to a collection of tables, and the word catalog to refer to a collection of databases. You can use a combination ofcatalog
anddatabase
to specify a hierarchical location for the UDF. - pyspark: Arguments to
create_database
,drop_database
, andget_schema
are now keyword-only except for thename
args. Calls to these functions that have relied on positional argument ordering need to be updated. - dask: the dask backend no longer supports
cov
/corr
withhow="pop"
. - duckdb: Calling the
get
orcontains
method onNULL
map
values now returnsNULL
. Usecoalesce(map.get(...), default)
or
coalesce(map.contains(), False)
to get the previous behavior. - api: Integer inputs to
select
andmutate
are now always interpreted as literals. Columns can still be accessed by their integer index using square-bracket syntax. - api: strings passed to table.mutate() are now interpreted as
column references instead of literals, useibis.literal(string)
to
pass the string as a literal - ir:
Schema.apply_to()
is removed, useibis.formats.pandas.PandasConverter.convert_frame()
instead - ddl: We are removing the word
schema
in its hierarchical
sense. We usedatabase
to mean a collection of tables. The behavior of
all*_database
methods now applies only to collections of tables and
never to collections ofdatabase
(formerlyschema
) CanListDatabases
abstract methods now all refer to
collections of tables.CanCreateDatabases
abstract methods now all refer to
collections of tables.list_databases
now takes a kwargcatalog
.create_database
now takes a kwargcatalog
.drop_database
now takes a kwargcatalog
.current_database
now refers to the current collection of tables.CanCreateSchema
is deprecated andcreate_schema
,drop_schema
,
list_schemas
, andcurrent_schema
are deprecated and redirect to the
corresponding method/property ending indatabase
.- We add a
CanListCatalog
andCanCreateCatalog
that can list and
create collections ofdatabase
, respectively.
The new methods arelist_catalogs
,create_catalog
,drop_catalog
, - There is a new
current_catalog
property. - api: timecontext feature is removed
- api: The
by
argument fromasof_join
is removed. Calls toasof_join
that previously usedby
should pass those arguments topredicates
instead. - cleanup: Deprecated methods and properties
op
,output_dtype
, andoutput_shape
are removed.op
is no longer needed, and use.dtype
and.shape
respectively for the other two. - api: expr.topk(...) now includes null counts. The row count of the topk call will not differ, but the number of nulls counted will no longer be zero. To drop the null row use the dropna method.
- api:
ibis.rows_with_max_lookback()
function andibis.window(max_lookback)
argument are removed - strings: Backends that previously used initcap (analogous to str.title) to implement StringValue.capitalize() will produce different results when the input string contains multiple words (a word's definition being backend-specific).
- impala: Impala UDFs no longer require explicit registration. Remove any calls to
Function.register
. If you were passingdatabase
toFunction.register
, pass that toscalar_function
oraggregate_function
as appropriate. - pandas: the
timecontext
feature is not supported anymore - api:
on
paremater oftable.asof_join()
is now only
accept a single predicate, usepredicates
to supply additional
join predicates.
Features
- add to_date function to StringValue (#9030) (0701978), closes #8908
- api: add
.as_scalar()
method for turning expressions into scalar subqueries (#8350) (8130169) - api: add
catalog
anddatabase
kwargs toibis.table
(#8801) (7d593c4) - api: add
describe
method to compute summary stats of table expressions (#8739) (c8d98a1) - api: add
ibis.today()
for retrieving the current date (#8664) (5e10d17) - api: add a
to_polars()
method for returning query results aspolars
objects (53454c1) - api: add a
uuid
function for returning a new uuid (#8438) (965b6d9) - api: add API for unwrapping JSON values into backend-native values (#8958) (aebb5cf)
- api: add disconnect method (#8341) (32665af), closes #5940
- api: allow *arg syntax with GroupedTable methods (#8923) (489bb89)
- api: count nulls with topk (#8531) (54c2c70)
- api: expose common types in the top-level
ibis
namespace (#9008) (3f3ed27), closes #8717 - api: include bad type in NotImplementedError (#8291) (36da06b)
- api: natively support polars dataframes in
ibis.memtable
(464bebc) - api: support
Table.order_by(*keys)
(6ade4e9) - api: support all dtypes in MapGet and MapContains (#8648) (401e0a4)
- api: support converting ibis types & schemas to/from polars types & schemas (73add93)
- api: support Deferreds in Array.map and .filter (#8267) (8289d2c)
- api: support the inner join convenience to not repeat fields known to be equal (#8127) (798088d)
- api: support variadic arguments on
Table.group_by()
(#8546) (665bc4f) - backends: introducing ibish the infinite scale backend you always wanted (#8785) (1d51243)
- bigquery: support polars memtables (26d103d)
- common: add
Dispatched
base class for convenient visitor pattern implementation (f80c5b3) - common: add
Node.find_below()
methods to exclude the root node from filtering (#8861) (80d12a2) - common: add a memory efficient
Node.map()
implementation (e3f2217) - common: also traverse nodes used as dictionary keys (#9041) (02c6607)
- common: introduce
FrozenOrderedDict
(#9081) (f926995), closes #9063 - datafusion, flink, mssql: add uuid operation (#8545) (2f85a42)
- datafusion: add array and strings functions ([#...
8.0.0
8.0.0 (2024-02-05)
β BREAKING CHANGES
- backends: Columns with Ibis
date
types are now returned as object dtype containingdatetime.date
objects when executing with the pandas backend. - impala: Direct HDFS integration is removed and support for ingesting pandas DataFrames directly is as well. The Impala backend still works with HDFS, but data in HDFS must be managed outside of ibis.
- api: replace
ibis.show_sql(expr)
calls withprint(ibis.to_sql(expr))
or if using Jupyter or IPythonibis.to_sql(expr)
- bigquery:
nullifzero
is removed; usenullif(0)
instead - bigquery:
zeroifnull
is removed; usefillna(0)
instead - bigquery:
list_databases
is removed; uselist_schemas
instead - bigquery: the bigquery
current_database
method returns thedata_project
instead of thedataset_id
. Usecurrent_schema
to retrievedataset_id
. To explicitly list tables in a given project and dataset, you can usef"{con.current_database}.{con.current_schema}"
Features
- api: define
RegexSplit
operation andre_split
API (07beaed) - api: support median and quantile on more types (#7810) (49c75a8)
- clickhouse: implement
RegexSplit
(e3c507e) - datafusion: implement
ops.RegexSplit
using pyarrow UDF (37b6b7f) - datafusion: set ops (37abea9)
- datatypes: add decimal and basic geospatial support to the sqlglot type parser/generator (59783b9)
- datatypes: make intervals round trip through sqlglot type mapper (d22f97a)
- duckdb-geospatial: add support for flipping coordinates (d47088b)
- duckdb-geospatial: enable use of literals (23ad256)
- duckdb: implement
RegexSplit
(229a1f4) - examples: add
zones
geojson example (#8040) (2d562b7), closes #7958 - flink: add new temporal operators (dfef418)
- flink: add primary key support (da04679)
- flink: export result to pyarrow (9566263)
- flink: implement array operators (#7951) (80e13b4)
- flink: implement struct field, clean up literal, and adjust timecontext test markers (#7997) (2d5e108)
- impala: rudimentary date support (d4bcf7b)
- mssql: add hashbytes and test for binary output hash fns (#8107) (91f60cd), closes #8082 #8082
- mssql: use odbc (f03ad0c)
- polars: implement
ops.RegexSplit
using pyarrow UDF (a3bed10) - postgres: implement
RegexSplit
(c955b6a) - pyspark: implement
RegexSplit
(cfe0329) - risingwave: init impl for Risingwave (#7954) (351747a), closes #8038
- snowflake: implement
RegexSplit
(2c1a726) - snowflake: implement insert method (2162e3f)
- trino: implement
RegexSplit
(9d1295f)
Bug Fixes
- api: deferred values are not truthy (00b3ece)
- backends: ensure that returned date results are actually proper date values (0626fb2)
- backends: preserve
order_by
position in window function when subsequent expressions are duplicated (#7943) (89056b9), closes #7940 - common: do not convert callables to resolveable objects (9963705)
- datafusion: work around lack of support for uppercase units in intervals (ebb6cde)
- datatypes: ensure that array construction supports literals and infers their shape from its inputs (#8049) (899dce1), closes #8022
- datatypes: fix bad references in
to_numpy()
(6fd4550) - deps: remove
filelock
from required dependencies (76dded5) - deps: update dependency black to v24 (425f7b1)
- deps: update dependency datafusion to v34 (601f889)
- deps: update dependency datafusion to v35 (#8224) (a34af25)
- deps: update dependency oracledb to v2 (e7419ca)
- deps: update dependency pyarrow to v15 (ef6a9bd)
- deps: update dependency pyodbc to v5 (32044ea)
- docs: surround executable code blocks with interactive mode on/off (4c660e0)
- duckdb: allow table creation from expr with geospatial datatypes (#7818) (ecac322)
- duckdb: ensure that casting to floating point values produces valid types in generated sql (424b206)
- examples: use anonymous access when reading example data from GCS (8e5c0af)
- impala: generate memtables using
UNION ALL
to work around sqlglot bug (399a5ef) - mutate/select: ensure that unsplatted dictionaries work in
mutate
andselect
APIs (#8014) (8ed19ea), closes #8013 - mysql: catch PyMySQL OperationalError exception (#7919) (f2c2664), closes #6010 #7918
- pandas: support non-string categorical columns (5de08c7)
- polars: avoid using unnecessary subquery for schema inference (0f43667)
- **p...
7.2.0
7.2.0 (2023-12-18)
Features
- api: add
ArrayValue.flatten
method and operation (e6e995c) - api: add
ibis.range
function for generating sequences (f5a0a5a) - api: add timestamp range (c567fe0)
- base: add
to_pandas
method to BaseBackend (3d1cf66) - clickhouse: implement array flatten support (d15c6e6)
- common:
node.replace()
now supports mappings for quick lookup-like substitutions (bbc93c7) - common: add
node.find_topmost()
method to locate matching nodes without descending further to their children (15acf7d) - common: allow matching on dictionaries in possibly nested patterns (1d314f7)
- common: expose
node.__children__
property to access the flattened list of children of a node (2e91476) - duckdb: add initial support for geospatial functions (65f496c)
- duckdb: add read_geo function (b19a8ce)
- duckdb: enforce aswkb for projections, coerce to geopandas (33327dc)
- duckdb: implement array flatten support (0a0eecc)
- exasol: add exasol backend (295903d)
- export: allow passing keyword arguments to PyArrow
ParquetWriter
andCSVWriter
(40558fd) - flink: implement nested schema support (057fabc)
- flink: implement windowed computations (256767f)
- geospatial: add support for GeoTransform on duckdb (ec533c1)
- geospatial: update read_geo to support url (3baf509)
- pandas/dask: implement flatten (c2e8d9d)
- polars: add
streaming
kwarg toto_pandas
(703507f) - polars: implement array flatten support (19b2aa0)
- pyspark: enable multiple values in
.substitute
(291a290) - pyspark: implement array flatten support (5d1fadf)
- snowflake: implement array flatten support (d3c754f)
- snowflake: read_csv with https (72752eb)
- snowflake: support udf arguments for reading from staged files (529a3a2)
- snowflake: use upstream
array_sort
(9624341) - sqlalchemy: support expressions in window bounds (5dbb3b1)
- trino: implement array flatten support (0d1faaa)
Bug Fixes
- api: avoid casting to bool for
table.info()
nullable
column (3b3bd7b) - bigquery: escape the schema (project ID) for BQ builtin UDFs (8096552)
- bigquery: fully qualified memtable names in compile (a81e432)
- clickhouse: use backwards compatible methods of getting query metadata (975556f)
- datafusion: bring back UDF registration (43084fa)
- datafusion: ensure that non-matching re_search calls return bool values when patterns do not match (088b027)
- datafusion: support computed group by when the aggregation is count distinct (18bdb7e)
- decompile: handle isin (6857751)
- deferred: don't pass expression in fstringified error message (724859d)
- deps: update dependency datafusion to v33 (57047a2)
- deps: update dependency sqlglot to v20 (13bc6e2)
- duckdb: ensure that already quoted identifiers are not erased (45ee391)
- duckdb: ensure that parameter names are unlikely to overlap with column names (d93dbe2)
- duckdb: gate geoalchemy import in duckdb geospatial (8f012c4)
- duckdb: render dates, times, timestamps and none literals correctly (5d8866a)
- duckdb: use functions for temporal literals (b1407f8)
- duckdb: use the UDF's signature instead of arguments' output type for generating a duckdb signature (233dce1)
- flink: add more test (33e1a31)
- flink: add os to the cache key (1b92b33)
- flink: add test cases for recreate table (1413de9)
- flink: customize the list of base idenitifers (0b5d343)
- flink: fix recreating table/view issue on flink backend (0c9791f)
- flink: implement TypeMapper and SchemaMapper for Flink backend (f983bfa)
- flink: use lazy import to prevent premature loading of pyflink during gen_matrix (d042402)
- geospatial: pretty print data in interactive mode (afb04ed)
- ir: ensure that join projection columns are all always nullable (f5f35c6)
- ir: handle renaming for scalar operations (6f77f17)
- ir: handle the case of non-overlapping data and add a test (1c9ae1b)
- ir: implicitly convert
None
literals withdt.Null
type to the requested type during value coercion (d51ec4e) - ir: merge window frames for bound analytic window functions with a subsequent over call (e12ce8d)
- ir: raise if
Concrete.copy()
receives unexpected arguments (442199a) - memtable: ensure column names match provided data (faf99df)
- memtables: disallow duplicat...
7.1.0
7.1.0 (2023-11-16)
Features
- api: add
bucket
method for timestamps (ca0f7bc) - api: add
Table.sample
method for sampling rows from a table (3ce2617) - api: allow selectors in
order_by
(359fd5e) - api: move analytic window functions to top-level (8f2ced1)
- api: support deferred in reduction filters (349f475)
- api: support specifying
signature
in udf definitions (764977e) - bigquery: add
location
parameter (d652dbb) - bigquery: add
read_csv
,read_json
,read_parquet
support (ff83110) - bigquery: support temporary tables using sessions (eab48a9)
- clickhouse: add support for timestamp
bucket
(10a5916) - clickhouse: support
Table.fillna
(5633660) - common: better inheritance support for Slotted and FrozenSlotted (9165d41)
- common: make Slotted and FrozenSlotted pickleable (13cbce0)
- common: support
Self
annotations forAnnotable
(0c60146) - common: use patterns to filter out nodes during graph traversal (3edd8f7)
- dask: add read_csv and read_parquet (e9260af)
- dask: enable pyarrow conversion (2d36722)
- dask: support
Table.sample
(09a7626) - datafusion: add case and if-else statements (851d560)
- datafusion: add corr and covar (edc42be)
- datafusion: add isnull and isnan operations (0076c25)
- datafusion: add some array functions (0b96b68)
- datafusion: add StringLength, FindInSet, ArrayStringJoin (fd03831)
- datafusion: add TimestampFromUNIX and subtract/add operations (2bffa5a)
- datafusion: add TimestampTruncate / fix broken extract time part functions (940ed21)
- datafusion: support dropping schemas (cc6870c)
- duckdb: add
attach
anddetach
methods for adding and removing databases to the current duckdb session (162b058) - duckdb: add
ntile
support (bf08a2a) - duckdb: add dict-like for DuckDB settings (ea2d317)
- duckdb: add support for specific timestamp scales (3518b78)
- duckdb: allow users to register fsspec filesystem with DuckDB (6172f07)
- duckdb: expose option to force reinstall extension (98080d0)
- duckdb: implement
Table.sample
as aTABLESAMPLE
query (3a80f3a) - duckdb: implement partial json collection casting (aae28e9)
- flink: add remaining operators for Flink to pass/skip the common tests (b27adc6)
- flink: add several temporal operators (f758228)
- flink: implement the
ops.TryCast
operation (752e587) - formats: map ibis JSON type to pyarrow strings (79b6eac)
- impala/pyspark: implement
to_pyarrow
(6b33454) - impala: implement
Table.sample
(8e78dfc) - implement window table valued functions (a35a756)
- improve generated column names for methods receiving intervals (c319ed3)
- mssql: add support for timestamp
bucket
(1ffac11) - mssql: support cross-db/cross-schema table list (3e0f0fa)
- mysql: support
ntile
(9a14ba3) - oracle: add fixes after running pre-commit (6538b70)
- oracle: add fixes after running pre-commit (e3d14b3)
- oracle: add support for loading Oracle RAW and BLOB types (c77eeb2)
- oracle: change parsing of Oracle NUMBER data type (649ab86)
- oracle: remove redundant brackets (2905484)
- pandas: add read_csv and read_parquet (34eeca6)
- pandas: support
Table.sample
(77215be) - polars: add support for timestamp
bucket
(c59518c) - postgres: add support for timestamp
bucket
(4d34afc) - pyspark: support
Table.sample
(6aa897e) - snowflake: support
ntile
(39eed1a) - snowflake: support cross-db/cross-schema table list (2071897)
- snowflake: support timestamp bucketing (a95ffa9)
- sql: implement
Table.sample
as arandom()
filter across several SQL backends (e1870ea) - trino: implement
Table.sample
as aTABLESAMPLE
query (f3d044c) - trino: support
ntile
(2978d1a) - trino: support temporal operations (8b8e885)
- udf: improve mypy compatibility for udf functions (65b5bb7)
- use
to_pyarrow
instead ofto_pandas
in the interactive repr (72aa573) - ux: fix long links, add repr links in vscode (734bd91)
- ux: implement recursive element conversion for nested types and json ([8ddfa94](https://gi...
7.0.0
7.0.0 (2023-10-02)
β BREAKING CHANGES
- api: the
interpolation
argument was only supported in the dask and pandas backends; for interpolated quantiles use dask or pandas directly - ir: Dask and Pandas only; cumulative operations that relied on implicit ordering from prior operations such as calls to
table.order_by
may no longer work, passorder_by=...
into the appropriate cumulative method to achieve the same behavior. - api: UUID, MACADDR and INET are no longer subclasses of strings. Cast those values to
string
to enable use of the string APIs. - impala:
ImpalaTable.rename
is removed, useBackend.rename_table
instead. - pyspark:
PySparkTable.rename
is removed, useBackend.rename_table
instead. - clickhouse:
ClickhouseTable
is removed. This class only provided a singleinsert
method. Use the Clickhouse backend'sinsert
method instead. - datatypes: The minimum version of
sqlglot
is now 17.2.0, to support much faster and more robust backend type parsing. - ir: ibis.expr.selectors module is removed, use ibis.selectors instead
- api: passing a tuple or a sequence of tuples to table.order_by() calls is not allowed anymore; use ibis.asc(key) or ibis.desc(key) instead
- ir: the
ibis.common.validators
module has been removed
and all validation rules fromibis.expr.rules
, either use typehints
or patterns fromibis.common.patterns
Features
- api: add
.delta
method for computing difference in units between two temporal values (18617bf) - api: add
ArrayIntersect
operation and correspondingArrayValue.intersect
API (76c95b2) - api: add
Backend.rename_table
(0047143) - api: add
levenshtein
edit distance API (ab211a8) - api: add
relocate
table expression API for moving columns around based on selectors (ee8a86f) - api: add
Table.rename
, with support for renaming via keyword arguments (917d7ec) - api: add
to_pandas_batches
(740778f) - api: add support for referencing backend-builtin functions (76f5f4b)
- api: implement negative slice indexing (caee5c1)
- api: improve repr for deferred expressions containing Column/Scalar values (6b1218a)
- api: improve repr of deferred functions (f2b3744)
- api: support deferred and literal values in
ibis.ifelse
(685dbc1) - api: support deferred arguments in
ibis.case()
(6f9f7c5) - api: support deferred arguments to
ibis.array
(b1b83f9) - api: support deferred arguments to
ibis.map
(86c8669) - api: support deferred arguments to
ibis.struct
(7ef870d) - api: support deferred arguments to udfs (a49d259)
- api: support deferred expressions in
ibis.date
(f454a71) - api: support deferred expressions in
ibis.time
(be1fd65) - api: support deferred expressions in
ibis.timestamp
(0e71505) - api: support deferred values in
ibis.coalesce
/ibis.greatest
/ibis.least
(e423480) - bigquery: implement array functions (04f5a11)
- bigquery: use sqlglot to implement functional unnest to relational unnest (167c3bd)
- clickhouse: add
read_parquet
andread_csv
(dc2ea25) - clickhouse: add support for
.sql
methods (f1d004b) - clickhouse: implement builtin agg functions (eea679a)
- clickhouse: support caching tables with the
.cache()
method (621bdac) - clickhouse: support reading parquet and csv globs (4ea1834)
- common: match and replace graph nodes (78865c0)
- datafusion: add coalesce, nullif, ifnull, zeroifnull (1cc67c9)
- datafusion: add ExtractWeekOfYear, ExtractMicrosecond, ExtractEpochSeconds (5612d48)
- datafusion: add join support (e2c143a)
- datafusion: add temporal functions (6be6c2b)
- datafusion: implement builtin agg functions (0367069)
- duckdb: expose loading extensions (2feecf7)
- examples: name examples tables according to example name (169d889)
- flink: add batch and streaming mode test fixtures for Flink backend (49485f6)
- flink: allow translation of decimal literals (52f7032)
- flink: fine-tune numeric literal translation (2f2d0d9)
- flink: implement
ops.FloorDivide
operation (95474e6) - flink: implement a minimal PyFlink
Backend
(46d0e33) - flink: implement insert dml (6bdec79)
- flink: implement table-related ddl in Flink backend to support streaming connectors (8dabefd)
- flink: implement translation of
NULLIFZERO
(6ad1e96) - flink: implement translation of
ZEROIFNULL
(31560eb) - flink: support translating typed null values (83beb7e)
- impala: implement
Backend.rename_table
(309c999) - introduce watermarks in ibis api (eaaebb8)
- just chat to open Zulip in terminal (95e164e)
- patterns: support building sequences in replacement patterns (f320c2e)
- patterns: support building sequences in replacement patterns (beab068)
- patterns: support calling methods on builders like a variable (58b2d0e)
- polars: implement new UDF API (becbf41)
- polars: implement support for builtin aggregate udfs (c383f62)
- polars: support reading ndjson ([1bda3bd](https://g...
6.2.0
6.2.0 (2023-08-31)
Features
- trino: add source application to trino backend (cf5fdb9)
Bug Fixes
- bigquery,impala: escape all ASCII escape sequences in string literals (402f5ca)
- bigquery: correctly escape ASCII escape sequences in regex patterns (a455203)
- release: pin conventional-changelog-conventionalcommits to 6.1.0 (d6526b8)
- trino: ensure that list_databases look at all catalogs not just the current one (cfbdbf1)
- trino: override incorrect base sqlalchemy
list_schemas
implementation (84d38a1)
Documentation
- trino: add connection docstring (507a00e)
6.1.0
6.1.0 (2023-08-03)
Features
- api: add
ibis.dtype
top-level API (867e5f1) - api: add
table.nunique()
for counting unique table rows (adcd762) - api: allow mixing literals and columns in
ibis.array
(3355dd8) - api: improve efficiency of
__dataframe__
protocol (15e27da) - api: support boolean literals in join API (c56376f)
- arrays: add
concat
method equivalent to__add__
/__radd__
(0ed0ab1) - arrays: add
repeat
method equivalent to__mul__
/__rmul__
(b457c7b) - backends: add
current_schema
API (955a9d0) - bigquery: fill out
CREATE TABLE
DDL options including support foroverwrite
(5dac7ec) - datafusion: add count_distinct, median, approx_median, stddev and var aggregations (45089c4)
- datafusion: add extract url fields functions (4f5ea98)
- datafusion: add functions sign, power, nullifzero, log (ef72e40)
- datafusion: add RegexSearch, StringContains and StringJoin (4edaab5)
- datafusion: implement in-memory table (d4ec5c2)
- flink: add tests and translation rules for additional operators (fc2aa5d)
- flink: implement translation rules and tests for over aggregation in Flink backend (e173cd7)
- flink: implement translation rules for literal expressions in flink compiler (a8f4880)
- improved error messages when missing backend dependencies (2fe851b)
- make output of
to_sql
a properstr
subclass (084bdb9) - pandas: add ExtractURLField functions (e369333)
- polars: implement
ops.SelfReference
(983e393) - pyspark: read/write delta tables (d403187)
- refactor ddl for create_database and add create_schema where relevant (d7a857c)
- sqlite: add scalar python udf support to sqlite (92f29e6)
- sqlite: implement extract url field functions (cb1956f)
- trino: implement support for
.sql
table expression method (479bc60) - trino: support table properties when creating a table (b9d65ef)
Bug Fixes
- api: allow scalar window order keys (3d3f4f3)
- backends: make
current_database
implementation and API consistent across all backends (eeeeee0) - bigquery: respect the fully qualified table name at the init (a25f460)
- clickhouse: check dispatching instead of membership in the registry for
has_operation
(acb7f3f) - datafusion: always quote column names to prevent datafusion from normalizing case (310db2b)
- deps: update dependency datafusion to v27 (3a311cd)
- druid: handle conversion issues from string, binary, and timestamp (b632063)
- duckdb: avoid double escaping backslashes for bind parameters (8436f57)
- duckdb: cast read_only to string for connection (27e17d6)
- duckdb: deduplicate results from
list_schemas()
(172520e) - duckdb: ensure that current_database returns the correct value (2039b1e)
- duckdb: handle conversion from duckdb_engine unsigned int aliases (e6fd0cc)
- duckdb: map hugeint to decimal to avoid information loss (4fe91d4)
- duckdb: run pre-execute-hooks in duckdb before file export (5bdaa1d)
- duckdb: use regexp_matches to ensure that matching checks containment instead of a full match (0a0cda6)
- examples: remove example datasets that are incompatible with case-insensitive file systems (4048826)
- exprs: ensure that left_semi and semi are equivalent (bbc1eb7)
- forward arguments through
__dataframe__
protocol (50f3be9) - ir: change "it not a" to "is not a" in errors (d0d463f)
- memtable: implement support for translation of empty memtable (05b02da)
- mysql: fix UUID type reflection for sqlalchemy 2.0.18 (12d4039)
- mysql: pass-through kwargs to connect_args (e3f3e2d)
- ops: ensure that name attribute is always valid for
ops.SelfReference
(9068aca) - polars: ensure that
pivot_longer
works with more than one column (822c912) - polars: fix collect implementation (c1182be)
- postgres: by default use domain socket (e44fdfb)
- pyspark: make
has_operation
method a[@classmethod](https://github.com/classmethod)
(c1b7dbc) - release: use @google/[email protected] to avoid module loading bug (673aab3)
- snowflake: fix broken unnest functionality (207587c)
- snowflake: reset the schema and database to the original schema after creating them (54ce26a)
- snowflake: reset to original schema when resetting the database (32ff832)
- snowflake: use
regexp_instr != 0
instead ofREGEXP
keyword (06e2be4) - sqlalchemy: add support for sqlalchemy string subclassed types (8b33b35)
- sql: handle parsing aliases ([3645cf4](3645cf4119620e8b01...
6.0.0
6.0.0 (2023-07-05)
β BREAKING CHANGES
-
imports: Use of
ibis.udf
as a module is removed. Useibis.legacy.udf
instead. -
The minimum supported Python version is now Python 3.9
-
api:
group_by().count()
no longer automatically names the count aggregationcount
. Userelabel
to rename columns. -
backends:
Backend.ast_schema
is removed. Useexpr.as_table().schema()
instead. -
snowflake/postgres: Postgres UDFs now use the new
@udf.scalar.python
API. This should be a low-effort replacement for the existing API. -
ir:
ops.NullLiteral
is removed -
datatypes:
dt.Interval
has no longer a default unit,dt.interval
is removed -
deps:
snowflake-connector-python
's lower bound was increased to 3.0.2, the minimum version needed to avoid a high-severity vulernability. Please upgradesnowflake-connector-python
to at least version 3.0.2. -
api:
Table.difference()
,Table.intersection()
, andTable.union()
now require at least one argument. -
postgres: Ibis no longer automatically defines
first
/last
reductions on connection to the postgres backend. Use DDL shown in https://wiki.postgresql.org/wiki/First/last_(aggregate) or one of thepgxn
implementations instead. -
api:
ibis.examples.<example-name>.fetch
no longer forwards arbitrary keyword arguments toread_csv
/read_parquet
. -
datatypes:
dt.Interval.value_type
attribute is removed -
api:
Table.count()
is no longer automatically named"count"
. UseTable.count().name("count")
to achieve the previous behavior. -
trino: The trino backend now requires at least version 0.321 of the
trino
Python package. -
backends: removed
AlchemyTable
,AlchemyDatabase
,DaskTable
,DaskDatabase
,PandasTable
,PandasDatabase
,PySparkDatabaseTable
, useops.DatabaseTable
instead -
dtypes: temporal unit enums are now available under
ibis.common.temporal
instead ofibis.common.enums
. -
clickhouse:
external_tables
can no longer be passed inibis.clickhouse.connect
. Passexternal_tables
directly inraw_sql
/execute
/to_pyarrow
/to_pyarrow_batches()
. -
datatypes:
dt.Set
is now an alias fordt.Array
-
bigquery: Before this change, ibis timestamp is mapping to Bigquery TIMESTAMP type and no timezone supports. However, it's not correct, BigQuery TIMESTAMP type should have UTC timezone, while DATETIME type is the no timezone version. Hence, this change is breaking the ibis timestamp mapping to BigQuery: If ibis timestamp has the UTC timezone, will map to BigQuery TIMESTAMP type. If ibis timestamp has no timezone, will map to BigQuery DATETIME type.
-
impala: Cursors are no longer returned from DDL operations to prevent resource leakage. Use
raw_sql
if you need specialized operations that return a cursor. Additionally, table-based DDL operations now return the table they're operating on. -
api:
Column.first()
/Column.last()
are now reductions by default. Code running these expressions in isolation will no longer be windowed over the entire table. Code using this function inselect
-based APIs should function unchanged. -
bigquery: when using the bigquery backend, casting float to int
will no longer round floats to the nearest integer -
ops.Hash: The
hash
method on table columns on longer accepts
thehow
argument. The hashing functions available are highly
backend-dependent and the intention of the hash operation is to provide
a fast, consistent (on the same backend, only) integer value.
If you have been passing in a value forhow
, you can remove it and you
will get the same results as before, as there were no backends with
multiple hash functions working. -
duckdb: Some CSV files may now have headers that did not have them previously. Set
header=False
to get the previous behavior. -
deps: New environments will have a different default setting for
compression
in the ClickHouse backend due to removal of optional dependencies. Ibis is still capable of using the optional dependencies but doesn't include them by default. Installclickhouse-cityhash
andlz4
to preserve the previous behavior. -
api:
Table.set_column()
is removed; useTable.mutate(name=expr)
instead -
api: the
suffixes
argument in all join methods has been removed in favor oflname
/rname
args. The default renaming scheme for duplicate columns has also changed. To get the exact same behavior as before, pass inlname="{name}_x", rname="{name}_y"
. -
ir:
IntervalType.unit
is now an enum instead of a string -
type-system: Inferred types of Python objects may be slightly different. Ibis now use
pyarrow
to infer the column types of pandas DataFrame and other types. -
backends:
path
argument ofBackend.connect()
is removed, use thedatabase
argument instead -
api: removed
Table.sort_by()
andTable.groupby()
, use.order_by()
and.group_by()
respectively -
datatypes:
DataType.scalar
andcolumn
class attributes are now strings. -
backends:
Backend.load_data()
,Backend.exists_database()
andBackend.exists_table()
are removed -
ir:
Value.summary()
andNumericValue.summary()
are removed -
schema:
Schema.merge()
is removed, use the union operatorschema1 | schema2
instead -
api:
ibis.sequence()
is removed -
drop support for Python 3.8 (747f4ca)
Features
- add dask windowing (9cb920a)
- add easy type hints to GroupBy (da330b1)
- add microsecond method to TimestampValue and TimeValue (e9df2da)
- api: add
__dataframe__
implementation (b3d9619) - api: add ALL_CAPS option to Table.relabel (c0b30e2)
- api: add first/last reduction APIs (8c01980)
- api: add zip operation and api (fecf695)
- api: allow passing multiple keyword arguments to
ibis.interval
(22ee854) - api: better repr and pickle support for deferred expressions (2b1ec9c)
- api: exact median (c53031c)
- api: raise better error on column name collision in joins (e04c38c)
- api: replace
suffixes
injoin
withlname
/rname
(3caf3a1) - api: support abstract type names in
selectors.of_type
(f6d2d56) - api: support list of strings and single strings in the
across
selector (a6b60e7) - api: use
create_table
to load example data (42e09a4) - bigquery: add client and storage_client params to connect (4cf1354)
- bigquery: enable group_concat over windows (d6a1117)
- cast: add table-level try_cast (5e4d16b)
- clickhouse: add array zip impl (efba835)
- clickhouse: move to clickhouse supported Python client (012557a)
- clickhouse: set default engine to native file (29815fa)
- clickhouse: support pyarrow decimal types (7472dd5)
- common: add a pure python egraph implementation (aed2ed0)
- common: add pattern matchers (b515d5c)
- common: add support for start parameter in StringFind (31ce741)
- common: add Topmost and Innermost pattern matchers (90b48fc)
- common: implement copy protocol for Immutable base class (e61c66b)
- create_table: support pyarrow Table in table creation (9dbb25c)
- datafusion: add string functions (66c0afb)
- datafusion: add support for scalar pyarrow UDFs ([45935b7](45935b78922f09ab...
5.1.0
5.1.0 (2023-04-11)
Features
- api: expand
distinct
API for dropping duplicates based on column subsets (3720ea5) - api: implement pyarrow memtables (9d4fbbd)
- api: support passing a format string to
Table.relabel
(0583959) - api: thread kwargs around properly to support more complex connection arguments (7e0e15b)
- backends: add more array functions (5208801)
- bigquery: make
to_pyarrow_batches()
smarter (42f5987) - bigquery: support bignumeric type (d7c0f49)
- default repr to showing all columns in Jupyter notebooks (91a0811)
- druid: add re_search support (946202b)
- duckdb: add map operations (a4c4e77)
- duckdb: support sqlalchemy 2 (679bb52)
- mssql: implement ops.StandardDev, ops.Variance (e322f1d)
- pandas: support memtable in pandas backend (6e4d621), closes #5467
- polars: implement count distinct (aea4ccd)
- postgres: implement
ops.Arbitrary
(ee8dbab) - pyspark:
pivot_longer
(f600c90) - pyspark: add ArrayFilter operation (2b1301e)
- pyspark: add ArrayMap operation (e2c159c)
- pyspark: add DateDiff operation (bfd6109)
- pyspark: add partial support for interval types (067120d)
- pyspark: add read_csv, read_parquet, and register (7bd22af)
- pyspark: implement count distinct (db29e10)
- pyspark: support basic caching (ab0df7a)
- snowflake: add optional 'connect_args' param (8bf2043)
- snowflake: native pyarrow support (ce3d6a4)
- sqlalchemy: support unknown types (fde79fa)
- sqlite: implement
ops.Arbitrary
(9bcdf77) - sql: use temp views where possible (5b9d8c0)
- table: implement
pivot_wider
API (60e7731) - ux: move
ibis.expr.selectors
toibis.selectors
and deprecate for removal in 6.0 (0ae639d)
Bug Fixes
- api: disambiguate attribute errors from a missing
resolve
method (e12c4df) - api: support filter on literal followed by aggregate (68d65c8)
- clickhouse: do not render aliases when compiling aggregate expression components (46caf3b)
- clickhouse: ensure that clickhouse depends on sqlalchemy for
make_url
usage (ea10a27) - clickhouse: ensure that truncate works (1639914)
- clickhouse: fix
create_table
implementation (5a54489) - clickhouse: workaround sqlglot issue with calling
match
(762f4d6) - deps: support pandas 2.0 (4f1d9fe)
- duckdb: branch to avoid unnecessary dataframe construction (9d5d943)
- duckdb: disable the progress bar by default (1a1892c)
- duckdb: drop use of experimental parallel csv reader (47d8b92)
- duckdb: generate
SIMILAR TO
instead of tilde to workaround sqlglot issue (434da27) - improve typing signature of .dropna() (e11de3f)
- mssql: improve aggregation on expressions (58aa78d)
- mssql: remove invalid aggregations (1ce3ef9)
- polars: backwards compatibility for the
time_zone
andtime_unit
properties (3a2c4df) - postgres: allow inference of unknown types (343fb37)
- pyspark: fail when aggregation contains a
having
filter (bd81a9f) - pyspark: raise proper error when trying to generate sql (51afc13)
- snowflake: fix new array operations; remove
ArrayRemove
operation (772668b) - snowflake: make sure ephemeral tables following backend quoting rules (9a845df)
- snowflake: make sure pyarrow is used when possible (01f5154)
- sql: ensure that set operations resolve to a single relation (3a02965)
- sql: generate consistent
pivot_longer
semantics in the presence of multipleunnest
s (6bc301a) - sqlglot: work with newer versions (6f7302d)
- trino,duckdb,postgres: make cumulative
notany
/notall
aggregations work (c2e985f) - trino: only support
how='first'
witharbitrary
reduction (315b5e7) - ux: use guaranteed length-1 characters for
NULL
values (8618789)
Refactors
- api: remove explicit use of
.projection
in favor of the shorter.select
(73df8df) - cache: factor out ref counted cache (c816f00)
- duckdb: simplify
to_pyarrow_batches
implementation (d6235ee) - duckdb: source loaded and installed extensions from duckdb (fb06262)
- duckdb: use native duckdb parquet reader unless auth required (e9f57eb)
- generate uuid-based names for temp tables ([a1164df](a1164df5d1bc4fa454371626a05...