Releases: aws/aws-sdk-pandas
AWS Data Wrangler 1.8.1
Bug Fix
- Fix NaN values handling for
wr.athena.read_sql_*()
. #351
Docs
- Instructions for installation in AWS Glue PySpark Jobs. #46
Thanks
We thank the following contributors/users for their work on this release:
@czagoni, @josecw, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!
AWS Data Wrangler 1.8.0
New Functionalities
wr.s3.to_parquet()
now hasmax_rows_by_file
argument. #283- Support for Unix path pattern matching (
*
,?
,[seq]
,[!seq]
) for any list/read/delete/copy function on S3. #322
Enhancements
- Mypy applied with strict mode.
Bug Fix
- Fix unnecessary table versioning (glue catalog) creation for
wr.s3.to_parquet()
during appends. #342 - Lack of sanitisation in indexes names for
wr.s3.to_parquet/csv()
. #343
Docs
- New Who uses AWS Data Wrangler? section!!!
Thanks
We thank the following contributors/users for their work on this release:
@Thiago-Dantas, @andre-marcos-perez, @ericct, @marcelo-vilela, @edvorkin, @nicholas-miles, @chrispruitt, @rparthas ,@igorborgest.
P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!
AWS Data Wrangler 1.7.0
Breaking changes
- The partitioned parquet reading now has a different approach for pushdown filters. For details check the tutorial
New Functionalities
- Global configuration module - TUTORIAL
- Concurrently partitions write - TUTORIAL
- Flexible Partitions Filter (PUSH-DOWN) - TUTORIAL
- Add Athena query metadata to Pandas DataFrames returned by
wr.athane.read_sql_*()
- TUTORIAL #331 wr.athena.describe_table()
#329wr.athena.show_create_table()
#334- Add
path_ignore_suffix
argument to all read functions #326
Enhancements
- Support for
PyArrow 1.0.0
#337 - Support for
Pandas 1.1.0
- Support writing encrypted redshift copy manifest to S3 #327
wr.athane.read_sql_*()
now accepts empty results #299- Allow connect_args to be passed when creating an SQL engine from a glue connection #309
- Add
skip_header_line_count
argument towr.catalog.create_csv_table()
#338
Bug Fix
- Add missing type annotations and fix types in docstrings. #321
- KeyError: 'StatementType' with Athena using max_cache_seconds #323
wr.s3.read_csv()
slow with chunksize #324wr.s3.read_csv()
with "chunksize" does not forward pandas_kwargs "encoding" #330- Ensure DataFrame mutability for
wr.athane.read_sql_*()
w/ctas_approach=True
#335
Docs
- Several small updates.
Thanks
We thank the following contributors/users for their work on this release:
@kylepierce, @davidszotten, @meganburger, @erikcw, @JPFrancoia, @zacharycarter, @DavideBossoli88, @c-line, @anand086, @jasadams, @mrtns, @schot, @koiker, @flaviomax, @bryanyang0528, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!
AWS Data Wrangler 1.6.3
New Functionalities
- Add
wr.catalog.get_partitions()
. #305
Enhancements
- Improving Decimal casting.
Bug Fix
- Fix support for support for boto3 >= 1.14.18. 🐞 #315
Docs
- Add Spark Table Interoperability tutorial.
- General small updates.
Thanks
We thank the following contributors/users for their work on this release:
@jasadams, @bryanyang0528, @qemtek, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!
AWS Data Wrangler 1.6.2
Enhancements
- Now casting columns before append on an existing table only if necessary (
wr.s3.to_parquet()
). - Add retry mechanism for InternalError on s3 object deletion.
- Add handling of immutable numpy arrays. (
flag.writeable==False
)
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.6.1
Enhancements
- Casting support for any column type to string using
dtype
argument onwr.s3.to_parquet()
Bug Fix
- General bugs related to Athena Cache. 🐞
Docs
- General small updates.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.6.0
New Functionalities
- Amazon Athena CACHE 🚀 #285
- Initial AWS STS module
Enhancements
- Numpy 1.19.0
- Add
auto_create
anddb_groups
arguments toget_redshift_temp_engine
#288 - Add
validate_schema
arguments towr.s3.read_parquet_table
- Add
safe
argument toread_parquet
#296 - Refactor naming of pandas kwargs #291
- Allow providing suffix to s3.store_parquet_metadata #295
- Add
last_modified_begin
andlast_modified_begin
tolist_objects
,read_csv
,read_json
,read_fwf
andread_parquet
Bug Fix
- Fix bug on
get_table_description
on tables w/o description #294
Docs
- Add Athena cache tutorial.
Thanks
We thank the following contributors/users for their work on this release:
@koiker, @patrick-muller, @flaviomax, @acere, @jarretg, @bryanyang0528, @schrobot, @kinghuang, @igorborgest.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.5.0
New Functionalities
- Amazon QuickSight support! 🎉
- Add create/delete database on wr.glue
Enhancements
- General improvements in the tutorials
- New Amazon S3 path check
- Add
sanitize_columns
arg for s3.to_parquet and s3.to_csv #278 #279 - Remove memory copy of DataFrame for to_parquet and to_csv
Bug Fix
- Force index=False for wr.db.to_sql() with redshift
Thanks
We thank the following contributors/users for their work on this release:
@ywang103, @patrick-muller, @tuliocasagrande, @sarojdongol, @sdknij, @ilyanoskov, @igorborgest.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.4.0
New Functionalities
- Add support for reading CSV, JSON and FWF partitions. #265
Enhancements
- General improvement of moto tests
Bug Fix
- Fix
encoding
arg support for reading CSV, JSON and FWF. #271
Thanks
We thank the following contributors/users for their work on this release:
@bryanyang0528, @dwbelliston, @patrick-muller, @sdknij, @igorborgest.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.3.0
New Functionalities
- Support for Athena Partition Projection [TUTORIAL]
Enhancements
Bug Fix
- Fix
dtype
(cast) onwr.s3.to_parquet
with nested types #263 - Fix EMR utilities for others region different than
us-east-1
#252 - Fix
wr.s3.to_parquet
for partitions in reverse order #264
Thanks
We thank the following contributors/users for their work on this release:
@bryanyang0528, @zachmoshe, @buseynehannes, @jiajie999, @igorborgest.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).