- Improve S3 read performance by not copying buffer (PR #284, @aperiodic)
- accept bytearray and memoryview as input to write in s3 submodule (PR #293, @bmizhen-exos)
- Fix two S3 bugs (PR #307, @mpenkov)
- Minor fixes: bz2file dependency, paramiko warning handling (PR #309, @mpenkov)
- improve unit tests (PR #310, @mpenkov)
- Removed dependency on lzma (PR #262, @tdhopper)
- backward compatibility fixes (PR #294, @mpenkov)
- Minor fixes (PR #291, @mpenkov)
- Fix #289: the smart_open package now correctly exposes a
__version__
attribute - Fix #285: handle edge case with question marks in an S3 URL
This release rolls back support for transparently decompressing .xz files, previously introduced in 1.8.1. This is a useful feature, but it requires a tricky dependency. It's still possible to handle .xz files with relatively little effort. Please see the README.rst file for details.
- Added support for .xz / lzma (PR #262, @vmarkovtsev)
- Added streaming HTTP support (PR #236, @handsomezebra)
- Fix handling of "+" mode, refactor tests (PR #263, @vmarkovtsev)
- Added support for SSH/SCP/SFTP (PR #58, @val314159 & @mpenkov)
- Added new feature: compressor registry (PR #266, @mpenkov)
- Implemented new
smart_open.open
function (PR #268, @mpenkov)
This new function replaces smart_open.smart_open
, which is now deprecated.
Main differences:
- ignore_extension → ignore_ext
- new
transport_params
dict parameter to contain keyword parameters for the transport layer (S3, HTTPS, HDFS, etc).
Main advantages of the new function:
- Simpler interface for the user, less parameters
- Greater API flexibility: adding additional keyword arguments will no longer require updating the top-level interface
- Better documentation for keyword parameters (previously, they were documented via examples only)
The old smart_open.smart_open
function is deprecated, but continues to work as previously.
- Add
python3.7
support (PR #240, @menshikh-iv) - Add
http/https
schema correctly (PR #242, @gliv) - Fix url parsing for
S3
(PR #235, @rileypeterson) - Clean up
_parse_uri_s3x
, resolve edge cases (PR #237, @mpenkov) - Handle leading slash in local path edge case (PR #238, @mpenkov)
- Roll back README changes (PR #239, @mpenkov)
- Add example how to work with Digital Ocean spaces and boto profile (PR #248, @navado & @mpenkov)
- Fix boto fail to load gce plugin (PR #255, @menshikh-iv)
- Drop deprecated
sudo
from travis config (PR #256, @cclauss) - Raise
ValueError
if s3 key does not exist (PR #245, @adrpar) - Ensure
_list_bucket
uses continuation token for subsequent pages (PR #246, @tcsavage)
- Unpin boto/botocore for regular installation. Fix #227 (PR #232, @menshikh-iv)
- Drop support for
python3.3
andpython3.4
& workaround for brokenmoto
(PR #225, @menshikh-iv) - Add
s3a://
support forS3
. Fix #210 (PR #229, @mpenkov) - Allow use
@
in object (key) names forS3
. Fix #94 (PRs #204 & #224, @dkasyanov & @mpenkov) - Make
close
idempotent & add dummyflush
forS3
(PR #212, @mpenkov) - Use built-in
open
whenever possible. Fix #207 (PR #208, @mpenkov) - Fix undefined name
uri
insmart_open_lib.py
. Fix #213 (PR #214, @cclauss) - Fix new unittests from #212 (PR #219, @mpenkov)
- Reorganize README & make examples py2/py3 compatible (PR #211, @piskvorky)
- Migrate to
boto3
. Fix #43 (PR #164, @mpenkov) - Refactoring smart_open to share compression and encoding functionality (PR #185, @mpenkov)
- Drop
python2.6
compatibility. Fix #156 (PR #192, @mpenkov) - Accept a custom
boto3.Session
instance (support STS AssumeRole). Fix #130, #149, #199 (PR #201, @eschwartz) - Accept
multipart_upload
parameters (supports ServerSideEncryption) forS3
. Fix (PR #202, @eschwartz) - Add support for
pathlib.Path
. Fix #170 (PR #175, @clintval) - Fix performance regression using local file-system. Fix #184 (PR #190, @mpenkov)
- Replace
ParsedUri
class with functions, cleanup internal argument parsing (PR #191, @mpenkov) - Handle edge case (read 0 bytes) in read function. Fix #171 (PR #193, @mpenkov)
- Fix bug with changing
f._current_pos
when callf.readline()
(PR #182, @inksink) - Сlose the old body explicitly after
seek
forS3
. Fix #187 (PR #188, @inksink)
- Fix author/maintainer fields in
setup.py
, avoid bug fromsetuptools==39.0.0
and add workaround forbotocore
andpython==3.3
. Fix #176 (PR #178 & #177, @menshikh-iv & @baldwindc)
- Improve S3 read performance. Fix #152 (PR #157, @mpenkov)
- Add integration testing + benchmark with real S3. Partial fix #151, #156 (PR #158, @menshikh-iv & @mpenkov)
- Disable integration testing if secure vars isn't defined (PR #157, @menshikh-iv)
- Add naitive .gz support for HDFS (PR #128, @yupbank)
- Drop python2.6 support + fix style (PR #137, @menshikh-iv)
- Create separate compression-specific layer. Fix #91 (PR #131, @mpenkov)
- Fix ResourceWarnings + replace deprecated assertEquals (PR #140, @horpto)
- Add encoding parameter to smart_open. Fix #142 (PR #143, @mpenkov)
- Add encoding tests for readers. Fix #145, partial fix #146 (PR #147, @mpenkov)
- Fix file mode for updating case (PR #150, @menshikh-iv)
- Remove GET parameters from url. Fix #120 (PR #121, @mcrowson)
- Enable compressed formats over http. Avoid filehandle leak. Fix #109 and #110. (PR #112, @robottwo )
- Make possible to change number of retries (PR #102, @shaform)
- Bugfix for compressed formats (PR #110, @tmylk)
- HTTP/HTTPS read support w/ Kerberos (PR #107, @robottwo)
- HdfsOpenWrite implementation similar to read (PR #106, @skibaa)
- Support custom S3 server host, port, ssl. (PR #101, @robottwo)
- Add retry around
s3_iter_bucket_process_key
to address S3 Read Timeout errors. (PR #96, @bbbco) - Include tests data in sdist + install them. (PR #105, @cournape)
- Fix #92. Allow hash in filename (PR #93, @tmylk)
- Relative path support (PR #73, @yupbank)
- Move gzipstream module to smart_open package (PR #81, @mpenkov)
- Ensure reader objects never return None (PR #81, @mpenkov)
- Ensure read functions never return more bytes than asked for (PR #84, @mpenkov)
- Add support for reading gzipped objects until EOF, e.g. read() (PR #81, @mpenkov)
- Add missing parameter to read_from_buffer call (PR #84, @mpenkov)
- Add unit tests for gzipstream (PR #84, @mpenkov)
- Bundle gzipstream to enable streaming of gzipped content from S3 (PR #73, @mpenkov)
- Update gzipstream to avoid deep recursion (PR #73, @mpenkov)
- Implemented readline for S3 (PR #73, @mpenkov)
- Added pip requirements.txt (PR #73, @mpenkov)
- Invert NO_MULTIPROCESSING flag (PR #79, @Janrain-Colin)
- Add ability to add query to webhdfs uri. (PR #78, @ellimilial)
- Accept an instance of boto.s3.key.Key to smart_open (PR #38, @asieira)
- Allow passing
encrypt_key
and other parameters toinitiate_multipart_upload
(PR #63, @asieira) - Allow passing boto
host
andprofile_name
to smart_open (PR #71 #68, @robcowie) - Write an empty key to S3 even if nothing is written to S3OpenWrite (PR #61, @petedmarsh)
- Support
LC_ALL=C
environment variable setup (PR #40, @nikicc) - Python 3.5 support
- Bug fix release to enable 'wb+' file mode (PR #50)
- Disable multiprocessing if unavailable. Allows to run on Google Compute Engine. (PR #41, @nikicc)
- Httpretty updated to allow LC_ALL=C locale config. (PR #39, @jsphpl)
- Accept an instance of boto.s3.key.Key (PR #38, @asieira)
- WebHDFS read/write (PR #29, @ziky90)
- re-upload last S3 chunk in failed upload (PR #20, @andreycizov)
- return the entire key in s3_iter_bucket instead of only the key name (PR #22, @salilb)
- pass optional keywords on S3 write (PR #30, @val314159)
- smart_open a no-op if passed a file-like object with a read attribute (PR #32, @gojomo)
- various improvements to testing (PR #30, @val314159)
- support for multistream bzip files (PR #9, @pombredanne)
- introduce this CHANGELOG