Skip to content

Commit ade1bd3

Browse files
committed
Document surrogateescape support and enable it for bytes decoding (issue #116)
1 parent 00d5149 commit ade1bd3

File tree

9 files changed

+76
-56
lines changed

9 files changed

+76
-56
lines changed

README.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,9 @@ Features
5353
``past.utils`` selected from Py2/3 compatibility interfaces from projects
5454
like ``six``, ``IPython``, ``Jinja2``, ``Django``, and ``Pandas``.
5555

56+
- partial support for the ``surrogateescape`` error handler when encoding and
57+
decoding the backported ``str`` and ``bytes`` objects. (This is currently
58+
in alpha.)
5659

5760
.. _code-examples:
5861

@@ -152,7 +155,7 @@ interface works like this:
152155
# Then, for example:
153156
from itertools import filterfalse, zip_longest
154157
from urllib.request import urlopen
155-
from collections import Counter, OrderedDict # backported to Py2.6
158+
from collections import Counter, OrderedDict, ChainMap # backported to Py2.6
156159
from collections import UserDict, UserList, UserString
157160
from subprocess import getoutput, getstatusoutput
158161

docs/bytes_object.rst

Lines changed: 15 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -66,26 +66,19 @@ code incompatibilities caused by the many differences between Py3 bytes
6666
and Py2 strings.
6767

6868

69-
..
70-
.. _bytes-test-results:
71-
72-
bytes test results
73-
~~~~~~~~~~~~~~~~~~
74-
75-
For reference, when using Py2's default :class:`bytes` (i.e.
76-
:class:`str`), running the ``bytes`` unit tests from Python 3.3's
77-
``test_bytes.py`` on Py2 (after fixing imports) gives this::
78-
79-
--------------------------------------------------------------
80-
Ran 203 tests in 0.209s
81-
82-
FAILED (failures=31, errors=55, skipped=1)
83-
--------------------------------------------------------------
84-
85-
Using :mod:`future`'s backported :class:`bytes` object passes most of
86-
the same Python 3.3 tests on Py2, except those requiring specific
87-
wording in exception messages.
88-
89-
See ``future/tests/test_bytes.py`` in the source for the actual set
90-
of unit tests that are actually run.
69+
surrogateescape
70+
~~~~~~~~~~~~~~~
71+
72+
The :class:`bytes` type from :mod:`builtins` also provides support for the
73+
``surrogateescape`` error handler on Python 2.x. Here is an example that works
74+
identically on Python 2.x and 3.x::
9175

76+
>>> from builtins import bytes
77+
>>> b = bytes(b'\xff')
78+
>>> b.decode('utf-8', 'surrogateescape')
79+
'\udcc3'
80+
81+
This feature is in alpha. Please leave feedback `here
82+
<https://github.com/PythonCharmers/python-future/issues>`_ about whether this
83+
works for you.
84+

docs/faq.rst

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -265,13 +265,6 @@ definitions) that greatly reduce the maintenance burden for single-source
265265
Py2/3 compatible code. ``future`` leverages these features and aims to
266266
close the remaining gap between Python 3 and 2.6 / 2.7.
267267

268-
Python 2.6 does not offer the following features which help with Py3
269-
compatibility:
270-
- ``surrogateescape`` error handler for string encoding or decoding;
271-
- ``memoryview`` objects.
272-
273-
Otherwise Python 2.6 is mostly supported.
274-
275268
Python 3.2 could perhaps be supported too, although the illegal unicode
276269
literal ``u'...'`` syntax may be inconvenient to work around. The Py3.2
277270
userbase is very small, however. Please let us know via GitHub `issue #29

docs/str_object.rst

Lines changed: 15 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -84,21 +84,19 @@ same behaviours as Python 3's :class:`str`::
8484
>>> assert list(s) == ['A', 'B', 'C', 'D']
8585
>>> assert s.split('B') == ['A', 'CD']
8686

87-
.. If you must ensure identical use of (unicode) strings across Py3 and Py2 in a
88-
.. single-source codebase, you can wrap string literals in a :func:`~str` call,
89-
.. as follows::
90-
..
91-
.. from __future__ import unicode_literals
92-
.. from future.builtins import *
93-
..
94-
.. # ...
95-
..
96-
.. s = str('This absolutely must behave like a Py3 string')
97-
..
98-
.. # ...
99-
..
100-
.. Most of the time this is unnecessary, but the stricter type-checking of the
101-
.. ``future.builtins.str`` object is useful for ensuring the same consistent
102-
.. separation between unicode and byte strings on Py2 as on Py3. This is
103-
.. important when writing protocol handlers, for example.
87+
surrogateescape
88+
~~~~~~~~~~~~~~~
89+
90+
The :class:`str` type from :mod:`builtins` also provides support for the
91+
``surrogateescape`` error handler on Python 2.x. Here is an example that works
92+
identically on Python 2.x and 3.x::
93+
94+
>>> from builtins import str
95+
>>> s = str(u'\udcff')
96+
>>> s.encode('utf-8', 'surrogateescape')
97+
b'\xff'
98+
99+
This feature is in alpha. Please leave feedback `here
100+
<https://github.com/PythonCharmers/python-future/issues>`_ about whether this
101+
works for you.
104102

docs/whatsnew.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ New features:
2525
- Backport of ``itertools.count`` for Py2.6 (issue #152)
2626
- Add constants to ``http.client`` such as ``HTTP_PORT`` and ``BAD_REQUEST`` (issue #137)
2727
- Backport of ``reprlib.recursive_repr`` to Py2
28+
- Enable support for the ``surrogateescape`` error handler for ``newstr`` and ``newbytes`` objects on Py2.x (issue #116). This feature is currently in alpha.
2829

2930
Bug fixes:
3031

src/future/types/newbytes.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,11 @@ def decode(self, encoding='utf-8', errors='strict'):
201201
# not keyword arguments as in Python 3 str.
202202

203203
from future.types.newstr import newstr
204+
205+
if errors == 'surrogateescape':
206+
from future.utils.surrogateescape import register_surrogateescape
207+
register_surrogateescape()
208+
204209
return newstr(super(newbytes, self).decode(encoding, errors))
205210

206211
# This is currently broken:

src/future/utils/surrogateescape.py

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -186,14 +186,15 @@ def register_surrogateescape():
186186
codecs.register_error(FS_ERRORS, surrogateescape_handler)
187187

188188

189-
if True:
190-
# Tests:
191-
register_surrogateescape()
192-
193-
b = decodefilename(fn)
194-
assert b == encoded, "%r != %r" % (b, encoded)
195-
c = encodefilename(b)
196-
assert c == fn, '%r != %r' % (c, fn)
197-
# print("ok")
189+
if __name__ == '__main__':
190+
pass
191+
# # Tests:
192+
# register_surrogateescape()
193+
194+
# b = decodefilename(fn)
195+
# assert b == encoded, "%r != %r" % (b, encoded)
196+
# c = encodefilename(b)
197+
# assert c == fn, '%r != %r' % (c, fn)
198+
# # print("ok")
198199

199200

tests/test_future/test_bytes.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -627,6 +627,19 @@ class MetaClass(type):
627627
class TestClass(with_metaclass(MetaClass, bytes)):
628628
pass
629629

630+
def test_surrogateescape_decoding(self):
631+
"""
632+
Tests whether surrogateescape decoding works correctly.
633+
"""
634+
pairs = [(u'\udcc3', b'\xc3'),
635+
(u'\udcff', b'\xff')]
636+
637+
for (s, b) in pairs:
638+
decoded = bytes(b).decode('utf-8', 'surrogateescape')
639+
self.assertEqual(s, decoded)
640+
self.assertTrue(isinstance(decoded, str))
641+
self.assertEqual(b, decoded.encode('utf-8', 'surrogateescape'))
642+
630643

631644
if __name__ == '__main__':
632645
unittest.main()

tests/test_future/test_str.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -551,6 +551,19 @@ class MetaClass(type):
551551
class TestClass(with_metaclass(MetaClass, str)):
552552
pass
553553

554+
def test_surrogateescape_encoding(self):
555+
"""
556+
Tests whether surrogateescape encoding works correctly.
557+
"""
558+
pairs = [(u'\udcc3', b'\xc3'),
559+
(u'\udcff', b'\xff')]
560+
561+
for (s, b) in pairs:
562+
encoded = str(s).encode('utf-8', 'surrogateescape')
563+
self.assertEqual(b, encoded)
564+
self.assertTrue(isinstance(encoded, bytes))
565+
self.assertEqual(s, encoded.decode('utf-8', 'surrogateescape'))
566+
554567

555568
if __name__ == '__main__':
556569
unittest.main()

0 commit comments

Comments
 (0)