Skip to content

Commit 01011c2

Browse files
authored
Merge branch 'master' into master
2 parents 7fd79e3 + 590d9a5 commit 01011c2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

95 files changed

+10416
-2905
lines changed

.coveragerc

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[run]
2+
branch = True
3+
source = html5lib
4+
5+
[paths]
6+
source =
7+
html5lib
8+
.tox/*/lib/python*/site-packages/html5lib

.gitignore

+75-13
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,82 @@
1-
# Because we never want compiled Python
1+
# Copyright (c) 2014 GitHub, Inc.
2+
#
3+
# Permission is hereby granted, free of charge, to any person obtaining a
4+
# copy of this software and associated documentation files (the "Software"),
5+
# to deal in the Software without restriction, including without limitation
6+
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
7+
# and/or sell copies of the Software, and to permit persons to whom the
8+
# Software is furnished to do so, subject to the following conditions:
9+
#
10+
# The above copyright notice and this permission notice shall be included in
11+
# all copies or substantial portions of the Software.
12+
#
13+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
18+
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
19+
# DEALINGS IN THE SOFTWARE.
20+
21+
# Byte-compiled / optimized / DLL files
222
__pycache__/
3-
*.pyc
23+
*.py[cod]
24+
*$py.class
425

5-
# Ignore stuff produced by distutils
6-
/build/
7-
/dist/
8-
/MANIFEST
26+
# C extensions
27+
*.so
928

10-
# Generated by parse.py -p
11-
stats.prof
29+
# Distribution / packaging
30+
.Python
31+
env/
32+
build/
33+
develop-eggs/
34+
dist/
35+
downloads/
36+
eggs/
37+
.eggs/
38+
lib/
39+
lib64/
40+
parts/
41+
sdist/
42+
var/
43+
*.egg-info/
44+
.installed.cfg
45+
*.egg
46+
MANIFEST
47+
48+
# PyInstaller
49+
# Usually these files are written by a python script from a template
50+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
51+
*.manifest
52+
*.spec
53+
54+
# Installer logs
55+
pip-log.txt
56+
pip-delete-this-directory.txt
1257

13-
# From cover (esp. in combination with nose)
58+
# Unit test / coverage reports
59+
htmlcov/
60+
.tox/
1461
.coverage
62+
.coverage.*
63+
.cache
64+
nosetests.xml
65+
coverage.xml
66+
*,cover
1567

16-
# Because tox's data is inherently local
17-
/.tox/
68+
# Translations
69+
*.mo
70+
*.pot
1871

19-
# We have no interest in built Sphinx files
20-
/doc/_build
72+
# Django stuff:
73+
*.log
74+
75+
# Sphinx documentation
76+
doc/_build/
77+
78+
# PyBuilder
79+
target/
80+
81+
# Generated by parse.py -p
82+
stats.prof

.prospector.yaml

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
strictness: veryhigh
2+
doc-warnings: false
3+
test-warnings: false
4+
5+
max-line-length: 139
6+
7+
requirements:
8+
- requirements.txt
9+
- requirements-test.txt
10+
- requirements-optional.txt
11+
12+
ignore-paths:
13+
- parse.py
14+
- utils/
15+
16+
python-targets:
17+
- 2
18+
- 3
19+
20+
mccabe:
21+
run: false

.pylintrc

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
[MASTER]
2+
ignore=tests
3+
4+
[MESSAGES CONTROL]
5+
# messages up to fixme should probably be fixed somehow
6+
disable = redefined-builtin,attribute-defined-outside-init,anomalous-backslash-in-string,no-self-use,redefined-outer-name,bad-continuation,wrong-import-order,superfluous-parens,no-member,duplicate-code,super-init-not-called,abstract-method,property-on-old-class,wrong-import-position,no-name-in-module,no-init,bad-mcs-classmethod-argument,bad-classmethod-argument,fixme,invalid-name,import-error,too-few-public-methods,too-many-ancestors,too-many-arguments,too-many-boolean-expressions,too-many-branches,too-many-instance-attributes,too-many-locals,too-many-lines,too-many-public-methods,too-many-return-statements,too-many-statements,missing-docstring,line-too-long,locally-disabled,locally-enabled,bad-builtin,deprecated-lambda
7+
8+
[FORMAT]
9+
max-line-length=139
10+
single-line-if-stmt=no

.pytest.expect

+1,322
Large diffs are not rendered by default.

.travis.yml

+12-18
Original file line numberDiff line numberDiff line change
@@ -2,36 +2,30 @@ language: python
22
python:
33
- "2.6"
44
- "2.7"
5-
- "3.2"
65
- "3.3"
76
- "3.4"
7+
- "3.5"
88
- "pypy"
99

10+
sudo: false
11+
12+
cache: pip
13+
1014
env:
1115
- USE_OPTIONAL=true
1216
- USE_OPTIONAL=false
13-
14-
matrix:
15-
exclude:
16-
- python: "2.7"
17-
env: USE_OPTIONAL=false
18-
- python: "3.4"
19-
env: USE_OPTIONAL=false
20-
include:
21-
- python: "2.7"
22-
env: USE_OPTIONAL=false FLAKE=true
23-
- python: "3.4"
24-
env: USE_OPTIONAL=false FLAKE=true
25-
26-
before_install:
27-
- git submodule update --init --recursive
17+
- SIX_VERSION=1.9 USE_OPTIONAL=true
2818

2919
install:
30-
- bash requirements-install.sh
20+
- ./requirements-install.sh
3121

3222
script:
33-
- nosetests
23+
- if [[ $TRAVIS_PYTHON_VERSION == pypy* ]]; then py.test; fi
24+
- if [[ $TRAVIS_PYTHON_VERSION != pypy* ]]; then coverage run -m pytest; fi
3425
- bash flake8-run.sh
3526

3627
after_script:
3728
- python debug-info.py
29+
30+
after_success:
31+
- if [[ $TRAVIS_PYTHON_VERSION != pypy* ]]; then coverage combine && codecov; fi

AUTHORS.rst

+15-5
Original file line numberDiff line numberDiff line change
@@ -16,20 +16,30 @@ Patches and suggestions
1616
- Lachlan Hunt
1717
- lantis63
1818
- Sam Ruby
19-
- Tim Fletcher
2019
- Thomas Broyer
20+
- Tim Fletcher
2121
- Mark Pilgrim
22-
- Philip Taylor
2322
- Ryan King
23+
- Philip Taylor
2424
- Edward Z. Yang
2525
- fantasai
26+
- Mike West
2627
- Philip Jägenstedt
2728
- Ms2ger
29+
- Mohammad Taha Jahangir
2830
- Andy Wingo
31+
- Juan Carlos Garcia Segovia
2932
- Andreas Madsack
3033
- Karim Valiev
31-
- Mohammad Taha Jahangir
32-
- Juan Carlos Garcia Segovia
33-
- Mike West
3434
- Marc DM
3535
- Ritwik Gupta
36+
- Tony Lopes
37+
- lilbludevil
38+
- Simon Sapin
39+
- Jon Dufresne
40+
- Drew Hubl
41+
- Austin Kumbera
42+
- Jim Baker
43+
- Michael[tm] Smith
44+
- Marc Abramowitz
45+
- Jon Dufresne

CHANGES.rst

+127-5
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,134 @@
11
Change Log
22
----------
33

4-
0.9999
5-
~~~~~~
4+
* Added the seamless attribute for iframes.
5+
6+
0.999999999/1.0b10
7+
~~~~~~~~~~~~~~~~~~
8+
9+
Released on July 15, 2016
10+
11+
* Fix attribute order going to the tree builder to be document order
12+
instead of reverse document order(!).
13+
14+
15+
0.99999999/1.0b9
16+
~~~~~~~~~~~~~~~~
17+
18+
Released on July 14, 2016
19+
20+
* **Added ordereddict as a mandatory dependency on Python 2.6.**
21+
22+
* Added ``lxml``, ``genshi``, ``datrie``, ``charade``, and ``all``
23+
extras that will do the right thing based on the specific
24+
interpreter implementation.
25+
26+
* Now requires the ``mock`` package for the testsuite.
27+
28+
* Cease supporting DATrie under PyPy.
29+
30+
* **Remove ``PullDOM`` support, as this hasn't ever been properly
31+
tested, doesn't entirely work, and as far as I can tell is
32+
completely unused by anyone.**
33+
34+
* Move testsuite to ``py.test``.
35+
36+
* **Fix #124: move to webencodings for decoding the input byte stream;
37+
this makes html5lib compliant with the Encoding Standard, and
38+
introduces a required dependency on webencodings.**
39+
40+
* **Cease supporting Python 3.2 (in both CPython and PyPy forms).**
41+
42+
* **Fix comments containing double-dash with lxml 3.5 and above.**
43+
44+
* **Use scripting disabled by default (as we don't implement
45+
scripting).**
46+
47+
* **Fix #11, avoiding the XSS bug potentially caused by serializer
48+
allowing attribute values to be escaped out of in old browser versions,
49+
changing the quote_attr_values option on serializer to take one of
50+
three values, "always" (the old True value), "legacy" (the new option,
51+
and the new default), and "spec" (the old False value, and the old
52+
default).**
53+
54+
* **Fix #72 by rewriting the sanitizer to apply only to treewalkers
55+
(instead of the tokenizer); as such, this will require amending all
56+
callers of it to use it via the treewalker API.**
57+
58+
* **Drop support of charade, now that chardet is supported once more.**
59+
60+
* **Replace the charset keyword argument on parse and related methods
61+
with a set of keyword arguments: override_encoding, transport_encoding,
62+
same_origin_parent_encoding, likely_encoding, and default_encoding.**
63+
64+
* **Move filters._base, treebuilder._base, and treewalkers._base to .base
65+
to clarify their status as public.**
66+
67+
* **Get rid of the sanitizer package. Merge sanitizer.sanitize into the
68+
sanitizer.htmlsanitizer module and move that to saniziter. This means
69+
anyone who used sanitizer.sanitize or sanitizer.HTMLSanitizer needs no
70+
code changes.**
71+
72+
* **Rename treewalkers.lxmletree to .etree_lxml and
73+
treewalkers.genshistream to .genshi to have a consistent API.**
74+
75+
* Move a whole load of stuff (inputstream, ihatexml, trie, tokenizer,
76+
utils) to be underscore prefixed to clarify their status as private.
77+
78+
79+
0.9999999/1.0b8
80+
~~~~~~~~~~~~~~~
81+
82+
Released on September 10, 2015
83+
84+
* Fix #195: fix the sanitizer to drop broken URLs (it threw an
85+
exception between 0.9999 and 0.999999).
86+
87+
88+
0.999999/1.0b7
89+
~~~~~~~~~~~~~~
90+
91+
Released on July 7, 2015
92+
93+
* Fix #189: fix the sanitizer to allow relative URLs again (as it did
94+
prior to 0.9999/1.0b5).
95+
96+
97+
0.99999/1.0b6
98+
~~~~~~~~~~~~~
99+
100+
Released on April 30, 2015
101+
102+
* Fix #188: fix the sanitizer to not throw an exception when sanitizing
103+
bogus data URLs.
104+
105+
106+
0.9999/1.0b5
107+
~~~~~~~~~~~~
108+
109+
Released on April 29, 2015
110+
111+
* Fix #153: Sanitizer fails to treat some attributes as URLs. Despite how
112+
this sounds, this has no known security implications. No known version
113+
of IE (5.5 to current), Firefox (3 to current), Safari (6 to current),
114+
Chrome (1 to current), or Opera (12 to current) will run any script
115+
provided in these attributes.
116+
117+
* Pass error message to the ParseError exception in strict parsing mode.
118+
119+
* Allow data URIs in the sanitizer, with a whitelist of content-types.
120+
121+
* Add support for Python implementations that don't support lone
122+
surrogates (read: Jython). Fixes #2.
123+
124+
* Remove localization of error messages. This functionality was totally
125+
unused (and untested that everything was localizable), so we may as
126+
well follow numerous browsers in not supporting translating technical
127+
strings.
6128

7-
Released on XXX, 2014
129+
* Expose treewalkers.pprint as a public API.
8130

9-
* Fix #XXX: added the seamless attribute for iframes.
131+
* Add a documentEncoding property to HTML5Parser, fix #121.
10132

11133

12134
0.999
@@ -126,7 +248,7 @@ Released on May 17, 2013
126248

127249
* Test harness has been improved and now depends on ``nose``.
128250

129-
* Documentation updated and moved to http://html5lib.readthedocs.org/.
251+
* Documentation updated and moved to https://html5lib.readthedocs.io/.
130252

131253

132254
0.95

MANIFEST.in

+4
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
include LICENSE
2+
include AUTHORS.rst
23
include CHANGES.rst
34
include README.rst
45
include requirements*.txt
6+
include .pytest.expect
7+
include tox.ini
8+
include pytest.ini
59
graft html5lib/tests/testdata
610
recursive-include html5lib/tests *.py

0 commit comments

Comments
 (0)