Skip to content

Commit de6f520

Browse files
committed
tests: Add htmldocck.py script for the use of Rustdoc tests.
The script is intended as a tool for doing every sort of verifications amenable to Rustdoc's HTML output. For example, link checkers would go to this script. It already parses HTML into a document tree form (with a slight caveat), so future tests can make use of it. As an example, relevant `rustdoc-*` run-make tests have been updated to use `htmldocck.py` and got their `verify.sh` removed. In the future they may go to a dedicated directory with htmldocck running by default. The detailed explanation of test scripts is provided as a docstring of htmldocck. cc #19723
1 parent ee2bfae commit de6f520

File tree

16 files changed

+350
-99
lines changed

16 files changed

+350
-99
lines changed

mk/tests.mk

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1011,7 +1011,8 @@ $(3)/test/run-make/%-$(1)-T-$(2)-H-$(3).ok: \
10111011
$$(LD_LIBRARY_PATH_ENV_NAME$(1)_T_$(2)_H_$(3)) \
10121012
"$$(LD_LIBRARY_PATH_ENV_HOSTDIR$(1)_T_$(2)_H_$(3))" \
10131013
"$$(LD_LIBRARY_PATH_ENV_TARGETDIR$(1)_T_$(2)_H_$(3))" \
1014-
$(1)
1014+
$(1) \
1015+
$$(S)
10151016
@touch $$@
10161017
else
10171018
# FIXME #11094 - The above rule doesn't work right for multiple targets

src/etc/htmldocck.py

Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
# Copyright 2015 The Rust Project Developers. See the COPYRIGHT
2+
# file at the top-level directory of this distribution and at
3+
# http://rust-lang.org/COPYRIGHT.
4+
#
5+
# Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
6+
# http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
7+
# <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
8+
# option. This file may not be copied, modified, or distributed
9+
# except according to those terms.
10+
11+
r"""
12+
htmldocck.py is a custom checker script for Rustdoc HTML outputs.
13+
14+
# How and why?
15+
16+
The principle is simple: This script receives a path to generated HTML
17+
documentation and a "template" script, which has a series of check
18+
commands like `@has` or `@matches`. Each command can be used to check if
19+
some pattern is present or not present in the particular file or in
20+
the particular node of HTML tree. In many cases, the template script
21+
happens to be a source code given to rustdoc.
22+
23+
While it indeed is possible to test in smaller portions, it has been
24+
hard to construct tests in this fashion and major rendering errors were
25+
discovered much later. This script is designed for making the black-box
26+
and regression testing of Rustdoc easy. This does not preclude the needs
27+
for unit testing, but can be used to complement related tests by quickly
28+
showing the expected renderings.
29+
30+
In order to avoid one-off dependencies for this task, this script uses
31+
a reasonably working HTML parser and the existing XPath implementation
32+
from Python 2's standard library. Hopefully we won't render
33+
non-well-formed HTML.
34+
35+
# Commands
36+
37+
Commands start with an `@` followed by a command name (letters and
38+
hyphens), and zero or more arguments separated by one or more whitespace
39+
and optionally delimited with single or double quotes. The `@` mark
40+
cannot be preceded by a non-whitespace character. Other lines (including
41+
every text up to the first `@`) are ignored, but it is recommended to
42+
avoid the use of `@` in the template file.
43+
44+
There are a number of supported commands:
45+
46+
* `@has PATH` checks for the existence of given file.
47+
48+
`PATH` is relative to the output directory. It can be given as `-`
49+
which repeats the most recently used `PATH`.
50+
51+
* `@has PATH PATTERN` and `@matches PATH PATTERN` checks for
52+
the occurrence of given `PATTERN` in the given file. Only one
53+
occurrence of given pattern is enough.
54+
55+
For `@has`, `PATTERN` is a whitespace-normalized (every consecutive
56+
whitespace being replaced by one single space character) string.
57+
The entire file is also whitespace-normalized including newlines.
58+
59+
For `@matches`, `PATTERN` is a Python-supported regular expression.
60+
The file remains intact but the regexp is matched with no `MULTILINE`
61+
and `IGNORECASE` option. You can still use a prefix `(?m)` or `(?i)`
62+
to override them, and `\A` and `\Z` for definitely matching
63+
the beginning and end of the file.
64+
65+
(The same distinction goes to other variants of these commands.)
66+
67+
* `@has PATH XPATH PATTERN` and `@matches PATH XPATH PATTERN` checks for
68+
the presence of given `XPATH` in the given HTML file, and also
69+
the occurrence of given `PATTERN` in the matching node or attribute.
70+
Only one occurrence of given pattern in the match is enough.
71+
72+
`PATH` should be a valid and well-formed HTML file. It does *not*
73+
accept arbitrary HTML5; it should have matching open and close tags
74+
and correct entity references at least.
75+
76+
`XPATH` is an XPath expression to match. This is fairly limited:
77+
`tag`, `*`, `.`, `//`, `..`, `[@attr]`, `[@attr='value']`, `[tag]`,
78+
`[POS]` (element located in given `POS`), `[last()-POS]`, `text()`
79+
and `@attr` (both as the last segment) are supported. Some examples:
80+
81+
- `//pre` or `.//pre` matches any element with a name `pre`.
82+
- `//a[@href]` matches any element with an `href` attribute.
83+
- `//*[@class="impl"]//code` matches any element with a name `code`,
84+
which is an ancestor of some element which `class` attr is `impl`.
85+
- `//h1[@class="fqn"]/span[1]/a[last()]/@class` matches a value of
86+
`class` attribute in the last `a` element (can be followed by more
87+
elements that are not `a`) inside the first `span` in the `h1` with
88+
a class of `fqn`. Note that there cannot be no additional elements
89+
between them due to the use of `/` instead of `//`.
90+
91+
Do not try to use non-absolute paths, it won't work due to the flawed
92+
ElementTree implementation. The script rejects them.
93+
94+
For the text matches (i.e. paths not ending with `@attr`), any
95+
subelements are flattened into one string; this is handy for ignoring
96+
highlights for example. If you want to simply check the presence of
97+
given node or attribute, use an empty string (`""`) as a `PATTERN`.
98+
99+
All conditions can be negated with `!`. `@!has foo/type.NoSuch.html`
100+
checks if the given file does not exist, for example.
101+
102+
"""
103+
104+
import sys
105+
import os.path
106+
import re
107+
import shlex
108+
from collections import namedtuple
109+
from HTMLParser import HTMLParser
110+
from xml.etree import cElementTree as ET
111+
112+
# &larrb;/&rarrb; are not in HTML 4 but are in HTML 5
113+
from htmlentitydefs import entitydefs
114+
entitydefs['larrb'] = u'\u21e4'
115+
entitydefs['rarrb'] = u'\u21e5'
116+
117+
# "void elements" (no closing tag) from the HTML Standard section 12.1.2
118+
VOID_ELEMENTS = set(['area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'keygen',
119+
'link', 'menuitem', 'meta', 'param', 'source', 'track', 'wbr'])
120+
121+
# simplified HTML parser.
122+
# this is possible because we are dealing with very regular HTML from rustdoc;
123+
# we only have to deal with i) void elements and ii) empty attributes.
124+
class CustomHTMLParser(HTMLParser):
125+
def __init__(self, target=None):
126+
HTMLParser.__init__(self)
127+
self.__builder = target or ET.TreeBuilder()
128+
def handle_starttag(self, tag, attrs):
129+
attrs = dict((k, v or '') for k, v in attrs)
130+
self.__builder.start(tag, attrs)
131+
if tag in VOID_ELEMENTS: self.__builder.end(tag)
132+
def handle_endtag(self, tag):
133+
self.__builder.end(tag)
134+
def handle_startendtag(self, tag, attrs):
135+
attrs = dict((k, v or '') for k, v in attrs)
136+
self.__builder.start(tag, attrs)
137+
self.__builder.end(tag)
138+
def handle_data(self, data):
139+
self.__builder.data(data)
140+
def handle_entityref(self, name):
141+
self.__builder.data(entitydefs[name])
142+
def handle_charref(self, name):
143+
code = int(name[1:], 16) if name.startswith(('x', 'X')) else int(name, 10)
144+
self.__builder.data(unichr(code).encode('utf-8'))
145+
def close(self):
146+
HTMLParser.close(self)
147+
return self.__builder.close()
148+
149+
Command = namedtuple('Command', 'negated cmd args lineno')
150+
151+
LINE_PATTERN = re.compile(r'(?<=(?<!\S)@)(?P<negated>!?)(?P<cmd>[A-Za-z]+(?:-[A-Za-z]+)*)(?P<args>.*)$')
152+
def get_commands(template):
153+
with open(template, 'rUb') as f:
154+
for lineno, line in enumerate(f):
155+
m = LINE_PATTERN.search(line.rstrip('\r\n'))
156+
if not m: continue
157+
158+
negated = (m.group('negated') == '!')
159+
cmd = m.group('cmd')
160+
args = m.group('args')
161+
if args and not args[:1].isspace():
162+
raise RuntimeError('Invalid template syntax at line {}'.format(lineno+1))
163+
args = shlex.split(args)
164+
yield Command(negated=negated, cmd=cmd, args=args, lineno=lineno+1)
165+
166+
def _flatten(node, acc):
167+
if node.text: acc.append(node.text)
168+
for e in node:
169+
_flatten(e, acc)
170+
if e.tail: acc.append(e.tail)
171+
172+
def flatten(node):
173+
acc = []
174+
_flatten(node, acc)
175+
return ''.join(acc)
176+
177+
def normalize_xpath(path):
178+
if path.startswith('//'):
179+
return '.' + path # avoid warnings
180+
elif path.startswith('.//'):
181+
return path
182+
else:
183+
raise RuntimeError('Non-absolute XPath is not supported due to \
184+
the implementation issue.')
185+
186+
class CachedFiles(object):
187+
def __init__(self, root):
188+
self.root = root
189+
self.files = {}
190+
self.trees = {}
191+
self.last_path = None
192+
193+
def resolve_path(self, path):
194+
if path != '-':
195+
path = os.path.normpath(path)
196+
self.last_path = path
197+
return path
198+
elif self.last_path is None:
199+
raise RuntimeError('Tried to use the previous path in the first command')
200+
else:
201+
return self.last_path
202+
203+
def get_file(self, path):
204+
path = self.resolve_path(path)
205+
try:
206+
return self.files[path]
207+
except KeyError:
208+
try:
209+
with open(os.path.join(self.root, path)) as f:
210+
data = f.read()
211+
except Exception as e:
212+
raise RuntimeError('Cannot open file {!r}: {}'.format(path, e))
213+
else:
214+
self.files[path] = data
215+
return data
216+
217+
def get_tree(self, path):
218+
path = self.resolve_path(path)
219+
try:
220+
return self.trees[path]
221+
except KeyError:
222+
try:
223+
f = open(os.path.join(self.root, path))
224+
except Exception as e:
225+
raise RuntimeError('Cannot open file {!r}: {}'.format(path, e))
226+
try:
227+
with f:
228+
tree = ET.parse(f, CustomHTMLParser())
229+
except Exception as e:
230+
raise RuntimeError('Cannot parse an HTML file {!r}: {}'.format(path, e))
231+
else:
232+
self.trees[path] = tree
233+
return self.trees[path]
234+
235+
def check_string(data, pat, regexp):
236+
if not pat:
237+
return True # special case a presence testing
238+
elif regexp:
239+
return re.search(pat, data) is not None
240+
else:
241+
data = ' '.join(data.split())
242+
pat = ' '.join(pat.split())
243+
return pat in data
244+
245+
def check_tree_attr(tree, path, attr, pat, regexp):
246+
path = normalize_xpath(path)
247+
ret = False
248+
for e in tree.findall(path):
249+
try:
250+
value = e.attrib[attr]
251+
except KeyError:
252+
continue
253+
else:
254+
ret = check_string(value, pat, regexp)
255+
if ret: break
256+
return ret
257+
258+
def check_tree_text(tree, path, pat, regexp):
259+
path = normalize_xpath(path)
260+
ret = False
261+
for e in tree.findall(path):
262+
try:
263+
value = flatten(e)
264+
except KeyError:
265+
continue
266+
else:
267+
ret = check_string(value, pat, regexp)
268+
if ret: break
269+
return ret
270+
271+
def check(target, commands):
272+
cache = CachedFiles(target)
273+
for c in commands:
274+
if c.cmd == 'has' or c.cmd == 'matches': # string test
275+
regexp = (c.cmd == 'matches')
276+
if len(c.args) == 1 and not regexp: # @has <path> = file existence
277+
try:
278+
cache.get_file(c.args[0])
279+
ret = True
280+
except RuntimeError:
281+
ret = False
282+
elif len(c.args) == 2: # @has/matches <path> <pat> = string test
283+
ret = check_string(cache.get_file(c.args[0]), c.args[1], regexp)
284+
elif len(c.args) == 3: # @has/matches <path> <pat> <match> = XML tree test
285+
tree = cache.get_tree(c.args[0])
286+
pat, sep, attr = c.args[1].partition('/@')
287+
if sep: # attribute
288+
ret = check_tree_attr(cache.get_tree(c.args[0]), pat, attr, c.args[2], regexp)
289+
else: # normalized text
290+
pat = c.args[1]
291+
if pat.endswith('/text()'): pat = pat[:-7]
292+
ret = check_tree_text(cache.get_tree(c.args[0]), pat, c.args[2], regexp)
293+
else:
294+
raise RuntimeError('Invalid number of @{} arguments \
295+
at line {}'.format(c.cmd, c.lineno))
296+
297+
elif c.cmd == 'valid-html':
298+
raise RuntimeError('Unimplemented @valid-html at line {}'.format(c.lineno))
299+
300+
elif c.cmd == 'valid-links':
301+
raise RuntimeError('Unimplemented @valid-links at line {}'.format(c.lineno))
302+
303+
else:
304+
raise RuntimeError('Unrecognized @{} at line {}'.format(c.cmd, c.lineno))
305+
306+
if ret == c.negated:
307+
raise RuntimeError('@{}{} check failed at line {}'.format('!' if c.negated else '',
308+
c.cmd, c.lineno))
309+
310+
if __name__ == '__main__':
311+
if len(sys.argv) < 3:
312+
print >>sys.stderr, 'Usage: {} <doc dir> <template>'.format(sys.argv[0])
313+
raise SystemExit(1)
314+
else:
315+
check(sys.argv[1], get_commands(sys.argv[2]))
316+

src/etc/maketest.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,8 @@ def convert_path_spec(name, value):
4646
putenv('HOST_RPATH_DIR', os.path.abspath(sys.argv[9]));
4747
putenv('TARGET_RPATH_DIR', os.path.abspath(sys.argv[10]));
4848
putenv('RUST_BUILD_STAGE', sys.argv[11])
49+
putenv('S', os.path.abspath(sys.argv[12]))
50+
putenv('PYTHON', sys.executable)
4951

5052
if not filt in sys.argv[1]:
5153
sys.exit(0)

src/test/run-make/rustdoc-hidden-line/Makefile

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,7 @@ all:
77
@echo $(RUSTDOC)
88
$(HOST_RPATH_ENV) $(RUSTDOC) --test foo.rs
99
$(HOST_RPATH_ENV) $(RUSTDOC) -w html -o $(TMPDIR)/doc foo.rs
10-
cp verify.sh $(TMPDIR)
11-
$(call RUN,verify.sh) $(TMPDIR)
10+
$(HTMLDOCCK) $(TMPDIR)/doc foo.rs
1211

1312
else
1413
all:

src/test/run-make/rustdoc-hidden-line/foo.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,7 @@
3030
/// }
3131
/// ```
3232
pub fn foo() {}
33+
34+
// @!has foo/fn.foo.html invisible
35+
// @matches - //pre '#.*\[.*derive.*\(.*Eq.*\).*\].*//.*Bar'
36+

src/test/run-make/rustdoc-hidden-line/verify.sh

Lines changed: 0 additions & 8 deletions
This file was deleted.

src/test/run-make/rustdoc-search-index/Makefile

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,7 @@ source=index.rs
77

88
all:
99
$(HOST_RPATH_ENV) $(RUSTDOC) -w html -o $(TMPDIR)/doc $(source)
10-
cp $(source) $(TMPDIR)
11-
cp verify.sh $(TMPDIR)
12-
$(call RUN,verify.sh) $(TMPDIR)
10+
$(HTMLDOCCK) $(TMPDIR)/doc $(source)
1311

1412
else
1513
all:

src/test/run-make/rustdoc-search-index/index.rs

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,20 +10,17 @@
1010

1111
#![crate_name = "rustdoc_test"]
1212

13-
// In: Foo
13+
// @has search-index.js Foo
1414
pub use private::Foo;
1515

1616
mod private {
1717
pub struct Foo;
1818
impl Foo {
19-
// In: test_method
20-
pub fn test_method() {}
21-
// Out: priv_method
22-
fn priv_method() {}
19+
pub fn test_method() {} // @has - test_method
20+
fn priv_method() {} // @!has - priv_method
2321
}
2422

2523
pub trait PrivateTrait {
26-
// Out: priv_method
27-
fn trait_method() {}
24+
fn trait_method() {} // @!has - priv_method
2825
}
2926
}

0 commit comments

Comments
 (0)