Skip to content

Commit d12edee

Browse files
committed
Redo fuzzer exercises to be based around tables
The previous bug using list items was not a good example. This also requires exploring a larger search space so we allow the browser to be reloaded using SIGHUP.
1 parent 9adbcab commit d12edee

File tree

13 files changed

+197
-405
lines changed

13 files changed

+197
-405
lines changed

docs/exercise-4b-hints/hint1.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
The bug is in the handling of a standard, common, HTML tag - specifically, an HTML tag which was *not* included in `browser.py`.
1+
The bug is in the handling of some standard, fairly common, HTML tags.

docs/exercise-4b-hints/hint2.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
1-
Perhaps `browser.py` was feeling a bit listless.
1+
Although you're not allowed to look in `html_table.py`, perhaps the name of that file
2+
gives you some clues about what sorts of HTML tags might be involved?

docs/exercise-4b-hints/hint3.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
There are several standard HTML list tags - `ul`, `ol`, `li` (and various others).
1+
There are several standard HTML table tags - `tr`, `td`, `table` (and various others).

docs/exercise-4b-hints/hint4.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
There is a combination of these list-related tags which will cause the browser to crash. You should write code to generate random combinations of these tags and intervening data.
1+
There is a combination of these table-related tags which will cause the browser to crash. You should write code to generate random combinations of these tags and intervening data.

docs/exercise4a.md

+2
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ Any of the following counts as a success:
1818
Rules:
1919
* You can only do this by *altering the HTML content of the web page*. Remember,
2020
you're a website operator. You *cannot* change the browser code.
21+
* You *must not* look in the `html_table.py` file, because that's for
22+
a subsequent exercise.
2123

2224
> [!TIP]
2325
> If you find a bug which makes one website look like another one,

docs/exercise4b.md

+4-2
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,11 @@ the hidden security bugs in the previous exercise.
66
Now you're going to write a program to find more security bugs. A program which
77
finds security bugs by testing another program is called a "fuzzer".
88

9+
**Note**: this exercise probably only works on Linux, Mac or Chromebooks.
10+
911
Do this:
1012

11-
* Do *NOT* look at the code for `src/fuzzer/browser-v2.py`. That is cheating!
13+
* Do *NOT* look at the code for `src/browser/html_table.py`. That is cheating!
1214
* Open `src/fuzzer/fuzzer.py` in VSCode and read it.
1315
* Run `python3 src/fuzzer/fuzzer.py`. Watch what it does.
1416
* Control-C to cancel it.
@@ -18,7 +20,7 @@ Now:
1820
1. Modify *one single number* in the `generate_testcase` function so that it
1921
finds one of the security bugs. Run the fuzzer again.
2022
2. Now, modify `generate_testcase` to find another bug which is hidden in
21-
`src/fuzzer/browser-v2.py`. Do *not* look at its code - that's cheating!
23+
`src/browser/html_table.py`. Do *not* look at its code - that's cheating!
2224
To be clear, this is an _extra_ security bug which wasn't in `browser.py`.
2325

2426
## General hints (no spoilers! Fine to read)

docs/for-teachers.md

+2
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,5 @@ be run on an online Python REPL. The kids will need a real computer capable
2222
of running Python locally, and they'll need to be able to install a few Python
2323
libraries using `pip`. You should carefully run through the [setup requirements](setup.md)
2424
before deciding if this project is right for you.
25+
26+
You may find [solutions at this page](solutions.md).

docs/solutions.md

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Solutions
2+
3+
Example solutions (there may be others!)
4+
5+
# Exercise 4b
6+
7+
```
8+
text = ""
9+
num = random.randrange(0, 12)
10+
for x in range(0, num):
11+
text += random.choice(["<tr>", "</tr>", "</td>", "<td>", "<table>", "</table>", "hello"])
12+
return text
13+
```

src/browser/browser.py

+66-27
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,13 @@
1717
# Simple demo python web browser. Lacks all sorts of important features.
1818

1919
from PyQt6.QtWidgets import QApplication, QWidget, QMainWindow, QVBoxLayout, QHBoxLayout, QPushButton, QLabel, QLineEdit, QSizePolicy
20-
from PyQt6.QtCore import QSettings, Qt, QPoint, QSize, QEvent
20+
from PyQt6.QtCore import QSettings, Qt, QPoint, QSize, QSocketNotifier
2121
from PyQt6.QtGui import QFont, QMouseEvent, QPainter, QFontMetrics
2222
import requests
2323
import os
24+
import signal
2425
import sys
26+
import html_table # do not look inside this file, that would be cheating on a later exercise
2527
from html.parser import HTMLParser
2628
from urllib.parse import urlparse
2729

@@ -37,7 +39,7 @@ class Renderer(HTMLParser, QWidget):
3739
the right sort of text in the right places.
3840
"""
3941

40-
def __init__(self, parent=None):
42+
def __init__(self, browser, parent=None):
4143
"""
4244
Code which is run when we create a new Renderer.
4345
"""
@@ -54,21 +56,14 @@ def __init__(self, parent=None):
5456
# e.g.
5557
# (10, 20, 50, 30, "http://foo.com")
5658
self.html = ""
57-
self.browser = None
59+
self.browser = browser
5860

5961
def minimumSizeHint(self):
6062
"""
6163
Returns the smallest possible size on the screen for our renderer.
6264
"""
6365
return QSize(800, 400)
6466

65-
def set_browser(self, browser):
66-
"""
67-
Remembers a reference to the browser object, so we can tell
68-
the browser later when a link is clicked.
69-
"""
70-
self.browser = browser
71-
7267
def mouseReleaseEvent(self, event: QMouseEvent | None) -> None:
7368
"""
7469
Handle a click somewhere in the renderer area. See if it
@@ -118,6 +113,7 @@ def paintEvent(self, event):
118113
self.space_needed_before_next_data = False
119114
self.current_link = None # if we're in a <a href=...> hyperlink
120115
self.known_links = list() # Links anywhere on the page
116+
self.table = None # whether we're in an HTML table
121117
# The following call interprets all the HTML in page_html.
122118
# You can't see most of the code which does this because it's
123119
# in the library which provides the HTMLParser class. But it will
@@ -127,6 +123,9 @@ def paintEvent(self, event):
127123
# handle_data and handle_endtag depending on what's inside self.html.
128124
self.feed(self.html)
129125
self.painter = None
126+
# Ignore the following two lines, they're used for exercise 4b only
127+
if os.environ.get("OUTPUT_STATUS") is not None:
128+
print("Rendering completed\n", flush=True)
130129

131130
def handle_starttag(self, tag, attrs):
132131
"""
@@ -139,6 +138,13 @@ def handle_starttag(self, tag, attrs):
139138
# Stuff inside these tags isn't actually HTML
140139
# to display on the screen.
141140
self.ignore_current_text = True
141+
if self.table is not None:
142+
# If we're inside a table, handle table-related tags but no others
143+
if tag == 'tr':
144+
self.table.handle_tr_start()
145+
if tag == 'td':
146+
self.table.handle_td_start()
147+
return
142148
if tag == 'b' or tag == 'strong':
143149
self.is_bold = True
144150
if tag == 's':
@@ -174,12 +180,20 @@ def handle_starttag(self, tag, attrs):
174180
heading_number = int(tag[1])
175181
font_size_difference = FONT_SIZE_INCREASES_FOR_HEADERS_1_TO_6[heading_number - 1]
176182
self.font_size += font_size_difference
183+
if tag == 'table':
184+
self.table = html_table.HTMLTable()
177185
self.space_needed_before_next_data = True
178186

179187
def handle_endtag(self, tag):
180188
"""
181189
Handle an HTML end tag, for example </a> or </b>
182190
"""
191+
if self.table is not None:
192+
# If we're inside a table, handle table end but no other tags
193+
if tag == 'table':
194+
self.y_pos = self.table.handle_table_end(self.y_pos, lambda x, y, content: self.draw_text(x, y, content))
195+
self.table = None
196+
return
183197
if tag == 'br' or tag == 'p': # move to a new line
184198
self.newline()
185199
if tag == 'script' or tag == 'style' or tag == 'title':
@@ -221,6 +235,21 @@ def handle_data(self, data):
221235
if self.space_needed_before_next_data:
222236
self.space_needed_before_next_data = False
223237
data = ' ' + data
238+
if self.table is not None:
239+
# If we're inside a table, ask our table layout code to
240+
# figure out where to draw it later
241+
self.table.handle_data(data)
242+
else:
243+
(text_width, text_height) = self.draw_text(self.x_pos, self.y_pos, data)
244+
self.x_pos = self.x_pos + text_width
245+
if text_height > self.tallest_text_in_previous_line:
246+
self.tallest_text_in_previous_line = text_height
247+
248+
def draw_text(self, x_pos, y_pos, text):
249+
"""
250+
Draw some text on the screen.
251+
Returns a tuple of (x, y) space occupied
252+
"""
224253
# Work out what font we'll draw this in.
225254
weight = QFont.Weight.Normal
226255
if self.is_bold:
@@ -233,26 +262,24 @@ def handle_data(self, data):
233262
self.painter.setPen(fill)
234263
# Work out the size of the text we're about to draw.
235264
text_measurer = QFontMetrics(font)
236-
text_width = int(text_measurer.horizontalAdvance(data))
265+
text_width = int(text_measurer.horizontalAdvance(text))
237266
text_height = int(text_measurer.height())
238267
# Tell our GUI canvas to draw some text! The important bit!
239-
self.painter.drawText(QPoint(self.x_pos, self.y_pos + text_height), data)
268+
self.painter.drawText(QPoint(x_pos, y_pos + text_height), text)
240269
# If we're in a hyperlink, underline it and record its coordinates
241270
# in case it gets clicked later.
242271
if self.current_link is not None:
243-
self.painter.drawLine(self.x_pos, self.y_pos + text_height, self.x_pos + text_width, self.y_pos + text_height)
244-
self.known_links.append((self.x_pos, self.y_pos, self.x_pos + text_width, self.y_pos + text_height, self.current_link))
272+
self.painter.drawLine(x_pos, y_pos + text_height, x_pos + text_width, y_pos + text_height)
273+
self.known_links.append((x_pos, y_pos, x_pos + text_width, y_pos + text_height, self.current_link))
245274
# Strikethrough - draw a line over the text but only
246275
# if we don't cover more than 50% of it, we don't want it illegible
247276
if self.is_strikethrough:
248277
fraction_of_text_covered = 6 / self.font_size
249278
if fraction_of_text_covered <= 0.5:
250-
strikethrough_line_y_pos = self.y_pos + (self.font_size / 2) - 80
251-
self.canvas.create_line(self.x_pos, strikethrough_line_y_pos,
252-
self.x_pos + text_width, strikethrough_line_y_pos)
253-
self.x_pos = self.x_pos + text_width
254-
if text_height > self.tallest_text_in_previous_line:
255-
self.tallest_text_in_previous_line = text_height
279+
strikethrough_line_y_pos = y_pos + (self.font_size / 2) - 80
280+
self.canvas.create_line(x_pos, strikethrough_line_y_pos,
281+
x_pos + text_width, strikethrough_line_y_pos)
282+
return (text_width, text_height)
256283

257284

258285
class Browser(QMainWindow):
@@ -283,20 +310,21 @@ def __init__(self, initial_url):
283310
toolbar.setLayout(toolbar_layout)
284311
overall_layout = QVBoxLayout()
285312
overall_layout.addWidget(toolbar)
286-
self.renderer = Renderer()
287-
self.renderer.set_browser(self)
313+
self.renderer = Renderer(self)
288314
overall_layout.addWidget(self.renderer)
289315
self.status_bar = QLabel("Status:")
290316
overall_layout.addWidget(self.status_bar)
291317
widget = QWidget()
292318
widget.setLayout(overall_layout)
293319
self.setCentralWidget(widget)
320+
# Set up somewhere to remember the last URL the user used
294321
self.settings = QSettings("browser-learning", "browser")
295322
if initial_url is None:
296323
initial_url = self.settings.value("url", "https://en.wikipedia.org", type=str)
297-
self.set_window_url(initial_url)
298324
else:
299325
self.navigate(initial_url)
326+
self.set_window_url(initial_url)
327+
self.setup_fuzzer_handling() # ignore
300328

301329
def go_button_clicked(self):
302330
"""
@@ -326,9 +354,6 @@ def set_status(self, message):
326354
Update the status line at the bottom of the screen
327355
"""
328356
self.status_bar.setText(message)
329-
# Ignore the following two lines, they're used for exercise 4b only
330-
if os.environ.get("OUTPUT_STATUS") is not None:
331-
print(message + "\n", flush=True)
332357

333358
def set_window_url(self, url):
334359
"""
@@ -377,6 +402,20 @@ def setup_encryption(self, url):
377402
elif "REQUESTS_CA_BUNDLE" in os.environ:
378403
del os.environ["REQUESTS_CA_BUNDLE"]
379404

405+
def setup_fuzzer_handling(self):
406+
"""
407+
Ignore this function - it's used to set up
408+
fuzzing for some of the later exercises.
409+
"""
410+
self.reader, self.writer = os.pipe()
411+
signal.signal(signal.SIGHUP, lambda _s, _h: os.write(self.writer, b'a'))
412+
notifier = QSocketNotifier(self.reader, QSocketNotifier.Type.Read, self)
413+
notifier.setEnabled(True)
414+
def signal_received():
415+
os.read(self.reader, 1)
416+
window.go_button_clicked()
417+
notifier.activated.connect(signal_received)
418+
380419

381420
#########################################
382421
# Main program here
@@ -402,4 +441,4 @@ def setup_encryption(self, url):
402441
# we need to display something on the screen, along with
403442
# methods above like "go_button_clicked" or "mouseReleaseEvent"
404443
# when the user interacts with the app.
405-
app.exec()
444+
app.exec()

src/browser/html_table.py

+80
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Copyright 2024 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# https://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
#####################################
16+
#####################################
17+
#####################################
18+
### DO NOT LOOK INSIDE THIS FILE! ###
19+
#####################################
20+
#####################################
21+
#####################################
22+
#####################################
23+
# This contains spoilers for exercise
24+
# 4b. Reading this code is cheating!
25+
#####################################
26+
#####################################
27+
#####################################
28+
#####################################
29+
30+
class HTMLTable:
31+
def __init__(self):
32+
self.rows = list()
33+
34+
def handle_tr_start(self):
35+
self.rows.append(list())
36+
37+
def handle_td_start(self):
38+
if len(self.rows) == 0: # no tr was found
39+
return
40+
self.rows[-1].append("")
41+
42+
def handle_data(self, data):
43+
if len(self.rows) == 0: # no tr was found
44+
return
45+
if len(self.rows[-1]) == 0: # no td was found
46+
return
47+
self.rows[-1][-1] += data
48+
49+
def handle_table_end(self, initial_y_pos, draw_at):
50+
"""
51+
Draws the table, using the passed function which takes
52+
x and y positions and content, draws the content,
53+
and returns a tuple of (x, y) space
54+
occupied.
55+
Returns the y position after the table is drawn.
56+
"""
57+
if len(self.rows) == 0:
58+
return initial_y_pos
59+
y_pos = initial_y_pos
60+
column_widths = list()
61+
first_row = True
62+
# Column widths are based on the first row space
63+
# occupied. A real algorithm would consider other rows.
64+
for row in self.rows:
65+
max_height = 0
66+
if first_row:
67+
first_row = False
68+
for cell in row:
69+
current_x_pos = sum(column_widths)
70+
(width, height) = draw_at(current_x_pos, y_pos, cell)
71+
column_widths.append(width + 10) # padding
72+
max_height = max(max_height, height)
73+
else:
74+
current_x_pos = 0
75+
for n, cell in enumerate(row):
76+
(_, height) = draw_at(current_x_pos, y_pos, cell)
77+
max_height = max(max_height, height)
78+
current_x_pos += column_widths[n]
79+
y_pos += max_height + 10 # padding
80+
return y_pos

0 commit comments

Comments
 (0)