Skip to content

fix: Sort rsync include/exclude options according to specificity (#561, #1420) #1895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 84 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
ffda525
Add test.logging with log function.
Aug 2, 2024
550b786
Add bit_config pytest fixture.
Aug 2, 2024
6d1594e
Add bit_snapshot pytest fixture.
Aug 2, 2024
3d1453e
Add test.filetree.
Aug 2, 2024
1109687
Add test_rsync_selections.
Aug 5, 2024
0e90172
Make-adjusted python paths.
Aug 14, 2024
71dedd0
Snapshots.rsyncSuffix: docstring note about extra parameters.
Aug 14, 2024
2908bcd
conftest.py: add a blank line.
Aug 16, 2024
1af7596
filetree.py: add docstrings.
Aug 16, 2024
76a0b63
filetree.py: wrap 2 long lines.
Aug 16, 2024
fc560c0
test.logging: add docstring.
Sep 16, 2024
139f6f7
test.logging: add horizontal lines.
Sep 16, 2024
d8bed28
test_rsync_selections: load the cases in the params functions.
Sep 16, 2024
d786ad6
test_rsync_selections: params_for_cases: add docstring and comments.
Sep 16, 2024
36a5c50
test_rsync_selections: combine params_for_cases and params_for_raise_…
Sep 16, 2024
fbd89a1
conftest: update for the refactoring of Config.setSnapshotsPath (67e9…
Sep 30, 2024
25d4151
test_rsync_selections: specify selection mode(s) when calling params_…
Sep 30, 2024
5217805
selection_raise_cases: remove "both-implied-dir" case.
Sep 30, 2024
a1552a3
selection_cases: only use absolute path in excludes.
Sep 30, 2024
c15ecae
selection_cases: update cases with original failing.
Sep 30, 2024
b1e03f7
selection_cases: add "various" case.
Sep 30, 2024
0140b22
snapshots.py: add Snapshots.pathSelections method.
Sep 30, 2024
61285c2
snapshots.py: use pathSelections if SELECTIONS_MODE == "sorted".
Sep 30, 2024
f635129
snapshots.py: Snapshots.takeSnapshot: save cmd.
Sep 30, 2024
c45370f
test_rsync_selections: params_for_cases: allow some conditional case …
Sep 16, 2024
625cedd
test_rsync_selections: add update_config.
Sep 16, 2024
50dacc9
test_rsync_selections: use update_config.
Sep 16, 2024
d5c5671
test_rsync_selections: remove add_includes.
Sep 16, 2024
ae4dc36
test_rsync_selections: use a subdirectory for the sample files.
Sep 16, 2024
b04df4c
test_rsync_selections: test_rsyncSuffix__raises: some log message for…
Sep 16, 2024
2b67f44
test_rsync_selections: rewrite prepend_paths, handling root directory.
Sep 16, 2024
1097253
test_rsync_selections: add root cases.
Sep 16, 2024
378edda
filetree.py: improve docstrings.
Sep 16, 2024
1b955ec
filetree.py: parse_tree: add comments.
Sep 16, 2024
4372a7d
filetree.py: parse_tree: add two descriptive variables.
Sep 16, 2024
64288cf
filetree.py: parse_tree: add more comments.
Sep 16, 2024
235b465
filetree.py: rewrite tree_from_files with sort_paths.
Sep 26, 2024
5300e67
filetree.py: import Path itself.
Sep 26, 2024
564b3cf
filetree.py: parse_tree: add parent_path variable.
Sep 26, 2024
409ada8
filetree.py: add type hints.
Sep 26, 2024
ad17012
filetree.py: parse_tree: put trailing comments on their own lines.
Sep 26, 2024
b4a9b1b
filetree.py: parse_tree: add a blank line.
Sep 26, 2024
8e7b181
filetree.py: tree_from_files: ensure consistent whitespace.
Sep 28, 2024
b85d951
test_tree_from_files: parametrize and add special cases.
Sep 28, 2024
9362ebc
test_filetree.py: add type hints.
Sep 28, 2024
549c79f
filetree.py: improve the module docstring a bit.
Sep 28, 2024
866e56e
test_logging: allow for the extra newlines and horizontal lines.
Sep 28, 2024
43fcbf9
logging, test_logging: add type hints.
Sep 29, 2024
62c698d
conftest.py: import Path itself.
Sep 29, 2024
e002054
conftest, test_fixtures: add type hints.
Sep 29, 2024
c26bbae
conftest: tidy formatting (black).
Sep 29, 2024
c081844
conftest, test_fixtures: add docstrings.
Sep 29, 2024
acbf980
logging, test_logging: add docstrings.
Sep 29, 2024
e2135e7
test_filetree.py: increase some indentation.
Sep 29, 2024
4196c00
test_rsync_selections: params_for_cases: raise for incomplete case spec.
Sep 29, 2024
6d4b965
filetree.py: improve the module docstring a bit more.
Sep 30, 2024
e7e71fe
test_rsync_selections: add the "# act" line in each test function.
Oct 1, 2024
78cdd1d
test_rsync_selections: add type hints.
Oct 2, 2024
746647b
Snapshots.pathSelections: fix docstring typo (thanks to codespell).
Oct 4, 2024
81b364e
Add a CHANGELOG entry.
Oct 4, 2024
ba5101d
filetree.py: use Union to keep Python 3.9 compatibility.
Oct 4, 2024
c9c8124
test_rsync_selections..py: use Union to keep Python 3.9 compatibility.
Oct 4, 2024
e249b99
Revert "Make-adjusted python paths."
Oct 4, 2024
52c7478
Snapshots.rsyncSuffix: specify a getattr default (as was the intent).
Oct 4, 2024
212f1b2
test_rsync_selections.py: import Path itself.
Oct 7, 2024
35fe3a5
Add YAML version of test/selection_cases.
Oct 7, 2024
34e3b52
Add ddt and PyYAML.
Oct 7, 2024
8c71db1
test_rsync_selections.py: add unittest-based test_rsyncSuffix.
Oct 7, 2024
edeb50d
Add YAML version of test/selection_raise_cases.
Oct 7, 2024
4c40070
test_rsync_selections.py: add unittest-based test_rsyncSuffix__raises.
Oct 7, 2024
f2877c8
test_rsync_selections.py: allow for cases having other flags.
Oct 7, 2024
d7b4397
test_rsync_selections.py: assert_backup: normalize the trees sooner.
Oct 7, 2024
4e24949
test_rsync_selections.py: add unittest-based cases for root dir.
Oct 7, 2024
8e92d3e
RsyncSuffixTests.test_rsyncSuffix__raises: remove extra blank line.
Oct 7, 2024
b37b6dc
test_rsync_selections.py: remove the pytest-based tests.
Oct 7, 2024
c45f482
Remove the pytest fixtures.
Oct 7, 2024
9fc504b
Remove the non-YAML selection case files.
Oct 7, 2024
169d547
test_rsync_selections.py: unify the regular and raise cases.
Oct 9, 2024
509b4d7
selection_cases: rename "simple" case.
Oct 10, 2024
9d4dd9b
Merge branch 'dev' into path_selections
buhtz Dec 29, 2024
3723a02
Merge branch 'dev' into path_selections
buhtz Feb 9, 2025
7f6e91d
Merge branch 'dev' into path_selections
buhtz Apr 26, 2025
8d51323
Merge branch 'dev' into path_selections
buhtz Apr 27, 2025
42680ad
Merge branch 'dev' into path_selections
buhtz Jun 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ install:
- pip install -U pip
- pip install pylint ruff flake8 pyfakefs keyring
- pip install pyqt6 dbus-python
- pip install ddt PyYAML
# add ssh public / private key pair to ensure user can start ssh session to localhost for tests
- ssh-keygen -b 2048 -t rsa -f /home/travis/.ssh/id_rsa -N ""
- cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Expand Down
1 change: 1 addition & 0 deletions CHANGES
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Version 1.6.0-dev (development of upcoming release)
* Removed: LICENSE file in favor of LICENSES directory and LICENSES.md file
* Fixed: Optimize subparsers and usage output (#2132)
* Fixed: Re-design about dialog (#1936)
* Fixed: Sort rsync include/exclude options according to specificity (#561, #1420)
* Fixed: Stop waking up monitor when inhibit suspend on backup starts (#714, #1090)
* Fixed: Avoid shutdown confirmation dialog on Budgie and Cinnamon desktop environments (#788)
* Fixed: Crash in "Manage profiles" dialog when using "qt6ct" (#2128)
Expand Down
155 changes: 142 additions & 13 deletions common/snapshots.py
Original file line number Diff line number Diff line change
Expand Up @@ -1428,6 +1428,8 @@ def takeSnapshot(self, sid, now, include_folders):

self.setTakeSnapshotMessage(0, _('Creating backup'))

self._rsync_cmd_args = cmd

# run rsync
proc = tools.Execute(cmd,
# TODO
Expand Down Expand Up @@ -2336,13 +2338,15 @@ def rsyncSuffix(self, includeFolders=None, excludeFolders=None):

Returns:
(list): Rsync include and exclude options.
"""
# Create exclude patterns string
rsync_exclude = self.rsyncExclude(excludeFolders)

# Create include patterns list
rsync_include, rsync_include2 = self.rsyncInclude(includeFolders)

DEREKVEIT 2024-07-31: This method is only called in one place,
self.takeSnapshot, which itself is only called in one place in
self.backup. The excludeFolders parameter is unused and could be
removed. The includeFolders parameter is always set simply from
self.config.include() in self.backup, so the includeFolders parameter
could be eliminated too by getting it from self.config here just as the
excludes are.
"""
encode = self.config.ENCODE

ret = ['--chmod=Du+wx']
Expand All @@ -2353,18 +2357,143 @@ def rsyncSuffix(self, includeFolders=None, excludeFolders=None):
encode.exclude(self.config._MOUNT_ROOT)
)
])
# TODO: fix bug #561:
# after rsync_exclude we need to explicitly include files inside
# excluded folders, recursive exclude folder-content again and finally
# add the rest from rsync_include2
ret.extend(rsync_include)
ret.extend(rsync_exclude)
ret.extend(rsync_include2)
if getattr(self.config, "SELECTIONS_MODE", "") == "sorted":
# This is ignoring the includeFolders and excludeFolders arguments.
ret.extend([f"--{option}={path}" for option, path in self.pathSelections()])
else:
rsync_exclude = self.rsyncExclude(excludeFolders)
rsync_include, rsync_include2 = self.rsyncInclude(includeFolders)

# TODO: fix bug #561:
# after rsync_exclude we need to explicitly include files inside excluded
# folders, recursive exclude folder-content again and finally add the
# rest from rsync_include2
ret.extend(rsync_include)
ret.extend(rsync_exclude)
ret.extend(rsync_include2)

# Make exclusion rather than inclusion the default.
ret.append('--exclude=*')
ret.append(encode.chroot)

return ret

def pathSelections(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just based on a gut feeling, I would say this method can be split into several methods.
Can the sorting part be separated from "self.config" and any other BIT-specific code? And than make that a module function instead of a method.

"""Return data for the --include= and --exclude= options of rsync.

Returns a list of (<option>, <path>) where <option> is "include" or
"exclude".

When rsync considers a path, the first matching include or exclude rule
is applied. The order of the options matters when there is both an
include rule and an exclude rule with a pattern that would match.

The items are reverse sorted so that rules for more-specific paths
come first, e.g.:
/var/log/My App
/var/log/My*
/var/log
/var/*/something
/var/*.log
/var/*
/var/**
/var
"""
selections = set()

# Some exclude patterns represent relative paths. For these to have
# the correct priority in the sorted paths for rsync options, they will
# be converted into absolute paths.
relative_excludes = set()
for path in self.config.exclude():
if path == "/":
selections.add(("exclude", "/**"))
elif path.startswith("/"):
selections.add(("exclude", path))
else:
relative_excludes.add(path)

include_paths = set([path for path, _ in self.config.include()])

for path, path_type in self.config.include():
is_dir = path_type == 0

selections.add(("include", path))

if is_dir:
# For concatenations with a slash below.
if path == "/":
path = ""

# For the user, including a directory implies also including
# anything in it, but rsync does not work that way, so here we
# also specify to include "anything in it".
selections.add(("include", f"{path}/**"))

# Add the relative paths as absolute paths.
for relative_path in relative_excludes:
if f"{path}/{relative_path}" not in include_paths:
selections.add(("exclude", f"{path}/{relative_path}"))
selections.add(("exclude", f"{path}/**/{relative_path}"))

# Since we exclude by default (--exclude=*), we must ensure that
# each parent directory is also explicitly included. Otherwise
# rsync will skip the parent and not find this path. These are not
# being given the <path>/** pattern as above, so this does not
# implicitly include other contents of the parent directories.
while True:
path, tail = os.path.split(path)
if not tail:
break
selections.add(("include", path))

# If the parent directory was directly excluded, let the
# directory itself be an exception to the "directory and its
# contents" implied by that, and only exclude the contents.
if ("exclude", path) in selections:
selections.remove(("exclude", path))
selections.add(("exclude", f"{path}/**"))

# Add the relative paths as absolute paths.
if path == "/":
path = ""
for relative_path in relative_excludes:
if f"{path}/{relative_path}" not in include_paths:
selections.add(("exclude", f"{path}/{relative_path}"))
selections.add(("exclude", f"{path}/**/{relative_path}"))

# Identify contradictory specifications.
include_paths = set(p for o, p in selections if o == "include")
exclude_paths = set(p for o, p in selections if o == "exclude")
conflicts = include_paths & exclude_paths
for conflict_path in list(conflicts):
if conflict_path.endswith("/**"):
# This would be the automatically-added include for a folder,
# which should give way to a more-directly specified exclude.
selections.remove(("include", conflict_path))
conflicts.remove(conflict_path)
if conflicts:
# XXX: Preferably, the config GUI would also alert the user about this.
raise ValueError(f"a path is both included and excluded: {conflicts}")

# Sort in reverse so that the more specific path rules are given first.
# For the sort key, replace wildcard characters to give them low sort
# values, corresponding to them being least specific. For example,
# "/home/user" is more specific than "/home/*".
selections = sorted(
selections,
key=lambda item: item[1].replace("**", "\x00").translate(str.maketrans({
"*": "\x01",
"?": "\x02",
"[": "\x03",
})),
reverse=True,
)

print("\n" + "\n : ".join(f"{o} {p}" for o, p in selections))

return selections

def rsyncExclude(self, excludeFolders=None):
"""Format exclude list for rsync.

Expand Down
173 changes: 173 additions & 0 deletions common/test/filetree.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
from pathlib import Path
import re
import textwrap
from typing import Iterable, Union


"""
A "tree" in this module means a multi-line text representation of files and
directories like this:

a_tree = '''
foo/
bar/
file-a
baz/
file-b
file-c
'''

This is to help make backup tests that are easy to read and write and are clear
in their expectations and results.

Features of a tree:
- Directories are distinguished by a trailing slash
- Within a directory, names must be sorted with directories before files
- Indentation increment must be 4 spaces
- Initial indentation and leading and trailing whitespace are arbitrary

Useful functions:
- files_from_tree: Create a file structure from a tree representation
- tree_from_files: Generate a tree representation from an existing file structure
- normal: Normalize the indentation and whitespace of a tree string

An example of how this might be used to test a backup procedure:
1) empty directories A and B are created.
2) tree_a is defined with a string to specify some files.
3) files_from_tree(A, tree_a)
4) a backup is made from directory A to directory B.
5) tree_b = tree_from_files(B)
6) assert tree_b == normal(tree_a)
"""


def files_from_tree(parent_dir: Union[str, Path], tree: str) -> None:
"""Create in `parent_dir` the structure described by `tree`."""
dir_paths, file_paths = parse_tree(parent_dir, tree)

for path in dir_paths:
path.mkdir()

for path in file_paths:
path.touch()


def parse_tree(parent_dir: Union[str, Path], tree: str) -> tuple[list[Path], list[Path]]:
"""Return the paths described by `tree` in `parent_dir`."""
parent_path = Path(parent_dir)

# a stack of ancestral directories at the current line
parent_dirs: list[Path] = []
# a stack of corresponding indentation strings
indents: list[str] = []
# most recent directory name in each ancestral directory
prec_dirname: dict[Path, str] = {}
# most recent file name in each ancestral directory
prec_filename: dict[Path, str] = {}

# full paths of the directories
dir_paths = []
# full paths of the files
file_paths = []

prev_filename: str = ""

for line in tree.splitlines():
if not line.strip():
continue

indent, filename = split_indent(line)

if not re.match(r"^(?: )*$", indent):
raise ValueError(f"indentation must be of 4-space increments: {line = }")

if not indents:
# first iteration
indents.append(indent)
parent_dirs.append(parent_path)
elif indent == indents[-1]:
# the same indentation level and same parent directory
pass
elif indent in indents[:-1]:
# a previous indentation level and corresponding parent directory
index = indents.index(indent)
indents = indents[: index + 1]
parent_dirs = parent_dirs[: index + 1]
elif len(indent) > len(indents[-1]):
# should be reading the contents of a directory just seen
if not prev_filename.endswith("/"):
raise ValueError(f"indentation without directory: {line = }")
indents.append(indent)
parent_dirs.append(parent_dirs[-1] / prev_filename[:-1])
else:
raise ValueError(f"inconsistent tree indentation: {line = }")

preceding_dirname_here = prec_dirname.get(parent_dirs[-1], "")
preceding_filename_here = prec_filename.get(parent_dirs[-1], "")
if filename.endswith("/"):
if not preceding_dirname_here < filename:
raise ValueError(
f"listed out of order after {preceding_dirname_here!r}: {line = }"
)
if preceding_filename_here:
raise ValueError(f"directory cannot be listed after file(s): {line = }")

# add the path for this directory
dir_paths.append(parent_dirs[-1] / filename)

prec_dirname[parent_dirs[-1]] = filename
else:
if not preceding_filename_here < filename:
raise ValueError(
f"listed out of order after {preceding_filename_here!r}: {line = }"
)

# add the path for this file
file_paths.append(parent_dirs[-1] / filename)

prec_filename[parent_dirs[-1]] = filename

prev_filename = filename

return dir_paths, file_paths


def split_indent(text: str) -> tuple[str, str]:
"""Return the indentation and the remainder of the string as 2 strings."""
mo = re.match(r"( *)(.*)", text)
indent, remainder = mo.groups() # type: ignore [union-attr]
return indent, remainder


def tree_from_files(parent_dir: Union[str, Path]) -> str:
"""Return a tree describing the contents of `parent_dir`."""
parent_path = Path(parent_dir)
initial_depth = len(parent_path.parts)
tree_lines = []
for path in sort_paths(parent_path.rglob("*")):
if path == parent_path:
continue
indent = " " * (len(path.parents) - initial_depth)
name = path.name + ("/" if path.is_dir() else "")
tree_lines.append(indent + name)

return "\n" + "\n".join(tree_lines) + "\n"


def normal(tree_string: str) -> str:
"""Normalize indentation depth and surrounding whitespace of the tree."""
return f"\n{textwrap.dedent(tree_string).strip()}\n"


def sort_paths(paths: Iterable[Path]) -> list[Path]:
"""Sort a list of paths.

Within each directory, subdirectories are listed before files.
"""
return sorted(
paths,
key=lambda path: (
[(False, part) for part in path.parts[:-1]]
+ [(not path.is_dir(), path.name)]
),
)
Loading