Skip to content

Complete implementation of all commands, in-place editing, and optimizations #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

dspinellis
Copy link
Collaborator

@dspinellis dspinellis commented May 19, 2025

At this state sed implements all POSIX commands, and can correctly run the two complex scripts: hanoi.sed (solves the Towers of Hanoi puzzle) and math.sed (implements an arbitrary precision integer math calculator).

The performance of this Rust implementation is now better than the GNU and FreeBSD implementations for most benchmark cases.

GNU sed under Linux

               no-op-short rust is  3.04 times faster than gnu
          access-log-no-op rust is  1.91 times faster than gnu
       access-log-no-subst rust is  1.59 times faster than gnu
          access-log-subst rust is  1.32 times faster than gnu
         access-log-no-del rust is  1.54 times faster than gnu
        access-log-all-del rust is  1.99 times faster than gnu
       access-log-translit rust is 12.67 times faster than gnu
    access-log-complex-sub gnu is  4.24 times faster than rust
                 remove-cr rust is  1.68 times faster than gnu
              genome-subst rust is  1.15 times faster than gnu
                number-fix rust is  1.06 times faster than gnu
               long-script gnu is  2.19 times faster than rust
                     hanoi rust is  1.96 times faster than gnu
                 factorial rust is  1.25 times faster than gnu

FreeBSD sed

               no-op-short rust is  3.50 times faster than fbsd
          access-log-no-op rust is  2.45 times faster than fbsd
       access-log-no-subst rust is  1.38 times faster than fbsd
          access-log-subst rust is  4.02 times faster than fbsd
         access-log-no-del rust is  1.35 times faster than fbsd
        access-log-all-del rust is  4.74 times faster than fbsd
       access-log-translit fbsd is  1.14 times faster than rust
    access-log-complex-sub rust is  2.00 times faster than fbsd
                 remove-cr rust is  2.48 times faster than fbsd
              genome-subst rust is  2.21 times faster than fbsd
                number-fix fbsd is  1.04 times faster than rust
               long-script fbsd is 15.46 times faster than rust
                     hanoi rust is  9.33 times faster than fbsd
                 factorial rust is 115.72 times faster than fbsd

Remaining tasks

  • Improve runtime error reporting by including script coordinates in each command.
  • Fix buffering on terminal output to match current implementations
  • Profile lagging benchmarks to see whether there's room for further optimizations.
  • Implement more GNU extensions.

dspinellis added 15 commits May 19, 2025 19:28
According to POSIX, labels consist of portable filename characters.
Previously empty RE were saved and reused based on their occurrence
in the compiled script.

Per POSIX if an RE is empty (that is, no pattern is specified) sed shall
behave as if the last RE used in the last command applied (either as an
address or as part of a substitute command) was specified.

Now REs are saved and reused at runtime.
The previous order resulted in a duplicate label error.
regex::Regex EREs don't support back-references, so switch to
fancy_regex.

While at it remove test_compile_re_reuse_saved test case, which is no
longer valid, as REs are reused at runtime.
@dspinellis dspinellis changed the title Implement a, c, i commands and fix faults to run example scripts Implement a, c, i commands and fix faults to run sophisticated scripts May 19, 2025
fn re_or_saved_re(regex: &Option<Regex>, context: &mut ProcessingContext) -> UResult<Regex> {
match regex {
Some(re) => {
*context.saved_regex.borrow_mut() = Some(re.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to ignore the clone ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed; fixed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change further reduced the 50! calculation from 4.34" to 3.46" (against 10.28" of GNU sed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impressive!

Comment on lines 202 to 210
let caps = match caps {
Ok(c) => c,
Err(e) => {
return Err(USimpleError::new(
2,
format!("regular expression capture retrieval error: {}", e),
));
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you wish but you could write:

Suggested change
let caps = match caps {
Ok(c) => c,
Err(e) => {
return Err(USimpleError::new(
2,
format!("regular expression capture retrieval error: {}", e),
));
}
};
let caps = caps
.map_err(|e| USimpleError::new(
2,
format!("regular expression capture retrieval error: {}", e),
))?;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed!


/// Copy the specified file to the output.
pub fn copy_file(&mut self, path: &PathBuf) -> io::Result<()> {
#[cfg(unix)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know what will happen here on !unix ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are no memory mapped data, so there's nothing to flush. (The flush is needed when we switch from coalescing and writing out memory mapped data (with zero copies) to BufWriter.

This is code for the r command, which is still WIP.

This improves performance up to 1.5 times, with no downside.

           no-op-short     rust-sed     is similarly fast as   rust-sed-f0
      access-log-no-op     rust-sed is   1.5 times faster than rust-sed-f0
   access-log-no-subst     rust-sed     is similarly fast as   rust-sed-f0
      access-log-subst     rust-sed     is similarly fast as   rust-sed-f0
     access-log-no-del     rust-sed is   1.1 times faster than rust-sed-f0
    access-log-all-del     rust-sed     is similarly fast as   rust-sed-f0
   access-log-translit     rust-sed is   1.2 times faster than rust-sed-f0
access-log-complex-sub     rust-sed     is similarly fast as   rust-sed-f0
             remove-cr     rust-sed     is similarly fast as   rust-sed-f0
          genome-subst     rust-sed     is similarly fast as   rust-sed-f0
            number-fix     rust-sed is   1.1 times faster than rust-sed-f0
           long-script     rust-sed     is similarly fast as   rust-sed-f0
                 hanoi     rust-sed     is similarly fast as   rust-sed-f0
             factorial     rust-sed     is similarly fast as   rust-sed-f0
dspinellis added 19 commits May 24, 2025 20:26
This improves performance by 10% in the following benchmark case.
Performance in all others remains the same.

access-log-subst     rust-sed is   1.1 times faster than rust-sed-eb64104
In theory, find, find_iter, captures, and captures_iter according to the
values of sub.occurrence, sub.replacement.max_group_number could allow
for better performance.
In practice, specializing for the easiest case to use find did not show
any improvement.

           no-op-short     rust-sed     is similarly fast as   rust-sed-7868f4f
      access-log-no-op     rust-sed     is similarly fast as   rust-sed-7868f4f
   access-log-no-subst     rust-sed     is similarly fast as   rust-sed-7868f4f
      access-log-subst     rust-sed     is similarly fast as   rust-sed-7868f4f
     access-log-no-del     rust-sed     is similarly fast as   rust-sed-7868f4f
    access-log-all-del     rust-sed     is similarly fast as   rust-sed-7868f4f
   access-log-translit     rust-sed     is similarly fast as   rust-sed-7868f4f
access-log-complex-sub     rust-sed     is similarly fast as   rust-sed-7868f4f
             remove-cr     rust-sed     is similarly fast as   rust-sed-7868f4f
          genome-subst     rust-sed     is similarly fast as   rust-sed-7868f4f
            number-fix     rust-sed     is similarly fast as   rust-sed-7868f4f
           long-script     rust-sed     is similarly fast as   rust-sed-7868f4f
                 hanoi     rust-sed     is similarly fast as   rust-sed-7868f4f
             factorial     rust-sed     is similarly fast as   rust-sed-7868f4f

Consequently, this commit just documents the change and its results, and
will be reverted.
This reverts commit f18f772.
No performance improvement was seen.
This improves one benchmark case by 20%:
access-log-all-del rust-sed is 1.2 times faster than rust-sed-f18f772

Unfortunatelly, all other cases remain unaffected.
The hope is that the now available regex::bytes will run find() faster
than captures_iter().  Indeed, using find() increases the performance
of several benchmark cases by 2-12%.

           no-op-short rust-sed is  1.02 times faster than rust-sed-5fad204
      access-log-no-op rust-sed is  1.01 times faster than rust-sed-5fad204
   access-log-no-subst rust-sed is  1.08 times faster than rust-sed-5fad204
      access-log-subst rust-sed is  1.12 times faster than rust-sed-5fad204
   access-log-translit rust-sed is  1.02 times faster than rust-sed-5fad204
access-log-complex-sub rust-sed is  1.04 times faster than rust-sed-5fad204
             remove-cr rust-sed is  1.12 times faster than rust-sed-5fad204
          genome-subst rust-sed is  1.02 times faster than rust-sed-5fad204
                 hanoi rust-sed is  1.01 times faster than rust-sed-5fad204

All other cases remain the same.
Performance improves significantly in the following case:

number-fix rust-sed is 1.07 times faster than rust-sed-90d60b8

Other changes <5% are likely to be noise.
- Add RE to detect specifications that require an RE engine, rather than
  a literal string match.
- Change RE unit tests to test directly the RE rather than the constructor.
- Use arrays for similar tests to avoid repetition.
Regex uses an automaton even for matching literal strings. Moving
through its state transitions is suboptimal.  This commit introduces
literal string matching with the memchr::memmem matcher, which may use
SIMD and other specialized features to speed up the search as well as
other algorithms for special sizes.

This boosts substantially the performance of several benchmark cases.

access-log-no-subst rust-86 is 2.84 times faster than rust-sed-a6
   access-log-subst rust-86 is 1.87 times faster than rust-sed-a6
  access-log-no-del rust-86 is 3.00 times faster than rust-sed-a6
 access-log-all-del rust-86 is 2.80 times faster than rust-sed-a6
          remove-cr rust-86 is 4.35 times faster than rust-sed-a6
       genome-subst rust-86 is 3.65 times faster than rust-sed-a6

It makes makes the performance of Rust better than the GNU and FreeBSD
implementations for most benchmark cases.

           no-op-short rust is  3.04 times faster than gnu
      access-log-no-op rust is  1.91 times faster than gnu
   access-log-no-subst rust is  1.59 times faster than gnu
      access-log-subst rust is  1.32 times faster than gnu
     access-log-no-del rust is  1.54 times faster than gnu
    access-log-all-del rust is  1.99 times faster than gnu
   access-log-translit rust is 12.67 times faster than gnu
access-log-complex-sub gnu is  4.24 times faster than rust
             remove-cr rust is  1.68 times faster than gnu
          genome-subst rust is  1.15 times faster than gnu
            number-fix rust is  1.06 times faster than gnu
           long-script gnu is  2.19 times faster than rust
                 hanoi rust is  1.96 times faster than gnu
             factorial rust is  1.25 times faster than gnu

           no-op-short rust is  3.50 times faster than fbsd
      access-log-no-op rust is  2.45 times faster than fbsd
   access-log-no-subst rust is  1.38 times faster than fbsd
      access-log-subst rust is  4.02 times faster than fbsd
     access-log-no-del rust is  1.35 times faster than fbsd
    access-log-all-del rust is  4.74 times faster than fbsd
   access-log-translit fbsd is  1.14 times faster than rust
access-log-complex-sub rust is  2.00 times faster than fbsd
             remove-cr rust is  2.48 times faster than fbsd
          genome-subst rust is  2.21 times faster than fbsd
            number-fix fbsd is  1.04 times faster than rust
           long-script fbsd is 15.46 times faster than rust
                 hanoi rust is  9.33 times faster than fbsd
             factorial rust is 115.72 times faster than fbsd
By using uucore's .map_err_context.
@dspinellis dspinellis changed the title Implement a, c, i commands and fix faults to run sophisticated scripts Complete implementation of all commands, in-place editing, and optimizations May 30, 2025
@dspinellis
Copy link
Collaborator Author

@sylvestre I integrated the commits for the remaining commands and in-place editing. The implementation is mostly done now.

@sylvestre
Copy link
Contributor

Excellent, I will have a look :)

@sylvestre
Copy link
Contributor

I will reply to your email too :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants