Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate strings matching a regular expression #64

Closed
wviterson opened this issue Nov 6, 2015 · 13 comments
Closed

Generate strings matching a regular expression #64

wviterson opened this issue Nov 6, 2015 · 13 comments

Comments

@wviterson
Copy link

Somewhere in the long-forgotten corners of the internet, I found Xeger (https://code.google.com/p/xeger/). It's a small wrapper around the Automaton library (http://www.brics.dk/automaton/), which luckiliy exists in the maven repository.

If you give Xeger a regular expression, it will generate a random String matching that expression.
Would it be an idea to add such a feature to junit-quickcheck?

Something like the imaginary @Matches annotation in
@Theory public void hold(@ForAll @Matches(pattern="[0-9A-F]{4}") String hexValue) {...}

@pholser
Copy link
Owner

pholser commented Nov 12, 2015

@wviterson This sounds like a really cool idea!

@wviterson
Copy link
Author

Ok, I'm having a bit of trouble understanding the quickcheck framework... feel free to fix it, that's probably a lot easier than explaining to me...

So far, I added the automaton dependency to the pom.xml:

        <dependency>
            <groupId>dk.brics.automaton</groupId>
            <artifactId>automaton</artifactId>
            <version>1.11-8</version>
        </dependency>

expanded @interface InRange:

public @interface InRange {
    ...
    /**
     * @return a regular expression pattern to match
     */
    String matchesRegex() default "";

and copy-paste-changed the generator class (TODO: contact the original author, see if he can remember writing that class):

package com.pholser.junit.quickcheck.generator.java.lang;

import com.pholser.junit.quickcheck.generator.GenerationStatus;
import com.pholser.junit.quickcheck.generator.Generator;
import com.pholser.junit.quickcheck.generator.InRange;
import com.pholser.junit.quickcheck.random.SourceOfRandomness;
import dk.brics.automaton.Automaton;
import dk.brics.automaton.RegExp;
import dk.brics.automaton.State;
import dk.brics.automaton.Transition;

import java.util.List;

import static com.pholser.junit.quickcheck.internal.Reflection.defaultValueOf;

/**
 * <p>Produces values for theory parameters of type {@link String} that adhere to a given regular expression.
 *
 * This class is heavily inspired by https://code.google.com/p/xeger/
 * </p>
 */
public class RegexGenerator extends Generator<String> {
    private String pattern = (String)defaultValueOf(InRange.class, "matchesRegex");
    private Automaton automaton;

    public RegexGenerator() {
        super(String.class);
    }

    /**
     * Tells this generator to produce values the match the given regular expression
     * @param range annotation that gives the range's constraints
     */
    public void configure(InRange range) {
        pattern = range.matchesRegex().isEmpty()?".*":range.matchesRegex();
        automaton = new RegExp(pattern).toAutomaton();
    }

    @Override public String generate(SourceOfRandomness random, GenerationStatus status) {
        StringBuilder builder = new StringBuilder();
        generate(builder, automaton.getInitialState(), random);
        return builder.toString();
    }

    private void generate(StringBuilder builder, State state, SourceOfRandomness random) {
        List<Transition> transitions = state.getSortedTransitions(true);
        if (transitions.size() == 0) {
            assert state.isAccept();
            return;
        }
        int nroptions = state.isAccept() ? transitions.size() : transitions.size() - 1;
        int option = random.nextInt(0, nroptions);
        if (state.isAccept() && option == 0) {          // 0 is considered stop
            return;
        }
        // Moving on to next transition
        Transition transition = transitions.get(option - (state.isAccept() ? 1 : 0));
        char c = (char) random.nextInt(transition.getMin(), transition.getMax());
        builder.append(c);
        generate(builder, transition.getDest(), random);
    }
}

And then the part where it should all come together: a test class

@RunWith(Theories.class)
public class RegexGeneratorTest {
    @Theory
    public void hold(@ForAll @InRange(matchesRegex = "ab*c+d?") String pattern) {
        System.out.println(pattern);
        assertTrue(pattern.matches("ab*c+d?"));
    };
}

fails with java.lang.IllegalArgumentException: Cannot find generator for com.pholser.junit.quickcheck.generator.java.lang.RegexGeneratorTest.hold:arg0 of type java.lang.String 😢

@pholser
Copy link
Owner

pholser commented Nov 13, 2015

I think your best bet would be to write and use a custom generator. Maybe something like this?

public class Structured extends Generator<String> {
    private Matching matching;

    public Structured() {
        super(String.class);
    }

    @Override public String generate(
        SourceOfRandomness random,
        GenerationStatus status) {

        Xeger x = new Xeger(matching != null ? matching.value() : ".*");
        return x.generate();
    }

    public void configure(Matching matching) {
        this.matching = matching;
    }

    @Target({ PARAMETER, FIELD, ANNOTATION_TYPE, TYPE_USE })
    @Retention(RUNTIME)
    @GeneratorConfiguration
    public @interface Matching {
        String value();
    }
}

@RunWith(Theories.class)
public class XegerTest {
    @Theory public void holds(
        @ForAll @From(Structured.class) @Matching("ab*c+d?") String pattern) {

        System.out.println(pattern);
        assertTrue(pattern.matches("ab*c+d?"));
    }
}

@pholser
Copy link
Owner

pholser commented Nov 13, 2015

@wviterson Your custom generator looks ok; to get junit-quickcheck to use it, you can add the marker @From(RegexGenerator.class) to the theory parameter:

@Theory public void hold(
    @ForAll @From(RegexGenerator.class) @InRange(matchesRegex = "ab*c+d?") String pattern) {

    System.out.println(pattern);
    assertTrue(pattern.matches("ab*c+d?"));
}

You can also package your generator in a ServiceLoader-style JAR file and make that JAR available on the class path; junit-quickcheck will pick it up and then can use your generator on theory parameters of type String. One potential downside to this approach is that custom generators for types that already have generators for them in junit-quickcheck-generators-*.jar complement those other generators rather than overriding them. This means that the "built-in" String generator would be available on a given generation, even though the built-in generator may not support @InRange.

At some point, I may reconsider the choices of generators for a parameter to consider only those generators that support the configuration annotations on the parameter. Right now, if a generator that can produce instance of a given type is attempted to be configured with annotations it doesn't understand, junit-quickcheck silently ignores the configuration but still allows the generator to generate values for the parameter.

Hope this helps! Thanks for your interest!

@wviterson
Copy link
Author

Thank you, the @from works. I can also use InRange.format(), so this new generator does not affect the InRange class:

    @Theory
    public void hold(@ForAll @From(RegexGenerator.class) @InRange(format = "ab*c+d?") String pattern) {
        System.out.println(pattern);
        assertTrue(pattern.matches("ab*c+d?"));
    };

Do you want the code somewhere in the codebase?

@wviterson
Copy link
Author

The TODO: I found the original author:

@wspringer: Thank you for writing xeger back in 2009. You inspired us!
As you can see, we had to copy-paste-modify the code (use a different random generator, avoid manual downloading in a local Maven repository).

Strictly speaking, that's against the Xeger license. Do you mind if we use the code that way?

Thanks,
Walter

@pholser
Copy link
Owner

pholser commented Nov 17, 2015

@wviterson I'd like to hold off on including RegexGenerator as part of junit-quickcheck-generators, and re-evaluate after I have more feedback on what to do re: multiple generators that can satisfy a theory parameter type.

For example, consider this theory:

@Theory pubilc void x(@ForAll @InRange(format = "ab*c+d?") String s) {
    // ...
}

With RegexGenerator as part of junit-quickcheck-generators, there would be two generators that could satisfy parameter s based on its type. On every generation, one of them would be chosen at random to generate a value for s; but RegexGenerator would honor the @InRange marker and StringGenerator would not. This is probably a mistake in how junit-quickcheck decides what generators are applicable -- it should choose from only those generators that can satisfy the theory parameter's type and will honor all of the configuration annotations on the parameter.

@wviterson
Copy link
Author

Ok, then I'll just use my custom generator for now... on the few 😦 projects that have Java 8 'already'

@pholser pholser closed this as completed Nov 19, 2015
@DeepSpawn
Copy link

As of 1.0.7 the serviceloader plugin supports includes/excludes. This means we could add a RegexGenerator to junit-quickcheck-generators but not have it get picked up as a string generator automatically.

Would you be interested in a PR adding a regex generator that requires an explicit @from to use?

@pholser
Copy link
Owner

pholser commented Aug 22, 2016

@DeepSpawn Absolutely I would! Thanks.

Another option for forcing @From is to have RegexGenerator override method canRegisterAsType(Class<?>) to answer false always.

@DeepSpawn
Copy link

DeepSpawn commented Oct 31, 2016

Hey, sorry for the delay in replying. So I had a look at using Generex for this purpose but I ran into an issue where some inputs would cause a stack overflow in the underlying dk.brics.automaton library ie mifmif/Generex#26.
I couldn't think of a particularly clean way of dealing with this problem which is why I have not raised a PR yet.

@vlsi
Copy link
Contributor

vlsi commented Sep 18, 2019

@pholser , could you please check RegexStringGenerator I've created for gradle/gradle#10724 ?

It looks like Generex is abandoned, and I guess it would be nice if quickcheck provided "regexp-based" generator with shrink support.

@SimY4
Copy link

SimY4 commented Jul 16, 2022

I just created a small project that supports generation of regex constrained strings for junit quickcheck. Have a look and leave feedback if you're keen:

https://github.com/SimY4/coregex

junit quickcheck usage example in unit tests:
https://github.com/SimY4/coregex/blob/main/junit-quickcheck/src/test/java/com/github/simy4/coregex/junit/quickcheck/CoregexGeneratorTest.java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants