|
| 1 | +--- |
| 2 | +layout: single |
| 3 | +title: "automata: Simulation and manipulation" |
| 4 | +excerpt: automata is a package implementing structures and algorithms for manipulating finite automata, pushdown automata, and Turing machines, that was recently accepted into the pyOpenSci ecosystem. |
| 5 | +author: Eliot W. Robson |
| 6 | +permalink: /blog/automata |
| 7 | +header: |
| 8 | + overlay_image: /images/automata/finite_language_dfa.png |
| 9 | + overlay_filter: rgba(20, 13, 36, 0.8) |
| 10 | +categories: |
| 11 | + - blog-post |
| 12 | + - automata |
| 13 | + - formal-languages |
| 14 | + - models-of-computation |
| 15 | +comments: true |
| 16 | +--- |
| 17 | + |
| 18 | + |
| 19 | +Automata are abstract machines used to represent models of computation, and are a central object of study in theoretical computer science. Given an input string of characters over a fixed alphabet, these machines either accept or reject the string. A language corresponding to an automaton is |
| 20 | +the set of all strings it accepts. Three important families of automata in increasing order of generality are the following: |
| 21 | + |
| 22 | +1. Finite-state automata |
| 23 | +2. Pushdown automata |
| 24 | +3. Turing machines |
| 25 | + |
| 26 | +The [`automata`](https://caleb531.github.io/automata/) package facilitates working with these families by allowing simulation of reading input and higher-level manipulation |
| 27 | +of the corresponding languages using specialized algorithms. For an overview on automata theory, see [this Wikipedia article](https://en.wikipedia.org/wiki/Automata_theory), and |
| 28 | +for a more comprehensive introduction to each of these topics, see [these lecture notes](https://jeffe.cs.illinois.edu/teaching/algorithms/#models). |
| 29 | + |
| 30 | +## Statement of need |
| 31 | + |
| 32 | +Automata are a core component of both computer science education and research, seeing further theoretical work |
| 33 | +and applications in a wide variety of areas such as computational biology and networking. |
| 34 | +Consequently, the manipulation of automata with software packages has seen significant attention from |
| 35 | +researchers in the past. The similarly named Mathematica package [`Automata`](https://www.cs.cmu.edu/~sutner/automata.html) implements a number of |
| 36 | +algorithms for use with finite-state automata, including regular expression conversion and binary set operations. |
| 37 | +In Java, the [Brics package](https://www.brics.dk/automaton/) implements similar algorithms, while the [JFLAP package](https://www.jflap.org/) places an emphasis |
| 38 | +on interactivity and simulation of more general families of automata. |
| 39 | + |
| 40 | +[`automata`](https://caleb531.github.io/automata/) serves the demand for such a package in the Python software ecosystem, implementing algorithms and allowing for |
| 41 | +simulation of automata in a manner comparable to the packages described previously. As a popular high-level language, Python enables |
| 42 | +significant flexibility and ease of use that directly benefits many users. The package includes a comprehensive test suite, |
| 43 | +support for modern language features (including type annotations), and has a large number of different automata, |
| 44 | +meeting the demands of users across a wide variety of use cases. In particular, the target audience |
| 45 | +is both researchers that wish to manipulate automata, and for those in educational contexts to reinforce understanding about how these |
| 46 | +models of computation function. |
| 47 | + |
| 48 | +## Package features |
| 49 | + |
| 50 | +The API of the package is designed to mimic the formal mathematical description of each automaton using built-in Python data structures |
| 51 | +(such as sets and dicts). This is for ease of use by those that are unfamiliar with these models of computation, while also providing performance |
| 52 | +suitable for tasks arising in research. In particular, algorithms in the package have been written for tackling |
| 53 | +performance on large inputs, incorporating optimizations such as only exploring the reachable set of states |
| 54 | +in the construction of a new finite-state automaton. The package also has native display integration with Jupyter |
| 55 | +notebooks, enabling easy visualization that allows students to interact with [`automata`](https://caleb531.github.io/automata/) in an exploratory manner. |
| 56 | + |
| 57 | +Of note are some commonly used and technical algorithms implemented in the package for finite-state automata: |
| 58 | + |
| 59 | +- An optimized version of the Hopcroft-Karp algorithm to determine whether two deterministic finite automata (DFA) are equivalent. |
| 60 | + |
| 61 | +- The product construction algorithm for binary set operations (union, intersection, etc.) on the languages corresponding to two input DFAs. |
| 62 | + |
| 63 | +- Thompson's algorithm for converting regular expressions to equivalent nondeterministic finite automata (NFA). |
| 64 | + |
| 65 | +- Hopcroft's algorithm for DFA minimization. |
| 66 | + |
| 67 | +- A specialized algorithm for directly constructing a state-minimal DFA accepting a given finite language. |
| 68 | + |
| 69 | +- A specialized algorithm for directly constructing a minimal DFA recognizing strings containing a given substring. |
| 70 | + |
| 71 | +To the authors' knowledge, this is the only Python package implementing all of the automata manipulation algorithms stated above. |
| 72 | + |
| 73 | +## Example usage |
| 74 | + |
| 75 | + |
| 76 | + |
| 77 | +{ width=100% } |
| 78 | + |
| 79 | +The following example is inspired by the use case described in @Johnson_2010. |
| 80 | +We wish to determine which strings in a given set are within the target edit distance |
| 81 | +to a reference string. We will first initialize a DFA corresponding to a fixed set of target words |
| 82 | +over the alphabet of all lowercase ascii characters. |
| 83 | + |
| 84 | +```python |
| 85 | +from automata.fa.dfa import DFA |
| 86 | +from automata.fa.nfa import NFA |
| 87 | +import string |
| 88 | + |
| 89 | +target_words_dfa = DFA.from_finite_language( |
| 90 | + input_symbols=set(string.ascii_lowercase), |
| 91 | + language={'these', 'are', 'target', 'words', 'them', 'those'}, |
| 92 | +) |
| 93 | +``` |
| 94 | + |
| 95 | +Next, we construct an NFA recognizing all strings within a target edit distance of a fixed |
| 96 | +reference string, and then immediately convert this to an equivalent DFA. The package provides |
| 97 | +builtin functions to make this construction easy, and we use the same alphabet as the DFA that was just created. |
| 98 | + |
| 99 | +```python |
| 100 | +words_within_edit_distance_dfa = DFA.from_nfa( |
| 101 | + NFA.edit_distance( |
| 102 | + input_symbols=set(string.ascii_lowercase), |
| 103 | + reference_str='they', |
| 104 | + max_edit_distance=2, |
| 105 | + ) |
| 106 | +) |
| 107 | +``` |
| 108 | + |
| 109 | +Finally, we take the intersection of the two DFAs we have constructed and read all of |
| 110 | +the words in the output DFA into a list. The library makes this straightforward and idiomatic. |
| 111 | + |
| 112 | +```python |
| 113 | +found_words_dfa = target_words_dfa & words_within_edit_distance_dfa |
| 114 | +found_words = list(found_words_dfa) |
| 115 | +``` |
| 116 | + |
| 117 | +The DFA `found_words_dfa` accepts strings in the intersection of the languages of the |
| 118 | +DFAs given as input, and `found_words` is a list containing this language. Note the power of this |
| 119 | +technique is that the DFA `words_within_edit_distance_dfa` |
| 120 | +has an infinite language, meaning we could not do this same computation just using the builtin |
| 121 | +sets in Python directly (as they always represent a finite collection), although the |
| 122 | +syntax used by [`automata`](https://caleb531.github.io/automata/) is very similar to promote intuition. |
| 123 | + |
| 124 | +## Citing |
| 125 | + |
| 126 | +This post is adapted from [our JOSS paper](https://joss.theoj.org/papers/10.21105/joss.05759), which should be used for citations. |
0 commit comments