Skip to content

Commit 66838fd

Browse files
eliotwrobsonlwasser
authored andcommitted
Automata package blog post
1 parent 57ca766 commit 66838fd

File tree

3 files changed

+140
-0
lines changed

3 files changed

+140
-0
lines changed

_data/authors.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,3 +88,17 @@ Patrick J. Roddy:
8888
- label: Website
8989
icon: fas fa-fw fa-link
9090
url: https://paddyroddy.github.io
91+
Eliot W. Robson:
92+
name : Eliot W. Robson
93+
bio : "PhD Student, University of Illinois Urbana-Champaign"
94+
avatar : /images/people/eliot-w-robson.jpg
95+
links:
96+
- label: "Email"
97+
icon: "fas fa-fw fa-envelope-square"
98+
url: "mailto:[email protected]"
99+
- label: GitHub
100+
icon: fab fa-fw fa-github
101+
url: https://github.com/eliotwrobson
102+
- label: LinkedIn
103+
icon: fab fa-fw fa-linkedin
104+
url: https://www.linkedin.com/in/eliot-robson/

_posts/2024-07-10-automata.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
---
2+
layout: single
3+
title: "automata: Simulation and manipulation"
4+
excerpt: automata is a package implementing structures and algorithms for manipulating finite automata, pushdown automata, and Turing machines, that was recently accepted into the pyOpenSci ecosystem.
5+
author: Eliot W. Robson
6+
permalink: /blog/automata
7+
header:
8+
overlay_image: /images/automata/finite_language_dfa.png
9+
overlay_filter: rgba(20, 13, 36, 0.8)
10+
categories:
11+
- blog-post
12+
- automata
13+
- formal-languages
14+
- models-of-computation
15+
comments: true
16+
---
17+
18+
19+
Automata are abstract machines used to represent models of computation, and are a central object of study in theoretical computer science. Given an input string of characters over a fixed alphabet, these machines either accept or reject the string. A language corresponding to an automaton is
20+
the set of all strings it accepts. Three important families of automata in increasing order of generality are the following:
21+
22+
1. Finite-state automata
23+
2. Pushdown automata
24+
3. Turing machines
25+
26+
The [`automata`](https://caleb531.github.io/automata/) package facilitates working with these families by allowing simulation of reading input and higher-level manipulation
27+
of the corresponding languages using specialized algorithms. For an overview on automata theory, see [this Wikipedia article](https://en.wikipedia.org/wiki/Automata_theory), and
28+
for a more comprehensive introduction to each of these topics, see [these lecture notes](https://jeffe.cs.illinois.edu/teaching/algorithms/#models).
29+
30+
## Statement of need
31+
32+
Automata are a core component of both computer science education and research, seeing further theoretical work
33+
and applications in a wide variety of areas such as computational biology and networking.
34+
Consequently, the manipulation of automata with software packages has seen significant attention from
35+
researchers in the past. The similarly named Mathematica package [`Automata`](https://www.cs.cmu.edu/~sutner/automata.html) implements a number of
36+
algorithms for use with finite-state automata, including regular expression conversion and binary set operations.
37+
In Java, the [Brics package](https://www.brics.dk/automaton/) implements similar algorithms, while the [JFLAP package](https://www.jflap.org/) places an emphasis
38+
on interactivity and simulation of more general families of automata.
39+
40+
[`automata`](https://caleb531.github.io/automata/) serves the demand for such a package in the Python software ecosystem, implementing algorithms and allowing for
41+
simulation of automata in a manner comparable to the packages described previously. As a popular high-level language, Python enables
42+
significant flexibility and ease of use that directly benefits many users. The package includes a comprehensive test suite,
43+
support for modern language features (including type annotations), and has a large number of different automata,
44+
meeting the demands of users across a wide variety of use cases. In particular, the target audience
45+
is both researchers that wish to manipulate automata, and for those in educational contexts to reinforce understanding about how these
46+
models of computation function.
47+
48+
## Package features
49+
50+
The API of the package is designed to mimic the formal mathematical description of each automaton using built-in Python data structures
51+
(such as sets and dicts). This is for ease of use by those that are unfamiliar with these models of computation, while also providing performance
52+
suitable for tasks arising in research. In particular, algorithms in the package have been written for tackling
53+
performance on large inputs, incorporating optimizations such as only exploring the reachable set of states
54+
in the construction of a new finite-state automaton. The package also has native display integration with Jupyter
55+
notebooks, enabling easy visualization that allows students to interact with [`automata`](https://caleb531.github.io/automata/) in an exploratory manner.
56+
57+
Of note are some commonly used and technical algorithms implemented in the package for finite-state automata:
58+
59+
- An optimized version of the Hopcroft-Karp algorithm to determine whether two deterministic finite automata (DFA) are equivalent.
60+
61+
- The product construction algorithm for binary set operations (union, intersection, etc.) on the languages corresponding to two input DFAs.
62+
63+
- Thompson's algorithm for converting regular expressions to equivalent nondeterministic finite automata (NFA).
64+
65+
- Hopcroft's algorithm for DFA minimization.
66+
67+
- A specialized algorithm for directly constructing a state-minimal DFA accepting a given finite language.
68+
69+
- A specialized algorithm for directly constructing a minimal DFA recognizing strings containing a given substring.
70+
71+
To the authors' knowledge, this is the only Python package implementing all of the automata manipulation algorithms stated above.
72+
73+
## Example usage
74+
75+
![A visualization of `target_words_dfa`. Transitions on characters leading to immediate rejections are omitted.]({{ site.url }}/images/automata/finite_language_dfa.png)
76+
77+
![\label{fig:target_words_dfa}](finite_language_dfa.png){ width=100% }
78+
79+
The following example is inspired by the use case described in @Johnson_2010.
80+
We wish to determine which strings in a given set are within the target edit distance
81+
to a reference string. We will first initialize a DFA corresponding to a fixed set of target words
82+
over the alphabet of all lowercase ascii characters.
83+
84+
```python
85+
from automata.fa.dfa import DFA
86+
from automata.fa.nfa import NFA
87+
import string
88+
89+
target_words_dfa = DFA.from_finite_language(
90+
input_symbols=set(string.ascii_lowercase),
91+
language={'these', 'are', 'target', 'words', 'them', 'those'},
92+
)
93+
```
94+
95+
Next, we construct an NFA recognizing all strings within a target edit distance of a fixed
96+
reference string, and then immediately convert this to an equivalent DFA. The package provides
97+
builtin functions to make this construction easy, and we use the same alphabet as the DFA that was just created.
98+
99+
```python
100+
words_within_edit_distance_dfa = DFA.from_nfa(
101+
NFA.edit_distance(
102+
input_symbols=set(string.ascii_lowercase),
103+
reference_str='they',
104+
max_edit_distance=2,
105+
)
106+
)
107+
```
108+
109+
Finally, we take the intersection of the two DFAs we have constructed and read all of
110+
the words in the output DFA into a list. The library makes this straightforward and idiomatic.
111+
112+
```python
113+
found_words_dfa = target_words_dfa & words_within_edit_distance_dfa
114+
found_words = list(found_words_dfa)
115+
```
116+
117+
The DFA `found_words_dfa` accepts strings in the intersection of the languages of the
118+
DFAs given as input, and `found_words` is a list containing this language. Note the power of this
119+
technique is that the DFA `words_within_edit_distance_dfa`
120+
has an infinite language, meaning we could not do this same computation just using the builtin
121+
sets in Python directly (as they always represent a finite collection), although the
122+
syntax used by [`automata`](https://caleb531.github.io/automata/) is very similar to promote intuition.
123+
124+
## Citing
125+
126+
This post is adapted from [our JOSS paper](https://joss.theoj.org/papers/10.21105/joss.05759), which should be used for citations.
82.9 KB
Loading

0 commit comments

Comments
 (0)