Skip to content

Commit a0c2f71

Browse files
authored
Merge pull request #51 from jefferyjohn/chapter/regex
Add chapter on regular expressions
2 parents f0fd921 + 162d365 commit a0c2f71

File tree

2 files changed

+87
-0
lines changed

2 files changed

+87
-0
lines changed

book.adoc

+1
Original file line numberDiff line numberDiff line change
@@ -21,5 +21,6 @@ include::chapters/binary.adoc[]
2121
include::chapters/assembly.adoc[]
2222
include::chapters/careers.adoc[]
2323
include::chapters/environment.adoc[]
24+
include::chapters/regex.adoc[]
2425
include::chapters/git.adoc[]
2526
include::chapters/tools.adoc[]

chapters/regex.adoc

+86
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
[appendix]
2+
== Regular Expressions (Regex)
3+
[discrete]
4+
===== Jeffery John
5+
6+
{empty}
7+
8+
'''
9+
10+
[[regex]]
11+
12+
Regular expressions, or regex, are a way to search for patterns in text. For example, you can use regular expressions to look for email addresses in a document, or even a flag for a capture-the-flag challenge. Several programming languages, including Python, have built-in support for regular expressions.
13+
14+
=== Common Use Cases
15+
16+
You've likely used regex before. For example, `grep` and `find` are two Unix commands that use regular expressions to search for files and text. For more about them, xref:book.adoc#_how_to_search_for_strings_and_filenames[see our forensics section here].
17+
18+
Some other common use cases for regular expressions include searching for:
19+
20+
* URLs
21+
22+
* Phone numbers
23+
24+
* Dates
25+
26+
* IP addresses
27+
28+
* Passwords
29+
30+
Regular expressions can also be used to validate, or check, a user's input. For example, you may want to check that a user's credit card number is in the correct format before allowing them to submit a form.
31+
32+
This can also be useful for replacing or removing a string from a document. For example, you may want to remove all instances of a certain word, or perhaps prevent an attacker from submitting a form with malicious code.
33+
34+
=== Basic Syntax
35+
36+
Regex can be difficult to understand at a glance, as it is meant for describing patterns, not just simple strings.
37+
38+
A regex pattern is a sequence of characters that define a search. The regex `xyz` would match the string 'xyz', but not 'xy' or 'xzy'.
39+
40+
This can be expanded to include more complex patterns. For example, `x..` or `x.*y.*z`` would also match 'xyz', but also 'xab' or 'x123y456z'.
41+
42+
Much of our data is structured in a way that can be described by regular expressions. Email address often include the '@' symbol and a domain address, and credit card numbers often follow rules based on their issuer. Even our picoCTF flags are often in the format picoCTF{}, which could be described by regex as `picoCTF\{.{1,15}\}`.
43+
44+
==== Literal & Meta characters
45+
46+
Literal characters are the simplest pattern. They are characters that must be present. Like in our earlier example, the regex `xyz` could only match the string 'xyz'.
47+
48+
Metacharacters have special rules. For example, the period `.` can match any character. The asterisk `*` can match match zero or more of the character before it. Additionally, the plus `+` can match one or more of the character before it, and the question mark `?` can match zero or one of the character before it.
49+
50+
These can be combined to create even more complex patterns. While they sound very similar, a single character can make a big difference in the information you can find!
51+
52+
==== Escaping Special Characters
53+
54+
Just like in many programming languages, you can use a backslash `\` to escape a special character. For example, if you want to match a period, you would use `\.`. This prevents the period from being treated as a metacharacter, which would lead to your regex matching any character, not just a period.
55+
56+
==== Character Classes
57+
58+
Character classes are a way to find a set. The regex `[xyz]` would match any of the characters 'x', 'y', or 'z', but not necessarily need to match all of them. This can be expanded to include ranges, like `[a-z]` or `[0-9]`.
59+
60+
=== Anchors
61+
62+
Anchors can match the start (`^`) or end (`$`) of a string. This can be helpful if you aren't sure what the rest of the string looks like, but you know part of the pattern.
63+
64+
=== Regex in Python
65+
66+
We covered xref:book.adoc#_programming_in_python[Python] in our earlier chapters, which includes built-in support for regular expressions. By importing the `re` module, you can create and test regex in your code.
67+
68+
As an example:
69+
70+
[source,python]
71+
----
72+
import re
73+
74+
pattern = 'hello, *'
75+
string = 'hello, world!'
76+
match = re.search(pattern, string)
77+
78+
if match:
79+
print('Match found!')
80+
else:
81+
print('No match found.')
82+
----
83+
84+
This would print 'Match found!', as the pattern 'hello, *' matches the string 'hello, world!'. It would also return a match if the string included your name, like 'hello, reader!'.
85+
86+
Throughout this Primer, we'll share examples from xref:book.adoc#_levels_of_code[other coding languages] as well. Regex is a very helpful tool, and so it is nice to be able to use it in many different environments, depending on what is available and your comfort level. You might see regex for helping with a database query, website, or even a CTF challenge!

0 commit comments

Comments
 (0)