|
11 | 11 | "cell_type": "markdown",
|
12 | 12 | "metadata": {},
|
13 | 13 | "source": [
|
14 |
| - "## Regular Expressions" |
| 14 | + "# Regular Expressions\n", |
| 15 | + "\n", |
| 16 | + "**What are Regualar Expressions?** \n", |
| 17 | + "A regular expression, often abbreviated as \"regex\" or \"regexp,\" is a powerful tool used in computer science and text processing to describe and match patterns in strings. It's a sequence of characters that defines a search pattern. These patterns can be used for tasks like searching, extracting, replacing, and validating text.\n", |
| 18 | + "\n", |
| 19 | + "Regular expressions are widely used in tasks like text searching and manipulation, data validation, parsing, and more. They are supported by many programming languages and text processing tools, making them a versatile and essential tool for working with text data.\n", |
| 20 | + "\n", |
| 21 | + "**Why Regular Expressions?** \n", |
| 22 | + "Regular expressions are used for a variety of tasks in computer science, data processing, and text analysis due to their powerful pattern matching capabilities. Here are some key reasons why regular expressions are widely used:\n", |
| 23 | + "\n", |
| 24 | + "1. **Pattern Matching:** Regular expressions allow you to search for specific patterns or sequences of characters within a larger body of text. This is useful for tasks like finding specific words, dates, email addresses, or other structured information.\n", |
| 25 | + "\n", |
| 26 | + "2. **Text Extraction:** They can be used to extract specific pieces of information from a text document, such as names, phone numbers, URLs, or any other structured data.\n", |
| 27 | + "\n", |
| 28 | + "3. **Data Validation:** Regular expressions are used to validate input data. For example, you can use them to check if a user-provided email address or phone number is in the correct format.\n", |
| 29 | + "\n", |
| 30 | + "4. **Search and Replace:** They enable you to search for specific patterns in a text and replace them with something else. This is useful for tasks like cleaning up data or making text substitutions.\n", |
| 31 | + "\n", |
| 32 | + "5. **Parsing and Tokenization:** Regular expressions are essential for breaking down text into smaller units or tokens. This is used in tasks like natural language processing and compiler design.\n", |
| 33 | + "\n", |
| 34 | + "6. **Web Scraping:** When extracting data from websites, regular expressions can be used to locate and extract specific elements or information from HTML pages.\n", |
| 35 | + "\n", |
| 36 | + "7. **Log File Analysis:** Regular expressions are invaluable for searching and parsing log files, allowing you to extract important information or identify patterns of interest.\n", |
| 37 | + "\n", |
| 38 | + "8. **Pattern Validation:** They are used to validate whether a string adheres to a specific pattern or format, such as checking if a password meets certain criteria.\n", |
| 39 | + "\n", |
| 40 | + "9. **Data Transformation and Cleaning:** Regular expressions can be used to clean and transform text data. For example, removing unnecessary characters or formatting.\n", |
| 41 | + "\n", |
| 42 | + "10. **Language Agnostic:** Regular expressions are supported by most programming languages, making them a versatile tool that can be applied in a wide range of contexts.\n", |
| 43 | + "\n", |
| 44 | + "11. **Efficiency:** When used correctly, regular expressions can provide very efficient search and match operations, especially for complex patterns.\n", |
| 45 | + "\n", |
| 46 | + "Overall, regular expressions are a fundamental tool for working with text data and are an essential skill for tasks ranging from data preprocessing to text analysis and beyond. They provide a flexible and powerful means to perform complex pattern matching operations.\n", |
| 47 | + "\n", |
| 48 | + "\n", |
| 49 | + "**Applications of Regular Expressions** \n", |
| 50 | + "Regular expressions (regex) find applications in a wide range of real-time scenarios across various domains. Here are some of the most important applications of regex:\n", |
| 51 | + "\n", |
| 52 | + "1. **Data Validation and Form Input:**\n", |
| 53 | + " - Ensuring that user-provided data (like email addresses, phone numbers, passwords, etc.) adhere to specified formats before processing.\n", |
| 54 | + "\n", |
| 55 | + "2. **Search and Replace in Text Editors:**\n", |
| 56 | + " - Find and replace operations in text editors or IDEs, allowing for quick and precise changes in code or documents.\n", |
| 57 | + "\n", |
| 58 | + "3. **Log File Parsing and Analysis:**\n", |
| 59 | + " - Extracting relevant information from log files, helping to identify patterns, errors, or anomalies in system logs.\n", |
| 60 | + "\n", |
| 61 | + "4. **Web Scraping and Data Extraction:**\n", |
| 62 | + " - Extracting specific information from web pages, like email addresses, phone numbers, product names, etc., for further analysis.\n", |
| 63 | + "\n", |
| 64 | + "5. **Data Cleaning and Transformation:**\n", |
| 65 | + " - Preprocessing text data by removing unnecessary characters, fixing formatting issues, and standardizing data for analysis.\n", |
| 66 | + "\n", |
| 67 | + "6. **Search Engines and Information Retrieval:**\n", |
| 68 | + " - Powering search engines for matching user queries to relevant content on websites or databases.\n", |
| 69 | + "\n", |
| 70 | + "7. **URL Routing and Validation:**\n", |
| 71 | + " - Validating and parsing URLs to ensure they follow the correct format and extracting specific parameters from them.\n", |
| 72 | + "\n", |
| 73 | + "8. **Lexical Analysis in Compiler Design:**\n", |
| 74 | + " - Tokenizing source code into meaningful units for further processing by a compiler.\n", |
| 75 | + "\n", |
| 76 | + "9. **Natural Language Processing (NLP):**\n", |
| 77 | + " - Tokenizing sentences or words, extracting entities (like names, dates, locations), and performing advanced text processing tasks.\n", |
| 78 | + "\n", |
| 79 | + "10. **Network Security and Firewall Rules:**\n", |
| 80 | + " - Defining and enforcing rules for allowing or blocking specific types of traffic based on patterns in network traffic logs.\n", |
| 81 | + "\n", |
| 82 | + "11. **Database Querying and Validation:**\n", |
| 83 | + " - Validating and querying databases for specific patterns or formats, such as social security numbers, credit card numbers, etc.\n", |
| 84 | + "\n", |
| 85 | + "12. **Formal Language Theory and Automata:**\n", |
| 86 | + " - In computer science theory, regex is used in the definition of regular languages and finite automata.\n", |
| 87 | + "\n", |
| 88 | + "13. **Validation of Configuration Files:**\n", |
| 89 | + " - Ensuring that configuration files for software or systems follow the correct syntax and structure.\n", |
| 90 | + "\n", |
| 91 | + "14. **Extracting Metadata from Documents:**\n", |
| 92 | + " - Parsing documents (like PDFs, Word documents) to extract metadata such as titles, authors, dates, etc.\n", |
| 93 | + "\n", |
| 94 | + "15. **URL Rewriting in Web Servers:**\n", |
| 95 | + " - Modifying URLs on the fly to improve SEO or to direct traffic to specific pages.\n", |
| 96 | + "\n", |
| 97 | + "16. **Pattern Matching in DNA Sequences:**\n", |
| 98 | + " - Identifying specific genetic sequences or motifs in DNA for biological research.\n", |
| 99 | + "\n", |
| 100 | + "These are just some of the many real-time applications of regular expressions. Their versatility and powerful pattern-matching capabilities make them an invaluable tool in various fields of computer science and data processing." |
15 | 101 | ]
|
16 | 102 | },
|
17 | 103 | {
|
18 | 104 | "cell_type": "markdown",
|
19 | 105 | "metadata": {},
|
20 | 106 | "source": [
|
| 107 | + "**References for Practice** \n", |
21 | 108 | "Try to solve all the interactive tutorial from below mentioned website:\n",
|
22 | 109 | "\n",
|
23 | 110 | "https://regexone.com/lesson/introduction_abcs\n",
|
24 | 111 | "\n",
|
25 | 112 | "https://regex101.com"
|
26 | 113 | ]
|
27 | 114 | },
|
| 115 | + { |
| 116 | + "cell_type": "markdown", |
| 117 | + "metadata": {}, |
| 118 | + "source": [ |
| 119 | + "### Meta Characters\n", |
| 120 | + "`.`, `^`, `$`, `*`, `+`, `?`, `{`, `}`, `[`, `]`, `(`, `)`, `|`, `\\`\n", |
| 121 | + "\n", |
| 122 | + "### User-defined Character Classes\n", |
| 123 | + "- `[abc]` - Match either a or b or c\n", |
| 124 | + "- `[^abc]` - Match any character except a or b or c\n", |
| 125 | + "- `[a-z]` - Match a lower case english alphabet character\n", |
| 126 | + "- `[A-Z]` - Match an upper case english alphabet character\n", |
| 127 | + "- `[a-zA-Z]` - Match any english alphabet character\n", |
| 128 | + "- `[0-9]` - Match any digit character\n", |
| 129 | + "- `[a-zA-Z0-9_]` - Match any alphanumeric character\n", |
| 130 | + "- `[^a-zA-Z0-9_]` - Match any character except alphanumeric character\n", |
| 131 | + "\n", |
| 132 | + "### Pre-defined Character Classes\n", |
| 133 | + "- `\\d` - Match a digit character i.e. `[0-9]`\n", |
| 134 | + "- `\\D` - Match any character except digit character. i.e. `[^0-9]`\n", |
| 135 | + "- `\\w` - Match an alpha-numeric character i.e. `[a-zA-Z0-9_]`\n", |
| 136 | + "- `\\W` - Match any character except alpha-numeric character i.e. `[^a-zA-Z0-9_]`\n", |
| 137 | + "- `\\s` - Match a space character.\n", |
| 138 | + "- `\\S` - Match any character except space.\n", |
| 139 | + "- `\\t` - Match a tab character.\n", |
| 140 | + "- `\\n` - Match a next line character.\n", |
| 141 | + "\n", |
| 142 | + "### Quantifiers\n", |
| 143 | + "- `a*` - Match zero or more number of characters\n", |
| 144 | + "- `a+` - Match one or more number of characters\n", |
| 145 | + "- `a?` - Match atmost 1 character i.e. 0 or 1\n", |
| 146 | + "- `a{n}` - Match exactly n number of character\n", |
| 147 | + "- `a{m, n}` - Match atleast m number and atmost n number of characters" |
| 148 | + ] |
| 149 | + }, |
| 150 | + { |
| 151 | + "cell_type": "markdown", |
| 152 | + "metadata": {}, |
| 153 | + "source": [ |
| 154 | + "## Importing the Required Module" |
| 155 | + ] |
| 156 | + }, |
| 157 | + { |
| 158 | + "cell_type": "code", |
| 159 | + "execution_count": 1, |
| 160 | + "metadata": {}, |
| 161 | + "outputs": [], |
| 162 | + "source": [ |
| 163 | + "import re" |
| 164 | + ] |
| 165 | + }, |
28 | 166 | {
|
29 | 167 | "cell_type": "markdown",
|
30 | 168 | "metadata": {},
|
|
53 | 191 | "source": [
|
54 | 192 | "# find all matches and returns a list\n",
|
55 | 193 | "\n",
|
56 |
| - "import re\n", |
57 |
| - "\n", |
58 | 194 | "lst = re.findall('[0-9]', '0@kM29-1')\n",
|
59 | 195 | "\n",
|
60 | 196 | "print(lst)"
|
61 | 197 | ]
|
62 | 198 | },
|
63 |
| - { |
64 |
| - "cell_type": "code", |
65 |
| - "execution_count": 62, |
66 |
| - "metadata": {}, |
67 |
| - "outputs": [ |
68 |
| - { |
69 |
| - "name": "stdout", |
70 |
| - "output_type": "stream", |
71 |
| - "text": [ |
72 |
| - "<callable_iterator object at 0x0000023C69FFF9A0>\n", |
73 |
| - "0\n", |
74 |
| - "2\n", |
75 |
| - "9\n", |
76 |
| - "1\n" |
77 |
| - ] |
78 |
| - } |
79 |
| - ], |
80 |
| - "source": [ |
81 |
| - "import re\n", |
82 |
| - "\n", |
83 |
| - "matcher = re.finditer('[0-9]', '0@kM29-1')\n", |
84 |
| - "\n", |
85 |
| - "print(matcher)\n", |
86 |
| - "\n", |
87 |
| - "for m in matcher:\n", |
88 |
| - " print(m.group())" |
89 |
| - ] |
90 |
| - }, |
91 | 199 | {
|
92 | 200 | "cell_type": "code",
|
93 | 201 | "execution_count": 67,
|
|
486 | 594 | "ExecuteTime": {
|
487 | 595 | "end_time": "2018-06-07T05:24:58.259291Z",
|
488 | 596 | "start_time": "2018-06-07T05:24:58.254301Z"
|
489 |
| - }, |
490 |
| - "scrolled": true |
| 597 | + } |
491 | 598 | },
|
492 | 599 | "outputs": [
|
493 | 600 | {
|
|
1262 | 1369 | ],
|
1263 | 1370 | "metadata": {
|
1264 | 1371 | "kernelspec": {
|
1265 |
| - "display_name": "Python 3", |
| 1372 | + "display_name": "Python 3 (ipykernel)", |
1266 | 1373 | "language": "python",
|
1267 | 1374 | "name": "python3"
|
1268 | 1375 | },
|
|
1276 | 1383 | "name": "python",
|
1277 | 1384 | "nbconvert_exporter": "python",
|
1278 | 1385 | "pygments_lexer": "ipython3",
|
1279 |
| - "version": "3.8.5" |
| 1386 | + "version": "3.9.13" |
1280 | 1387 | },
|
1281 | 1388 | "toc": {
|
1282 | 1389 | "nav_menu": {},
|
|
1321 | 1428 | }
|
1322 | 1429 | },
|
1323 | 1430 | "nbformat": 4,
|
1324 |
| - "nbformat_minor": 2 |
| 1431 | + "nbformat_minor": 4 |
1325 | 1432 | }
|
0 commit comments