Skip to content

Commit 33ed1a7

Browse files
committed
updated
1 parent 93e6582 commit 33ed1a7

File tree

25 files changed

+3509
-36
lines changed

25 files changed

+3509
-36
lines changed

Module 1 - Python Programming/08. Regular Expressions/.ipynb_checkpoints/reg_exp-checkpoint.ipynb

Lines changed: 1447 additions & 0 deletions
Large diffs are not rendered by default.

Module 6 - Case Studies/8. Regex and Webscrapping/Regular Expressions/reg_exp.ipynb renamed to Module 1 - Python Programming/08. Regular Expressions/reg_exp.ipynb

Lines changed: 143 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -11,20 +11,158 @@
1111
"cell_type": "markdown",
1212
"metadata": {},
1313
"source": [
14-
"## Regular Expressions"
14+
"# Regular Expressions\n",
15+
"\n",
16+
"**What are Regualar Expressions?** \n",
17+
"A regular expression, often abbreviated as \"regex\" or \"regexp,\" is a powerful tool used in computer science and text processing to describe and match patterns in strings. It's a sequence of characters that defines a search pattern. These patterns can be used for tasks like searching, extracting, replacing, and validating text.\n",
18+
"\n",
19+
"Regular expressions are widely used in tasks like text searching and manipulation, data validation, parsing, and more. They are supported by many programming languages and text processing tools, making them a versatile and essential tool for working with text data.\n",
20+
"\n",
21+
"**Why Regular Expressions?** \n",
22+
"Regular expressions are used for a variety of tasks in computer science, data processing, and text analysis due to their powerful pattern matching capabilities. Here are some key reasons why regular expressions are widely used:\n",
23+
"\n",
24+
"1. **Pattern Matching:** Regular expressions allow you to search for specific patterns or sequences of characters within a larger body of text. This is useful for tasks like finding specific words, dates, email addresses, or other structured information.\n",
25+
"\n",
26+
"2. **Text Extraction:** They can be used to extract specific pieces of information from a text document, such as names, phone numbers, URLs, or any other structured data.\n",
27+
"\n",
28+
"3. **Data Validation:** Regular expressions are used to validate input data. For example, you can use them to check if a user-provided email address or phone number is in the correct format.\n",
29+
"\n",
30+
"4. **Search and Replace:** They enable you to search for specific patterns in a text and replace them with something else. This is useful for tasks like cleaning up data or making text substitutions.\n",
31+
"\n",
32+
"5. **Parsing and Tokenization:** Regular expressions are essential for breaking down text into smaller units or tokens. This is used in tasks like natural language processing and compiler design.\n",
33+
"\n",
34+
"6. **Web Scraping:** When extracting data from websites, regular expressions can be used to locate and extract specific elements or information from HTML pages.\n",
35+
"\n",
36+
"7. **Log File Analysis:** Regular expressions are invaluable for searching and parsing log files, allowing you to extract important information or identify patterns of interest.\n",
37+
"\n",
38+
"8. **Pattern Validation:** They are used to validate whether a string adheres to a specific pattern or format, such as checking if a password meets certain criteria.\n",
39+
"\n",
40+
"9. **Data Transformation and Cleaning:** Regular expressions can be used to clean and transform text data. For example, removing unnecessary characters or formatting.\n",
41+
"\n",
42+
"10. **Language Agnostic:** Regular expressions are supported by most programming languages, making them a versatile tool that can be applied in a wide range of contexts.\n",
43+
"\n",
44+
"11. **Efficiency:** When used correctly, regular expressions can provide very efficient search and match operations, especially for complex patterns.\n",
45+
"\n",
46+
"Overall, regular expressions are a fundamental tool for working with text data and are an essential skill for tasks ranging from data preprocessing to text analysis and beyond. They provide a flexible and powerful means to perform complex pattern matching operations.\n",
47+
"\n",
48+
"\n",
49+
"**Applications of Regular Expressions** \n",
50+
"Regular expressions (regex) find applications in a wide range of real-time scenarios across various domains. Here are some of the most important applications of regex:\n",
51+
"\n",
52+
"1. **Data Validation and Form Input:**\n",
53+
" - Ensuring that user-provided data (like email addresses, phone numbers, passwords, etc.) adhere to specified formats before processing.\n",
54+
"\n",
55+
"2. **Search and Replace in Text Editors:**\n",
56+
" - Find and replace operations in text editors or IDEs, allowing for quick and precise changes in code or documents.\n",
57+
"\n",
58+
"3. **Log File Parsing and Analysis:**\n",
59+
" - Extracting relevant information from log files, helping to identify patterns, errors, or anomalies in system logs.\n",
60+
"\n",
61+
"4. **Web Scraping and Data Extraction:**\n",
62+
" - Extracting specific information from web pages, like email addresses, phone numbers, product names, etc., for further analysis.\n",
63+
"\n",
64+
"5. **Data Cleaning and Transformation:**\n",
65+
" - Preprocessing text data by removing unnecessary characters, fixing formatting issues, and standardizing data for analysis.\n",
66+
"\n",
67+
"6. **Search Engines and Information Retrieval:**\n",
68+
" - Powering search engines for matching user queries to relevant content on websites or databases.\n",
69+
"\n",
70+
"7. **URL Routing and Validation:**\n",
71+
" - Validating and parsing URLs to ensure they follow the correct format and extracting specific parameters from them.\n",
72+
"\n",
73+
"8. **Lexical Analysis in Compiler Design:**\n",
74+
" - Tokenizing source code into meaningful units for further processing by a compiler.\n",
75+
"\n",
76+
"9. **Natural Language Processing (NLP):**\n",
77+
" - Tokenizing sentences or words, extracting entities (like names, dates, locations), and performing advanced text processing tasks.\n",
78+
"\n",
79+
"10. **Network Security and Firewall Rules:**\n",
80+
" - Defining and enforcing rules for allowing or blocking specific types of traffic based on patterns in network traffic logs.\n",
81+
"\n",
82+
"11. **Database Querying and Validation:**\n",
83+
" - Validating and querying databases for specific patterns or formats, such as social security numbers, credit card numbers, etc.\n",
84+
"\n",
85+
"12. **Formal Language Theory and Automata:**\n",
86+
" - In computer science theory, regex is used in the definition of regular languages and finite automata.\n",
87+
"\n",
88+
"13. **Validation of Configuration Files:**\n",
89+
" - Ensuring that configuration files for software or systems follow the correct syntax and structure.\n",
90+
"\n",
91+
"14. **Extracting Metadata from Documents:**\n",
92+
" - Parsing documents (like PDFs, Word documents) to extract metadata such as titles, authors, dates, etc.\n",
93+
"\n",
94+
"15. **URL Rewriting in Web Servers:**\n",
95+
" - Modifying URLs on the fly to improve SEO or to direct traffic to specific pages.\n",
96+
"\n",
97+
"16. **Pattern Matching in DNA Sequences:**\n",
98+
" - Identifying specific genetic sequences or motifs in DNA for biological research.\n",
99+
"\n",
100+
"These are just some of the many real-time applications of regular expressions. Their versatility and powerful pattern-matching capabilities make them an invaluable tool in various fields of computer science and data processing."
15101
]
16102
},
17103
{
18104
"cell_type": "markdown",
19105
"metadata": {},
20106
"source": [
107+
"**References for Practice** \n",
21108
"Try to solve all the interactive tutorial from below mentioned website:\n",
22109
"\n",
23110
"https://regexone.com/lesson/introduction_abcs\n",
24111
"\n",
25112
"https://regex101.com"
26113
]
27114
},
115+
{
116+
"cell_type": "markdown",
117+
"metadata": {},
118+
"source": [
119+
"### Meta Characters\n",
120+
"`.`, `^`, `$`, `*`, `+`, `?`, `{`, `}`, `[`, `]`, `(`, `)`, `|`, `\\`\n",
121+
"\n",
122+
"### User-defined Character Classes\n",
123+
"- `[abc]` - Match either a or b or c\n",
124+
"- `[^abc]` - Match any character except a or b or c\n",
125+
"- `[a-z]` - Match a lower case english alphabet character\n",
126+
"- `[A-Z]` - Match an upper case english alphabet character\n",
127+
"- `[a-zA-Z]` - Match any english alphabet character\n",
128+
"- `[0-9]` - Match any digit character\n",
129+
"- `[a-zA-Z0-9_]` - Match any alphanumeric character\n",
130+
"- `[^a-zA-Z0-9_]` - Match any character except alphanumeric character\n",
131+
"\n",
132+
"### Pre-defined Character Classes\n",
133+
"- `\\d` - Match a digit character i.e. `[0-9]`\n",
134+
"- `\\D` - Match any character except digit character. i.e. `[^0-9]`\n",
135+
"- `\\w` - Match an alpha-numeric character i.e. `[a-zA-Z0-9_]`\n",
136+
"- `\\W` - Match any character except alpha-numeric character i.e. `[^a-zA-Z0-9_]`\n",
137+
"- `\\s` - Match a space character.\n",
138+
"- `\\S` - Match any character except space.\n",
139+
"- `\\t` - Match a tab character.\n",
140+
"- `\\n` - Match a next line character.\n",
141+
"\n",
142+
"### Quantifiers\n",
143+
"- `a*` - Match zero or more number of characters\n",
144+
"- `a+` - Match one or more number of characters\n",
145+
"- `a?` - Match atmost 1 character i.e. 0 or 1\n",
146+
"- `a{n}` - Match exactly n number of character\n",
147+
"- `a{m, n}` - Match atleast m number and atmost n number of characters"
148+
]
149+
},
150+
{
151+
"cell_type": "markdown",
152+
"metadata": {},
153+
"source": [
154+
"## Importing the Required Module"
155+
]
156+
},
157+
{
158+
"cell_type": "code",
159+
"execution_count": 1,
160+
"metadata": {},
161+
"outputs": [],
162+
"source": [
163+
"import re"
164+
]
165+
},
28166
{
29167
"cell_type": "markdown",
30168
"metadata": {},
@@ -53,41 +191,11 @@
53191
"source": [
54192
"# find all matches and returns a list\n",
55193
"\n",
56-
"import re\n",
57-
"\n",
58194
"lst = re.findall('[0-9]', '0@kM29-1')\n",
59195
"\n",
60196
"print(lst)"
61197
]
62198
},
63-
{
64-
"cell_type": "code",
65-
"execution_count": 62,
66-
"metadata": {},
67-
"outputs": [
68-
{
69-
"name": "stdout",
70-
"output_type": "stream",
71-
"text": [
72-
"<callable_iterator object at 0x0000023C69FFF9A0>\n",
73-
"0\n",
74-
"2\n",
75-
"9\n",
76-
"1\n"
77-
]
78-
}
79-
],
80-
"source": [
81-
"import re\n",
82-
"\n",
83-
"matcher = re.finditer('[0-9]', '0@kM29-1')\n",
84-
"\n",
85-
"print(matcher)\n",
86-
"\n",
87-
"for m in matcher:\n",
88-
" print(m.group())"
89-
]
90-
},
91199
{
92200
"cell_type": "code",
93201
"execution_count": 67,
@@ -486,8 +594,7 @@
486594
"ExecuteTime": {
487595
"end_time": "2018-06-07T05:24:58.259291Z",
488596
"start_time": "2018-06-07T05:24:58.254301Z"
489-
},
490-
"scrolled": true
597+
}
491598
},
492599
"outputs": [
493600
{
@@ -1262,7 +1369,7 @@
12621369
],
12631370
"metadata": {
12641371
"kernelspec": {
1265-
"display_name": "Python 3",
1372+
"display_name": "Python 3 (ipykernel)",
12661373
"language": "python",
12671374
"name": "python3"
12681375
},
@@ -1276,7 +1383,7 @@
12761383
"name": "python",
12771384
"nbconvert_exporter": "python",
12781385
"pygments_lexer": "ipython3",
1279-
"version": "3.8.5"
1386+
"version": "3.9.13"
12801387
},
12811388
"toc": {
12821389
"nav_menu": {},
@@ -1321,5 +1428,5 @@
13211428
}
13221429
},
13231430
"nbformat": 4,
1324-
"nbformat_minor": 2
1431+
"nbformat_minor": 4
13251432
}

0 commit comments

Comments
 (0)