You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+94-1Lines changed: 94 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,7 @@ Repository of the project minishell from 42 Porto.
9
9
-[Introduction](#introduction)
10
10
-[Usage](#usage)
11
11
-[Example of usage](#example-of-usage)
12
+
-[Overview](#overview)
12
13
13
14
## Sources and Acknowledgments
14
15
I would like to share the key resources that helped me to construct this project. My sincere thanks go out to everyone who has shared their knowledge freely with the community.
@@ -69,4 +70,96 @@ This will compile an executable program called minishell.
69
70
## Example of Usage
70
71
**Click on the image below to watch on Youtube an exaple of usage of this project**
71
72
72
-
[](https://www.youtube.com/watch?v=urz76d7-Gq4)
73
+
[](https://www.youtube.com/watch?v=urz76d7-Gq4)
74
+
75
+
---
76
+
77
+
## Overview
78
+
79
+
This project recreates a simplified version of Bash with some basic functionalities. The implementation uses a top-down parsing algorithm with a tree structure. The program is divided into several key parts:
80
+
81
+
1.**Syntax Analysis**
82
+
2.**Tokenization**
83
+
3.**Parsing**
84
+
4.**Execution**
85
+
86
+
## 1. Syntax Analysis
87
+
88
+
**Purpose:** Syntax analysis checks if the user's input follows the correct syntax rules before any further processing. It is done at the beginning to ensure that the input is valid and free from structural errors.
89
+
90
+
-**When it's performed:** Syntax analysis happens first, before tokenization and parsing, to quickly catch errors.
91
+
-**Why at the start?** By performing syntax analysis early, the program avoids unnecessary computations and memory allocations. If the syntax is invalid, there's no point in proceeding with tokenization, parsing, or execution.
92
+
-**How it works:** It scans the raw input to detect issues such as mismatched parentheses, missing operators, or unbalanced quotes.
93
+
94
+
## 2. Tokenization
95
+
96
+
**Purpose:** Tokenization breaks down the input string into meaningful chunks or "tokens" that can be processed individually. These tokens represent different parts of the command, such as commands, arguments, options, and operators.
97
+
98
+
-**What gets tokenized?**
99
+
-**Commands:** e.g., `ls`, `echo`, `cat`
100
+
-**Arguments:** e.g., `-l`, `/home/user`
101
+
-**Operators:** e.g., `|`, `>`, `>>`, `<<`
102
+
-**Separators:** e.g., spaces, semicolons
103
+
104
+
**Purpose:** Tokenization breaks down the input string into meaningful units, or "tokens," that the shell can process individually. These tokens represent different parts of the command, such as commands, arguments, options, and special characters.
105
+
106
+
-**How it works:** The tokenization process in this project is more sophisticated, using a state machine to handle different types of tokens based on the characters encountered in the input. Here's a breakdown of the main logic:
107
+
108
+
- The function `do_lexing` iterates through the input string, character by character.
109
+
- If a special character is encountered (e.g., quotes, redirection symbols), the function `do_lexing_aux` is called to process it.
110
+
- If the character is not a special character, the program treats it as part of a word (e.g., command or argument). It continues scanning until another special character or delimiter is found.
111
+
- The function `create_token` is used to create a token based on the current state, which is determined by the type of character being processed. Special handling is applied for quotes (`'`, `"`) and redirection operators (`>`, `<`, `>>`, `<<`).
112
+
113
+
-**Detailed Breakdown:**
114
+
-**Handling Special Characters:** When a special character is detected (such as a single quote, double quote, or redirection operator), the program enters a specific state for processing that character. For instance:
115
+
- If a single quote (`'`) is encountered, the function `in_quote` is called, which handles the tokenization of the text inside the quotes.
116
+
- Redirection operators like `>` or `<` are processed using the `redir_env` function, which checks if additional characters (such as `>>` or `<<`) follow and creates the appropriate token.
117
+
-**Handling Words:** If a character is not a special character, it is considered part of a word. The function scans through the input to capture the entire word until another special character or space is encountered.
118
+
119
+
-**Why it's important:** Tokenization is crucial because it breaks the input into discrete units that can be further analyzed and executed. Without this step, the shell would not be able to distinguish between commands, arguments, operators, or special symbols. By categorizing each part of the input, the shell can process complex commands that involve options, redirection, piping, and more.
120
+
121
+
### Key Functions:
122
+
123
+
-`do_lexing`: This function manages the main tokenization loop, identifying special characters and delegating processing to the appropriate functions.
124
+
-`do_lexing_aux`: Handles the tokenization of special characters (quotes, spaces, redirection).
125
+
-`in_quote`: Processes tokens inside single (`'`) or double (`"`) quotes.
-`create_token`: Creates a token based on the current state and the identified substring.
128
+
129
+
-**Why it's important:** Tokenization allows the shell to interpret and process different parts of a command separately, making it possible to handle complex commands with multiple arguments and operators.
130
+
131
+
## 3. Parsing
132
+
133
+
**Purpose:** Parsing takes the tokens produced during tokenization and organizes them into a structured format, which can then be used for further execution. The parsing process builds a tree-like structure that represents the logical flow of the command, including handling commands, redirections, pipes, and other special operators.
134
+
135
+
-**How it works:** The parsing process in this project follows a series of steps, which includes handling commands, redirection, and pipes by organizing tokens into a tree-like structure of nodes.
136
+
137
+
The parsing process is a hierarchical flow where commands are processed first, followed by handling redirection and pipes. Here’s the general flow of parsing:
138
+
139
+
1.**Command Parsing (`parse_exec`)**: Each token is checked to see if it represents a command. If it does, an execution node is created to hold the command and its arguments.
140
+
2.**Redirection Parsing (`parse_redir`)**: After processing the command, redirection operators (like `>`, `<`, `>>`) are handled by creating redirection nodes and linking them to the execution node.
141
+
3.**Pipe Parsing (`parse_pipe`)**: If a pipe (`|`) is detected, the `parse_pipe` function creates a new pipe node that connects the left and right sides of the pipe operation.
142
+
143
+
**IMPORTANT**
144
+
145
+
Almost all the project and functions are documented along the code as comments.
146
+
For you to understand better what happen exactly, you need to understand the concept of Binary Tree.
147
+
Again, We construct our Binary Tree based on top down parsing algorithm. Click [HERE](https://www.youtube.com/watch?v=ubt-UjcQUYg) and watch this video 100x times(like us) to be able to understand an recreate a binary tree using this algorithm.
148
+
149
+
## 4. Execution
150
+
151
+
The execution phase takes the parsed structure and performs the actual action, running the command and executing.
152
+
153
+
-**What happens during execution?**
154
+
- The program checks if the command exists (e.g., `ls`, `echo`).
155
+
- It prepares the arguments and environment, managing operations like pipes, redirection, and background processes.
156
+
- The shell runs the command by forking new processes using system calls like `fork()`, `exec()`, and `wait()`.
157
+
-**Why Execution is Important:** Execution is the final step where the actual work is done—running the command and producing output. Without execution, the shell would only parse and analyze the input but wouldn't carry out any action.
158
+
159
+
## HereDoc
160
+
To better recriate the functionality and behaviour of bash, we needed to execute the Heredocs separately.
161
+
This process is done after the parsing and before the execution.
162
+
Doing this way, are able to recreate the heredoc with some edge cases, like pressing ctrl + c during multiple heredocs and getting the correct exit code.
0 commit comments