Skip to content

Commit 058b758

Browse files
Add baseline for the sprint
0 parents  commit 058b758

15 files changed

+806
-0
lines changed

1.environment.md

+220
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
# Installing a Python dev environment
2+
3+
## git and github
4+
5+
- [Install git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
6+
(choose [git bash](https://git-scm.com/download/win) on Windows): git is a versioning system used to manage the source code of software projects such as scikit-learn and NumPy.
7+
8+
- [Create an account on github.com](https://github.com): github is a platform to work collaboratively on the source code of hosted Open Source projects such as scikit-learn and NumPy.
9+
10+
- [Create an SSH key for GitHub](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent)
11+
12+
Once you have a github account and installed the `git` command on your system, open a new terminal session (use Git Bash under Windows) type the following commands.
13+
14+
- Fork scikit-learn in the github web interace: go to https://github.com/scikit-learn/scikit-learn and click the "fork" button. You should be automatically redirected to your personal fork at: https://github.com/myusername/scikit-learn in your web browser.
15+
16+
- Then, in the terminal clone your fork with git:
17+
18+
```
19+
$ git clone [email protected]:myusername/scikit-learn.git
20+
```
21+
22+
Note that the `$` sign is a generic prompt indicator for terminal commands. Please do not copy it when you copy-paste commands from this document.
23+
24+
- Many open source projects from the Python ecosystem share similar development practices. For instance, you can also (optionally) clone numpy with git if you want to use the development version of numpy instead of a released package:
25+
```
26+
$ git clone [email protected]:myusername/numpy/numpy.git
27+
```
28+
29+
- After cloning those repo, you should see a new local folders with your clones in the output of the `ls` command:
30+
31+
```
32+
$ ls
33+
```
34+
35+
- To locate those folder, use the `pwd` (path to working directory) command:
36+
37+
```
38+
$ pwd
39+
```
40+
41+
- Configure some aliases for the remote repositories:
42+
43+
List existing aliases in your scikit-learn clone:
44+
```
45+
$ cd scikit-learn
46+
$ git remote -v
47+
```
48+
49+
Add a new alias for your fork on github:
50+
```
51+
$ git remote add upstream https://github.com/scikit-learn/scikit-learn.git
52+
```
53+
54+
Check that your new alias has been properly configured:
55+
```
56+
$ git remote -v
57+
```
58+
59+
## conda
60+
61+
Conda is a command line tool to download software packages and work in isolated environements for different projects.
62+
63+
The fastest way to install the conda tool is to use a miniforge installer.
64+
65+
### Install Miniforge
66+
67+
- Install Miniforge from
68+
[the official installation page](https://github.com/conda-forge/miniforge#miniforge) (choose the latest Miniforge installer links for your Operating System version)
69+
70+
- Initialize the conda command in git bash
71+
- Windows: open "Git Bash" and type
72+
```
73+
$ cd Downloads/
74+
$ ./Miniforge3-Windows-x86_64.exe
75+
```
76+
- Linux & macOS:
77+
```
78+
$ cd Downloads/
79+
$ bash Miniforge3-*.sh
80+
```
81+
82+
And follow the instructions.
83+
84+
- Make sure your initialized your shell environment:
85+
86+
- Windows (with Git Bash) and Linux
87+
```
88+
$ conda init bash
89+
```
90+
91+
- macOS uses zsh instead of bash by default:
92+
```
93+
$ conda init zsh
94+
```
95+
96+
- then close your shell and start a new one to type:
97+
98+
```
99+
$ conda info
100+
```
101+
102+
and look for the location of the "base environment".
103+
104+
or
105+
106+
```
107+
$ where conda
108+
```
109+
to check that the conda command is in your PATH and useable from your shell.
110+
111+
### conda environments
112+
113+
conda environments make it possible to have specific versions of your packages to work on a specific project independently of the dependencies used for other projects. Once your are done with a project it's very easy to delete a conda environment to avoid accumulating packages you no longer need on your system.
114+
115+
conda environments also make it easy to make sure that the versions of the packages you use on your developer environment matchs those used by your team members or those required by the production environment for instance.
116+
117+
- create an environment named `sklworkshop`:
118+
```
119+
$ conda create --name sklworkshop -c conda-forge numpy scipy cython joblib threadpoolctl pytest matplotlib pandas
120+
```
121+
122+
if your are on macOS you should add the `compilers` packages to that list:
123+
124+
```
125+
$ conda create --name sklworkshop -c conda-forge numpy scipy cython joblib threadpoolctl pytest matplotlib pandas compilers
126+
```
127+
128+
Note: the `-c conda-forge` flag is not necessary if you installed conda with the miniforge installer, but it is necessary if you use a conda command installed from the Miniconda or Anaconda installers.
129+
130+
- activate and deactivate environments
131+
```
132+
$ conda activate sklworkshop
133+
(sklworkshop)$ conda deactivate
134+
```
135+
- version, environment and package listing
136+
```
137+
$ conda --version
138+
$ conda env list
139+
$ conda list
140+
```
141+
142+
## VS Code
143+
144+
VS Code is a very popular open source code editor with a rich set of extensions to turn it into a full fledged Integrated Development Environment (with fast code navigation, auto completion, pytest execution, debugger, jupyter notebook editing and execution...).
145+
146+
Here we show the main tips and tricks to get productive when using VS Code to work on Python projects such as scikit-learn (including the most useful keyboard shortcuts).
147+
148+
Install VSCode following the
149+
[instructions](https://code.visualstudio.com/docs/setup/setup-overview#_cross-platform) for your Operating System.
150+
151+
Launch VS Code. At the first start you might get a popup to ask you to configure [Telemetry settings](https://code.visualstudio.com/docs/getstarted/telemetry). Feel free to disable telemetry if you don't want VS Code to report any data to its developers.
152+
153+
Install the Python extension:
154+
- `Ctrl+Shift+X` to open the extension manager
155+
- search for the python extension: install
156+
157+
Note to macOS users: replace `Ctrl` by `Command` on most of the keyboard short-cuts presented in this document.
158+
159+
Open project folder for scikit-learn: `Ctrl+Shift+P` then type: "File: Open Folder..." and open the `scikit-learn` folder your create when running the `git clone` command above.
160+
161+
Open a Python file from the project by clicking on `setup.py` in the left panel named "EXPLORER".
162+
163+
In order to work with VSCode in your Python environment
164+
- `Ctrl+Shift+P` then "python select interpreter" and choose "sklworkshop"
165+
166+
Optionally, you can open a new project folder for NumPy similarly: `Ctrl+Shift+P` then type: "File: Open Folder..." and select the `numpy` folder.
167+
168+
Activate the "sklworkshop" Python interpreter for the numpy project as well.
169+
170+
Switch between projects: `Ctrl-r`
171+
172+
Browse the code:
173+
- by files `Ctrl-p`
174+
- by symbols `Ctrl-t`
175+
176+
At some point VSCode will complain about not finding a linter: scikit-learn uses `flake8`
177+
- Install `flake8` in your conda environment
178+
```
179+
$ conda activate sklworkshop
180+
(sklworkshop)$ conda install flake8
181+
```
182+
- Select `flake8` as a linter in VS Code: `Ctrl-Shift-P` "select linter"
183+
184+
## Practical code navigation
185+
186+
Find example files that mention the word "importance" in different ways:
187+
- VS Code: `Ctrl-P` "example importance" and open `examples/plot_permutation_importance.py`
188+
- GitHub: go to https://github.com/scikit-learn/scikit-learn in your browser, press `t` and type "example/importance"
189+
190+
Navigate to the `RandomForestClassifier` class from the `plot_permutation_importance.py` example:
191+
- VS Code: ctrl-clicking on the class name
192+
- GitHub: clicking on the class name
193+
194+
Find the class `KMeans` in scikit-learn in two different ways:
195+
- from the command line, in a bash or zsh terminal: use `git grep "class KMeans"` (note that using the "class" prefix makes the search more specific to only find the line of the class definition. Otherwise you will find all occurrences of the KMeans class, including in documentation, tests, examples...). When your are not sure about the casing, use the `git grep -i "keyword"` for case insensitive search instead.
196+
- VS Code: `Ctrl-t` and type "KMeans". If nothing happens, press `Enter`, and select the `KMeans` class from the list of suggestions.
197+
198+
## Installing C/C++ compilers to be able to build native extensions
199+
200+
Building scikit-learn from source requires a C/C++ compiler (to build native extensions
201+
typically written in Cython for instance).
202+
203+
If you have never installed a C/C++ compiler for your system you need to do it now.
204+
205+
**macOS users:** feel free to install the `compilers` package from conda-forge in your
206+
environment if your did not do it already. After installation, you need to deactivate
207+
and reactivate your environment for this installation to be effective.
208+
209+
```
210+
$ conda install -n sklworkshop compilers
211+
$ conda deactivate
212+
$ conda activate sklworkshop
213+
```
214+
215+
The scikit-learn build instructions link below gives more details
216+
217+
See instructions for your OS in the [installation guide](https://scikit-learn.org/stable/developers/advanced_installation.html#building-from-source).
218+
- [Windows](https://scikit-learn.org/dev/developers/advanced_installation.html#windows)
219+
- [Linux](https://scikit-learn.org/dev/developers/advanced_installation.html#linux)
220+
- [macOS](https://scikit-learn.org/dev/developers/advanced_installation.html#macos)

2.building.md

+45
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Building the main branch of scikit-learn
2+
3+
4+
To build scikit-learn from source, we need to make sure that we have scikit-learn build dependencies installed first:
5+
6+
```
7+
$ conda install numpy scipy cython # should be already installed
8+
$ cd scikit-learn/
9+
$ pip install --verbose --no-build-isolation -e .
10+
$ cd ..
11+
$ pip show scikit-learn
12+
```
13+
14+
We can check that we can import scikit-learn in an interactive IPython session:
15+
16+
Install ipython:
17+
```
18+
$ conda install ipython
19+
```
20+
21+
Then launch it to import scikit-learn
22+
23+
```
24+
$ ipython
25+
>>> import sklearn
26+
>>> sklearn.__version__
27+
1.1.dev0
28+
>>> sklearn.show_versions()
29+
[...] more details
30+
CTRL-D
31+
```
32+
33+
Many Python projects follow similar coding and packaging conventions. For instance, if you also want to build numpy from source (optional), you can do as follows:
34+
35+
```
36+
$ cd numpy/
37+
$ pip install --verbose --no-build-isolation -e .
38+
$ cd ..
39+
$ pip show numpy
40+
$ ipython
41+
>>> import numpy as np
42+
>>> print(np.__version__)
43+
1.xx.dev0+xxxxx
44+
CTRL-D
45+
```

3.example.md

+62
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Run a scikit-learn example
2+
3+
The scikit-learn documentation relies heavily on code examples to
4+
demonstrate how to use the package with actual data, typically on
5+
standard public datasets.
6+
7+
All scikit-learn examples are gathered in the [examples/](
8+
https://github.com/scikit-learn/scikit-learn/tree/main/examples)
9+
folder and its subfolders.
10+
11+
They are used to automatically generate the pages of the example
12+
gallery on the project website:
13+
14+
https://scikit-learn.org/stable/auto_examples/index.html
15+
16+
The goal of this exercise is to get familiar navigating in those
17+
examples and executing them either from the command-line or from
18+
with-in VS Code, leveraging the built-in matplotlib integration.
19+
20+
In particular, we will consider the following example file:
21+
22+
- [examples/inspection/plot_permutation_importance.py](
23+
https://github.com/scikit-learn/scikit-learn/tree/main/examples/inspection/plot_permutation_importance.py)
24+
25+
which renders as:
26+
27+
- https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance.html
28+
29+
Note: how to generate the HTML documentation will be presented later.
30+
31+
## From the command line
32+
33+
```
34+
$ cd scikit-learn
35+
$ ls examples
36+
$ ls examples/inspection
37+
$ python examples/inspection/plot_permutation_importance.py
38+
```
39+
40+
The text output should be displayed directly in the terminal,
41+
while the graphical output will pop-up in a new window managed
42+
by matplotlib. If you have not already installed matplotlib and pandas, you
43+
can do it with conda:
44+
45+
```
46+
$ conda install matplotlib pandas
47+
```
48+
49+
Hint: at any moment use the `pwd` command to find where you are. `pwd` stands
50+
for "Path to Working Directory". For instance here is a typical output one
51+
would get on a Linux machine:
52+
53+
```
54+
$ pwd
55+
/home/YOUR-NAME/code/scikit-learn
56+
```
57+
58+
## From VS Code
59+
60+
- `ctrl-p` "plot permutation importance" then
61+
62+
- `Ctrl-Shift-P` "Run Current File in Python Interactive Window"

0 commit comments

Comments
 (0)