Skip to content

Commit 6c86fe9

Browse files
change all notebooks to md format
1 parent 64f0707 commit 6c86fe9

File tree

292 files changed

+66776
-308189
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

292 files changed

+66776
-308189
lines changed

Chapter1/class.ipynb

Lines changed: 0 additions & 2265 deletions
This file was deleted.

Chapter1/class.md

Lines changed: 965 additions & 0 deletions
Large diffs are not rendered by default.

Chapter1/code_speed.ipynb

Lines changed: 0 additions & 263 deletions
This file was deleted.

Chapter1/code_speed.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
---
2+
jupytext:
3+
text_representation:
4+
extension: .md
5+
format_name: myst
6+
format_version: 0.13
7+
jupytext_version: 1.16.7
8+
kernelspec:
9+
display_name: venv
10+
language: python
11+
name: python3
12+
---
13+
14+
## Code Speed
15+
16+
+++
17+
18+
This section will show you some ways to speed up or track the performance of your Python code.
19+
20+
+++
21+
22+
### Concurrently Execute Tasks on Separate CPUs
23+
24+
+++
25+
26+
If you want to concurrently execute tasks on separate CPUs to run faster, consider using `joblib.Parallel`. It allows you to easily execute several tasks at once, with each task using its own processor.
27+
28+
```{code-cell} ipython3
29+
from joblib import Parallel, delayed
30+
import multiprocessing
31+
32+
def add_three(num: int):
33+
return num + 3
34+
35+
num_cores = multiprocessing.cpu_count()
36+
results = Parallel(n_jobs=num_cores)(delayed(add_three)(i) for i in range(10))
37+
results
38+
```
39+
40+
### Compare The Execution Time Between 2 Functions
41+
42+
+++
43+
44+
If you want to compare the execution time between 2 functions, try `timeit.timeit`. You can also specify the number of times you want to rerun your function to get a better estimation of the time.
45+
46+
```{code-cell} ipython3
47+
import time
48+
import timeit
49+
50+
def func():
51+
"""comprehension"""
52+
l = [i for i in range(10_000)]
53+
54+
def func2():
55+
"""list range"""
56+
l = list(range(10_000))
57+
58+
expSize = 1000
59+
time1 = timeit.timeit(func, number=expSize)
60+
time2 = timeit.timeit(func2, number=expSize)
61+
62+
print(time1/time2)
63+
```
64+
65+
From the result, we can see that it is faster to use list range than to use list comprehension on average.
66+
67+
+++
68+
69+
### Save Disk Space on Large Datasets with Parquet
70+
71+
```{code-cell} ipython3
72+
:tags: [hide-cell]
73+
74+
!pip install pyarrow
75+
```
76+
77+
To save disk space on large datasets, use Parquet files instead of CSV. Because Parquet files are compressed, they take up less space on disk and in memory than uncompressed CSV files.
78+
79+
For a 1 million row, 10 column dataset, storing it as CSV takes about 189.59 MB, while storing it as Parquet takes around 78.96 MB, saving approximately 110.63 MB of storage.
80+
81+
```{code-cell} ipython3
82+
import numpy as np
83+
import pandas as pd
84+
85+
# Create a dataset with 1 million rows and 10 columns
86+
np.random.seed(123)
87+
data = np.random.randint(0, 2**63, size=(1000000, 10))
88+
df = pd.DataFrame(data, columns=[f'col{str(i)}' for i in range(10)])
89+
```
90+
91+
```{code-cell} ipython3
92+
# Write data to Parquet file
93+
df.to_parquet('example.parquet')
94+
95+
# Write data to CSV file
96+
df.to_csv('example.csv', index=False)
97+
```
98+
99+
```{code-cell} ipython3
100+
import os
101+
print("Parquet file size:", os.path.getsize('example.parquet'))
102+
print("CSV file size:", os.path.getsize('example.csv'))
103+
```

0 commit comments

Comments
 (0)