|
| 1 | +--- |
| 2 | +jupytext: |
| 3 | + text_representation: |
| 4 | + extension: .md |
| 5 | + format_name: myst |
| 6 | + format_version: 0.13 |
| 7 | + jupytext_version: 1.16.7 |
| 8 | +kernelspec: |
| 9 | + display_name: venv |
| 10 | + language: python |
| 11 | + name: python3 |
| 12 | +--- |
| 13 | + |
| 14 | +## Code Speed |
| 15 | + |
| 16 | ++++ |
| 17 | + |
| 18 | +This section will show you some ways to speed up or track the performance of your Python code. |
| 19 | + |
| 20 | ++++ |
| 21 | + |
| 22 | +### Concurrently Execute Tasks on Separate CPUs |
| 23 | + |
| 24 | ++++ |
| 25 | + |
| 26 | +If you want to concurrently execute tasks on separate CPUs to run faster, consider using `joblib.Parallel`. It allows you to easily execute several tasks at once, with each task using its own processor. |
| 27 | + |
| 28 | +```{code-cell} ipython3 |
| 29 | +from joblib import Parallel, delayed |
| 30 | +import multiprocessing |
| 31 | +
|
| 32 | +def add_three(num: int): |
| 33 | + return num + 3 |
| 34 | +
|
| 35 | +num_cores = multiprocessing.cpu_count() |
| 36 | +results = Parallel(n_jobs=num_cores)(delayed(add_three)(i) for i in range(10)) |
| 37 | +results |
| 38 | +``` |
| 39 | + |
| 40 | +### Compare The Execution Time Between 2 Functions |
| 41 | + |
| 42 | ++++ |
| 43 | + |
| 44 | +If you want to compare the execution time between 2 functions, try `timeit.timeit`. You can also specify the number of times you want to rerun your function to get a better estimation of the time. |
| 45 | + |
| 46 | +```{code-cell} ipython3 |
| 47 | +import time |
| 48 | +import timeit |
| 49 | +
|
| 50 | +def func(): |
| 51 | + """comprehension""" |
| 52 | + l = [i for i in range(10_000)] |
| 53 | +
|
| 54 | +def func2(): |
| 55 | + """list range""" |
| 56 | + l = list(range(10_000)) |
| 57 | +
|
| 58 | +expSize = 1000 |
| 59 | +time1 = timeit.timeit(func, number=expSize) |
| 60 | +time2 = timeit.timeit(func2, number=expSize) |
| 61 | +
|
| 62 | +print(time1/time2) |
| 63 | +``` |
| 64 | + |
| 65 | +From the result, we can see that it is faster to use list range than to use list comprehension on average. |
| 66 | + |
| 67 | ++++ |
| 68 | + |
| 69 | +### Save Disk Space on Large Datasets with Parquet |
| 70 | + |
| 71 | +```{code-cell} ipython3 |
| 72 | +:tags: [hide-cell] |
| 73 | +
|
| 74 | +!pip install pyarrow |
| 75 | +``` |
| 76 | + |
| 77 | +To save disk space on large datasets, use Parquet files instead of CSV. Because Parquet files are compressed, they take up less space on disk and in memory than uncompressed CSV files. |
| 78 | + |
| 79 | +For a 1 million row, 10 column dataset, storing it as CSV takes about 189.59 MB, while storing it as Parquet takes around 78.96 MB, saving approximately 110.63 MB of storage. |
| 80 | + |
| 81 | +```{code-cell} ipython3 |
| 82 | +import numpy as np |
| 83 | +import pandas as pd |
| 84 | +
|
| 85 | +# Create a dataset with 1 million rows and 10 columns |
| 86 | +np.random.seed(123) |
| 87 | +data = np.random.randint(0, 2**63, size=(1000000, 10)) |
| 88 | +df = pd.DataFrame(data, columns=[f'col{str(i)}' for i in range(10)]) |
| 89 | +``` |
| 90 | + |
| 91 | +```{code-cell} ipython3 |
| 92 | +# Write data to Parquet file |
| 93 | +df.to_parquet('example.parquet') |
| 94 | +
|
| 95 | +# Write data to CSV file |
| 96 | +df.to_csv('example.csv', index=False) |
| 97 | +``` |
| 98 | + |
| 99 | +```{code-cell} ipython3 |
| 100 | +import os |
| 101 | +print("Parquet file size:", os.path.getsize('example.parquet')) |
| 102 | +print("CSV file size:", os.path.getsize('example.csv')) |
| 103 | +``` |
0 commit comments