Skip to content

Commit c75493e

Browse files
committed
sed -i 's;\([a-zA-Z:>]\) *$;\1;g' *
Signed-off-by: lucasew <[email protected]>
1 parent bf7b267 commit c75493e

File tree

126 files changed

+502
-502
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

126 files changed

+502
-502
lines changed

3-variants-of-classification-problems-in-machine-learning.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
22
title: "3 Variants of Classification Problems in Machine Learning"
33
date: "2020-10-19"
4-
categories:
4+
categories:
55
- "deep-learning"
6-
tags:
6+
tags:
77
- "classification"
88
- "classifier"
99
- "deep-learning"
@@ -57,7 +57,7 @@ This process - distinguishing between object types or _classes_ by automatically
5757
The first variant of classification problems is called **binary classification**. If you know the binary system of numbers, you'll know that it's related to the number _two_:
5858

5959
> In mathematics and digital electronics, a binary number is a number expressed in the base-2 numeral system or binary numeral system, which uses only two symbols: typically "0" (zero) and "1" (one).
60-
>
60+
>
6161
> Wikipedia (2003)
6262
6363
Binary classification, here, equals the assembly line scenario that we already covered and will repeat now:

a-gentle-introduction-to-long-short-term-memory-networks-lstm.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
22
title: "A gentle introduction to Long Short-Term Memory Networks (LSTM)"
33
date: "2020-12-29"
4-
categories:
4+
categories:
55
- "deep-learning"
6-
tags:
6+
tags:
77
- "deep-learning"
88
- "long-short-term-memory"
99
- "lstm"
@@ -53,13 +53,13 @@ After tokenizing a sequence such as a phrase, we can feed individual tokens (e.g
5353
Especially when you unfold this structure showing the parsing of subsequent tokens \[latex\]x\_{t-1}\[/latex\] etc., we see that hidden state passes across tokens in a left-to-right fashion. Each token can use information from the previous steps and hence benefit from additional context when transducing (e.g. translating) a token.
5454

5555
> The structure of the network is similar to that of a standard multilayer perceptron, with the distinction that we allow connections among hidden units associated with a time delay. Through these connections the model can retain information about the past, enabling it to discover temporal correlations between events that are far away from each other in the data.
56-
>
56+
>
5757
> Pascanu et al. (2013)
5858
5959
While being a relatively great step forward, especially with larger sequences, classic RNNs did not show great improvements over classic neural networks where the inputs were sets of time steps (i.e. multiple tokens just at once), according to Hochreiter & Schmidhuber (1997). Diving into Hochreiter's thesis work from 6 years earlier, the researchers have identified the [vanishing gradients problem](https://www.machinecurve.com/index.php/2019/08/30/random-initialization-vanishing-and-exploding-gradients/) and the relatively large distances error flow has to go when sequences are big as one of the leading causes why such models don't perform well.
6060

6161
> The vanishing gradients problem refers to the opposite behaviour, when long term components go exponentially fast to norm 0, making it impossible for the model to learn correlation between temporally distant events.
62-
>
62+
>
6363
> Pascanu et al. (2013)
6464
6565
### Why vanishing gradients?

a-simple-conv3d-example-with-keras.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
22
title: "A simple Conv3D example with TensorFlow 2 and Keras"
33
date: "2019-10-18"
4-
categories:
4+
categories:
55
- "buffer"
66
- "deep-learning"
77
- "frameworks"
8-
tags:
8+
tags:
99
- "conv3d"
1010
- "convolutional-neural-networks"
1111
- "deep-learning"
@@ -208,7 +208,7 @@ We can next import and prepare the data:
208208
```python
209209
# -- Process code --
210210
# Load the HDF5 data file
211-
with h5py.File("./full_dataset_vectors.h5", "r") as hf:
211+
with h5py.File("./full_dataset_vectors.h5", "r") as hf:
212212

213213
# Split the data into training/test features/targets
214214
X_train = hf["X_train"][:]
@@ -352,7 +352,7 @@ def rgb_data_transform(data):
352352

353353
# -- Process code --
354354
# Load the HDF5 data file
355-
with h5py.File("./full_dataset_vectors.h5", "r") as hf:
355+
with h5py.File("./full_dataset_vectors.h5", "r") as hf:
356356

357357
# Split the data into training/test features/targets
358358
X_train = hf["X_train"][:]

about-loss-and-loss-functions.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
22
title: "About loss and loss functions"
33
date: "2019-10-04"
4-
categories:
4+
categories:
55
- "deep-learning"
66
- "svms"
7-
tags:
7+
tags:
88
- "classifier"
99
- "deep-learning"
1010
- "loss-function"
@@ -156,7 +156,7 @@ Remember the MSE?
156156

157157
![](images/image-14-1024x296.png)
158158

159-
There's also something called the RMSE, or the **Root Mean Squared Error** or Root Mean Squared Deviation (RMSD). It goes like this:
159+
There's also something called the RMSE, or the **Root Mean Squared Error** or Root Mean Squared Deviation (RMSD). It goes like this:
160160

161161
![](images/image.png)
162162

@@ -254,18 +254,18 @@ Because the benefit of the \[latex\]\\delta\[/latex\] is also becoming your bott
254254
Loss functions are also applied in classifiers. I already discussed in another post what classification is all about, so I'm going to repeat it here:
255255

256256
> Suppose that you work in the field of separating non-ripe tomatoes from the ripe ones. It’s an important job, one can argue, because we don’t want to sell customers tomatoes they can’t process into dinner. It’s the perfect job to illustrate what a human classifier would do.
257-
>
258-
> Humans have a perfect eye to spot tomatoes that are not ripe or that have any other defect, such as being rotten. They derive certain characteristics for those tomatoes, e.g. based on color, smell and shape:
259-
>
257+
>
258+
> Humans have a perfect eye to spot tomatoes that are not ripe or that have any other defect, such as being rotten. They derive certain characteristics for those tomatoes, e.g. based on color, smell and shape:
259+
>
260260
> \- If it’s green, it’s likely to be unripe (or: not sellable);
261261
> \- If it smells, it is likely to be unsellable;
262262
> \- The same goes for when it’s white or when fungus is visible on top of it.
263-
>
263+
>
264264
> If none of those occur, it’s likely that the tomato can be sold. We now have _two classes_: sellable tomatoes and non-sellable tomatoes. Human classifiers _decide about which class an object (a tomato) belongs to._
265-
>
265+
>
266266
> The same principle occurs again in machine learning and deep learning.
267267
> Only then, we replace the human with a machine learning model. We’re then using machine learning for _classification_, or for deciding about some “model input” to “which class” it belongs.
268-
>
268+
>
269269
> Source: [How to create a CNN classifier with Keras?](https://www.machinecurve.com/index.php/2019/09/17/how-to-create-a-cnn-classifier-with-keras/)
270270
271271
We'll now cover loss functions that are used for classification.

albert-explained-a-lite-bert.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
22
title: "ALBERT explained: A Lite BERT"
33
date: "2021-01-06"
4-
categories:
4+
categories:
55
- "deep-learning"
6-
tags:
6+
tags:
77
- "albert"
88
- "bert"
99
- "deep-learning"
@@ -51,13 +51,13 @@ However, let's take a quick look at BERT here as well before we move on. Below,
5151
Previous studies (such as the [study creating BERT](https://www.machinecurve.com/index.php/2021/01/04/intuitive-introduction-to-bert/) or the [one creating GPT](https://www.machinecurve.com/index.php/2021/01/05/dall-e-openai-gpt-3-model-can-draw-pictures-based-on-text/)) have demonstrated that the size of language models is related to performance. The bigger the language model, the better the model performs, is the general finding.
5252

5353
> Evidence from these improvements reveals that a large network is of crucial importance for achieving state-of-the-art performance
54-
>
54+
>
5555
> Lam et al. (2019)
5656
5757
While this allows us to build models that really work well, this also comes at a cost: models are really huge and therefore cannot be used widely in practice.
5858

5959
> An obstacle to answering this question is the memory limitations of available hardware. Given that current state-of-the-art models often have hundreds of millions or even billions of parameters, it is easy to hit these limitations as we try to scale our models. Training speed can also be significantly hampered in distributed training, as the communication overhead is directly proportional to the number of parameters in the model.
60-
>
60+
>
6161
> Lam et al. (2019)
6262
6363
Recall that BERT comes in two flavors: a \[latex\]\\text{BERT}\_\\text{BASE}\[/latex\] model that has 110 million trainable parameters, and a \[latex\]\\text{BERT}\_\\text{LARGE}\[/latex\] model that has 340 million ones (Devlin et al., 2018).
@@ -89,7 +89,7 @@ If things are not clear by now, don't worry - that was expected :D We're going t
8989
The first key difference between the BERT and ALBERT models is that **parameters of the word embeddings are factorized**.
9090

9191
> In mathematics, **factorization** (...) or **factoring** consists of writing a number or another mathematical object as a product of several _factors_, usually smaller or simpler objects of the same kind. For example, 3 × 5 is a factorization of the integer 15
92-
>
92+
>
9393
> Wikipedia (2002)
9494
9595
Factorization of these parameters is achieved by taking the matrix representing the weights of the word embeddings \[latex\]E\[/latex\] and decomposing it into two different matrices. Instead of projecting the one-hot encoded vectors directly onto the hidden space, they are first projected on some-kind of lower-dimensional embedding space, which is then projected to the hidden space (Lan et al, 2019). Normally, this should not produce a different result, but let's wait.
@@ -168,7 +168,7 @@ The following results can be reported:
168168
Beyond the general results, the authors have also performed ablation experiments to see whether the changes actually cause the performance improvement, or not.
169169

170170
> An ablation study studies the performance of an AI system by removing certain components, to understand the contribution of the component to the overall system.
171-
>
171+
>
172172
> Wikipedia (n.d.)
173173
174174
These are the results:

an-introduction-to-dcgans.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
22
title: "An introduction to DCGANs"
33
date: "2021-03-24"
4-
categories:
4+
categories:
55
- "buffer"
66
- "deep-learning"
7-
tags:
7+
tags:
88
- "convolutional-neural-networks"
99
- "dcgan"
1010
- "deep-learning"

an-introduction-to-tensorflow-keras-callbacks.md

+14-14
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
22
title: "An introduction to TensorFlow.Keras callbacks"
33
date: "2020-11-10"
4-
categories:
4+
categories:
55
- "frameworks"
6-
tags:
6+
tags:
77
- "callbacks"
88
- "keras"
99
- "tensorflow"
@@ -43,7 +43,7 @@ In Machine Learning terms, each iteration is also called an **epoch**. Hence, tr
4343
Now, it can be the case that you want to get insights from the training process while it is running. Or you want to provide automated steering in order to avoid wasting resources. In those cases, you might want to add a **callback** to your Keras model.
4444

4545
> A callback is an object that can perform actions at various stages of training (e.g. at the start or end of an epoch, before or after a single batch, etc).
46-
>
46+
>
4747
> Keras Team (n.d.)
4848
4949
As we shall see later in this article, among others, there are [callbacks for monitoring](https://www.machinecurve.com/index.php/2019/11/13/how-to-use-tensorboard-with-keras/) and for stopping the training process [when it no longer makes the model better](https://www.machinecurve.com/index.php/2019/05/30/avoid-wasting-resources-with-earlystopping-and-modelcheckpoint-in-keras/). This is possible because with callbacks, we can 'capture' the training process while it is happening. They essentially 'hook' into the training process by allowing the training process to invoke certain callback definitions. In Keras, each callback implements at least one, but possibly multiple of the following definitions (Keras Team, n.d.).
@@ -116,7 +116,7 @@ model.fit(train_generator,
116116
If you want to periodically save your Keras model - or the model weights - to some file, the `ModelCheckpoint` callback is what you need.
117117

118118
> Callback to save the Keras model or model weights at some frequency.
119-
>
119+
>
120120
> TensorFlow (n.d.)
121121
122122
It is available as follows:
@@ -162,7 +162,7 @@ Did you know that you can visualize the training process realtime [with TensorBo
162162
With the `TensorBoard` callback, you can link TensorBoard with your Keras model.
163163

164164
> Enable visualizations for TensorBoard.
165-
>
165+
>
166166
> TensorFlow (n.d.)
167167
168168
The callback logs a range of items from the training process into your TensorBoard log location:
@@ -211,7 +211,7 @@ Optimizing your neural network involves applying [gradient descent](https://www.
211211
During this process, you want to find a model that performs well in terms of predictions (i.e., it is not underfit) but that is not too rigid with respect to the dataset it is trained on (i.e., it is neither overfit). That's why the `EarlyStopping` callback can be useful if you are dealing with a situation like this.
212212

213213
> Stop training when a monitored metric has stopped improving.
214-
>
214+
>
215215
> TensorBoard (n.d.)
216216
217217
It is implemented as follows:
@@ -252,7 +252,7 @@ During the optimization process, a so called _weight update_ is computed. Howeve
252252
Preferably being relatively large during the early iterations and lower in the later stages, we must adapt the learning rate during the training process. This is called [learning rate decay](https://www.machinecurve.com/index.php/2019/11/11/problems-with-fixed-and-decaying-learning-rates/) and shows what a _learning rate scheduler_ can be useful for. The `LearningRateScheduler` callback implements this functionality.
253253

254254
> At the beginning of every epoch, this callback gets the updated learning rate value from `schedule` function provided at `__init__`, with the current epoch and current learning rate, and applies the updated learning rate on the optimizer.
255-
>
255+
>
256256
> TensorFlow (n.d.)
257257
258258
Its implementation is really simple:
@@ -294,7 +294,7 @@ Keeping your learning rate equal when close to a plateau means that your model w
294294
With the `ReduceLROnPlateau` callback, the optimization process can be instructed to _reduce_ the learning rate (and hence the step) when a plateau is encountered.
295295

296296
> Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This callback monitors a quantity and if no improvement is seen for a 'patience' number of epochs, the learning rate is reduced.
297-
>
297+
>
298298
> TensorFlow (n.d.)
299299
300300
The callback is implemented as follows:
@@ -333,7 +333,7 @@ Above, we saw that training logs can be distributed to [TensorBoard](https://www
333333
In those cases, you might wish to send the training logs there instead. The `RemoteMonitor` callback can help you do this.
334334

335335
> Callback used to stream events to a server.
336-
>
336+
>
337337
> TensorFlow (n.d.)
338338
339339
It is implemented as follows:
@@ -369,7 +369,7 @@ model.fit(train_generator,
369369
Say that you want a certain function to fire after every batch or every epoch - a simple function, nothing special. However, it's not provided in the collection of callbacks presented with the `tensorflow.keras.callbacks` API. In this case, you might want to use the `LambdaCallback`.
370370

371371
> Callback for creating simple, custom callbacks on-the-fly. This callback is constructed with anonymous functions that will be called at the appropriate time. Te
372-
>
372+
>
373373
> TensorFlow (n.d.)
374374
375375
It can thus be used to provide anonymous (i.e. `lambda` functions without a name) functions to the training process. The callback looks as follows:
@@ -401,7 +401,7 @@ model.fit(train_generator,
401401
In some cases (e.g. when you did not apply min-max normalization to your input data), the loss value can be very strange - outputting values close to Infinity or values that are Not a Number (`NaN`). In those cases, you don't want to pursue further training. The `TerminateOnNaN` callback can help here.
402402

403403
> Callback that terminates training when a NaN loss is encountered.
404-
>
404+
>
405405
> TensorFlow (n.d.)
406406
407407
It is implemented as follows:
@@ -428,7 +428,7 @@ model.fit(train_generator,
428428
CSV files can be very useful when you need to exchange data. If you want to flush your training logs into a CSV file, the `CSVLogger` callback can be useful to you.
429429

430430
> Callback that streams epoch results to a CSV file.
431-
>
431+
>
432432
> TensorFlow (n.d.)
433433
434434
It is implemented as follows:
@@ -461,7 +461,7 @@ model.fit(train_generator,
461461
When you are training a Keras model with verbosity set to `True`, you will see a progress bar in your terminal. With the `ProgbarLogger` callback, you can change what is displayed there.
462462

463463
> Callback that prints metrics to stdout.
464-
>
464+
>
465465
> TensorFlow (n.d.)
466466
467467
It is implemented as follows:
@@ -493,7 +493,7 @@ model.fit(train_generator,
493493
When you are training a neural network, especially in a [distributed setting](https://www.machinecurve.com/index.php/2020/10/16/tensorflow-cloud-easy-cloud-based-training-of-your-keras-model/), it would be problematic if your training process suddenly stops - e.g. due to machine failure. Every iteration passed so far will be gone. With the experimental `BackupAndRestore` callback, you can instruct Keras to create temporary checkpoint files after each epoch, to which you can restore later.
494494

495495
> `BackupAndRestore` callback is intended to recover from interruptions that happened in the middle of a model.fit execution by backing up the training states in a temporary checkpoint file (based on TF CheckpointManager) at the end of each epoch.
496-
>
496+
>
497497
> TensorFlow (n.d.)
498498
499499
It is implemented as follows:

automating-neural-network-configuration-with-keras-tuner.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
22
title: "Automating neural network configuration with Keras Tuner"
33
date: "2020-06-09"
4-
categories:
4+
categories:
55
- "deep-learning"
66
- "frameworks"
7-
tags:
7+
tags:
88
- "deep-neural-network"
99
- "hyperparameter-tuning"
1010
- "hyperparameters"
@@ -74,7 +74,7 @@ model.add(Dense(no_classes, activation='softmax'))
7474
Here, the architectural choices you make (such as the number of filters for a `Conv2D` layer, kernel size, or the number of output nodes for your `Dense` layer) determine what are known as the _parameters_ of your neural network - the weights (and by consequence biases) of your neural network:[](https://datascience.stackexchange.com/posts/17643/timeline)
7575

7676
> The parameters of a neural network are typically the weights of the connections. In this case, these parameters are learned during the training stage. So, the algorithm itself (and the input data) tunes these parameters.
77-
>
77+
>
7878
> [Robin, at StackExchange](https://datascience.stackexchange.com/questions/17635/model-parameters-hyper-parameters-of-neural-network-their-tuning-in-training#:~:text=The%20parameters%20of%20a%20neural,or%20the%20number%20of%20epochs.)
7979
8080
### Tuning hyperparameters in your neural network
@@ -89,7 +89,7 @@ However, things don't end there. Rather, in step (2), you'll _configure_ the mod
8989
Here's why they are called _hyper_parameters:
9090

9191
> The hyper parameters are typically the learning rate, the batch size or the number of epochs. The are so called "hyper" because they influence how your parameters will be learned. You optimize these hyper parameters as you want (depends on your possibilities): grid search, random search, by hand, using visualisations… The validation stage help you to both know if your parameters have been learned enough and know if your hyper parameters are good.
92-
>
92+
>
9393
> [Robin, at StackExchange](https://datascience.stackexchange.com/questions/17635/model-parameters-hyper-parameters-of-neural-network-their-tuning-in-training#:~:text=The%20parameters%20of%20a%20neural,or%20the%20number%20of%20epochs.)
9494
9595
As Robin suggests, hyperparameters can be selected (and optimized) in multiple ways. The easiest way of doing so is by hand: you, as a deep learning engineer, select a set of hyperparameters that you will subsequently alter in an attempt to make the model better.
@@ -103,7 +103,7 @@ However, can't we do this in a better way when training a Keras model?
103103
As you would have expected: yes, we can! :) Let's introduce Keras Tuner to the scene. As you would expect from engineers, the description as to what it does is really short but provides all the details:
104104

105105
> A hyperparameter tuner for Keras, specifically for tf.keras with TensorFlow 2.0.
106-
>
106+
>
107107
> [Keras-tuner on GitHub](https://github.com/keras-team/keras-tuner)
108108
109109
If you already want to look around, you could visit their website, and if not, let's take a look at what it does.

avoid-wasting-resources-with-earlystopping-and-modelcheckpoint-in-keras.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
22
title: "Using EarlyStopping and ModelCheckpoint with TensorFlow 2 and Keras"
33
date: "2019-05-30"
4-
categories:
4+
categories:
55
- "buffer"
66
- "deep-learning"
77
- "frameworks"
8-
tags:
8+
tags:
99
- "ai"
1010
- "callbacks"
1111
- "deep-learning"

0 commit comments

Comments
 (0)