-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathbook11.qmd
20 lines (11 loc) · 1.57 KB
/
book11.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# *Data Munging with fastai's Mid-Level API* {#sec-midlevel-data}
We have seen what `Tokenizer` and `Numericalize` do to a collection of texts, and how they're used inside the data block API, which handles those transforms for us directly using the `TextBlock`. But what if we want to only apply one of those transforms, either to see intermediate results or because we have already tokenized texts? More generally, what can we do when the data block API is not flexible enough to accommodate our particular use case? For this, we need to use fastai's *mid-level API* for processing data. The data block API is built on top of that layer, so it will allow you to do everything the data block API does, and much much more.
## Going Deeper into fastai's Layered API
The fastai library is built on a *layered API*. In the very top layer there are *applications* that allow us to train a model in five lines of codes, as we saw in @sec-intro. In the case of creating `DataLoaders` for a text classifier, for instance, we used the line:
```python
from fastai.text.all import *
dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
```
The factory method `TextDataLoaders.from_folder` is very convenient when your data is arranged the exact same way as the IMDb dataset, but in practice, that often won't be the case. The data block API offers more flexibility.
---
This is just a preview of this chapter. The rest of this chapter is not available here, but you read the [source notebook](https://github.com/fastai/fastbook/) which has the same content (but with less nice formatting).