You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: provider/baked/src/export.rs
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@
6
6
//!
7
7
//! This module can be used as a target for the `icu_provider_export` crate.
8
8
//!
9
-
//! See our [datagen tutorial](https://github.com/unicode-org/icu4x/blob/main/tutorials/data-management.md) for more information about different data providers.
9
+
//! See our [tutorials](https://github.com/unicode-org/icu4x/blob/main/tutorials) for more information about different data providers.
Copy file name to clipboardExpand all lines: provider/blob/src/export/mod.rs
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@
6
6
//!
7
7
//! This module can be used as a target for the `icu_provider_export` crate.
8
8
//!
9
-
//! See our [datagen tutorial](https://github.com/unicode-org/icu4x/blob/main/tutorials/data-management.md) for more information about different data providers.
9
+
//! See our [tutorials](https://github.com/unicode-org/icu4x/blob/main/tutorials) for more information about different data providers.
Copy file name to clipboardExpand all lines: provider/fs/src/export/mod.rs
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@
6
6
//!
7
7
//! This module can be used as a target for the `icu_provider_export` crate.
8
8
//!
9
-
//! See our [datagen tutorial](https://github.com/unicode-org/icu4x/blob/main/tutorials/data-management.md) for more information about different data providers.
9
+
//! See our [tutorials](https://github.com/unicode-org/icu4x/blob/main/tutorials) for more information about different data providers.
Copy file name to clipboardExpand all lines: tutorials/data-packs.md
+38-33Lines changed: 38 additions & 33 deletions
Original file line number
Diff line number
Diff line change
@@ -1,64 +1,69 @@
1
-
# Interactive Date Picker - Custom Data
1
+
# Introduction to ICU4X - Data packs
2
+
3
+
If you're happy shipping your app with the recommended set of locales included in `ICU4X`, you can stop reading now. If you want to include additional locales, do runtime data loading, or build your own complex data pipelines, this tutorial is for you.
2
4
3
5
In this tutorial, we will add additional locale data to your app. ICU4X compiled data contains data for hundreds of languages, but there are languages that have data in CLDR that are not included (generally because they don't have comprehensive coverage). For example, if you try using the locale `ccp` (Chakma) in your app, you will get output like `2023 M11 7`. Believe it or not, but this is not actually correct output for Chakma. Instead ICU4X fell back to the "root locale", which tries to be as neutral as possible. Note how it avoided calling the month by name by using `M11`, even though we requested a format with a non-numeric month name.
4
6
5
7
So, let's add some data for Chakma.
6
8
7
-
## 1. Installing `icu4x-datagen`
9
+
## 1. Prerequisites
10
+
11
+
This tutorial assumes you have finished the [introductory tutorial](quickstart.md) and continues where that tutorial left off. In particular, you should still have the latest version of your code.
8
12
9
13
Data generation is done using the `icu4x-datagen` tool, which pulls data from [Unicode's *Common Locale Data Repository* (*CLDR*)](http://cldr.unicode.org/index/downloads) and from `ICU4C` releases.
10
14
11
-
Verify that Rust is installed. If it's not, you can install it in a few seconds from [https://rustup.rs/](https://rustup.rs/).
15
+
Verify that Rust is installed (even if you're following the JavaScript tutorial). If it's not, you can install it in a few seconds from [https://rustup.rs/](https://rustup.rs/).
12
16
13
-
```console
17
+
```shell
14
18
cargo --version
15
19
# cargo 1.86.0 (adf9b6ad1 2025-02-28)
16
20
```
17
21
18
22
Now you can run
19
23
20
-
```console
24
+
```shell
21
25
cargo install icu4x-datagen
22
26
```
23
27
24
28
## 2. Generating the data pack
25
29
26
30
We're ready to generate the data. We will use the blob format, and create a blob that will contain just Chakma data. At runtime we can then load it as needed.
27
31
28
-
```console
32
+
```shell
29
33
icu4x-datagen --markers all --locales ccp --format blob --out ccp.blob
30
34
```
31
35
32
36
This will generate a `ccp.blob` file containing data for Chakma.
33
37
38
+
`icu4x-datagen` has many options, some of which we'll discover below. The default options should work for most purposes, but check out `icu4x-datagen --help` to learn more about fine-tuning your data.
39
+
34
40
💡 Note: if you're having technical difficulties, this file is available [here](https://storage.googleapis.com/static-493776/icu4x_2023-11-03/ccp.blob).
35
41
36
42
37
43
## 3. Using the data pack
38
44
39
-
### Rust Part 3
45
+
<details>
46
+
<summary>Rust</summary>
40
47
41
48
To use blob data, we will need to add the `icu_provider_blob` crate to our project:
42
49
43
-
```console
50
+
```shell
44
51
cargo add icu_provider_blob --features alloc
45
52
```
46
53
47
54
We also need to enable the `serde` feature on the `icu` crate to enable deserialization support:
48
55
49
-
```console
56
+
```shell
50
57
cargo add icu --features serde
51
58
```
52
59
53
60
Now, update the instantiation of the datetime formatter to load data from the blob if the
54
61
locale is Chakma:
55
62
56
-
```rust
57
-
// At the top of the file:
63
+
```rust, ignore
58
64
use icu::locale::locale;
59
65
use icu_provider_blob::BlobDataProvider;
60
66
61
-
// replace the date_formatter creation
62
67
let date_formatter = if locale == locale!("ccp") {
63
68
println!("Using buffer provider");
64
69
@@ -78,9 +83,10 @@ let date_formatter = if locale == locale!("ccp") {
78
83
};
79
84
```
80
85
81
-
Try using `ccp` now!
86
+
</details>
82
87
83
-
### JavaScript Part 3
88
+
<details>
89
+
<summary>JavaScript</summary>
84
90
85
91
Update the formatting logic to load data from the blob if the locale is Chakma. Note that this code uses a callback, as it does an HTTP request:
86
92
@@ -101,7 +107,7 @@ function load_blob(url, callback) {
let dateTimeFormatter =DateTimeFormatter.createYmdtWithProvider(
104
-
DataProvider.createFromBlob(blob),
110
+
DataProvider.fromByteSlice(blob),
105
111
locale,
106
112
DateTimeLength.Long,
107
113
);
@@ -116,6 +122,8 @@ if (localeStr == "ccp") {
116
122
}
117
123
```
118
124
125
+
</details>
126
+
119
127
Try using `ccp` now!
120
128
121
129
## 4. Slimming the data pack
@@ -124,7 +132,7 @@ Note: the following steps are currently only possible in Rust. 🤷
124
132
125
133
When we ran `icu4x-datagen`, we passed `--markers all`, which make it generate *all* data for the Chakma locale, even though we only need date formatting. We can make `icu4x-datagen` analyze our binary to figure out which markers are needed:
@@ -135,7 +143,7 @@ This should generate a lot fewer markers!
135
143
136
144
Let's look at the sizes:
137
145
138
-
```console
146
+
```shell
139
147
wc -c *.blob
140
148
# 5448603 ccp.blob
141
149
# 13711 ccp_smaller.blob
@@ -149,32 +157,29 @@ The last datagen invocation still produced a lot of markers, as you saw in its o
149
157
150
158
Replace the `DateTimeFormatter::try_new` calls with `FixedCalendarDateTimeFormatter::try_new`, and change the `format` invocation to convert the input to the Gregorian calendar:
The generic type of `FixedCalendarDateTimeFormatter` will be inferred from the input, which now has type `&Date<Gregorian>` now. Unlike `DateTimeFormatter`, `FixedCalendarDateTimeFormatter` never applies calendar conversions on its input, so it will be a `FixedCalendarDateTimeFormatter<Gregorian, ...>`.
164
+
The generic type of `FixedCalendarDateTimeFormatter` will be inferred from the input, which has type `&Date<Gregorian>` now. Unlike `DateTimeFormatter`, `FixedCalendarDateTimeFormatter` never applies calendar conversions on its input, so it will be a `FixedCalendarDateTimeFormatter<Gregorian, ...>`.
157
165
158
-
Now we can run datagen with `--markers-for-bin` again:
166
+
Now we can run datagen with `--markers-for-bin` again and the output should be much shorter:
0 commit comments