Skip to content

Commit 1485cf0

Browse files
committed
aSD
1 parent ebdead3 commit 1485cf0

File tree

12 files changed

+353
-440
lines changed

12 files changed

+353
-440
lines changed

examples/cargo/README.md

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

provider/baked/src/export.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
//!
77
//! This module can be used as a target for the `icu_provider_export` crate.
88
//!
9-
//! See our [datagen tutorial](https://github.com/unicode-org/icu4x/blob/main/tutorials/data-management.md) for more information about different data providers.
9+
//! See our [tutorials](https://github.com/unicode-org/icu4x/blob/main/tutorials) for more information about different data providers.
1010
//!
1111
//! # Examples
1212
//!

provider/blob/src/export/mod.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
//!
77
//! This module can be used as a target for the `icu_provider_export` crate.
88
//!
9-
//! See our [datagen tutorial](https://github.com/unicode-org/icu4x/blob/main/tutorials/data-management.md) for more information about different data providers.
9+
//! See our [tutorials](https://github.com/unicode-org/icu4x/blob/main/tutorials) for more information about different data providers.
1010
//!
1111
//! # Examples
1212
//!

provider/export/README.md

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

provider/export/src/lib.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
//!
88
//! For command-line usage, see the [`icu4x-datagen` binary](https://crates.io/crate/icu4x-datagen).
99
//!
10-
//! Also see our [datagen tutorial](https://github.com/unicode-org/icu4x/blob/main/tutorials/data-management.md).
10+
//! See our [tutorials](https://github.com/unicode-org/icu4x/blob/main/tutorials) for more information about different data providers.
1111
//!
1212
//! # Examples
1313
//!

provider/fs/src/export/mod.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
//!
77
//! This module can be used as a target for the `icu_provider_export` crate.
88
//!
9-
//! See our [datagen tutorial](https://github.com/unicode-org/icu4x/blob/main/tutorials/data-management.md) for more information about different data providers.
9+
//! See our [tutorials](https://github.com/unicode-org/icu4x/blob/main/tutorials) for more information about different data providers.
1010
//!
1111
//! # Examples
1212
//!

tools/md-tests/src/lib.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,12 @@ mod readme {}
1010
mod tutorials {
1111
#[doc = include_str!("../../../tutorials/quickstart.md")]
1212
mod quickstart_md {}
13-
#[doc = include_str!("../../../tutorials/date-picker.md")]
14-
mod date_picker_md {}
13+
#[doc = include_str!("../../../tutorials/data-packs.md")]
14+
mod data_packs_md {}
1515
#[doc = include_str!("../../../tutorials/data-provider-runtime.md")]
1616
mod data_provider_runtime_md {}
17-
#[doc = include_str!("../../../tutorials/data-management.md")]
18-
mod data_management_md {}
17+
#[doc = include_str!("../../../tutorials/data-slimming.md")]
18+
mod data_slimming_md {}
1919
}
2020

2121
mod documents {

tutorials/README.md

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

tutorials/date-picker-data.md renamed to tutorials/data-packs.md

Lines changed: 38 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,64 +1,69 @@
1-
# Interactive Date Picker - Custom Data
1+
# Introduction to ICU4X - Data packs
2+
3+
If you're happy shipping your app with the recommended set of locales included in `ICU4X`, you can stop reading now. If you want to include additional locales, do runtime data loading, or build your own complex data pipelines, this tutorial is for you.
24

35
In this tutorial, we will add additional locale data to your app. ICU4X compiled data contains data for hundreds of languages, but there are languages that have data in CLDR that are not included (generally because they don't have comprehensive coverage). For example, if you try using the locale `ccp` (Chakma) in your app, you will get output like `2023 M11 7`. Believe it or not, but this is not actually correct output for Chakma. Instead ICU4X fell back to the "root locale", which tries to be as neutral as possible. Note how it avoided calling the month by name by using `M11`, even though we requested a format with a non-numeric month name.
46

57
So, let's add some data for Chakma.
68

7-
## 1. Installing `icu4x-datagen`
9+
## 1. Prerequisites
10+
11+
This tutorial assumes you have finished the [introductory tutorial](quickstart.md) and continues where that tutorial left off. In particular, you should still have the latest version of your code.
812

913
Data generation is done using the `icu4x-datagen` tool, which pulls data from [Unicode's *Common Locale Data Repository* (*CLDR*)](http://cldr.unicode.org/index/downloads) and from `ICU4C` releases.
1014

11-
Verify that Rust is installed. If it's not, you can install it in a few seconds from [https://rustup.rs/](https://rustup.rs/).
15+
Verify that Rust is installed (even if you're following the JavaScript tutorial). If it's not, you can install it in a few seconds from [https://rustup.rs/](https://rustup.rs/).
1216

13-
```console
17+
```shell
1418
cargo --version
1519
# cargo 1.86.0 (adf9b6ad1 2025-02-28)
1620
```
1721

1822
Now you can run
1923

20-
```console
24+
```shell
2125
cargo install icu4x-datagen
2226
```
2327

2428
## 2. Generating the data pack
2529

2630
We're ready to generate the data. We will use the blob format, and create a blob that will contain just Chakma data. At runtime we can then load it as needed.
2731

28-
```console
32+
```shell
2933
icu4x-datagen --markers all --locales ccp --format blob --out ccp.blob
3034
```
3135

3236
This will generate a `ccp.blob` file containing data for Chakma.
3337

38+
`icu4x-datagen` has many options, some of which we'll discover below. The default options should work for most purposes, but check out `icu4x-datagen --help` to learn more about fine-tuning your data.
39+
3440
💡 Note: if you're having technical difficulties, this file is available [here](https://storage.googleapis.com/static-493776/icu4x_2023-11-03/ccp.blob).
3541

3642

3743
## 3. Using the data pack
3844

39-
### Rust Part 3
45+
<details>
46+
<summary>Rust</summary>
4047

4148
To use blob data, we will need to add the `icu_provider_blob` crate to our project:
4249

43-
```console
50+
```shell
4451
cargo add icu_provider_blob --features alloc
4552
```
4653

4754
We also need to enable the `serde` feature on the `icu` crate to enable deserialization support:
4855

49-
```console
56+
```shell
5057
cargo add icu --features serde
5158
```
5259

5360
Now, update the instantiation of the datetime formatter to load data from the blob if the
5461
locale is Chakma:
5562

56-
```rust
57-
// At the top of the file:
63+
```rust, ignore
5864
use icu::locale::locale;
5965
use icu_provider_blob::BlobDataProvider;
6066
61-
// replace the date_formatter creation
6267
let date_formatter = if locale == locale!("ccp") {
6368
println!("Using buffer provider");
6469
@@ -78,9 +83,10 @@ let date_formatter = if locale == locale!("ccp") {
7883
};
7984
```
8085

81-
Try using `ccp` now!
86+
</details>
8287

83-
### JavaScript Part 3
88+
<details>
89+
<summary>JavaScript</summary>
8490

8591
Update the formatting logic to load data from the blob if the locale is Chakma. Note that this code uses a callback, as it does an HTTP request:
8692

@@ -101,7 +107,7 @@ function load_blob(url, callback) {
101107
if (localeStr == "ccp") {
102108
load_blob("https://storage.googleapis.com/static-493776/icu4x_2023-11-03/ccp.blob", (blob) => {
103109
let dateTimeFormatter = DateTimeFormatter.createYmdtWithProvider(
104-
DataProvider.createFromBlob(blob),
110+
DataProvider.fromByteSlice(blob),
105111
locale,
106112
DateTimeLength.Long,
107113
);
@@ -116,6 +122,8 @@ if (localeStr == "ccp") {
116122
}
117123
```
118124

125+
</details>
126+
119127
Try using `ccp` now!
120128

121129
## 4. Slimming the data pack
@@ -124,7 +132,7 @@ Note: the following steps are currently only possible in Rust. 🤷
124132

125133
When we ran `icu4x-datagen`, we passed `--markers all`, which make it generate *all* data for the Chakma locale, even though we only need date formatting. We can make `icu4x-datagen` analyze our binary to figure out which markers are needed:
126134

127-
```console
135+
```shell
128136
cargo build --release
129137
icu4x-datagen --markers-for-bin target/release/tutorial --locales ccp --format blob --out ccp_smaller.blob
130138
```
@@ -135,7 +143,7 @@ This should generate a lot fewer markers!
135143

136144
Let's look at the sizes:
137145

138-
```console
146+
```shell
139147
wc -c *.blob
140148
# 5448603 ccp.blob
141149
# 13711 ccp_smaller.blob
@@ -149,32 +157,29 @@ The last datagen invocation still produced a lot of markers, as you saw in its o
149157

150158
Replace the `DateTimeFormatter::try_new` calls with `FixedCalendarDateTimeFormatter::try_new`, and change the `format` invocation to convert the input to the Gregorian calendar:
151159

152-
```rust
160+
```rust,ignore
153161
println!("Date: {}", date_formatter.format(&iso_date.to_calendar(Gregorian)));
154162
```
155163

156-
The generic type of `FixedCalendarDateTimeFormatter` will be inferred from the input, which now has type `&Date<Gregorian>` now. Unlike `DateTimeFormatter`, `FixedCalendarDateTimeFormatter` never applies calendar conversions on its input, so it will be a `FixedCalendarDateTimeFormatter<Gregorian, ...>`.
164+
The generic type of `FixedCalendarDateTimeFormatter` will be inferred from the input, which has type `&Date<Gregorian>` now. Unlike `DateTimeFormatter`, `FixedCalendarDateTimeFormatter` never applies calendar conversions on its input, so it will be a `FixedCalendarDateTimeFormatter<Gregorian, ...>`.
157165

158-
Now we can run datagen with `--markers-for-bin` again:
166+
Now we can run datagen with `--markers-for-bin` again and the output should be much shorter:
159167

160-
```console
168+
```shell
161169
cargo build --release
162170
icu4x-datagen --markers-for-bin target/release/tutorial --locales ccp --format blob --out ccp_smallest.blob
171+
# ...
172+
# 2025-05-14T14:26:52.306Z INFO [icu_provider_export::export_impl] Generated marker DatetimeNamesMonthGregorianV1
173+
# 2025-05-14T14:26:52.308Z INFO [icu_provider_export::export_impl] Generated marker DatetimeNamesYearGregorianV1
174+
# 2025-05-14T14:26:52.312Z INFO [icu_provider_export::export_impl] Generated marker DatetimePatternsDateGregorianV1
175+
# 2025-05-14T14:26:52.324Z INFO [icu_provider_export::export_impl] Generated marker DecimalDigitsV1
176+
# 2025-05-14T14:26:52.325Z INFO [icu_provider_export::export_impl] Generated marker DecimalSymbolsV1
177+
# ...
163178
```
164179

165-
The output will be much shorter:
166-
167-
```console
168-
2025-05-14T14:26:52.306Z INFO [icu_provider_export::export_impl] Generated marker DatetimeNamesMonthGregorianV1
169-
2025-05-14T14:26:52.308Z INFO [icu_provider_export::export_impl] Generated marker DatetimeNamesYearGregorianV1
170-
2025-05-14T14:26:52.312Z INFO [icu_provider_export::export_impl] Generated marker DatetimePatternsDateGregorianV1
171-
2025-05-14T14:26:52.324Z INFO [icu_provider_export::export_impl] Generated marker DecimalDigitsV1
172-
2025-05-14T14:26:52.325Z INFO [icu_provider_export::export_impl] Generated marker DecimalSymbolsV1
173-
```
174-
175-
And the blob will also be much smaller at the sizes:
180+
The blob should also be even smaller:
176181

177-
```console
182+
```shell
178183
wc -c *.blob
179184
# 5448603 ccp.blob
180185
# 13711 ccp_smaller.blob

0 commit comments

Comments
 (0)