Skip to content

Commit a2de079

Browse files
committed
aSD
1 parent 3c84e45 commit a2de079

File tree

4 files changed

+306
-390
lines changed

4 files changed

+306
-390
lines changed

tutorials/date-picker-data.md renamed to tutorials/data-packs.md

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,18 @@
1-
# Interactive Date Picker - Custom Data
1+
# Introduction to ICU4X - Data packs
2+
3+
If you're happy shipping your app with the recommended set of locales included in `ICU4X`, you can stop reading now. If you want to include additional locales, do runtime data loading, or build your own complex data pipelines, this tutorial is for you.
24

35
In this tutorial, we will add additional locale data to your app. ICU4X compiled data contains data for hundreds of languages, but there are languages that have data in CLDR that are not included (generally because they don't have comprehensive coverage). For example, if you try using the locale `ccp` (Chakma) in your app, you will get output like `2023 M11 7`. Believe it or not, but this is not actually correct output for Chakma. Instead ICU4X fell back to the "root locale", which tries to be as neutral as possible. Note how it avoided calling the month by name by using `M11`, even though we requested a format with a non-numeric month name.
46

57
So, let's add some data for Chakma.
68

7-
## 1. Installing `icu4x-datagen`
9+
## 1. Prerequisites
10+
11+
This tutorial assumes you have finished the [introductory tutorial](quickstart.md) and continues where that tutorial left off. In particular, you should still have the latest version of your code.
812

913
Data generation is done using the `icu4x-datagen` tool, which pulls data from [Unicode's *Common Locale Data Repository* (*CLDR*)](http://cldr.unicode.org/index/downloads) and from `ICU4C` releases.
1014

11-
Verify that Rust is installed. If it's not, you can install it in a few seconds from [https://rustup.rs/](https://rustup.rs/).
15+
Verify that Rust is installed (even if you're following the JavaScript tutorial). If it's not, you can install it in a few seconds from [https://rustup.rs/](https://rustup.rs/).
1216

1317
```console
1418
cargo --version
@@ -31,12 +35,15 @@ icu4x-datagen --markers all --locales ccp --format blob --out ccp.blob
3135

3236
This will generate a `ccp.blob` file containing data for Chakma.
3337

38+
`icu4x-datagen` has many options, some of which we'll discover below. The default options should work for most purposes, but check out `icu4x-datagen --help` to learn more about fine-tuning your data.
39+
3440
💡 Note: if you're having technical difficulties, this file is available [here](https://storage.googleapis.com/static-493776/icu4x_2023-11-03/ccp.blob).
3541

3642

3743
## 3. Using the data pack
3844

39-
### Rust Part 3
45+
<details>
46+
<summary>Rust</summary>
4047

4148
To use blob data, we will need to add the `icu_provider_blob` crate to our project:
4249

@@ -54,11 +61,9 @@ Now, update the instantiation of the datetime formatter to load data from the bl
5461
locale is Chakma:
5562

5663
```rust
57-
// At the top of the file:
5864
use icu::locale::locale;
5965
use icu_provider_blob::BlobDataProvider;
6066

61-
// replace the date_formatter creation
6267
let date_formatter = if locale == locale!("ccp") {
6368
println!("Using buffer provider");
6469

@@ -78,9 +83,10 @@ let date_formatter = if locale == locale!("ccp") {
7883
};
7984
```
8085

81-
Try using `ccp` now!
86+
</details>
8287

83-
### JavaScript Part 3
88+
<details>
89+
<summary>JavaScript</summary>
8490

8591
Update the formatting logic to load data from the blob if the locale is Chakma. Note that this code uses a callback, as it does an HTTP request:
8692

@@ -116,6 +122,8 @@ if (localeStr == "ccp") {
116122
}
117123
```
118124

125+
</details>
126+
119127
Try using `ccp` now!
120128

121129
## 4. Slimming the data pack
Lines changed: 64 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,42 @@
1-
# Data management
2-
3-
This tutorial introduces data providers as well as the `icu4x-datagen` tool.
1+
# Introduction to ICU4X - Data slimming
42

53
If you're happy shipping your app with the recommended set of locales included in `ICU4X`, you can stop reading now. If you want to reduce code size, do runtime data loading, or build your own complex data pipelines, this tutorial is for you.
64

5+
In this tutorial, we will remove unneeded locale data from our app. ICU4X compiled data contains data for hundreds of languages, but not all locales might be required at runtime. Usually there is a fixed set that a user can choose from, which in our example is going to be Japanese and English (`ja` and `en`).
6+
77
## 1. Prerequisites
88

9-
This tutorial assumes you have finished the [introductory tutorial](quickstart.md) and continues where that tutorial left off. In particular, you should still have the latest version of code for `myapp`.
9+
This tutorial assumes you have finished the [introductory tutorial](quickstart.md) and continues where that tutorial left off. In particular, you should still have the latest version of your code.
10+
11+
Data generation is done using the `icu4x-datagen` tool, which pulls data from [Unicode's *Common Locale Data Repository* (*CLDR*)](http://cldr.unicode.org/index/downloads) and from `ICU4C` releases.
1012

11-
## 2. Generating data
13+
Verify that Rust is installed (even if you're following the JavaScript tutorial). If it's not, you can install it in a few seconds from [https://rustup.rs/](https://rustup.rs/).
1214

13-
Data generation is done using the `icu4x-datagen` tool, which pulls in data from [Unicode's *Common Locale Data Repository* (*CLDR*)](http://cldr.unicode.org/index/downloads) and from `ICU4C` releases to generate `ICU4X` data.
15+
```console
16+
cargo --version
17+
# cargo 1.86.0 (adf9b6ad1 2025-02-28)
18+
```
1419

15-
First we will need to install the binary:
20+
Now you can run
1621

1722
```console
1823
cargo install icu4x-datagen
1924
```
2025

21-
Get a coffee, this might take a while ☕.
26+
## 2. Generating custom data
2227

2328
Once installed, run:
2429

2530
```console
26-
icu4x-datagen --markers all --locales ja --format baked --pretty --out my_data
31+
icu4x-datagen --markers all --locales ja en --format baked --pretty --out my_data
2732
```
2833

29-
This will generate a `my_data` directory containing the data for all components in the `ja` locale.
34+
This will generate a `my_data` directory containing the data for all components in the `ja` and `en` locales.
3035

3136
`icu4x-datagen` has many options, some of which we'll discover below. The default options should work for most purposes, but check out `icu4x-datagen --help` to learn more about fine-tuning your data.
3237

33-
### Should you check in data to your repository?
38+
<details>
39+
<summary>Aside: Should you check in data to your repository?</summary>
3440

3541
You can check in the generated data to your version control system, or you can add it to a build script. There are pros and cons of both approaches.
3642

@@ -46,8 +52,12 @@ You should generate it automatically at build time if:
4652

4753
If you check in the generated data, it is recommended that you configure a job in continuous integration that verifies that the data in your repository reflects the latest CLDR/Unicode releases; otherwise, your app may drift out of date.
4854

55+
</details>
56+
4957
## 3. Using the generated data
5058

59+
Note: this section is currently only possible in Rust. 🤷
60+
5161
Once we have generated the data, we need to instruct `ICU4X` to use it. To do this, set the `ICU4X_DATA_DIR` during the compilation of your app:
5262

5363
```console
@@ -79,41 +89,26 @@ Because of these two data provider types, every `ICU4X` API has three constructo
7989

8090
## 5. Using the generated data explicitly
8191

92+
Note: this section is currently only possible in Rust. 🤷
93+
8294
The data we generated in section 2 is actually just Rust code defining `DataProvider` implementations for all markers using hardcoded data (go take a look!).
8395

8496
So far we've used it through the default `try_new` constructor by using the environment variable to replace the built-in data. However, we can also directly access the `DataProvider` implementations if we want, for example to combine it with other providers.
8597

8698
We include the generated code with the `include!` macro. The `impl_data_provider!` macro adds the generated implementations to any type.
8799

88-
```rust,compile_fail
89-
extern crate alloc; // required as my-data is written for #[no_std]
90-
use icu::locale::{locale, Locale};
91-
use icu::calendar::Date;
92-
use icu::datetime::{DateTimeFormatter, fieldsets::YMD};
93-
94-
const LOCALE: Locale = locale!("ja");
100+
Replace your `date_time_formatter` construction with the following code:
95101

96-
struct MyDataProvider;
102+
```rust,compile_fail
103+
extern crate alloc; // required as my_data is written for #[no_std]
97104
include!("../my_data/mod.rs");
105+
struct MyDataProvider;
98106
impl_data_provider!(MyDataProvider);
99107
100-
fn main() {
101-
let baked_provider = MyDataProvider;
102-
103-
let dtf = DateTimeFormatter::try_new_unstable(
104-
&baked_provider,
105-
LOCALE.into(),
106-
YMD::long()
107-
)
108-
.expect("ja data should be available");
109-
110-
let date = Date::try_new_iso(2020, 10, 14)
111-
.expect("date should be valid");
112-
113-
let formatted_date = dtf.format(&date);
114-
115-
println!("📅: {}", formatted_date);
116-
}
108+
// Create and use an ICU4X date formatter:
109+
let date_formatter = DateTimeFormatter::try_new_unstable(MyDataProvider, locale.into(), YMDT::medium())
110+
.expect("should have data for specified locale");
111+
println!("📅: {}", date_formatter.format(&iso_date_time));
117112
```
118113

119114
The `impl_data_provider!` code will require additional crates, see its documentation for a list.
@@ -152,52 +147,53 @@ This will generate a `my_data_blob.postcard` file containing the serialized data
152147

153148
### Locale Fallbacking
154149

150+
<details>
151+
<summary>Rust</summary>
152+
155153
Unlike `BakedDataProvider`, `BlobDataProvider` (and `FsDataProvider`) does not perform locale fallbacking. For example, if `en-US` is requested but only `en` data is available, then the data request will fail. To enable fallback, we can wrap the provider in a `LocaleFallbackProvider`.
156154

157155
Note that fallback comes at a cost, as fallbacking code and data has to be included and executed on every request. If you don't need fallback (disclaimer: you probably do), you can use the `BlobDataProvider` directly (for baked data, see [`Options::skip_internal_fallback`](https://docs.rs/icu_provider_baked/latest/icu_provider_baked/export/struct.Options.html)).
158156

159157
We can then use the provider in our code:
160158

161159
```rust,no_run
162-
use icu::locale::{locale, Locale, fallback::LocaleFallbacker};
163-
use icu::calendar::Date;
164-
use icu::datetime::{DateTimeFormatter, fieldsets::YMD};
160+
use icu::locale::fallback::LocaleFallbacker;
165161
use icu_provider_adapters::fallback::LocaleFallbackProvider;
166162
use icu_provider_blob::BlobDataProvider;
167163
168-
const LOCALE: Locale = locale!("ja");
164+
let blob = std::fs::read("my_data_blob.postcard").expect("Failed to read file");
165+
let buffer_provider =
166+
BlobDataProvider::try_new_from_blob(blob.into_boxed_slice())
167+
.expect("blob should be valid");
169168
170-
fn main() {
171-
let blob = std::fs::read("my_data_blob.postcard").expect("Failed to read file");
172-
let buffer_provider =
173-
BlobDataProvider::try_new_from_blob(blob.into_boxed_slice())
174-
.expect("blob should be valid");
169+
let fallbacker = LocaleFallbacker::try_new_with_buffer_provider(&buffer_provider)
170+
.expect("Provider should contain fallback rules");
175171
176-
let fallbacker = LocaleFallbacker::try_new_with_buffer_provider(&buffer_provider)
177-
.expect("Provider should contain fallback rules");
172+
let buffer_provider = LocaleFallbackProvider::new(buffer_provider, fallbacker);
178173
179-
let buffer_provider = LocaleFallbackProvider::new(buffer_provider, fallbacker);
174+
// Create and use an ICU4X date formatter:
175+
let date_formatter = DateTimeFormatter::try_new_with_buffer_provider(&buffer_provider, locale.into(), YMDT::medium())
176+
.expect("should have data for specified locale");
177+
println!("📅: {}", date_formatter.format(&iso_date_time));
178+
}
179+
```
180180

181-
let dtf = DateTimeFormatter::try_new_with_buffer_provider(
182-
&buffer_provider,
183-
LOCALE.into(),
184-
YMD::long()
185-
)
186-
.expect("blob should contain required markers and `ja` data");
181+
As you can see in the second `expect` message, it's not possible to statically tell whether the correct data markers are included. While `BakedDataProvider` would result in a compile error for missing `DataProvider<M>` implementations, `BlobDataProvider` returns runtime errors if markers are missing.
187182

188-
let date = Date::try_new_iso(2020, 10, 14)
189-
.expect("date should be valid");
183+
</details>
190184

191-
let formatted_date = dtf.format(&date);
185+
<details>
186+
<summary>JavaScript</summary>
192187

193-
println!("📅: {}", formatted_date);
194-
}
195-
```
188+
TODO
189+
190+
</details>
196191

197-
As you can see in the second `expect` message, it's not possible to statically tell whether the correct data markers are included. While `BakedDataProvider` would result in a compile error for missing `DataProvider<M>` implementations, `BlobDataProvider` returns runtime errors if markers are missing.
198192

199193
## 7. Data slicing
200194

195+
Note: this section is currently only possible in Rust. 🤷
196+
201197
You might have noticed that the blob we generated is a hefty 5MB. This is no surprise, as we used `--markers all`. However, our binary only uses date formatting data in Japanese. There's room for optimization:
202198

203199
```console
@@ -211,38 +207,13 @@ But there is more to optimize. You might have noticed this in the output of the
211207
We can instead use `FixedCalendarDateTimeFormatter<Gregorian>`, which only supports formatting `Date<Gregorian>`s:
212208

213209
```rust,no_run
214-
use icu::locale::{locale, Locale, fallback::LocaleFallbacker};
215-
use icu::calendar::{Date, Gregorian};
216-
use icu::datetime::{FixedCalendarDateTimeFormatter, fieldsets::YMD};
217-
use icu_provider_adapters::fallback::LocaleFallbackProvider;
218-
use icu_provider_blob::BlobDataProvider;
219-
220-
const LOCALE: Locale = locale!("ja");
221-
222-
fn main() {
223-
let blob = std::fs::read("my_data_blob.postcard").expect("Failed to read file");
224-
let buffer_provider =
225-
BlobDataProvider::try_new_from_blob(blob.into_boxed_slice())
226-
.expect("blob should be valid");
227-
228-
let fallbacker = LocaleFallbacker::try_new_with_buffer_provider(&buffer_provider)
229-
.expect("Provider should contain fallback rules");
230-
231-
let buffer_provider = LocaleFallbackProvider::new(buffer_provider, fallbacker);
232-
233-
let dtf = FixedCalendarDateTimeFormatter::<Gregorian, _>::try_new_with_buffer_provider(
234-
&buffer_provider,
235-
LOCALE.into(),
236-
YMD::long(),
237-
)
238-
.expect("blob should contain required data");
239-
240-
let date = Date::try_new_gregorian(2020, 10, 14)
241-
.expect("date should be valid");
242-
243-
let formatted_date = dtf.format(&date);
210+
use icu::datetime::FixedCalendarDateTimeFormatter;
211+
use icu::calendar::cal::Gregorian;
244212
245-
println!("📅: {}", formatted_date);
213+
// Create and use an ICU4X date formatter:
214+
let date_formatter = FixedCalendarDateTimeFormatter::try_new(locale.into(), YMDT::medium())
215+
.expect("should have data for specified locale");
216+
println!("📅: {}", date_formatter.format(&iso_date_time.to_calendar(Gregorian)));
246217
}
247218
```
248219

@@ -260,6 +231,6 @@ These API-level optimizations also apply to compiled data (there's no need to us
260231

261232
We have learned how to generate data and load it into our programs, optimize data size, and gotten to know the different data providers that are part of `ICU4X`.
262233

263-
For a deeper dive into configuring your data providers in code, see [data-provider-runtime.md].
234+
For a deeper dive into configuring your data providers in code, see [the runtime data provider tutorial](data-provider-runtime.md).
264235

265236
You can learn more about datagen, including the Rust API which we have not used in this tutorial, by reading [the docs](https://docs.rs/icu_provider_export/latest/).

0 commit comments

Comments
 (0)