Skip to content

Commit 428dc82

Browse files
committed
rename 10k -> 4k
1 parent 5df0b92 commit 428dc82

File tree

2 files changed

+5
-4
lines changed

2 files changed

+5
-4
lines changed

recipes/A5000_24GB_x8/i18n-ja-wikipedia-step-10k.yaml recipes/A5000_24GB_x8/i18n-ja-wikipedia-step-4k.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
target_task: tasks/i18n/ja.md
22
base_model_id: TinyLlama/TinyLlama-1.1B-intermediate-step-715k-1.5T
3-
model_name: TinyLlama-1.5T-ja-wikipedia-step-10k
3+
model_name: tinyllama-ja-wikipedia-1.5T-step-4k
44
output_base_dir: /data/output
55
dataset_id: wikimedia/wikipedia
66
dataset_load_config: 20231101.ja
@@ -14,7 +14,7 @@ train_claim_gpu_num: 4
1414
train_per_device_train_batch_size: 8
1515
train_gradient_accumulation_steps: 4
1616
train_num_train_epochs: 4
17-
train_max_steps: 10000
17+
train_max_steps: 4000
1818
train_fp16: True
1919
inference_max_new_tokens: 32
2020
evaluations:

src/dataset/load.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,9 @@
33
import yaml
44
from datasets.load import load_dataset
55

6-
load_dataset("oscar")
7-
load_dataset("cc100")
6+
# load_dataset("oscar")
7+
load_dataset("cc100", "en")
8+
load_dataset("cc100", "ja")
89
load_dataset("cerebras/SlimPajama-627B")
910
load_dataset("bigcode/starcoderdata")
1011
load_dataset("Open-Orca/OpenOrca")

0 commit comments

Comments
 (0)