AMAP, Alibaba Group
Progressive Implicit CoT distillation training framework of GPlan. Uses curriculum learning to compress structured CoT texts into special think tokens epoch by epoch, distilling implicit reasoning capabilities.
🚧 Coming Soon
The GSISR dataset is collected from Amap. All data have been anonymized to protect user privacy — original feature names, POI identifiers, and user identifiers have been replaced with generic placeholders.
Each user is described by 14 anonymized profile features. All categorical values have been mapped to numerical IDs.
| Field | Description |
|---|---|
| User ID | A unique numerical identifier for each user. |
| Profile Feature 1–14 | Anonymized user profile attributes. |
| Short-term Behavior Seq | Anonymized short-term behavior sequence. POI names and behavior types (e.g., click) are replaced with numerical IDs. |
| Long-term Behavior | Anonymized long-term behavior feature. Original values are replaced with numerical IDs. |
Each planning request includes real-time contextual information.
| Field | Description |
|---|---|
| Current Time | Timestamp of the planning request. |
| Weekend Flag | Whether the current day is a weekend (0/1). |
| Holiday Flag | Whether the current day is a holiday (0/1). |
| Current City & District | The city and district where the user is located (mapped to IDs). |
| Current POI Name | The name of the user's current Point of Interest (mapped to ID). |
| Current POI Category | The category/tag of the current POI (mapped to ID). |
Each request may include up to 5 trigger event features that capture the user's immediate intent signals.
| Field | Description |
|---|---|
| Trigger 1–5 | Anonymized event trigger features. Original event types and descriptions are replaced with numerical IDs. |
Each label consists of two parts:
1. Chain-of-Thought (CoT) text — A structured reasoning process wrapped in XML tags:
<THOUGHT>
<CONTEXT>Briefly analyze the current context and the user's potential needs</CONTEXT>
<STRATEGY>Based on the context analysis, devise the core strategy for the plan</STRATEGY>
<STEP_1>Focus on analyzing the primary and most crucial intent</STEP_1>
...
<STEP_n>Explain why the n-th intent is recommended</STEP_n>
</THOUGHT>
2. Intent sequence — A JSON array of tool-calling intents representing the recommendation. Each intent includes a tool name and associated parameters selected from a predefined intent library:
[
{"工具名称": "tool_5", "起始位置": "当前位置", "空间范围": "附近", "tag": "美食"},
{"工具名称": "tool_2", "起始位置": "当前位置", "终点位置": "家"},
...
]The intent library includes 10 tool types covering scenarios such as ride-hailing, navigation, transit, POI recommendation, order reminders, weather queries, etc.
pip install -r requirements.txt
bash finetune.sh├── finetune.py # Finetuning script (WeightedLossTrainer + SyncEpochCallback)
├── finetune.sh # Launch script example
├── data_process/
│ ├── collate_fns.py # Data collator (progressive distillation)
│ └── data_loader.py # Data loading
├── utils.py # Utility functions and argument definitions
├── add_tokens/extended_cot_vocabs.json # CoT special token vocabulary
├── config/ds_z3_bf16.json # DeepSpeed ZeRO-3 configuration
└── requirements.txt