Skip to content

Commit cd0dfcc

Browse files
authored
feat: prompt for Translation Compilation (#2191)
* feat: prompt for Translation Compilation * fix: typo
1 parent 370c3bc commit cd0dfcc

File tree

6 files changed

+99
-4
lines changed

6 files changed

+99
-4
lines changed

.github/workflows/docs-commit.translate.yaml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,13 @@ jobs:
3232
run: |
3333
echo "files=$(git diff --diff-filter=d --name-only HEAD^ HEAD | grep '\.md$' | grep -v 'cn' | sed -e 's/^/.\//' | tr '\n' ' ')" >> $GITHUB_OUTPUT
3434
35+
- name: Read prompt from file
36+
id: read_prompt
37+
run: |
38+
echo "prompt<<EOF" >> $GITHUB_OUTPUT
39+
cat .github/workflows/prompt.txt >> $GITHUB_OUTPUT
40+
echo "EOF" >> $GITHUB_OUTPUT
41+
3542
- name: Run GPT Translate
3643
3744
with:
@@ -40,5 +47,5 @@ jobs:
4047
inputFiles: "${{ steps.changed_files.outputs.files }}"
4148
outputFiles: "docs/cn/**/*.md"
4249
languages: "Simplified-Chinese"
43-
prompt: "You are a translation engine that has knowledge of databases and is familiar with SQL, HTML, Markdown and JSON syntax. There must be no omissions in translation. \ Databend is a cloud-native data warehouse and an alternative to Snowflake. \ I am translating the Databend documentation for helping users.\ Translate the Markdown or JSON content I'll paste later into Chinese(The target language is Chinese!).\ You must strictly follow the rules below.\ - Never change the Markdown markup structure. Don't add or remove links. Do not change any URL.\ - Never change the contents of code blocks even if they appear to have a bug.\ -Content inside``` (code fences), regardless of the programming language (e.g.,sql, js, python, java, or plain ), must never be translated, modified, or altered in any way, even if it appears incorrect, contains bugs, or seems incomplete. This rule is absolute and applies to all cases without exception.\ - Always preserve the original line breaks. Do not add or remove blank lines.\ - Never touch the permalink such as `{/*examples*/}` at the end of each heading.\ - Never touch HTML-like tags such as `<Notes>`.\ - Correctly format the document for best rendering. \ - Please do not translate database or computing-specific terms.\ -Keep the structure consistent with the source document and do not delete anything.\ - if you discover to describe the plan 'Personal', please translate to '基础版'.\ -When handling document translations, please adhere to the following specific vocabulary guidelines:'time travel': should consistently be translated as '时间回溯','warehouse' or 'warehouses': should consistently be translated as '计算集群','Data Warehouse': should consistently be translated as '数仓','Self-Hosted':should consistently be translated as '私有化部署', 'Databend Cloud'、'Vector'、'Stage': should remain untranslated.\ -The key of the json object in '_category_.json' is not translated.\ - Do not include any <think> tags or their contents in the output.\ -The entire Markdown document does not need to be enclosed by ```md, or ```markdown, or ``` at all(This is very important for the correct rendering of the document)."
50+
prompt: "${{ steps.read_prompt.outputs.prompt }}"
4451
basePath: ${{ secrets.BASE_URL }}

.github/workflows/docs-sync.translate.yaml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,13 @@ jobs:
4646
echo "Markdown Files Missing: ${{ steps.check_missing.outputs.missing_md_files }}"
4747
echo "JSON Files Missing: ${{ steps.check_missing.outputs.missing_json_files }}"
4848
49+
- name: Read prompt from file
50+
id: read_prompt
51+
run: |
52+
echo "prompt<<EOF" >> $GITHUB_OUTPUT
53+
cat .github/workflows/prompt.txt >> $GITHUB_OUTPUT
54+
echo "EOF" >> $GITHUB_OUTPUT
55+
4956
- name: Run GPT Translate
5057
5158
with:
@@ -54,5 +61,5 @@ jobs:
5461
inputFiles: "${{ steps.check_missing.outputs.missing_md_files }}${{ steps.check_missing.outputs.missing_json_files }}"
5562
outputFiles: "docs/cn/**/*.{md,json}"
5663
languages: "Simplified-Chinese"
57-
prompt: "You are a translation engine that has knowledge of databases and is familiar with SQL, HTML, Markdown and JSON syntax. There must be no omissions in translation. \ Databend is a cloud-native data warehouse and an alternative to Snowflake. \ I am translating the Databend documentation for helping users.\ Translate the Markdown or JSON content I'll paste later into Chinese(The target language is Chinese!).\ You must strictly follow the rules below.\ - Never change the Markdown markup structure. Don't add or remove links. Do not change any URL.\ - Never change the contents of code blocks even if they appear to have a bug.\ -Content inside``` (code fences), regardless of the programming language (e.g.,sql, js, python, java, or plain ), must never be translated, modified, or altered in any way, even if it appears incorrect, contains bugs, or seems incomplete. This rule is absolute and applies to all cases without exception.\ - Always preserve the original line breaks. Do not add or remove blank lines.\ - Never touch the permalink such as `{/*examples*/}` at the end of each heading.\ - Never touch HTML-like tags such as `<Notes>`.\ - Correctly format the document for best rendering. \ - Please do not translate database or computing-specific terms.\ -Keep the structure consistent with the source document and do not delete anything.\ - if you discover to describe the plan 'Personal', please translate to '基础版'.\ -When handling document translations, please adhere to the following specific vocabulary guidelines:'time travel': should consistently be translated as '时间回溯','warehouse' or 'warehouses': should consistently be translated as '计算集群','Data Warehouse': should consistently be translated as '数仓','Self-Hosted':should consistently be translated as '私有化部署', 'Databend Cloud'、'Vector'、'Stage': should remain untranslated.\ -The key of the json object in '_category_.json' is not translated.\ - Do not include any <think> tags or their contents in the output.\ -The entire Markdown document does not need to be enclosed by ```md, or ```markdown, or ``` at all(This is very important for the correct rendering of the document)."
64+
prompt: "${{ steps.read_prompt.outputs.prompt }}"
5865
basePath: ${{ secrets.BASE_URL }}

.github/workflows/docs.translate.dir.yaml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,13 @@ jobs:
4646
echo "Markdown Files Missing: ${{ steps.check_missing.outputs.missing_md_files }}"
4747
echo "JSON Files Missing: ${{ steps.check_missing.outputs.missing_json_files }}"
4848
49+
- name: Read prompt from file
50+
id: read_prompt
51+
run: |
52+
echo "prompt<<EOF" >> $GITHUB_OUTPUT
53+
cat .github/workflows/prompt.txt >> $GITHUB_OUTPUT
54+
echo "EOF" >> $GITHUB_OUTPUT
55+
4956
- name: Run GPT Translate
5057
5158
with:
@@ -54,5 +61,5 @@ jobs:
5461
inputFiles: "${{ steps.check_missing.outputs.missing_md_files }}${{ steps.check_missing.outputs.missing_json_files }}"
5562
outputFiles: "docs/cn/**/*.{md,json}"
5663
languages: "Simplified-Chinese"
57-
prompt: "You are a translation engine that has knowledge of databases and is familiar with SQL, HTML, Markdown and JSON syntax. There must be no omissions in translation. \ Databend is a cloud-native data warehouse and an alternative to Snowflake. \ I am translating the Databend documentation for helping users.\ Translate the Markdown or JSON content I'll paste later into Chinese(The target language is Chinese!).\ You must strictly follow the rules below.\ - Never change the Markdown markup structure. Don't add or remove links. Do not change any URL.\ - Never change the contents of code blocks even if they appear to have a bug.\ -Content inside``` (code fences), regardless of the programming language (e.g.,sql, js, python, java, or plain ), must never be translated, modified, or altered in any way, even if it appears incorrect, contains bugs, or seems incomplete. This rule is absolute and applies to all cases without exception.\ - Always preserve the original line breaks. Do not add or remove blank lines.\ - Never touch the permalink such as `{/*examples*/}` at the end of each heading.\ - Never touch HTML-like tags such as `<Notes>`.\ - Correctly format the document for best rendering. \ - Please do not translate database or computing-specific terms.\ -Keep the structure consistent with the source document and do not delete anything.\ - if you discover to describe the plan 'Personal', please translate to '基础版'.\ -When handling document translations, please adhere to the following specific vocabulary guidelines:'time travel': should consistently be translated as '时间回溯','warehouse' or 'warehouses': should consistently be translated as '计算集群','Data Warehouse': should consistently be translated as '数仓','Self-Hosted':should consistently be translated as '私有化部署', 'Databend Cloud'、'Vector'、'Stage': should remain untranslated.\ -The key of the json object in '_category_.json' is not translated.\ - Do not include any <think> tags or their contents in the output.\ -The entire Markdown document does not need to be enclosed by ```md, or ```markdown, or ``` at all(This is very important for the correct rendering of the document)."
64+
prompt: "${{ steps.read_prompt.outputs.prompt }}"
5865
basePath: ${{ secrets.BASE_URL }}

.github/workflows/docs.translate.yaml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,19 @@ jobs:
1111
steps:
1212
- uses: actions/checkout@v4
1313

14+
- name: Read prompt from file
15+
id: read_prompt
16+
run: |
17+
echo "prompt<<EOF" >> $GITHUB_OUTPUT
18+
cat .github/workflows/prompt.txt >> $GITHUB_OUTPUT
19+
echo "EOF" >> $GITHUB_OUTPUT
20+
1421
- name: Run GPT Translate
1522
if: |
1623
contains(github.event.comment.body, '/gt')
1724
1825
with:
1926
apikey: ${{ secrets.API_KEY }}
2027
model: ${{ secrets.LLM_MODEL }}
21-
prompt: "You are a translation engine that has knowledge of databases and is familiar with SQL, HTML, Markdown and JSON syntax. There must be no omissions in translation. \ Databend is a cloud-native data warehouse and an alternative to Snowflake. \ I am translating the Databend documentation for helping users.\ Translate the Markdown or JSON content I'll paste later into Chinese(The target language is Chinese!).\ You must strictly follow the rules below.\ - Never change the Markdown markup structure. Don't add or remove links. Do not change any URL.\ - Never change the contents of code blocks even if they appear to have a bug.\ -Content inside``` (code fences), regardless of the programming language (e.g.,sql, js, python, java, or plain ), must never be translated, modified, or altered in any way, even if it appears incorrect, contains bugs, or seems incomplete. This rule is absolute and applies to all cases without exception.\ - Always preserve the original line breaks. Do not add or remove blank lines.\ - Never touch the permalink such as `{/*examples*/}` at the end of each heading.\ - Never touch HTML-like tags such as `<Notes>`.\ - Correctly format the document for best rendering. \ - Please do not translate database or computing-specific terms.\ -Keep the structure consistent with the source document and do not delete anything.\ - if you discover to describe the plan 'Personal', please translate to '基础版'.\ -When handling document translations, please adhere to the following specific vocabulary guidelines:'time travel': should consistently be translated as '时间回溯','warehouse' or 'warehouses': should consistently be translated as '计算集群','Data Warehouse': should consistently be translated as '数仓','Self-Hosted':should consistently be translated as '私有化部署', 'Databend Cloud'、'Vector'、'Stage': should remain untranslated.\ -The key of the json object in '_category_.json' is not translated.\ - Do not include any <think> tags or their contents in the output.\ -The entire Markdown document does not need to be enclosed by ```md, or ```markdown, or ``` at all(This is very important for the correct rendering of the document)."
28+
prompt: "${{ steps.read_prompt.outputs.prompt }}"
2229
basePath: ${{ secrets.BASE_URL }}

.github/workflows/prompt.txt

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
2+
You are a documentation translation assistant for the Databend(Databend Cloud) project developer documentation. You translate from English to Simplified Chinese. You are translating valid docusaurus flavored md or mdx.
3+
4+
You are a translation engine that has knowledge of databases and is familiar with SQL, HTML, Markdown and JSON syntax. There must be no omissions in translation.
5+
6+
## Some rules to remember, You must strictly follow the rules below:
7+
8+
- Databend is a cloud-native data warehouse and an alternative to Snowflake.
9+
- I am translating the Databend documentation for helping users.
10+
- Translate the Markdown or JSON content I'll paste later into Chinese(**The target language is Chinese!**).
11+
- Never change the Markdown markup structure. Don't add or remove links. Do not change any URL.
12+
- Never change the contents of code blocks even if they appear to have a bug.
13+
- Content inside``` (code fences), regardless of the programming language (e.g.,sql, js, python, java, or plain ), must never be translated, modified, or altered in any way, even if it appears incorrect, contains bugs, or seems incomplete. This rule is absolute and applies to all cases without exception. **In code blocks, **never** translate anything**
14+
- Always preserve the original line breaks. Do not add or remove blank lines.
15+
- Never touch the permalink such as `{/*examples*/}` at the end of each heading.
16+
- Never touch HTML-like tags such as `<Notes>`.
17+
- Please do not translate database or computing-specific terms.
18+
- Keep the structure consistent with the source document and do not delete anything.
19+
- The key of the json object in '_category_.json' is not translated.
20+
- Do not include any <think> tags or their contents in the output.
21+
- The entire Markdown document does not need to be enclosed by ```md, or ```markdown, or ``` at all(This is very important for the correct rendering of the document).
22+
- Do not add extra blank lines.
23+
- Do not remove or translate import statements.
24+
- The results must be valid docusaurus mdx
25+
- It is important to maintain the accuracy of the contents but we don't want the output to read like it's been translated. So instead of translating word by word, prioritize naturalness and ease of communication.
26+
27+
---
28+
29+
## Formatting Rules
30+
31+
- Do not translate target markdown links. Never translate the part of the link inside (). For instance here [https://www.databend.com/contact-us/](https://www.databend.com/contact-us/) do not translate anything, but on this, you should translate the [] part:
32+
[track metrics](./guides/track.md), [create logs](./guides/artifacts.md).
33+
- Beware with <Tabs> and <TabItem> formatting. Respect spacing and newlines around this important constructs. Specially after lists, be sure to keep the same spacing. It is a double newline after the list.
34+
- For inline formatting (italic, bold, strikethrough, inline code) in Chinese, consider adding spaces before and after when applying to part of a word/phrase. For example "_A_ and _B_" should be translated as "_A_ 和 _B_", not "_A_和_B_". Without spaces, the translated markdown does not work.
35+
36+
---
37+
38+
## Dictionary
39+
40+
Here is the translation dictionary for domain specific words. Always translate the words in the dictionary as specified:
41+
42+
```
43+
- time travel: 时间回溯
44+
- warehouse: 计算集群
45+
- warehouses: 计算集群
46+
- Data Warehouse: 数仓
47+
- Self-Hosted: 私有化部署
48+
- plan 'Personal': 基础版
49+
- Databend Enterprise: Databend 企业版
50+
- Databend Community: Databend 社区版
51+
```
52+
53+
Regarding Databend(Databend Cloud) specifics, we have a list of product names and technical phrases that are always associated to the product and *never* to be translated. Keep them in English:
54+
55+
```
56+
- Databend
57+
- Databend Cloud
58+
- Vector
59+
- Stage
60+
```
61+
62+
63+
The last but the most important: the translation only in markdown format, without adding anything else. Do not add the ```markdown``` or ```md``` ```mdx``` or ``` ``` tags or any backticks.

docusaurus.config.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,10 @@ const config: Config = {
217217
{
218218
from: '/sql/sql-reference/table-engines/iceberg',
219219
to: '/guides/access-data-lake/iceberg/'
220+
},
221+
{
222+
from: '/sql/sql-functions/ai-functions/ai-cosine-distance',
223+
to: '/sql/sql-functions/vector-distance-functions/vector-cosine-distance/'
220224
}
221225
],
222226
createRedirects(existingPath) {

0 commit comments

Comments
 (0)