Skip to content

Commit 30465b7

Browse files
authored
Merge pull request #600 from RafsanNeloy/secondary
Added Wiki Dump Docs for Download and Usage
2 parents 535a486 + 0462c93 commit 30465b7

File tree

1 file changed

+135
-5
lines changed

1 file changed

+135
-5
lines changed

docs/source/scribe_data/cli.rst

Lines changed: 135 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ The Scribe-Data CLI supports the following commands:
3030
2. ``get`` (alias: ``g``)
3131
3. ``total`` (alias: ``t``)
3232
4. ``convert`` (alias: ``c``)
33+
5. ``download`` (alias: ``d``)
3334

3435
Note: For all language arguments, if the language is more than one word then the argument value needs to be passed with double quotes around it.
3536

@@ -159,6 +160,55 @@ Examples:
159160
Getting and formatting English verbs
160161
Data updated: 100%|████████████████████████| 1/1 [00:XY<00:00, XY.Zs/process]
161162
163+
If we want to retrieve data using lexeme dumps, we can use the following command:
164+
165+
.. code-block:: bash
166+
167+
$ scribe-data get -lang german -dt nouns -wdp
168+
169+
**Example Output:**
170+
171+
.. code-block:: text
172+
173+
Languages to process: German
174+
Data types to process: ['nouns']
175+
Existing dump files found:
176+
- scribe_data_wikidata_dumps_export/latest-lexemes.json.bz2
177+
? Do you want to: (Use arrow keys)
178+
» Delete existing dumps
179+
Skip download
180+
Use existing latest dump
181+
Download new version
182+
183+
**Instructions:**
184+
185+
1. Use the arrow keys to navigate through the options.
186+
2. Press **Enter** to confirm your selection.
187+
188+
**Options Explained:**
189+
190+
- **Delete existing dumps**: Removes the existing dump files before downloading new ones.
191+
- **Skip download**: Skips the download process.
192+
- **Use existing latest dump**: Processes the existing dump file without downloading a new version.
193+
- **Download new version**: Downloads the latest version of the lexeme dump.
194+
195+
**Note:** Ensure you have sufficient disk space and a stable internet connection if downloading a new version.
196+
197+
**If No Existing Dump Files Are Found:**
198+
199+
1. If no existing dump files are found, the command will display the following message:
200+
201+
.. code-block:: text
202+
203+
No existing dump files found. Downloading new version...
204+
205+
2. The command will then proceed to download the latest dump file:
206+
.. code-block:: text
207+
208+
Downloading dump to scribe_data_wikidata_dumps_export\latest-lexemes.json.bz2...
209+
scribe_data_wikidata_dumps_export\latest-lexemes.json.bz2: 100%|███████████████████| 370M/370M [04:20<00:00, 1.42MiB/s]
210+
Wikidata lexeme dump download completed successfully!
211+
162212
Behavior and Output:
163213
^^^^^^^^^^^^^^^^^^^^
164214

@@ -304,11 +354,36 @@ If user selects ``Configure total lexemes request``:
304354
305355
Language Data Type Total Lexemes
306356
======================================================================
307-
english nouns 30,841
308-
adjectives 12,840
309-
310-
basque nouns 14,498
311-
adjectives 278
357+
english nouns 123,456
358+
adjectives 234,567
359+
360+
basque nouns 34,567
361+
adjectives 250
362+
363+
The command ``scribe-data total -lang english -wdp`` retrieves total lexeme and translation counts for English, checks dumps, and provides detailed statistics.
364+
365+
.. code-block::
366+
367+
$ scribe-data total -lang english -wdp
368+
Languages to process: English
369+
Data types to process: None
370+
Existing dump files found:
371+
- scribe_data_wikidata_dumps_export/latest-lexemes.json.bz2
372+
? Do you want to: Use existing latest dump
373+
We'll use the following lexeme dump scribe_data_wikidata_dumps_export/latest-lexemes.json.bz2
374+
Processing entries: 100%|████████████████████████████████████████████████████| 1406276/1406276 [15:25<00:14, 1495.97it/s]
375+
Language Data Type Total Lexemes Total Translations
376+
==========================================================================================
377+
english nouns 123,456 12,345
378+
adjectives 345,678 2,345
379+
adverbs 45,678 345
380+
verbs 5,678 4,567
381+
proper_nouns 6,789 5,678
382+
prepositions 789 100
383+
conjunctions 75 25
384+
pronouns 50 25
385+
personal_pronouns 25 50
386+
postpositions 1
312387
313388
Features:
314389
^^^^^^^^^
@@ -327,6 +402,22 @@ The interactive mode is particularly useful for:
327402
- Complex queries with multiple parameters.
328403
- Viewing available options without memorizing commands.
329404

405+
Root Interactive Command
406+
~~~~~~~~~~~~~~~~~~~~~~~~~
407+
.. code-block:: bash
408+
409+
$ scribe-data interactive
410+
Welcome to Scribe-Data v4.1.0 interactive mode!
411+
? What would you like to do? (Use arrow keys)
412+
» Download a Wikidata lexemes dump
413+
Check for totals
414+
Get data
415+
Get translations
416+
Convert JSON
417+
Exit
418+
419+
The command ``scribe-data interactive`` initiates the interactive mode, allowing users to easily select and execute various Scribe-Data operations.
420+
330421
Total Command
331422
~~~~~~~~~~~~~
332423

@@ -426,3 +517,42 @@ Options:
426517
- ``-f, --file FILE``: The file to convert to a new type.
427518
- ``-ko, --keep-original``: Whether to keep the file to be converted (default: True).
428519
- ``-ot, --output-type {json,csv,tsv,sqlite}``: The output file type.
520+
521+
Download Command
522+
~~~~~~~~~~~~~~~~
523+
Usage:
524+
525+
.. code-block:: bash
526+
527+
scribe-data download
528+
529+
Behavior and Output:
530+
^^^^^^^^^^^^^^^^^^^^
531+
532+
- **If Existing Dump Files Are Found:**
533+
534+
1. If existing dump files are found, the command will display the following message:
535+
536+
.. code-block:: text
537+
538+
Existing dump files found:
539+
- scribe_data_wikidata_dumps_export/latest-lexemes.json.bz2
540+
541+
2. The command will prompt the user with options to choose from:
542+
543+
.. code-block:: text
544+
545+
? Do you want to: (Use arrow keys)
546+
» Delete existing dumps
547+
Skip download
548+
Use existing latest dump
549+
Download new version
550+
- **If Downloading New Version:**
551+
552+
1. If the user chooses to proceed with the download, the dump will be downloaded to the specified directory:
553+
554+
.. code-block:: text
555+
556+
Downloading dump to scribe_data_wikidata_dumps_export\latest-lexemes.json.bz2...
557+
scribe_data_wikidata_dumps_export\latest-lexemes.json.bz2: 100%|███████████████████| 370M/370M [04:20<00:00, 1.42MiB/s]
558+
Wikidata lexeme dump download completed successfully!

0 commit comments

Comments
 (0)