This contains TypeScript functions which help when working with ATF, a semi-standardized text markup format used by the Cuneiform Digital Library Initiative as a way to transcribe the contents of cuniform tablets.
More specifically it contains a tokenizer to split ATF contents into separate characters. See here and here for similar projects. https://github.com/cdli-gh
https://cdli.mpiwg-berlin.mpg.de/search?f[provenience][]=Sippar-Amnanum%20(mod.%20Tell%20ed-Der)
The Cuniform Digital Library Initiative is a great resource and has a collection of ATF files. To export the ATF data into a single file call the following using the CDLI API client:
npm install -g https://github.com/cdli-gh/framework-api-client
npx cdli export \
--host https://cdli.mpiwg-berlin.mpg.de/ \
--entities inscriptions \
--format atf \
--output-file artifacts.atf
VSCode + Volar (and disable Vetur).
TypeScript cannot handle type information for .vue
imports by default, so we replace the tsc
CLI with vue-tsc
for type checking. In editors, we need Volar to make the TypeScript language service aware of .vue
types.
See Vite Configuration Reference.
npm install
npm run dev
npm run build
Run Unit Tests with Vitest
npm run test:unit
If a line is given but not all signs on the line are annotated, which is then the index of the signs: are there gaps possible. E.g. sign 1 2 4 5 are annotated 3 not.
/
= we don't know which of the following signs to read, but it should be one of these. Indexing? Is this a single index or more than one index? Three signs with the same index?
- example of character divider:
na-bi-{d}EN.ZU
, the combination of -{ is only one character divider. - example of compound verbs:
PA3(|IGI.RU|)
orE3(|UD.DU|)
- note the pipes within parentheses. We have to annotate both because the annotations need a goal and we still need to refer to the correct reading of a sign. - example of missing or too many signs: we use
<x>
to indicate sign(s) we think is/are missing or<<x>>
to indicate sign(s) that we think is/are wrongly put in, the missing signs will not be annotated but they still typically have an index, whereas the ones that are wrongly there will be annotated and also still have an index. - example of word mixing: the rule is that hyphens split syllables or words within Proper nouns, whereas dots split different signs part of a word in logographic writing. Syllables are written with lower case letters, logograms with upper case letters. One typical confusion is that in Proper nouns two words written with logographic writing can be split by hyphens if they each refer to two individual words, e.g.
{d}EN.ZU
for the god Sîn, but{d}EN.ZU-ZI
for the personal name Sîn-napišti (ZI
=napišti
). - example of ambiguity for upper case: most upper case written signs are logograms, but some are uncertain readings for a sign, i.e. we can see what sign it is, but we don’t know how to understand it.
#
= partial breakage, all signs after each other that are followed by a # will be parsed in classical publications to start and end with upper-half square brackets, e.g. ITI NE#.NE#.GAR# U4.5(disz).KAM parses to ITI ⸢NE.NE.GAR⸣ U4.5(disz).KAM. Most often these signs will also be annotated but sometimes they won't, depends on how bad the breakage is.!
= corrected reading, is often followed by a parentheses with the original wrong reading, e.g. na!(u4), the value we read is 'na' but on the tablet we see 'u4'.?
= uncertain identification of a sign[]
= complete broken of section, can contain signs, e.g. [IGI {d}]EN.ZU-ZI DUB.SAR, the signs within square brackets are never annotated. Can also just be [...] to indicate that the break contains things we can't estimate, sometimes they also contain a number of x's indicating a number of signs we assume to be there, e.g. [IGI x-x-x-x] DUB.SAR.<>
= signs that should be added, see above<<>>
= signs that should be removed, see above/
= we don't know which of the following signs to read, but it should be one of these.
Development by GhentCDH - Ghent University.
Funded by the GhentCDH research projects.