punctuation signs interrupting words #351

arlogriffiths · 2025-03-05T23:20:03Z

in cases like

<l n="a" enjamb="yes">d<choice><sic>i</sic><corr>ī</corr></choice>kṣā-nimittaṁ kila sarvva-bhāvā<g type="dotMid">.</g></l>
<l n="b">nāṁ svāvaśeṣaṁ vara-bhasma tatra <g type="dotMid">.</g></l>

or

<l n="a">saṁpady aśeṣāṁ pr̥thivīṁ niyogā<g type="dotMid">.</g>d</l>
<l n="b">yo bhasma-de<lb n="A9" break="yes"/>śām anuśāsti viṣṇum· <g type="dotMid">.</g></l>

(both examples taken from DHARMA_INSCIC00141 — https://dharmalekha.info/texts/INSCIC00141)

we find <g> interrupting words. Is any special measure required to ensure indexability of such words, or can Michaël's machine be taught to ignore any <g> altogether?

does EGD address such cases at all, Dan?

(cc to @salomepichon and @chhomkunthea for their information)

The text was updated successfully, but these errors were encountered:

arlogriffiths · 2025-03-06T00:45:34Z

I don't understand why a space is inserted after ° inside sarvva-bhāvā°nāṁ in physical display

michaelnmmeyer · 2025-03-06T04:05:15Z

I propose we address both these questions later on.

danbalogh · 2025-03-06T08:36:59Z

On the encoding side, what you have done is sufficient, but there's the option of flagging it as <orig>, which I would recommend, and for the second item, also the option of normalising it, which I would also recommend.
The applicable EGD bits are at the end of §4.2.1 (before the example with paṁcaviṁśati<choice><orig>|n</orig><reg>M|</reg></choice> tat-putro). The situation is also essentially the same as with space in your <l n="a">jayatīndrādidevāsya<space/>ś</l><l n="b">śrīmān yajñapatīśvaraḥ</l>, which is in EGD §4.3.3, though flagging/normalisation is not mentioned there (though perhaps it should; I've made a note of that for myself).
The EGD does not specifically address cases like this in verse, but here is one from my VengiCalukya0078:

<lg n="11" met="anuṣṭubh">
...
<l n="c">sarvva-lokāśraye yasmi<lb n="51" break="no"/><choice><orig>|n</orig><reg>N|</reg></choice></l>
<l n="d">saty<unclear>a</unclear>-rāj<unclear>e</unclear> sthitaṁ jagaT|</l>
</lg>

For word indexing, I'm sure it should not be a problem to ignore g elements.

arlogriffiths added the question Further information is requested label Mar 5, 2025

arlogriffiths assigned danbalogh and michaelnmmeyer Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

punctuation signs interrupting words #351

punctuation signs interrupting words #351

arlogriffiths commented Mar 5, 2025

arlogriffiths commented Mar 6, 2025

michaelnmmeyer commented Mar 6, 2025

danbalogh commented Mar 6, 2025

punctuation signs interrupting words #351

punctuation signs interrupting words #351

Comments

arlogriffiths commented Mar 5, 2025

arlogriffiths commented Mar 6, 2025

michaelnmmeyer commented Mar 6, 2025

danbalogh commented Mar 6, 2025