Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

punctuation signs interrupting words #351

Open
arlogriffiths opened this issue Mar 5, 2025 · 3 comments
Open

punctuation signs interrupting words #351

arlogriffiths opened this issue Mar 5, 2025 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@arlogriffiths
Copy link
Collaborator

@danbalogh @michaelnmmeyer

in cases like

<l n="a" enjamb="yes">d<choice><sic>i</sic><corr>ī</corr></choice>kṣā-nimittaṁ kila sarvva-bhāvā<g type="dotMid">.</g></l>
<l n="b">nāṁ svāvaśeṣaṁ vara-bhasma tatra <g type="dotMid">.</g></l>

or

<l n="a">saṁpady aśeṣāṁ pr̥thivīṁ niyogā<g type="dotMid">.</g>d</l>
<l n="b">yo bhasma-de<lb n="A9" break="yes"/>śām anuśāsti viṣṇum· <g type="dotMid">.</g></l>

(both examples taken from DHARMA_INSCIC00141 — https://dharmalekha.info/texts/INSCIC00141)

we find <g> interrupting words. Is any special measure required to ensure indexability of such words, or can Michaël's machine be taught to ignore any <g> altogether?

does EGD address such cases at all, Dan?

(cc to @salomepichon and @chhomkunthea for their information)

@arlogriffiths arlogriffiths added the question Further information is requested label Mar 5, 2025
@arlogriffiths
Copy link
Collaborator Author

I don't understand why a space is inserted after ° inside sarvva-bhāvā°nāṁ in physical display

Image

@michaelnmmeyer
Copy link
Member

I propose we address both these questions later on.

@danbalogh
Copy link
Collaborator

On the encoding side, what you have done is sufficient, but there's the option of flagging it as <orig>, which I would recommend, and for the second item, also the option of normalising it, which I would also recommend.
The applicable EGD bits are at the end of §4.2.1 (before the example with paṁcaviṁśati<choice><orig>|n</orig><reg>M|</reg></choice> tat-putro). The situation is also essentially the same as with space in your <l n="a">jayatīndrādidevāsya<space/>ś</l><l n="b">śrīmān yajñapatīśvaraḥ</l>, which is in EGD §4.3.3, though flagging/normalisation is not mentioned there (though perhaps it should; I've made a note of that for myself).
The EGD does not specifically address cases like this in verse, but here is one from my VengiCalukya0078:

<lg n="11" met="anuṣṭubh">
...
<l n="c">sarvva-lokāśraye yasmi<lb n="51" break="no"/><choice><orig>|n</orig><reg>N|</reg></choice></l>
<l n="d">saty<unclear>a</unclear>-rāj<unclear>e</unclear> sthitaṁ jagaT|</l>
</lg>

For word indexing, I'm sure it should not be a problem to ignore g elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants