Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specification: MWEs in ParlaMint #236

Closed
matyaskopp opened this issue May 21, 2022 · 2 comments
Closed

Specification: MWEs in ParlaMint #236

matyaskopp opened this issue May 21, 2022 · 2 comments
Assignees
Labels
🕮 Documentation Improvements or additions to documentation enhancement New feature or request

Comments

@matyaskopp
Copy link
Collaborator

related to: #204

I have an idea for the implementation of MWEs in ParlaMint. We use <linkGrp> and <link> for annotating the relation between pairs of words. It is not possible to use it in MWEs' situation.

I am suggesting to use <spanGrp> and <span>. 17.3 Spans and Interpretations

<s>
  <w XML:id="w1">Nice</w>
  <w XML:id="w2">to</w>
  <w XML:id="w3">meet</w>
  <w XML:id="w4">you</w>
  <pc>!</pc>
  <spanGrp>
    <span from="#w1" to="#w4" ana="#greeting"/>
  </spanGrp>
</s>

If I correctly understand TEI documentation, there is another solution to how to encode span:

  <spanGrp>
    <span target="#w1 #w2 #w3 #w4" ana="#greeting"/>
  </spanGrp>

I am not sure if this is the best solution and which variant from-to/targer I prefer. This is just the first draft (I am planning to use MWEs in another project, so I am raring to know the ParlaMint solution, so I will be able to reuse it)

@matyaskopp matyaskopp added enhancement New feature or request 🕮 Documentation Improvements or additions to documentation labels May 21, 2022
@TomazErjavec
Copy link
Collaborator

My comments:

  • Do we actually need MWEs in ParlaMint? For what exactly?
  • We annotate MWEs in some Slovene corpora (http://hdl.handle.net/11356/1434), where we simply used linkGrp/link. The definition is "an association or hypertextual link among elements" which seems appropriate. But also note that the reason we used links is that (our) MWEs need not always involve adjecent words, i.e. they can have gaps which then forces you to list the tokens explicitly.
  • But if the MWE words are always contiguous then why not follow Parla-CLARIN which suggest using in-place <seg> elements?

@TomazErjavec TomazErjavec changed the title specification: MWEs in ParlaMint Specification: MWEs in ParlaMint May 23, 2022
@TomazErjavec
Copy link
Collaborator

We now use seg, so this is obsolete & closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🕮 Documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants