Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaced 'break' with 'beginning' for page|line|column|gathering in Guidelines and Specs #2634

Merged
merged 20 commits into from
Jan 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
ea3d132
replaced break with beginning for pb and cb
trishaoconnor Dec 10, 2024
9aaa9eb
Replace breaks with beginnings for att.breaking example
trishaoconnor Dec 10, 2024
69b6f21
Replace break with beginning in param
trishaoconnor Dec 10, 2024
77bf022
Replace break with beginning
trishaoconnor Dec 10, 2024
68d7f10
Replace break with beginning
trishaoconnor Dec 10, 2024
fdb054d
Fix missing s per @sydb.
martindholmes Jan 13, 2025
536a267
Apply recommended update to att.breaking
trishaoconnor Jan 16, 2025
b2c4465
Apply recommended changes toWD-NonStandardCharacters
trishaoconnor Jan 16, 2025
5f2fedf
Apply recommended changes toWD-NonStandardCharacters
trishaoconnor Jan 16, 2025
eaa34d6
Applied recommended changes to CO-CoreElements
trishaoconnor Jan 16, 2025
324be44
Applied recommended changes to DI-PrintDictionaries
trishaoconnor Jan 16, 2025
65d9eac
Applied recommended changes to DS-DefaultTextStructure
trishaoconnor Jan 16, 2025
0ae161d
Applied recommended changes to FT-TablesFormulaeGraphics
trishaoconnor Jan 16, 2025
f7faf25
Applied recommended changes to PH-PrimarySources
trishaoconnor Jan 16, 2025
8618a90
Applied recommended changes to SG-GentleIntroduction
trishaoconnor Jan 16, 2025
169067c
Changed 'beginning' back to 'break' in model.xml and param.xml
trishaoconnor Jan 16, 2025
fb8a4a4
Added missing Oxford comma
trishaoconnor Jan 20, 2025
861b9ba
Rewrite line beginning section
trishaoconnor Jan 20, 2025
14e2fbb
Change linebreaks to line divisions
trishaoconnor Jan 20, 2025
4de687a
dev updates
trishaoconnor Jan 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions P5/Source/Guidelines/en/CH-LanguagesCharacterSets.xml
Original file line number Diff line number Diff line change
Expand Up @@ -410,7 +410,7 @@ dates, and predefined value lists.</note></p></div>
depends largely on the purpose, external requirements, local
equipment and so forth, it is thus outside the scope of coverage
for these Guidelines. </p>
<p>It might however nevertheless be helpful to put some of the
<p>It might nevertheless be helpful to put some of the
terminology used for the rendering process in the context of the
discussion of this chapter. As was mentioned above, Unicode
encodes abstract characters, not specific glyphs. For any
Expand All @@ -421,12 +421,12 @@ dates, and predefined value lists.</note></p></div>
and which areas have to be left blank. If we want to print a character
from the Latin script, besides the selection of
the overall glyph shape, this process also requires that a
specific weight of the font has been selected, a specific size
specific weight and size of the font has been selected,
and to what degree the shape should be slanted. Beyond
individual characters, the overall typesetting process also
follows specific rules of how to calculate the distance between
characters, how much whitespace occurs between words, at which
points line breaks might occur and so forth. </p>
follows specific rules for calculating the distance between
characters, for determining how much whitespace occurs between any two words, and how long each line should be (and thus at which
points a new line begins), and so forth. </p>
<p>If we concern ourselves only with the rendering process of the
characters themselves, leaving out all these other parameters, we
will realize that of all the information required for this process, only a small
Expand Down
31 changes: 15 additions & 16 deletions P5/Source/Guidelines/en/CO-CoreElements.xml
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,7 @@ problem for text encoders. Suppose, for example, that we wish to
investigate a diachronic English corpus for occurrences of
<mentioned>tea-pot</mentioned> and <mentioned>teapot</mentioned>, to
find evidence for the point at which this compound becomes
lexicalized. Any case where the word is hyphenated across a linebreak,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use of the term has nothing to do with our <lb> element, and perhaps should just be left as “line break”. (Note the space — 18 of the 25 occurences of "line.?break" in the Guidelines have a space. (One has a hyphen, and six have the single word version.)

Copy link
Contributor Author

@trishaoconnor trishaoconnor Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further down on line 353, "line division" is used. For consistency, I've rewritten line 332 using "division" instead of "beginning" and "break". Hopefully, this substitution is acceptable?

"Any case where the word is hyphenated across a linebreak..."

Thank you very much, too, for spotting the different renderings of the term throughout the Guidelines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine. But with respect to the different renderings of the term, no pat on the back for just finding them; I have not gone out and fixed them, yet. 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched using the query that @martinascholger kindly provided when opening the issue. I thought that this would have caught every instance, including variations between line-break, line break and linebreak?
(page|line|column|gathering)[\s-]*breaks
I'll search for the occurrences that you found an fix them too. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on what regular expression language you are using, I think. (I am not sure on this, but I think in most languages “\s”, when inside square brackets, means “whitespace”; but in some languages means “an ‘s’”. It would be quite conceivable that it means “a backslash or an ‘s”’, too.)

lexicalized. Any case where the word is hyphenated across a line division,
like this: <eg xml:space="preserve"><![CDATA[tea-
pot]]></eg> is
ambiguous: there is no simple way of deciding which of the two
Expand Down Expand Up @@ -381,11 +381,11 @@ whitespace. </p>

<p> The <gi>lb</gi>, <gi>pb</gi>, and <gi>cb</gi> elements are notable
exceptions to this general rule, since their function is precisely to
represent (or replace) line, page, or column breaks, which, as noted
represent (or replace) line, page, or column beginnings, which, as noted
above, are generally considered to be equivalent to whitespace. These
elements provide a more reliable way of preserving the lineation,
pagination, etc of a source document, since the encoder should not
assume that (untagged) line breaks etc. in an XML source file will
pagination, etc. of a source document, since the encoder should not
assume that (untagged) line beginnings etc. in an XML source file will
necessarily be preserved. </p>

<p>To control the intended tokenization, the encoder may use the
Expand Down Expand Up @@ -2696,7 +2696,7 @@ appropriate value for the <att>rend</att> attribute. Suggested values
for <att>rend</att> include:
<list rend="bulleted">
<item><term>bulleted</term> (items preceded by bullets or similar markings)</item>
<item><term>inline</term> (items rendered within continuous prose, with no linebreaks)</item>
<item><term>inline</term> (items rendered within continuous prose, with no line divisions)</item>
<item><term>numbered</term> (items preceded by numbers or letters)</item>
<item><term>simple</term> (items rendered as blocks, but with no bullet or number)</item>
</list>
Expand Down Expand Up @@ -3492,11 +3492,11 @@ section <ptr target="#CORS6"/> and in section <ptr target="#SACR"/>.
<p>When a text has no pre-existing associated reference system of any
kind, these Guidelines recommend as a minimum that at least the page
boundaries of the source text be marked using one of the methods
outlined in this section. Retaining page breaks in the markup is also
outlined in this section. Retaining page boundaries in the markup is also
recommended for texts which have a detailed reference system of their
own. Line breaks in prose texts may be, but need not be, tagged.<note place="bottom">Many encoders find it convenient to retain the line
breaks of the original during data entry, to simplify proofreading,
but this may be done without inserting a tag for each line break of
own. Line divisions in prose texts may be, but need not be, encoded.<note place="bottom">Many encoders find it convenient to retain the line
divisions of the original during data entry, to simplify proofreading,
but this may be done without inserting an element for the beginning of each line in
the original.</note></p>
<div type="div3" xml:id="CORS1"><head>Using the <att>xml:id</att> and <att>n</att> Attributes</head>
<p>When traditional reference schemes represent a hierarchical
Expand Down Expand Up @@ -3900,7 +3900,7 @@ treated as a single word, a tagging such as the following is recommended:
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="CORS5-egXML-hq" source="#NONE">
...sed imp<lb break="no"/>erator dixit...
</egXML>
Where hyphenation appears before a line or page break, the encoder may
Where hyphenation appears at the end of a topographic line, a column, or a page, the encoder may
or may not choose to record the fact, either explicitly using an
appropriate Unicode character, or descriptively for example by means
of the <att>rend</att> attribute; see further <ptr target="#COPU-2"/>.</p>
Expand Down Expand Up @@ -5645,13 +5645,13 @@ metrical rather than typographic lines. In some modern or free verse,
it may be hard to decide whether the typographic line is to be
regarded as a verse line or not, but the distinction is quite clear
for verse following regular metrical patterns. Where a metrical line is
interrupted by a typographic line break, the encoder may choose to
ignore the fact entirely or to use the empty <gi>lb</gi> (line break)
interrupted by the start of a new typographic line, the encoder may choose to
ignore the fact entirely or to use the empty <gi>lb</gi> (line beginning)
element discussed in <ptr target="#CORS"/>. By convention, the start
of a metrical line implies the start of a typographic line; hence
there is no need to introduce an <gi>lb</gi> tag at the start of every
<gi>l</gi> element, but only at places where a new typographic line
starts within a metrical line, as in the following example:
starts within a metrical line, as in the following example:

<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="COVE-egXML-vm" source="#CO-eg-06">
<l>Of Mans First Disobedience, and<lb/> the Fruit</l>
Expand All @@ -5664,14 +5664,13 @@ starts within a metrical line, as in the following example:
In the original copy text, the presence of an ornamental capital at
the start of the poem means that the measure is not wide enough to
print the first four lines on four lines; instead each metrical line occupies
two typographic lines, with a break at the point indicated. Note that
two typographic lines, with the second beginning at the point indicated. Note that
this encoding makes no attempt to preserve information about the
whitespace or indentation associated with either kind of line; if regarded
as essential, this information would be recorded using the
<att>rend</att> or <att>rendition</att> attributes discussed in <ptr target="#STGA"/>. </p>
<p>The <gi>l</gi> element should not be used to represent typographic
lines in non-verse materials: if the line-breaking points in a prose
text are considered important for analysis, they should be marked with
lines in non-verse materials: if the lineation of a prose text is considered important for analysis, the beginning of each line should be marked with
the <gi>lb</gi> element. Alternatively, a neutral segmentation element
such as <gi>seg</gi> or <gi>ab</gi> may be used; see further
discussion of these elements in chapter <ptr target="#SA"/>. The
Expand Down
5 changes: 2 additions & 3 deletions P5/Source/Guidelines/en/DI-PrintDictionaries.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2227,8 +2227,7 @@ following three, which help to clarify some issues raised with particular urgenc
dictionaries, on account of the complexity of both their typography and their
information structure.<list rend="bulleted">
<item>(a) the <term>typographic view</term>—the
two-dimensional printed page, including information about line and page breaks
and other features of layout </item>
two-dimensional printed page, including information about lineation, pagination, and other features of layout </item>
<item>(b) the <term>editorial view</term>—the one-dimensional sequence of tokens
which can be seen as the input to the typesetting process; the wording and
punctuation of the text and the sequencing of items are visible in this view,
Expand All @@ -2243,7 +2242,7 @@ therefore hyphenated (<q>naut-</q>
<q>ical</q>); the typographic view of the dictionary preserves this information. In a
purely editorial view, the particular form in which the domain name is given in the
particular dictionary (as <q>nautical</q>, rather than <q>naut.</q>, <q>Naut.</q>, etc.)
would be preserved, but the fact of the line break would not. Font shifts might
would be preserved, but the fact that the word was split across two lines with a soft hyphen would not. Font shifts might
plausibly be included in either a strictly typographic or an editorial view. In the
lexical view, the only information preserved concerning domain would be some standard
symbol or string representing the nautical domain (e.g. <q>naut.</q>) regardless of the
Expand Down
6 changes: 3 additions & 3 deletions P5/Source/Guidelines/en/DS-DefaultTextStructure.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1506,7 +1506,7 @@ the <gi>div</gi> elements containing chapters of the text itself. (For the
<!-- ... -->
</div>
</egXML>
Alternatively, the pointers in the index might link to the page breaks
Alternatively, the pointers in the index might link to the page beginnings
at which a chapter begins, assuming that these have been included in
the markup:
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="DSFRONT-egXML-xd"><!-- ... -->
Expand Down Expand Up @@ -1629,7 +1629,7 @@ the work discussed earlier in this section: <egXML xmlns="http://www.tei-c.org/n
</docImprint>
</titlePage></front></egXML></p>
<p>Second, a characteristically verbose 17th century example. Note the
use of the <gi>lb</gi> tag to mark the line breaks of the original
use of the <gi>lb</gi> tag to mark the line beginnings of the original
where necessary:
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="DSTITL-egXML-dk" source="#DS-eg-06"><titlePage>
<docTitle>
Expand Down Expand Up @@ -1731,7 +1731,7 @@ transcription):
</div>
</back></egXML>
<!-- Smith, Wealth of Nations, 1776; index to vol 1 -->
Note that if the page breaks in the original source have also been
Note that if the pagination in the original source have also been
explicitly encoded, and given identifiers, the references to them in the
above index can more usefully be recorded as links. For example,
assuming that the encoding of page 461 of the original source starts
Expand Down
2 changes: 1 addition & 1 deletion P5/Source/Guidelines/en/FT-TablesFormulaeGraphics.xml
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ this is rarely if ever done in practice.</note-->
</p>
<p>The content of table elements is not limited to <gi>head</gi> and
<gi>row</gi>. Milestone elements such as <gi>cb</gi> and <gi>lb</gi>
allow breaks to be signalled inside tables; <gi>figure</gi> provides an
allow new columns or lines to be signalled inside tables; <gi>figure</gi> provides an
option for including data which is not amenable to normal row and cell
analysis; and other elements such as <gi>epigraph</gi> and
<gi>trailer</gi> provide options for including text which is clearly
Expand Down
4 changes: 2 additions & 2 deletions P5/Source/Guidelines/en/HD-Header.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1410,7 +1410,7 @@ it to mark italicised English words only.</p>
<p>The <att>withId</att> attribute may optionally be used to specify
how many of the occurrences of the element in question bear a value
for the global <att>xml:id</att> attribute, as in the following
example: <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="HD57-2-egXML-kf"><tagUsage gi="pb" occurs="321" withId="321"> Marks page breaks in the York
example: <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="HD57-2-egXML-kf"><tagUsage gi="pb" occurs="321" withId="321"> Marks page beginnings in the York
(1734) edition only </tagUsage></egXML> This indicates that the
<gi>pb</gi> element occurs 321 times, on each of which an identifier
is provided.</p>
Expand Down Expand Up @@ -1518,7 +1518,7 @@ not recommended for automatic processing.</p>
text. The <att>n</att> attribute on each <gi>div1</gi> and
<gi>div2</gi> contains the canonical reference for each such
division, in the form 'XX.yyy', where XX is the book number in Roman
numerals, and yyy the section number in arabic. Line breaks are
numerals, and yyy the section number in arabic. Line beginnings are
marked by empty <gi>lb</gi> elements, each of which includes the
through line number in Casaubon's edition as the value of its
<gi>n</gi> attribute.</p>
Expand Down
2 changes: 1 addition & 1 deletion P5/Source/Guidelines/en/NH-Non-hierarchical.xml
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ wound.</seg></p></egXML>
typographical and metrical line divisions correspond,
<gi>lb</gi> does not itself make a metrical claim: in encoding
verse from sources, such as Old English manuscripts, where
physical line breaks are not used to indicate metrical
physical line beginnings are not used to indicate metrical
lineation, the correspondence would break down entirely.</p>


Expand Down
10 changes: 5 additions & 5 deletions P5/Source/Guidelines/en/PH-PrimarySources.xml
Original file line number Diff line number Diff line change
Expand Up @@ -749,8 +749,8 @@ used to define a polygon of any shape using this coordinate system:-->
chapter provides ways of encoding such information: <list rend="bulleted">
<item>methods of recording editorial or other alterations to the text, such as expansion
of abbreviations, corrections, conjectures, etc. (section <ptr target="#PHCH"/>)</item>
<item>methods of describing important extra-linguistic phenomena in the source: unusual
spaces, lines, page and line breaks, changes of manuscript hand, etc. (section <ptr target="#PHPH"/>)</item>
<item>methods of describing important extra-linguistic phenomena in the source: pagination, lineation, unusual
spaces, changes of manuscript hand, etc. (section <ptr target="#PHPH"/>)</item>
<item>methods of representing aspects of layout such as spacing or lines <ptr target="#PHLAY"/>
</item>
<item>methods of representing material such as running heads, catch-words, and the like
Expand Down Expand Up @@ -2549,9 +2549,9 @@ referring to the zone marked in purple on the right
-->
<p>This approach assumes that the transcription will primarily be organized in the same way as
the physical layout of the source, using embedded transcription elements. Alternatively,
where the a non-embedded transcription has been provided, using the <gi>text</gi> element,
it is still possible to record gathering breaks, page breaks, column breaks, line breaks
etc in the source, using the elements described in section <ptr target="#CORS"/>. Detailed
where a non-embedded transcription has been provided, using the <gi>text</gi> element,
it is still possible to record gathering beginnings, page beginnings, column beginnings, line beginnings
etc. in the source, using the elements described in section <ptr target="#CORS"/>. Detailed
metadata about the physical make-up of a source will usually be summarized by the
<gi>physDesc</gi> component of an <gi>msDesc</gi> element discussed in <ptr target="#msph"/>. </p>

Expand Down
8 changes: 4 additions & 4 deletions P5/Source/Guidelines/en/SG-GentleIntroduction.xml
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ forth. And for certain types of analysis (most notably textual
criticism) the physical appearance of one particular printed or
manuscript source may be of importance: paradoxically, one may wish to
use descriptive markup to describe presentational features such as
typeface, line breaks, use of whitespace and so forth.</p>
typeface, original topographic lineation, use of whitespace, and so forth.</p>

<p>These textual structures overlap with one another in complex and
unpredictable ways. Particularly when dealing with texts as
Expand Down Expand Up @@ -346,7 +346,7 @@ document.<note place="bottom">The element names here have been chosen for
clarity of exposition; there is, however, a TEI element corresponding to
each<!--, so that this example may be regarded as TEI-conformable in the
sense that this term is defined in <ptr target="#CF"/>-->.</note> It will, however, serve as an introduction to the basic notions of XML.
Whitespace and line breaks have been added to the example for the
Whitespace and line divisions have been added to the example for the
sake of visual clarity only; they have no particular significance in the
XML encoding itself. Also, the line
<eg><![CDATA[<!-- more poems go here -->]]></eg>
Expand Down Expand Up @@ -1256,8 +1256,8 @@ only so that it can be clearly distinguished from the
structure of the document. As suggested above, one common example is
the need, when processing an XML document for printed output, to
include a suggestion that the formatting processor might use to
determine where to begin a new page of output. Page-breaking decisions
are usually best made by the formatting engine alone, but there will
determine where to begin a new page of output. It is generally best
to leave pagination of the output to the formatting engine alone, but there will
always be occasions when it may be necessary to override these. An XML
processing instruction inserted into the document is one very simple
and effective way of doing this without interfering with other aspects
Expand Down
5 changes: 2 additions & 3 deletions P5/Source/Guidelines/en/WD-NonStandardCharacters.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1250,11 +1250,10 @@ precinct at Dodona. (L.H. Jeffery Archive)</head>
is reversed, and so is their individual orientation (in fact, we see them
<soCalled>from the back</soCalled>, as it were). <gi>seg</gi> elements
have been used here because these are clearly not <soCalled>lines</soCalled>
in the sense of poetic lines; the text is continuous prose, and linebreaks
are incidental.</p>
in the sense of poetic lines; the text is continuous prose, and the division into separate lines is incidental.</p>

<p>There are obviously some unsatisfactory aspects of this manner of encoding
boustrophedon. In the inscription above, some words run across linebreaks,
boustrophedon. In the inscription above, some words are split across two lines,
so if we wished to tag both words and the right-to-left phenomena, one
hierarchy would have to be privileged over the other. By using a transform
function rather than a writing mode property, we are apparently suggesting
Expand Down
3 changes: 1 addition & 2 deletions P5/Source/Specs/att.breaking.xml
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,7 @@ of any adjacent whitespace</desc>
</valList>
<exemplum xml:lang="en">
<p>In the following lines from the <title level="a">Dream of the Rood</title>,
linebreaks occur in the middle of the words <mentioned>lāðost</mentioned>
and <mentioned>reord-berendum</mentioned>.
the words <mentioned>lāðost</mentioned> and <mentioned>reord-berendum</mentioned> each start on one line and continue onto the next.
</p>
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="class-attr-breaking-egXML-il" xml:lang="ang">
<ab>
Expand Down
Loading
Loading