PCRE2Project
diff --git a/‎ChangeLog
Lines changed: 7 additions & 2 deletions b/‎ChangeLog
Lines changed: 7 additions & 2 deletions
diff --git a/‎doc/html/pcre2api.html
Lines changed: 8 additions & 5 deletions b/‎doc/html/pcre2api.html
Lines changed: 8 additions & 5 deletions
diff --git a/‎doc/html/pcre2compat.html
Lines changed: 11 additions & 4 deletions b/‎doc/html/pcre2compat.html
Lines changed: 11 additions & 4 deletions
diff --git a/‎doc/html/pcre2pattern.html
Lines changed: 29 additions & 17 deletions b/‎doc/html/pcre2pattern.html
Lines changed: 29 additions & 17 deletions
diff --git a/‎doc/html/pcre2syntax.html
Lines changed: 12 additions & 6 deletions b/‎doc/html/pcre2syntax.html
Lines changed: 12 additions & 6 deletions
diff --git a/‎doc/html/pcre2unicode.html
Lines changed: 7 additions & 4 deletions b/‎doc/html/pcre2unicode.html
Lines changed: 7 additions & 4 deletions
@@ -130,13 +130,18 @@ includes underscore.
 
 33. Changed the meaning of [:xdigit:] in UCP mode to match Perl. It now also
 matches the "fullwidth" versions of the hex digits. Just like it is done for
-[:digit:], PCRE2_EXTRA_ASCII_DIGIT can be used to keep this class ASCII only.
+[:digit:], PCRE2_EXTRA_ASCII_DIGIT can be used to keep this class ASCII only
+without affecting other POSIX classes.
 
 34. GitHub PR305 fixes a potential integer overflow in pcre2_dfa_match().
 
-35. Updated handling of \b and \B in UCP mode to match the changes to \w in 32 
+35. Updated handling of \b and \B in UCP mode to match the changes to \w in 32
 above because \b and \B are defined in terms of \w.
 
+36. Within a pattern (?aT) and (?-aT) set and reset the PCRE2_EXTRA_ASCII_DIGIT
+option, and (?aP) also sets (?aT) so that (?-aP) disables all ASCII
+restrictions on POSIX classes.
+
 
 Version 10.42 11-December-2022
 ------------------------------
 
@@ -2014,13 +2014,16 @@ <h1>pcre2api man page</h1>
   PCRE2_EXTRA_ASCII_DIGIT
 </pre>
 This option forces the POSIX character classes [:digit:] and [:xdigit:] to
-match only ASCII digits, even when PCRE2_UCP is set.
+match only ASCII digits, even when PCRE2_UCP is set. It can be changed within
+a pattern by means of the (?aT) option setting.
 <pre>
   PCRE2_EXTRA_ASCII_POSIX
 </pre>
-This option forces the POSIX character classes to match only ASCII characters,
-even when PCRE2_UCP is set. It can be changed within a pattern by means of the
-(?aP) option setting.
+This option forces all the POSIX character classes, including [:digit:] and
+[:xdigit:], to match only ASCII characters, even when PCRE2_UCP is set. It can
+be changed within a pattern by means of the (?aP) option setting, but note that 
+this also sets PCRE2_EXTRA_ASCII_DIGIT in order to ensure that (?-aP) unsets
+all ASCII restrictions for POSIX classes.
 <pre>
   PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
 </pre>
@@ -4137,7 +4140,7 @@ <h1>pcre2api man page</h1>
 </P>
 <br><a name="SEC43" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 19 September 2023
+Last updated: 12 October 2023
 <br>
 Copyright &copy; 1997-2023 University of Cambridge.
 <br>
 
@@ -71,9 +71,10 @@ <h1>pcre2compat man page</h1>
 7. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
 built with Unicode support (the default). The properties that can be tested
 with \p and \P are limited to the general category properties such as Lu and
-Nd, script names such as Greek or Han, Bidi_Class, Bidi_Control, and the
-derived properties Any and LC (synonym L&). Both PCRE2 and Perl support the Cs
-(surrogate) property, but in PCRE2 its use is limited. See the
+Nd, the derived properties Any and LC (synonym L&), script names such as Greek
+or Han, Bidi_Class, Bidi_Control, and a few binary properties. Both PCRE2 and
+Perl support the Cs (surrogate) property, but in PCRE2 its use is limited. See
+the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 documentation for details. The long synonyms for property names that Perl
 supports (such as \p{Letter}) are not supported by PCRE2, nor is it permitted
@@ -239,6 +240,12 @@ <h1>pcre2compat man page</h1>
 fall into any stack-overflow limit. PCRE2 made a similar change at release
 10.30, and also has many build-time and run-time customizable limits.
 </P>
+<P>
+21. Unlike Perl, PCRE2 doesn't have character set modifiers and specially no way
+to set characters by context just like Perl's "/d". A regular expression using
+PCRE2_UTF and PCRE2_UCP will use similar rules to Perl's "/u"; something closer
+to "/a" could be selected by adding other PCRE2_EXTRA_ASCII* options on top.
+</P>
 <br><b>
 AUTHOR
 </b><br>
@@ -254,7 +261,7 @@ <h1>pcre2compat man page</h1>
 REVISION
 </b><br>
 <P>
-Last updated: 19 September 2023
+Last updated: 12 October 2023
 <br>
 Copyright &copy; 1997-2023 University of Cambridge.
 <br>
 
@@ -1521,7 +1521,7 @@ <h1>pcre2pattern man page</h1>
 The default "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),
 and space (32). If locale-specific matching is taking place, the list of space
 characters may be different; there may be fewer or more of them. "Space" and
-\s match the same set of characters.
+\s match the same set of characters, as do "word" and \w.
 </P>
 <P>
 The name "word" is a Perl extension, and "blank" is a GNU extension from Perl
@@ -1538,15 +1538,15 @@ <h1>pcre2pattern man page</h1>
 By default, characters with values greater than 127 do not match any of the
 POSIX character classes, although this may be different for characters in the
 range 128-255 when locale-specific matching is happening. However, in UCP mode,
-some of the classes are changed so that Unicode character properties are used.
-This is achieved by replacing certain POSIX classes with other sequences, as
-follows:
+unless certain options are set (see below), some of the classes are changed so
+that Unicode character properties are used. This is achieved by replacing
+POSIX classes with other sequences, as follows:
 <pre>
   [:alnum:]  becomes  \p{Xan}
   [:alpha:]  becomes  \p{L}
   [:blank:]  becomes  \h
   [:cntrl:]  becomes  \p{Cc}
-  [:digit:]  becomes  \p{Nd}  unless PCRE2_EXTRA_ASCII_DIGIT is set
+  [:digit:]  becomes  \p{Nd}
   [:lower:]  becomes  \p{Ll}
   [:space:]  becomes  \p{Xps}
   [:upper:]  becomes  \p{Lu}
@@ -1581,16 +1581,20 @@ <h1>pcre2pattern man page</h1>
 <P>
 [:xdigit:]
 In addition to the ASCII hexadecimal digits, this also matches the "fullwidth"
-versions of those characters, whose Unicode code points start at U+FF10. The
-effect of PCRE2_UCP can be negated by setting the PCRE2_EXTRA_ASCII_DIGIT
-option, just like it does for [:digit]. This is a change that was made in
-PCRE release 10.43 for Perl compatibility.
+versions of those characters, whose Unicode code points start at U+FF10. This
+is a change that was made in PCRE release 10.43 for Perl compatibility.
 </P>
 <P>
 The other POSIX classes are unchanged by PCRE2_UCP, and match only characters
-with code points less than 256. The effect of PCRE2_UCP on all POSIX classes
-can be negated by setting the PCRE2_EXTRA_ASCII_POSIX option, either when
-calling <b>pcre2_compile()</b> or internally within the pattern.
+with code points less than 256. 
+</P>
+<P>
+There are two options that can be used to restrict the POSIX classes to ASCII
+characters when PCRE2_UCP is set. The option PCRE2_EXTRA_ASCII_DIGIT affects
+just [:digit:] and [:xdigit:]. Within a pattern, this can be set and unset by
+(?aT) and (?-aT). The PCRE2_EXTRA_ASCII_POSIX option disables UCP processing
+for all POSIX classes, including [:digit:] and [:xdigit:]. Within a pattern,
+(?aP) and (?-aP) set and unset both these options for consistency.
 </P>
 <br><a name="SEC11" href="#TOC1">COMPATIBILITY FEATURE FOR WORD BOUNDARIES</a><br>
 <P>
@@ -1609,7 +1613,9 @@ <h1>pcre2pattern man page</h1>
 <a href="#smallassertions">"Simple assertions"</a>
 above), and in a Perl-style pattern the preceding or following character
 normally shows which is wanted, without the need for the assertions that are
-used above in order to give exactly the POSIX behaviour.
+used above in order to give exactly the POSIX behaviour. Note also that the 
+PCRE2_UCP option changes the meaning of \w (and therefore \b) by default, so 
+it also affects these POSIX sequences.
 </P>
 <br><a name="SEC12" href="#TOC1">VERTICAL BAR</a><br>
 <P>
@@ -1643,8 +1649,8 @@ <h1>pcre2pattern man page</h1>
 </pre>
 For example, (?im) sets caseless, multiline matching. It is also possible to
 unset these options by preceding the relevant letters with a hyphen, for
-example (?-im). The two "extended" options are not independent; unsetting either
-one cancels the effects of both of them.
+example (?-im). The two "extended" options are not independent; unsetting
+either one cancels the effects of both of them.
 </P>
 <P>
 A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
@@ -1665,7 +1671,8 @@ <h1>pcre2pattern man page</h1>
   aD for PCRE2_EXTRA_ASCII_BSD
   aS for PCRE2_EXTRA_ASCII_BSS
   aW for PCRE2_EXTRA_ASCII_BSW
-  aP for PCRE2_EXTRA_ASCII_POSIX
+  aP for PCRE2_EXTRA_ASCII_POSIX and PCRE2_EXTRA_ASCII_DIGIT
+  aT for PCRE2_EXTRA_ASCII_DIGIT
   r  for PCRE2_EXTRA_CASELESS_RESTRICT
   J  for PCRE2_DUPNAMES
   U  for PCRE2_UNGREEDY
@@ -1675,6 +1682,11 @@ <h1>pcre2pattern man page</h1>
 above, it sets (or unsets) all the ASCII options.
 </P>
 <P>
+PCRE2_EXTRA_ASCII_DIGIT has no additional effect when PCRE2_EXTRA_ASCII_POSIX 
+is set, but including it in (?aP) means that (?-aP) suppresses all ASCII 
+restrictions for POSIX classes.
+</P>
+<P>
 When one of these option changes occurs at top level (that is, not inside group
 parentheses), the change applies until a subsequent change, or the end of the
 pattern. An option change within a group (see below for a description of
@@ -3832,7 +3844,7 @@ <h1>pcre2pattern man page</h1>
 </P>
 <br><a name="SEC32" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 04 October 2023
+Last updated: 12 October 2023
 <br>
 Copyright &copy; 1997-2023 University of Cambridge.
 <br>
 
@@ -391,10 +391,11 @@ <h1>pcre2syntax man page</h1>
 of the group.
 <pre>
   (?a)            all ASCII options
-  (?aD)           restrict \d to ASCII, even in UCP mode
-  (?aS)           restrict \s to ASCII, even in UCP mode
-  (?aW)           restrict \w to ASCII, even in UCP mode
-  (?aP)           restrict POSIX classes to ASCII even in UCP mode
+  (?aD)           restrict \d to ASCII in UCP mode
+  (?aS)           restrict \s to ASCII in UCP mode
+  (?aW)           restrict \w to ASCII in UCP mode
+  (?aP)           restrict all POSIX classes to ASCII in UCP mode
+  (?aT)           restrict POSIX digit classes to ASCII in UCP mode
   (?i)            caseless
   (?J)            allow duplicate named groups
   (?m)            multiline
@@ -404,9 +405,14 @@ <h1>pcre2syntax man page</h1>
   (?U)            default ungreedy (lazy)
   (?x)            ignore white space except in classes or \Q...\E
   (?xx)           as (?x) but also ignore space and tab in classes
-  (?-...)         unset option(s)
+  (?-...)         unset the given option(s)
   (?^)            unset imnrsx options
 </pre>
+(?aP) implies (?aT) as well, though this has no additional effect. However, it 
+means that (?-aP) is really (?-PT) which disables all ASCII restrictions for 
+POSIX classes.
+</P>
+<P>
 Unsetting x or xx unsets both. Several options may be set at once, and a
 mixture of setting and unsetting such as (?i-x) is allowed, but there may be
 only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
@@ -620,7 +626,7 @@ <h1>pcre2syntax man page</h1>
 </P>
 <br><a name="SEC32" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 30 September 2023
+Last updated: 12 October 2023
 <br>
 Copyright &copy; 1997-2023 University of Cambridge.
 <br>
 
@@ -52,9 +52,12 @@ <h1>pcre2unicode man page</h1>
 \P{..}, and \X can be used. This is not dependent on the PCRE2_UTF setting.
 The Unicode properties that can be tested are a subset of those that Perl
 supports. Currently they are limited to the general category properties such as
-Lu for an upper case letter or Nd for a decimal number, the Unicode script
-names such as Arabic or Han, Bidi_Class, Bidi_Control, and the derived
-properties Any and LC (synonym L&). Full lists are given in the
+Lu for an upper case letter or Nd for a decimal number, the derived properties
+Any and LC (synonym L&), the Unicode script names such as Arabic or Han,
+Bidi_Class, Bidi_Control, and a few binary properties.
+</P>
+<P>
+The full lists are given in the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 and
 <a href="pcre2syntax.html"><b>pcre2syntax</b></a>
@@ -510,7 +513,7 @@ <h1>pcre2unicode man page</h1>
 REVISION
 </b><br>
 <P>
-Last updated: 04 February 2023
+Last updated: 12 October 2023
 <br>
 Copyright &copy; 1997-2023 University of Cambridge.
 <br>