Description
The Problem
Raku currently very lightly wraps internal NQP
and VM operations for:
- getting Unicode properties (
uniprop
) - getting Unicode names (
uniname
/uninames
) - matching the first-or-only character of a string to a property or a property/property category pair (
unimatch
) - producing Unicode characters (
uniparse
) - producing numerical values from Unicode numbers (
unival
)
As currently implemented, these routines -- methods and subs alike -- do not perform their functions as well as they could.
Issues
-
uniprop
will return 0 if the property provided is not found. This would be better served with aFailure
orException
, or at the very least an empty string so that the return type is consistent. -
'Properties' and 'Property Categories' are served via the same mechanism, even though they are only interchangeable from category to property and not the other way around.
"w".unimatch("Latin", "Script") ==> say() # True "w".unimatch("Script", "Latin") ==> say() # False "w".uniprop("Script") ==> say() # Latin "w".uniprop("Latin") ==> say() # Latin "w".uniprop ==> say() # Ll
-
All of the methods which take properties could fail at compile time when provided with known nonexistent properties (ie, not stored in a variable).
- As of right now, there is zero feedback that the provided property is nonexistent. This was reported already [6 years ago](Unrecognized unicode properties shouldn't fail silently rakudo/rakudo#1496).
-
Properties are returned from NQP with spaces separating words, but only
_
separated property names are accepted going the other direction."e".uniprop("Block") # "Basic Latin" "e".uniprop("Basic Latin") # 0 "e".uniprop("Basic_Latin") # "Basic Latin"
use nqp; my $a = nqp::unipropcode("Basic Latin"), my $b = nqp::unipropcode("Basic_Latin"), $a, $b, $a == $b ==> say() # (0 6 False)
-
The current implementation is not [thread-safe](Unicode property methods/subs are not thread-safe rakudo/rakudo#4871). See also at [the MoarVM level](Unicode ops are critically thread-dangerous MoarVM/MoarVM#1717).
-
There are separate methods for "first character" (
uniprop
/uniname
) versus "full string" (uniprops
,uninames
). It feels far too Perl-ish at this point to accept a multi-character string as an argument that only operates ona single character.- Wouldn't it be reasonable to except
uniprops
to return a comprehensive list of applicable properties for a single character? Currently there is no way (that I am aware of) to get such a comprehensive list.
- Wouldn't it be reasonable to except
-
(Nit-level deficiency)
smashedtogetherlowercase
is an ugly (at worst) or a non-conformant (at best) naming scheme.
The solution
The instructions for problem-solving
state I'm not meant to spend any space on this in the initial post.
So for now, I'll just mention that -- at the HLL level -- I think we could achieve all current functionality as well as much more out of even a single method that utilizes adverbs.