Skip to content

pp_ref() builtin_pp_reftype(): strlen()+Newx()+memcpy()->100% pre-made COWs #23391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: blead
Choose a base branch
from

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Jun 29, 2025

  • ref() PP keyword has extremely high usage. Greping my blead repo shows:
    Searched "ref(" 4347 hits in 605 files of 5879 searched

  • The strings keyword ref() returns are part of the Perl 5 BNF grammer.
    This is not up for debate. Changing their spelling or lowercasing them
    is not for debate, or i18n-ing them dynamically realtime against
    glibc.so's current OS process global locale is not up for debate or
    wiring, or wiring inotify/kqueue into the runloop to monitor /etc or /var
    so this race condition works as designed in a unit test:

    $perl -E "dire('hello')"
    Routine indéfinie &cœur::dire aufgerufen bei -e Zeile 1
    
  • sv_reftype() and sv_ref() have very badly designed prototypes, and the
    first time a new Perl in C dev reads their source code, they will think
    these 2 will cause infinite C stack recursion and a SEGV. Probably most
    automated C code analytic tools will complain these 2 functions do
    infinite recursion too.

  • The 2 functions don't return a string length, forcing all callers to
    execute a libc strlen() call on a string, that could be 8 bytes, or 80 MB.

  • The 2 functions don't split, parse, cat, or glue multiple strings to
    create their output. All null term-ed strings that they return, are
    already sitting in virtual address space. Either const HW RO, or
    RCed HEK*s from the PL_strtab pool, that were found inside something
    similar to a GV*/HV*/HE*/CV*/AV*/GP*/OP*/SV* in a OP* (no threads).

  • COW 255 buffers from Newx() under 9 chars can't COW currently by policy.
    CODE is 4, SCALAR is 6. HASH is 4. ARRAY is 5. But very short SV HEK* COWs
    will COW propagate without problems.

  • PP code if(ref($self) eq 'HASH') {} should never involve all 3-4 calls
    Newx()/Realloc()/strlen()/memcpy().

    So this fix all of this, and make pp_ref()/PP KW ref() be closer in speed
    to C/C++/Asm style object type checking, which is almost always going to
    be 1 or 2 or 3 ptr equality tests against C constant &sum_vtbl_sum_class,
    or in Microsoft ecosystem SW, its a equality test of a 16 byte GUID in
    memory, against a 16 byte SSE literal stored in a SSE opcode (TLDR ver).
    Just convert backends sv_ref()/sv_reftype() to HEK* retvals, and convert
    the front end pp_*() ops to fetch HEK*s and return SV*s with
    POK_on SvPVX()== HEK*. In all likely hood, if right side of PP code is
    if (ref($self) eq 'HASH') {}, during the execution of
    memcpy(pv1, pv2, len) as part of pp_eq, pv1 and pv2 are the same mem addr.
    But I didn't single step eq operator to verify that yet.

  • inside PP(pp_reftype) previously the branch sv_setsv(TARG, &PL_sv_undef);
    did not fire SMG, after this commit it does, IDK why it wasnt firing
    before, or consequences of SMG firing now on sv_set_undef(rsv); path.

  • I suspect "sv_setsv(TARG, &PL_sv_undef);" and "sv_set_undef(rsv);" are
    not perfect behavior copies of each other, in extreme/bizzare/user error
    and bad CPAN XS code situtations but I haven't found any side effects of
    the switch from sv_setsv(TARG, &PL_sv_undef); to sv_set_undef(rsv)

    Untested typothetical cases like
    sv_setsv(gv_star, &PL_sv_undef); sv_setsv(hv_star, &PL_sv_undef);
    sv_setsv(svt_regexp_star, &PL_sv_undef);
    sv_setsv(svt_invlist_star, &PL_sv_undef);
    sv_setsv(svt_object_star, &PL_sv_undef);
    sv_setsv(svt_io_star, &PL_sv_undef);

  • sv_sethek() has a severe pathologic performance problem, if args
    SV* dsv and HEK* src_hek, test true for

    if(SvPVX(dsv) == HEK_KEY(src_hek)) {}.
    

    But its still better than a strlen()/Newx()/memcpy()/push_save_stack()/
    delayed_Safefree(); cycle. Any fix for this would be for the future.

  • these 2 functions are experimental for now, hence undocumented and not
    public API, if they are made public, arg const int ob should be removed
    because of its confusing faux-infinite recursion but not real life
    infinite recursion. The fuctions are exported so P5P hackers and
    CPAN XS devs (unsanctioned by P5P) can benchmark and research these 2 new
    functions using Inline::C/EU::PXS.

  • future improvements not done here, make sv_reftype() and sv_ref() wrappers
    around their HEK* counterparts. Note the HEK* must be RC++ed and stuffed
    in a new SV*, or a PAD TARG SV*, before the rpp_replace_1_1_NN(TARG); call
    because in artificial situations/fuzzing, strange things can happen during
    a SvREFCNT_dec_NN(); call, and the HEK* sitting in a C auto might
    get freed during the SvREFCNT_dec_NN();

  • another improvement, sv_sethek(rsv, hek); is somewhat heavy, and doesn't
    have a shortcut, to RC-- an existing SVPV HEK* COW itself, instead it
    uses SV_THINKFIRST_***() and sv_force_normal***() to RC-- an existing
    SVPV HEK* COW. If the SV* PAD TARG, is being used over and over by ref()
    opcode, its always going to have a stale HEK* SVPVX() that needs to be
    RC--ed.

  • another improvement, check if(sv_reftypehek() == SvPVX(targ)) before
    calling sv_sethek(rsv, hek);

  • another improvement, beyond scope for me, make into 1 OP*/opcode:

    if(ref($self) eq 'HASH')
    

    and

    if(ref($self) eq 'ARRAY')
    
  • another improvement, dont deref my_perl->Iop/PL_ptr many times in a row.
    I didn't do any CPU opcode/instruction stripping in this commit. Thats
    for a future commit.

  • another improvement, investigate if most of large switch() inside
    Perl_sv_reftypehek() can be turned into a
    const I8 arr_of_PL_sv_consts_idxs[]; with a couple tiny special cases.

  • todo invert if (!rsv) { branch, so hot path (yes cached in PL_sv_consts).
    comes first in machine code/asm order.


  • This set of changes requires a perldelta entry, and I need help writing it.

@iabyn
Copy link
Contributor

iabyn commented Jun 30, 2025 via email

@bulk88
Copy link
Contributor Author

bulk88 commented Jun 30, 2025

On Sun, Jun 29, 2025 at 03:56:07PM -0700, bulk88 wrote: -ref() PP keyword has extremely high usage. Greping my blead repo shows:
[ snip 100 further lines] Please try to use meaningful commit summary lines and messages.

All tech decisions are documented with rational. Read them bullet point by bullet point.

If I am the only Subject Matter Expert who knows the Perl VM C code, I can't really help out a React JSX SME or Go SME guru who tries to review the Perl C VM code. At that point I would have to offer a 6 hour pre-conference class at a TPRC or YAPCEU event on P5 VM C level design/optimization/O(n) complexity of interp internals to my students. Not a joke.

I tried to read the commit message. I had no idea what what the commit was about, apart from something to do with a badly designed sv_ref/sv_reftype API perhaps?. Looking at the actual diff I guess the commit is about adding two new functions, sv_refhek and sv_reftypehek and then making use of them to speed up pp_ref() etc.? And perhaps adding some new SV constants?

Correct. I didn't invent SV_CONST()/PL but my eyes are telling me it has design has similarities to 1990s/2000s era Spidermonkey's analogs of SV* POK Newx() vs SV* POK HEK* COW vs SV* POK SVppv_STATIC COW.

Current sv_ref/sv_reftype's protoype/fn signature is atrocious. Why do those 2 functions not return the string length to the caller through any mechanism? Whoever typed that in and saved it, I would never hire that person to work as an IT employee.

Utf8 isn't original to P5, but those 2 can't return a yes/no utf8 flag either. Also the backing storage and lifetime of those char*s is undefined according to public API AFAIK. Clearly anyone can look at the source code and see the 1 word all upper case char*s are C "" lit strings, and the "::" strings are HEK*s, but that is reverse engineering/non-public API.

Returning HEKs always with the 2 new fns fixes pretty much every design problem I can think of. Returning new SV heads with RC=1, or new SV heads with RC=1+mortal, or accepting an in SV* to set, I believe is alot of unnecessary overhead, since those SV heads and SVPV bodies would be constantly alloced then dtored in the caller frame, or at next save_stack or mortal_stack boundary, and 50% of the caller's of sv_ref/sv_reftype are printf() style functions that want a "%s" or "%" SVf or or "%" HEKf ptr for a very brief moment in time. They aren't interested in long term RC++ed storage of the string. But if the caller wants long term RC++ storage, they can get it very quickly and with COW benefits by calling sv_sethek() or newSVhek().

Also I decided returning the global/permanent SV*s, back to callers is a bad idea, I would have to mark the SV*s SvREADONLY(), and there is a risk of SvREADONLY() marked SV* winding up inside an AV* or inside a HE* without a high level PP level copy/assignment op (aka newSVsv()) to strip the SvREADONLY()-ness flag.

$pvref = \ref($self);
${$pvref} .= ' class is unknown.'; 
die ${$pvref};

Now what? Line 2 fatal errored. But if its a SVPV holding a HEK, it is silently decowed on line 2 without problems. Thats why the new API returns HEKs and doesn't use SV APIs.

The SpiderMonkey JS engine's src code's initial commit is 1 year or max 2 years, after Perl 5's initial commit. So SM JS engine and Perl 5 engine are the same exact age. Since Netscape's/Mozilla's/Firefox's JS engine is very well used, tried, true, and tested for decades, borrowing design choices from it, can not be a bad idea.

Perl's PL_sv_consts is way too short. This PR improves the situation. This PR is a small step in a hot code path towards the goal of solving this meta bug I made #22872

Rest of this is FF JS VM vs P5 VM management of CC/link time constants and how they appear on a C runtime level and at a ECMAScript/PP level.

Spidermonkey calls them "Atoms", Perl calls them "HEK *"s or "U32 hash"s. Spidermonkey uses words like "Pinned" and "JSExternalString", to mean Perl's SVppv_STATIC or RO or RW C data globals.

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/String.h

Here is a list of what Spidermonkey says are critical "" string/token/identifier literals that are required to run the JS engine.

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/CommonPropertyNames.h

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/Keywords.h

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/jsatom.cpp#L56

Spidermonkey Immortals

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/Id.cpp
https://github.com/ricardoquesada/Spidermonkey/blob/master/js/public/Id.h#L149
https://github.com/ricardoquesada/Spidermonkey/blob/master/js/public/Value.h#L1985

Spidermonkey has C global RW HEK*s structs baked into the engine (libperl.so or libspidermonkey.so) at CC time.

https://github.com/ricardoquesada/Spidermonkey/blob/4a75ea2543408bd1b2c515aa95901523eeef7858/js/src/jsatom.cpp#L268

Notice Spidermonkey has 1 byte long (latin 1) Immortal SV*s for 0-9 A-Z and a-z, and IDK if im reading the code right, but they also have a array of (26+26+10) x (26+26+10)= 3844 immortal SV*s covering all 2 byte permutations of ( 0-9 A-Z and a-z) x (0-9 A-Z and a-z). This would allow C-like speed in SM JS or Perl char by char string processing with PP substr(), or C-like speed and C-like memory usage, for splitting a SVPV* into an AV* of 1 byte long SVPV*s.

Currently in Perl, splitting a SVPV into an AV* has a 8+24+16+16=64 bytes per 1 original char expansion ratio overhead.
If some permutation, or all permutations, of lower 7 bit or high ASCII Latin-1, printable and/or unprintable, \w, \d, ., \s, were SV* IMMs, the expansion ratio of splitting a SVPV into a AV* would just be 8 bytes per original 1 byte.

https://github.com/ricardoquesada/Spidermonkey/blob/4a75ea2543408bd1b2c515aa95901523eeef7858/js/src/vm/String.cpp#L710

8+24+16+16=64 bytes, detailed math: 8 SV* in AV* + 24 SV head + 16 XPV body + 8 OS malloc header + 16 min buf alloc rule of newSVpvn = 72 bytes

offtopic: stolen buzzword/tech word from Perl VM lol https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/SelfHosting.cpp#L247
https://github.com/ricardoquesada/Spidermonkey/blob/master/js/public/Value.h#L313

Not how SM burns in/attaches/binds XSUBs or does its newXS(); calls. They are const RO arrays of structs, not a long list of serial fn calls in machine code with 2-5 args the way BOOT:{} and EU::PXS do it.

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/SelfHosting.cpp#L793

SM's analog of CV* heads/ CV* body structs are stored in const RO C global memory mmaped/disk backed memory, unlike Perl which uses no-malloc-header-bloat arena pool slots from malloc() memory.

https://github.com/ricardoquesada/Spidermonkey/blob/4a75ea2543408bd1b2c515aa95901523eeef7858/js/src/jsapi.h#L2428

Here is your (davem's) short string experiment perl branch , as production code in SM

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/String-inl.h#L46

I think machine integers 0-99, can be converted to base 10 RCed ASCII string objects, in O(1) time, its just a single pointer dereference to turn ints 0-99 or 0-256, into JS VM ASCII string objects

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/String.h#L1043

This is for another ticket, but SM decided on > 1/4th unused space, or 75% mark, to do a realloc() to shrink operation. Perl's current logic is much much more complicated for deciding when to COW, deCOW, and do shrinking realloc().

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/StringBuffer.cpp#L30

offtopic, the JS stack, internally is the OS's C stack with some tiny Asm tricks, generic RISC and stack grows up HPUX PARISC compliant https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/jsnativestack.cpp

The commit also seems to have snuck in an unrelated change to pp_const().

It is unrelated, but a tiny meaningless change, not worth a PR on its own, and then 2 lines long commit in the P5P repo.

I can BP that line now to see what is inside the SV*. It makes no machine code difference in -O1/-O2 before and after. But I can now set a BP on the line, and see what is inside the SV* struct. If someone doesn't like the change, it means they don't know what a C debugger is, or how to use one, and can't call themselves a professional C dev if the only C level diag tool they know how to use is printf().

@xenu
Copy link
Member

xenu commented Jul 1, 2025

Let me summarise the above:

"I am very smart."

@iabyn
Copy link
Contributor

iabyn commented Jul 1, 2025 via email

@bulk88
Copy link
Contributor Author

bulk88 commented Jul 2, 2025

Ok, I will try once again to explain what I mean in really simple terms.

A commit message often has three parts:

  1. the subject line;

  2. an initial paragraph or two describing in general terms the purpose of
    the commit;

  3. further paragraphs to explain in more detail what the problem was and
    how it was fixed.

These three are potentially for different audiences.

(1) is so that, for example, people bisecting, preparing perldeltas etc
can get a quick idea of the general category of this commit. It may be
read by people who have no particular interest in, or expertise in this
area.

For example a commit subject message of:

"add sv_refhek, sv_reftypehek functions for better ref() performance"

"better ref() performance" is a side effect of this commit. My original hacking attempt 9 months ago ago was inspired by how bad EU::PXS's/P5P's official my $self = $_[0]; statement converted to C lang/XS lang is, for all CPAN XS authors

if (sv_isa($arg, "$ntype")) {

if (SvROK($arg) && sv_derived_from($arg, "$ntype")) {

and

S_sv_derived_from_svpvn(pTHX_ SV *sv, SV *namesv, const char * name, const STRLEN len, U32 flags)

Perl_sv_derived_from_sv(pTHX_ SV *sv, SV *namesv, U32 flags)

and I'm annoyed at having to always copy-paste reimplement those templates with my own code in every single CPAN or private XS module. The 9 months ago hacking attempt branch started to modify ParseXS.pm itself, then I realized it was getting too complex, and ParseXS.pm itself is unstable/greenfield-forklift-in-progress, and that PR is unlikely to be merged so I abandoned it.

I recently got annoyed again at sv_reftype() while single stepping, and decided to extract the non-ParseXS.pm related C code from the "ParseXS.pm no more sv_isobject(self)+sv_derived_from() branch"

and even more annoyed at this copy-paste if (!sv_isobject(self) || !sv_derived_from(self, "Digest::SHA")) that is all over CPAN, and some parts of p5p/.git

#ifndef INT2PTR
#define INT2PTR(p, i) (p) (i)
#endif

#define MAX_WRITE_SIZE 16384
#define IO_BUFFER_SIZE 4096

static SHA *getSHA(pTHX_ SV *self)
{
	if (!sv_isobject(self) || !sv_derived_from(self, "Digest::SHA"))
		return(NULL);
	return INT2PTR(SHA *, SvIV(SvRV(self)));
}

MODULE = Digest::SHA		PACKAGE = Digest::SHA

Since sv_refhek(), sv_reftypehek() need secret private-api actual usage by perl5xx.dll first, before any talk of making EU::PXS aware of them, or making CPAN authors aware of them, pp_ref()/pp_builtin_ref() was the perfect place to use the new fn calls. And thats how this PR's 1st revision was created.

shows, at a casual glance, the two main bullet points of the commit: that
it's adding new functions, and that it's to address a particular
performance issue.

The proposed commit message of

"pp_ref() builtin_pp_reftype(): strlen()+Newx()+memcpy()->100% pre-made COWs"

says nothing about what the commit actually does, and is entirely cryptic
as to what issue it is addressing.

I will assume anyone with a commit bit has read https://perldoc.perl.org/perlguts and https://perldoc.perl.org/perlxs atleast once in their life, even if they have never clicked "Save" on a .c/.xs file. C programming requires a Firearms License. I'll assume all the important people at P5P have one.

HTML5/AJAX/DHTML/DOM are child safe programing languages. C is not. Even if you have never been at a gun range/actual training, every dev of any high level lang should know to ignore it (C), or stay away from C code when they see C code, or start writing complaints in forums and bug trackers. Any adult knows not to pick up a gun and "hey is this thing real?" and pull the trigger. Guns don't come with yellow and orange labels safety labels on them. .c commits don't come with yellow and orange labels safety labels on them. People just know not to touch them when they see them, unless they know what they are doing.

Just like I can't read a single optree or regexp engine related commit, PP devs can't read a single .c/.xs related commit. Thats just life.

I can read Python, I've never clicked the "Save" button in my life. I know not to try it. Beyond my skills. I don't expect the 50th percentile of PP-only devs to understand my commit message. And PP-only devs aren't capable of break-pointing C code or fixing it, or maintaining it. They aren't the audiance. If they read https://perldoc.perl.org/perlguts and https://perldoc.perl.org/perlxs top to bottom, atleast once in their life, they will understand enough of the commit message, even if they can't judge, debug, criticise, or modify the code.

My original title was very clear explaining what the commit does:

pp_ref() = P5P word
builtin_pp_reftype() = P5P word
strlen() = CS101 C lang class @ any community college
Newx() = P5P word
memcpy() = CS101 C lang class @ any community college
100% pre-made = my words (the benefit)
COWs = basic Unix/Linux word, fork is not an eating utensil

I will revise the commit title to "add sv_refhek, sv_reftypehek functions for better ref() performance" since I don't have any reason not to, if other people with other eyeballs think that sounds better, then it is. I know I'm writing for someone else to read this in the future after I'm dead/bussed. I'm not writing it for me to read later, but someone like me in the future to know what I did and why, and what lead to the choices I made long ago, of all possible choices I had available to me at that time in history.

In the second part, you can mention briefly that the existing sv_ref() and
sv_reftype() functions have a poorly designed interface and why this
is inefficient, and how the new functions address this issue.

Then in the remainder of the commit message you can (but only if
necessary) address at great length details which aren't immediately
apparent from the commit diff, such as benchmark results, disassembly
listings, etc.

Where is the split or balance between .git repo message bloat, and accurate tech info to know what the original author was thinking at the time, years ago in history?

The todo list can definetly be stripped from my original commit message, and just left in this PR, with a short sentence "Go read the PR associated with this commit, for info about things not fixed in this commit, and todo ideas".

As regards the pp_const() change: even although it's a small change, it
should be done as a separate commit, with it's own commit message. That
message will explain why that change is useful. And will stop people who
are examining the main commit trying to understand why pp_const() needs
changing to address the ref() performance issue.

Agree.

@richardleach
Copy link
Contributor

Is the use of HEKs like this the ideal approach?

For example, would we get better mileage in the long run out of something like an additional (I hear the groans) COW implementation for internal core consts (like the ref constants) that can also hold the string values of POK CONST OPs? (i.e. If some Perl code uses the same constant more than once, possibly more than 255 times across a large app, there's only one actual string buffer.)

Or perhaps more focussed on reftypes, should they be implemented as SVf_IsCOW|SVppv_STATIC SVs?

If not, what are the downsides or blockers to those alternatives?

@bulk88
Copy link
Contributor Author

bulk88 commented Jul 2, 2025

Let me summarise the above:

"I am very smart."

Some people say that about me, but I don't think I am. PCs only do what their owners tell them to do. And they are damn good at it. My degree says EE, not CS. Material properties, entropy, solar noise, stuff over heating, stuff rusting, stuff UV rotting, stuff cracking/abrasion/flex damage, catching fire, estimating service life, human factors (end users, and even all train professionals make mistakes, how many redundancies are in the design?).

Intel/AMD/ARM chips are created by humans. The veins in a tree leaf or flower petal, or pattern of fur on your cat, are not created by humans. Because humans created Intel/AMD/ARM chips, they only do what people tell them to do. If it wasn't you, some other man or woman sat down, and sketched out the blueprints/code/CAD/FEA, to make your melted nugget of 1/4 part beach sand and 3/4 parts copper https://download.intel.com/newsroom/kits/chipmaking/pdfs/Sand-to-Silicon_32nm-Version.pdf into an Intel Xeon CPU. Your not any better or worse than the other person. You can learn enough to be the other person if you want. Because they did.

Some books I own: click to expand, way too off topic. https://www.amazon.com/Manga-Guide-Electricity-Kazuhiro-Fujitaki/dp/1593271972 I think I met the book's author in person at a convention/conference and bought it when it came out in 2010. Good read.

still on my shelf at home from when I was a kid
attachment (2)

@iabyn
Copy link
Contributor

iabyn commented Jul 2, 2025 via email

@bulk88
Copy link
Contributor Author

bulk88 commented Jul 2, 2025

Is the use of HEKs like this the ideal approach?

They are the ONLY solution. The entirety of Perl 5 lang's package keyword, % token, and :: token, runs off HEK* structs. You can't be seriously proposing to remove all of the following: bless keyword, Perl 5.000 alpha OOP class/package namespace system, Perl's associate array data type called the HV*/HE*/%/$ref->{key} from the P5 VM.

HEK*s are appropriate for anything \w+ style string that will eventually become any 1 of these 4

  • a CV*
  • a GV*
  • a hash ->{key}
  • a PP sub {}
  • a ->method();
  • a package var/our var

HEK* are NOT appropriate for certain end user data, atleast not from a P5P dev/team view, or not from a CPAN XS author viewpoint. Examples of HEK* being NOT appropriate:

  • SVIVs like current time, customer balances, counters
  • SVNV floating point numbers with a . dot
  • SVPVs with raw unprintable binary and full of 0-32 control code bytes
  • SVPVs that will never be identifiers in any XML/JSON/YAML/PROTOBUF/Storable/Cap'n Proto/HV*/PP src code/any QWERTY keyboard typeable programing langguage
  • no customer street addresses (high specialized CPAN XS or private biz logic XS modules excluded)
  • product names
  • telephone numbers (high specialized CPAN XS or private biz logic XS modules excluded)

Basically any =~ /^\w+$/; written in any source code in any prog land is a good candidate for being stuffed into a HEK*, things coming from a SQL DB, TCPIP stream, or CSV file, basically never. Column names excluded, but those are a Schema/IDL/src code identifier constants, they aren't customer data.

If I find any char * ptr buffer entering hv_common() as arg char * key or arg SV * keysv, you lost the game. Its going to become a RCed HEK* object inside PL_strtab whether you like it or not. After it becomes a RCed HEK* object inside PL_strtab, everyone should take advantage of that COWable, U32 hash precomputed, no-memcmp() function calls ever, HEK* object to the maximum possible. Perl's HEK* object is an identical clone of ECMAScript's Immutable String() class/object type, but with the benefit of a U32 hash struct member con catted on forever,

For example, would we get better mileage in the long run out of something like an additional (I hear the groans) COW implementation for internal core consts (like the ref constants) that can also hold the string values of POK CONST OPs? (i.e. If some Perl code uses the same constant more than once, possibly more than 255 times across a large app, there's only one actual string buffer.)

Your walking on thin ice. The technical debate, if COW 255 obj's RC reaches == 255, then "forever-pin" the Newx backed buffer, for the rest of the perl PID's lifetime, has gotten formerly active P5P devs banned by P5P moderators previously in history, see https://perldoc.perl.org/perlpolicy#STANDARDS-OF-CONDUCT

I have no engineering comment on that debate if to forever process pin to "faux-C static storage" any Newx() backed COW 255 object that naturally reaches RC == 255 in the runloop.

If I read your sentence of COW implementation for internal core consts (like the ref constants) that can also hold the string values of POK CONST OPs? the other way, meaning you want to store upto 4/8/16/32 byte long char arrays[] members in OP structs. I would hand-clap for that. 8 bytes is alot of room. Why on earth should Perl_pp_seq() and Perl_pp_eq() be derefing my_perl->Istack_sp->sv_any->xiv_u.xivu_iv to learn a PP src code const literal integer that can not be legally rewritten in the run loop?

How about *(IV*)(&my_perl->Iop->op_last_field_of_struct_UNOP_called_op_first)? 2 mem reads, not 3 fields. 100% guarantee swear on a religious book the entire contents struct UNOP @ my_perl->Iop is already sitting in L1 cache/GPR registers, before the 1st CPU instruction of Perl_pp_eq() executes :-D

Here is a dump of -O1 PP keyword == from my perl42.dll.

op * Perl_pp_eq(interpreter *my_perl)
{
  signed int cmp_rel_bool; // ebx
  sv **sp_svp; // rdx
  sv *right; // r8
  sv *left; // r9
  int left_and_right_svflags; // eax
  bool is_eq; // zf
  int ret_cmp_rel; // eax

  cmp_rel_bool = 0;
  if ( !(((*my_perl->Istack_sp)->sv_flags | *(_DWORD *)(*((_QWORD *)my_perl->Istack_sp - 1) + 12i64)) & 0x200800)
    || !Perl_try_amagic_bin(my_perl, 21, 16) ) {
    sp_svp = my_perl->Istack_sp;
    right = *my_perl->Istack_sp;
    left = (sv *)*((_QWORD *)my_perl->Istack_sp - 1);
    left_and_right_svflags = left->sv_flags & right->sv_flags;
    if ( _bittest(&left_and_right_svflags, 8u) && ((left->sv_flags | right->sv_flags) & 0x80000000) == 0 ) {
      is_eq = left->sv_any->xiv_u.xivu_iv == right->sv_any->xiv_u.xivu_iv;
    }
    else {
      if ( _bittest(&left_and_right_svflags, 9u) ) {
        if ( left->sv_any->xnv_u.xnv_nv == right->sv_any->xnv_u.xnv_nv )
          cmp_rel_bool = 1;
        goto LABEL_11;
      }
      ret_cmp_rel = Perl_do_ncmp(my_perl, *((sv *const *)my_perl->Istack_sp - 1), right);
      sp_svp = my_perl->Istack_sp;
      is_eq = ret_cmp_rel == 0;
    }
    LOBYTE(cmp_rel_bool) = is_eq;
LABEL_11:
    my_perl->Istack_sp = sp_svp - 1;
    *(sp_svp - 1) = (sv *)((char *)&my_perl->Isv_no + (-(signed __int64)(cmp_rel_bool != 0) & 0xFFFFFFFFFFFFFFD0ui64));
  }
  return my_perl->Iop->op_next;
}

Or perhaps more focussed on reftypes, should they be implemented as SVf_IsCOW|SVppv_STATIC SVs?

Too dangerous b/c too little usage. 80% of all PP code wants to know $self aka $_[0] is __PACKAGE__ aka My::Module::My::Class, and that the caller isn't playing nonsense code games like my $string = IO::Handle::getline(rand(1337)); or $path = File::Spec::catdir(rand(1337), rand(6969));.

Its hard to find CPAN modules with PUBLIC APIs, that say pass a ref to $SCALAR at $_[3] and pass a ref to {HASH} at $_[3] and pass a ref to [ARRAY] at $_[3], are 3 legit, totally different features.

Also yy_lex()/bison/ck_op_*()/keywords.c currently don't and it would be hard but not impossible, to teach them to 100% absolute guarantee de-duping U8 bytes from STDIO FD 3 4 or 5, to exactly 1 SVf_IsCOW|SVppv_STATIC memory address, PL_no/PL_yes style. KHW a month ago have severe problems getting ISO C function pointer == equality to work across TUs on Z/OS platform. Richard, you are now asking for ISO C "lits" to have == equality across TUs. Works on 99% of Perl platforms, until you find that 1% that are annoying and weird.

ISO C drama

https://learn.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-170#microsoft-specific-1

hardware read only ISO C string "literals" ARE NOT A DEFAULT OPTIMIZATION!!!!! (obv WinPerl -O0 -Od -DDEBUGGING still turns on this optimization)

https://learn.microsoft.com/en-us/cpp/build/reference/gf-eliminate-duplicate-strings?view=msvc-170

If not, what are the downsides or blockers to those alternatives?
Or perhaps more focussed on reftypes, should they be implemented as SVf_IsCOW|SVppv_STATIC SVs?

Also, while I like SVf_IsCOW|SVppv_STATIC, I've gotten 12 or 15 secs using Newx() bufs, to 7 secs using HEK*s down to 4.5 secs using SVf_IsCOW|SVppv_STATIC bufs in an unreleased refresh of CPAN XSConfig.pm.xs. I wouldn't put a wedding ring on SVf_IsCOW|SVppv_STATIC because it can be too viral sometimes circulating in the runloop. SVf_IsCOW|SVppv_STATIC legally requires releasing the Newx() block back to OS malloc(), no buts, no ifs, LEGALLY REQUIRED!!!

What if the next C function is sv_catpvs(some_targ_sv_from_a_pad, ".");

MUHAHAAH!!!!! >:-<

Propagating a HEK* COW is always optional, its always a performance loss if you don't propagate, but never any dangerous/insta-bug ticket side effects. Not propagating a SVf_IsCOW|SVppv_STATIC COW breaks P5P Public C API as currently written.

I haven't really decompiled or researched how $cvref = *{'My::Mod::some_meth'}{CODE}; runs on a C level, but a 0.9 second guess says PP grammar string {CODE} probably has something to do with a HV*/HEK* on a C level. I don't see any performance benefits to using "ARRAY" or "CODE" in a SVf_IsCOW|SVppv_STATIC vs a HEK*. Remember HEK*s SvPVX(), usually can be == address equality checked, maybe always == equality checked, Note Ive seen a HEKf_NOT_IN_PL_STR_TAB flag in the VM somewhere.

I don't think SVf_IsCOW|SVppv_STATIC have any promises that there aren't 4 unique addr "SCALAR\0"s all marked with SVf_IsCOW|SVppv_STATIC floating around the runloop. perl510.dll-perl542.dll has multiple unique "AUTOLOAD\0"s and multiple unique "SCALAR\0"s inside it right now. I can generate an automated list if someone cares.

@bulk88
Copy link
Contributor Author

bulk88 commented Jul 2, 2025

Benchmarks show an improvement of %6 in the CPU burn loop. A/B runtime C bool var set from env var was used for before/after, same exact perl541.dll file. no recompile.

sub ben {
    my ($m2, $cnt, $i, $aref) = ($m, 0, 0, \@a);
	foreach my $el (@{$aref}) { $cnt++ if ref($el) eq 'SCALAR';}
    $gcnt += $cnt;
}
cmpthese(undef,{b1 => \&ben, b2 => \&ben});

FORCE OLD MODE

    Rate  b2  b1
b2 276/s  -- -0%
b1 277/s  1%  --
29676828

NEW MODE SV_SETHEK

    Rate  b1  b2
b1 293/s  -- -0%
b2 295/s  0%  --
30874164

Raw data/bench code/instrumented C code hidden b/c its not that important

``` use Benchmark qw( :all :hireswallclock ); use v5.30; my $gcnt = 0; my $m; #$m = (time() >> 12)+0; $m = 42761; #say $m;exit; my @A; $a[$m+1] = 1; { my ($i, $scal, $rcode, $mod) = (0, 1, \&Internals::V); my $rscal = \$scal; for(;$i<$m;$i++){ $mod = $i % 3; $a[$i] = $mod == 0 ? $scal : $mod == 1 ? $rscal : $rcode; } } sub ben { my ($m2, $cnt, $i, $aref) = ($m, 0, 0, \@A); #system 'pause'; #for(;$i < $m2 ; $i++) { $cnt++ if ref($aref->[$i]) eq 'SCALAR';} foreach my $el (@{$aref}) { $cnt++ if ref($el) eq 'SCALAR';} $gcnt += $cnt; } cmpthese(undef,{b1 => \&ben, b2 => \&ben}); print $gcnt ."\n"; ``` ``` C:\sources\perl5\win32>cd .. && timeit perl.exe -Ilib win32\benchref.pl & cd win32 Rate b1 b2 b1 293/s -- -0% b2 295/s 0% -- 30874164 Exit code : 0 Elapsed time : 7.95 Kernel time : 0.05 (0.6%) User time : 7.85 (98.7%) page fault # : 2504 Working set : 9864 KB Paged pool : 94 KB Non-paged pool : 8 KB Page file size : 5260 KB C:\sources\perl5\win32>set PERL_RR=1 ##### FORCE OLD MODE C:\sources\perl5\win32>cd .. && timeit perl.exe -Ilib win32\benchref.pl & cd win32 Rate b2 b1 b2 276/s -- -0% b1 277/s 1% -- 29676828 Exit code : 0 Elapsed time : 8.06 Kernel time : 0.05 (0.6%) User time : 8.02 (99.5%) page fault # : 2500 Working set : 9848 KB Paged pool : 94 KB Non-paged pool : 8 KB Page file size : 5244 KB C:\sources\perl5\win32> ``` temp A/B runtime branch selector new vs old ``` PP(pp_ref) { SV * const sv = *PL_stack_sp;
SvGETMAGIC(sv);
if (!SvROK(sv)) {
    rpp_replace_1_IMM_NN(&PL_sv_no);
    return NORMAL;
}

/* op is in boolean context? */
if (   (PL_op->op_private & OPpTRUEBOOL)
    || (   (PL_op->op_private & OPpMAYBE_TRUEBOOL)
        && block_gimme() == G_VOID))
{
    /* refs are always true - unless it's to an object blessed into a
     * class with a false name, i.e. "0". So we have to check for
     * that remote possibility. The following is is basically an
     * unrolled SvTRUE(sv_reftype(rv)) */
    SV * const rv = SvRV(sv);
    if (SvOBJECT(rv)) {
        HV *stash = SvSTASH(rv);
        HEK *hek = HvNAME_HEK(stash);
        if (hek) {
            I32 len = HEK_LEN(hek);
            /* bail out and do it the hard way? */
            if (UNLIKELY(
                   len == HEf_SVKEY
                || (len == 1 && HEK_KEY(hek)[0] == '0')
            ))
                goto do_sv_ref;
        }
    }
    rpp_replace_1_IMM_NN(&PL_sv_yes);
    return NORMAL;
}

do_sv_ref:
{
dTARGET;
if(g_old) {
sv_ref(TARG, SvRV(sv), TRUE);
} else {
HEK* hek = sv_refhek(SvRV(sv), TRUE);
// dTARGET;
sv_sethek(TARG, hek);
}
rpp_replace_1_1_NN(TARG);
SvSETMAGIC(TARG);
return NORMAL;
}
}

U32 g_old = 0;

void
perl_construct(pTHXx)
{

PERL_ARGS_ASSERT_PERL_CONSTRUCT;

#ifdef MULTIPLICITY
init_interp();
PL_perl_destruct_level = 1;
#else
PERL_UNUSED_ARG(my_perl);
if (PL_perl_destruct_level > 0)
init_interp();
#endif
PL_curcop = &PL_compiling; /* needed by ckWARN, right away */
extern U32 g_old;
#ifdef PERL_TRACE_OPS
Zero(PL_op_exec_cnt, OP_max+2, UV);
#endif

{
char * pv = PerlEnv_getenv("PERL_RR");
if(pv && pv[0] && pv[0] != '0')
g_old = 1;
else
g_old = 0;
}

</details>

bulk88 added 2 commits July 2, 2025 20:12
-ref() PP keyword has extremely high usage. Greping my blead repo shows:
 Searched "ref(" 4347 hits in 605 files of 5879 searched

-High level PP keyword ref(), aka C function Perl_pp_ref(), uses slow,
 inefficient, badly designed, backend public XS/C API called functions
 called Perl_sv_ref()/Perl_sv_reftype().

-This commit fixes all design problems with Perl_sv_ref()/Perl_sv_reftype(),
 and will speed up the very high usage PP keyword ref(), along with a
 very similar but very new and very little used PP keyword called
 "use builtin qw( reftype );" which is near identical to Perl_pp_ref().

-a crude benchmark, with the array ref in $aref holding
43000 SV*s, split 1/3rd SV* IOK, 1/3rd RV* to SV* IOK,
and 1/3rd RV* to CV*, showed a %6 speed increase for this code

sub benchme {
    foreach my $el (@{$aref}) { $cnt++ if ref($el) eq 'SCALAR';}
}

-The all UPPERCASE strings keyword ref() returns are part of the Perl 5
 BNF grammer. Changing their spelling or lowercasing them is not for
 debate, or i18n-ing them dynamically realtime against glibc.so's current
 "OS global locale" with inotify()/kqueue() in the runloop to monitor a
 text file /etc or /var so this race condition works as designed in a
 unit test will never happen:
     $perl -E "dire('hello')"
     Routine indéfinie &cœur::dire aufgerufen bei -e Zeile 1

-sv_reftype() and sv_ref() have very badly designed prototypes, and the
 first time a new Perl in C dev reads their source code, they will think
 these 2 will cause infinite C stack recursion and a SEGV. Probably most
 automated C code analytic tools maybe will complain these 2 functions do
 infinite recursion.

-The 2 functions don't return a string length, forcing all callers to
 execute a libc strlen() call on a string, that could be 8 bytes, or 80 MB.

-All null term-ed strings that they return, are already sitting in virtual
 address space. Either const HW RO, or RCed HEK*s from the PL_strtab pool,
 that were found inside something similar to a
 GV*/HV*/HE*/CV*/AV*/GP*/OP*/SV* in a OP*(no threads)
.
-COW 255 buffers from Newx() under 9 chars can't COW currently by policy.
 CODE is 4, SCALAR is 6. HASH is 4. ARRAY is 5. But very short SV HEK* COWs
 will COW propagate without problems. ref() is also used to retrieve
 "Local::My::Class" strings, which have an extremely high chance to wind
 up getting passed to hv_common() through some high level PP keyword like
 bless or @isa, and hv_common() extracts precalculated U32 hash values
 from SV* with HEK* buffers, speeding up hv_common(). So SV* POKs with
 COW 255 and COW SVs_STATIC buffers are bad choices compared to using SV*
 POK HEK* buffers for a new faster version of sv_reftype()/sv_ref().

-PP code "if(ref($self) eq 'HASH') {}" should never involve all 3-5 calls
 Newx()/Realloc()/strlen()/memcpy()/Safefree(), on each execution of the
 line.

 To improve the src code dev-friendlyness of the prototypes of, and speed
 inside of, and the speed of in all libperl callers of
 Perl_sv_ref()/Perl_sv_reftype(). Make HEK* variants of them. Initially
 the HEK* variants are private to libperl. Maybe after 1-3 years into the
 future, they can be made official public C API for CPAN XS authors.
 These 2 new functions are undocumented/private API until further notice.

 Using SV* holding RC-ed HEK* SvPVX() buffers removes all these libc
 C lang logical and/or Asm machine code steps from during execution of
 PP keyword ref(). The pre-allocated PAD TARG SV* just keeps getting
 a RC-- on the old HEK* inside SvPVX(), and a RC++ on the new HEK* written
 to SvPVX() of the PAD TARG SV*. Touching only 6 void*s/size_t adresses
 total, each one a single read/write CPU instruction pair.
 SvPVX, SvCUR, SvLEN, old_hek.shared_he.shared_he_he.he_valu.hent_refcount,
 new_hek.shared_he.shared_he_he.he_valu.hent_refcount,
 new_hek.shared_he.shared_he_hek.hek_len. This brings PP KW ref() closer
 to C++ style RTTI that just compares const read-only vtable pointers.

 Some design and optimization problems with the old and new
 pp_ref()/pp_reftype()/sv_ref()/sv_reftype()/sv_refhek()/sv_reftypehek()
 calls are intentionally not being fixed in this commit to keep this
 commit small. Check the associated PR of the commit for details.
…ations

-faster method lookups, faster new SVPV creation (COWs), some of these
 locations were missed by the original branch/PRs/commits that added
 SV_CONST() macro/api.

-I belive all "" C string literals that match a SV_CONST_UPPERCASE SV* HEK*
 cached constant have been replaced with their SV* POK HEK* COW buffer
 equivalents inside libperl with this commit, excluding some instances of
 "__ANON__" strings. Only PERL_CORE files qualify for the SV_CONST()
 optimization, because of design choices made previously about the
 SV_CONST() API. Changing the PERL_CORE-only design choice is out of
 scope of this patch.

-in pp_dbmopen() add SV_CONST(TIEHASH) macros for faster lookup/U32 hash
 pre-calc, and change newSVpvs_flags("AnyDBM_File", SVs_TEMP) to
 newSVpvs_share("AnyDBM_File"), because this sv is used multiple times
 in this pp_*() function, and it is a package name, and it is guaranteed
 to get passed into hv_common() somewhere eventually in some child
 function call we are making.

-some "__ANON__" locations were not changed from sv_*newSV*pvs("__ANON__");
 to sv_*newSV*hek(SV_CONST(__ANON__)); because right after, there is a
 sv_catpvs(""); that will make the SVPV HEK* COW instantly de-COW which
 saved no CPU or memory resources in the end, and only wasted them. Or it
 didn't look "safe" for a SV* COW buffer to be on that line.

-pp_tie() call_method() is an thin inefficient wrapper that makes a
 mortal SVPV around a C string, since the real backend API is call_sv(),
 so switch the call_method() in pp_tie() to the read backend function
 call_sv() and avoid making that mortal SVPV
@bulk88 bulk88 force-pushed the high_speed_ref()_eq_HASH branch from 36e7ab4 to 2ad59ed Compare July 3, 2025 00:19
@bulk88
Copy link
Contributor Author

bulk88 commented Jul 3, 2025

repushed, less detailed commit title, shorter commit message body, pp_const() optimization removed, short API docs added as C comments, the switch(){} tree directly using rsv = newSVpvn_share(pv, len, 0); PL_sv_consts[idx] = rsv; was documented why its doing that,

2nd commit expands usage of the new "__ANON__" SVPV HEK ptr, and corrects all missed opportunities for SV_CONST() optimization missed by the initial commit/commits of SV_CONST() API years ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants