ParseXS: build an AST for each XSUB #23225

iabyn · 2025-04-24T19:37:03Z

This PR is part of my ParseXS refactoring work. I don't intend for it to be merged until 5.43.1.

This series of about 150 commits changes ExtUtils::ParseXS so that, instead of intermixing parsing and code-emitting for each XSUB, it now parses each XSUB into an Abstract Syntax Tree, and then walks the tree to emit all the C code for that XSUB.

This makes the code generally more understandable and maintainable.

For now it still just discards each tree after the XSUB is parsed; in future work, the AST will be extended so that it holds the whole file (including all the XSUBs) rather than just the current XSUB.

This branch contains six types of commit.

For terminal AST nodes for keywords such as FOO, the old

ExtUtils::ParseXS::handle_FOO()

method is removed and a new ExtUtils::ParseXS::Node::FOO class is added with parse() and as_code() methods which copy over the parsing and code-emitting parts of the handle_FOO() method. For a few keywords like INPUT which have values per line, both a Node::FOO and Node::FOO_line class are created, with several FOO_line nodes being children of the FOO node.

Note that doing the modifications for a single keyword often consists in fact of several commits in sequence.

For higher-level nodes, a Node::foo class is created with parse() and as_code() methods as before, but the contents of these methods are typically populated by moving the relevant bits of code over from the big ExtUtils::ParseXS::process_file() method.
Most of the state fields of the ExtUtils::ParseXS class (especially all the xsub_foo ones) are removed and similar fields added to the various Node subclasses instead.
Fixups to ensure that all parse-time code is in parse() methods or associated helper functions, and similarly for as_code().
Various bug fixes related to state that should be per-CASE rather than per-XSUB. Some of these bugs were pre-existing, some were introduced during this branch.
General tidying-up, fixing code comments, adding POD etc.

tonycoz · 2025-05-07T04:59:52Z

dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS/Node.pm

+    #XXX note that for backwards compatibility, matching DIS/ENABLE is
+    #very lax: it is case insensitive, and ignores any trailing garbage
+    # on the line
+    unless ($s =~ /^(ENABLE|DISABLE)\b/i) {
+        my ($keyword) = ($self =~ /(\w+)=/); # final component of class name
+        $pxs->death("Error: $keyword: ENABLE/DISABLE")
+    }
+    $self->{enable} = uc($1) eq 'ENABLE' ? 1 : 0;


The old code was lax, but as far as I can tell this changes the behaviour when the keyword isn't exactly "ENABLE" or "DISABLE".

The original code from one case:

$self->death("Error: PROTOTYPES: ENABLE/DISABLE") unless $setting =~ /^(ENABLE|DISABLE)\b/i; $self->{PROTOTYPES_value} = 1 if $1 eq 'ENABLE'; $self->{PROTOTYPES_value} = 0 if $1 eq 'DISABLE'; $self->{proto_behaviour_specified} = 1;

In the original code like: PROTOTYPES: enable would pass the validation but then ignore the setting (except for suppressing the warning).

With the change prototypes (versioncheck, etc) will be enabled.

This isn't a bad change, but it is backward incompatible, I can see it changing the behaviour of some code.

https://metacpan.org/release/HAG/BSD-Itimer-0.8/source/Itimer.xs#L80
https://metacpan.org/release/JHI/BSD-Resource-1.2911/source/Resource.xs#L568
https://metacpan.org/release/TMM/HEAT-Crypto-0.07/source/Crypto.xs#L30
https://metacpan.org/release/GRICHTER/ExtUtils-XSBuilder-0.28/source/XSBuilder/WrapXS.pm#L1455

(there's more than this, and some obsolete matches like mod_perl 1.x modules)

iabyn · 2025-05-10T11:20:27Z

On Tue, May 06, 2025 at 10:00:14PM -0700, Tony Cook wrote: The old code was lax, but as far as I can tell this changes the behaviour when the keyword isn't exactly "ENABLE" or "DISABLE".

That's a good point. I've looked into this a bit further. Apart from the PROTOTYPES keyword, all the other 'ENABLE'-type keywords seem to be used 'properly' in CPAN. So I propose that for those keywords, I make it so that anything other than this following the keyword is an error: /\s* (ENABLE|DISABLE) \s* $/x I further propose that we special-case the PROTOTYPES keyword for backwards-compatibility. I suspect that the main reason that people have added this keyword is to silence the "Please specify prototyping behavior" warning. They were then inadvertently relying on the fact that any old unrecognised garbage following this keyword is interpreted as DISABLE, which is the default anyway. Based on a CPAN grep, this is a count of how many distros contain at least one match of each of these patterns in *.xs files: 595 /^\s*PROTOTYPES:\s*ENABLE\s*$/ 980 /^\s*PROTOTYPES:\s*DISABLE\s*$/ 12 /^\s*PROTOTYPES:\s*ENABLED\s*$/ 49 /^\s*PROTOTYPES:\s*DISABLED\s*$/ 11 /^\s*PROTOTYPES:\s*[Ee]nabled?\s*$/ 31 /^\s*PROTOTYPES:\s*[Dd]isabled?\s*$/ Only the first pattern above is actually interpreted as enabling prototypes. There are only 4 instances of a PROTOTYPES line which *don't* match /: \s* (ENABLE|DISABLE)D? \s* $/xi which are: PerlIO-via/via.xs: PROTOTYPES: ENABLE; Win32-FindFile: PROTOTYPES: DISABLE; X-Osd: PROTOTYPES: DISABLES Net-LibNFS: PROTOTYPES: DISABLESTATVFS64_OFFSET So I propose that 1) Anything not matching that last regex fails with an error, and we supply fixes for those 4 now-failing distros. 2) Anything other than 'ENABLE' is interpreted as DISABLE (thus maintaining backcompat). 3) anything other than /\s*(ENABLE|DISABLE)\s*$/ issues a warning to the effect that only ENABLE/DISABLE are valid keywords and that the unrecognised value has been interpreted as DISABLE. In total, this means that we break 4 CPAN distros and make another 103 start to emit a warning during build (but would otherwise continue to work as before). Note that this change will be applied early in the development cycle, so we would have nearly a year for distros to be updated or for us to revert this change.

…

-- "Foul and greedy Dwarf - you have eaten the last candle." -- "Hordes of the Things", BBC Radio.

Leont · 2025-05-10T13:18:56Z

PerlIO-via/via.xs: PROTOTYPES: ENABLE;

That's a core module, so we might as well fix it immediately.

jkeenan · 2025-05-11T14:53:36Z

As of Sun May 11 14:47:18 UTC 2025, the GH interface for this p.r. is stating, "This branch cannot be rebased due to conflicts."

I decided to explore this. I did a local checkout of the p.r., rebased it on blead, and proceeded to configure and build. For unknown reasons, my machine slowed to a crawl during the compilation of utf8.c. I had to do a hard reboot.

When I recovered, I did a new checkout of the p.r. but did not rebase it on blead. I then configured, built and tested without incident. I called git clean; git rebase blead. I then configured, built and again retested without incident.

I'm not sure what's happening, but I suggest we figure out why this ticket is saying that the branch cannot be rebased.

iabyn · 2025-05-12T12:25:20Z

On Sun, May 11, 2025 at 07:53:59AM -0700, James E Keenan wrote: I decided to explore this. I did a local checkout of the p.r., rebased it on blead, and proceeded to configure and build. For unknown reasons, my machine slowed to a crawl during the compilation of utf8.c. I had to do a hard reboot.

It rebased and built fine for me.

I'm not sure what's happening, but I suggest we figure out why this ticket is saying that the branch cannot be rebased.

I've no idea why. My branch only touches files under dist/ExtUtils-ParseXS/, and nothing recent in blead does. I intend to just ignore gitgub's opinion, and merge it manually when the time comes.

…

-- No man treats a motor car as foolishly as he treats another human being. When the car will not go, he does not attribute its annoying behaviour to sin, he does not say, You are a wicked motorcar, and I shall not give you any more petrol until you go. He attempts to find out what is wrong and set it right. -- Bertrand Russell, Has Religion Made Useful Contributions to Civilization?

tonycoz · 2025-05-12T23:04:43Z

So I propose that

Anything not matching that last regex fails with an error, and we supply fixes for those 4 now-failing distros.

Anything other than 'ENABLE' is interpreted as DISABLE (thus maintaining backcompat).

anything other than /\s*(ENABLE|DISABLE)\s*$/ issues a warning to the effect that only ENABLE/DISABLE are valid keywords and that the unrecognised value has been interpreted as DISABLE.

In total, this means that we break 4 CPAN distros and make another 103 start to emit a warning during build (but would otherwise continue to work as before). Note that this change will be applied early in the development cycle, so we would have nearly a year for distros to be updated or for us to revert this change.

This is fine for me.

I'm not sure what's happening, but I suggest we figure out why this ticket is saying that the branch cannot be rebased.

It rebased fine for me too, and the automation isn't adding the "hasConficts" label, so I think we're fine.

tonycoz · 2025-05-15T00:55:54Z

dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS.pm

@@ -1038,18 +1038,11 @@ EOF
        $ppcode->as_code($self);


commit message "thins commit"

tonycoz · 2025-05-15T01:03:37Z

dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS.pm

@@ -1026,25 +1026,18 @@ EOF
      # another way, it indicates an implicit "OUTPUT:\n\tRETVAL".
      my $implicit_OUTPUT_RETVAL;

-      if (/^\s*NOT_IMPLEMENTED_YET/) {


commit message "chuck of code" chunk?

tonycoz · 2025-05-15T04:32:22Z

dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS.pm

@@ -405,14 +405,14 @@ sub process_file {

  # Global Constants

-  my ($Is_VMS, $VMS_SymSet);


typo in commit message "mover an unshift"

tonycoz · 2025-05-19T05:37:40Z

dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS.pm

@@ -730,6 +730,14 @@ EOM
      next PARAGRAPH;
    }

+    # Initialise more per-XSUB state


xsub_pdecl in the commit message should be xsub_decl I think

tonycoz · 2025-05-21T01:45:44Z

dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS/Node.pm

-        my $parent = shift;
+        my ($key, $parent, @fields) = @_;
+
+        die "Internal error: key '$key' not 'parent'\n"
+            unless $key eq 'parent';


Right now we have a bunch of parent => '', rather than forcing an empty parameter that implies a bare ...::Node base class, maybe parent could be optional (perhaps renamed to -parent to avoid parent being the only bare identifier with special behaviour).

So all the places that now do $build_subclass->(parent => '', fields...) would be $build_subclass->(fields...).

Calls that need a non-bare-::Node base class would be $build_subclass->(-parent=> 'parentname', fields...)`.

Yeah, that sounds sensible. I'll probably do something along those lines. (NB: I'm not intending to make any changes until after you've reviewed the entirety of the PR in its current state.)

tonycoz · 2025-05-21T02:19:11Z

dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS/Node.pm

+        unless $func_header =~
+            /^(?:([\w:]*)::)?(\w+)\s*\(\s*(.*?)\s*\)\s*(const)?\s*(;\s*)?$/s;
+
+    my ($class, $name, $params_text) = ($1, $2, $3);


I realise this was originally xsub_class, but given the ways it is used and that it includes the const I wonder if the class member here would be better named this_type or something similar.

It's not entirely true, since THIS is actually {class} *, but the "class name" including the const seemed strange to me.

Perhaps leave the field as called "class", but don't store the "const" string as part of it, and add a separate field called 'const' to indicate whether a 'const' string needs to be emitted?

That works for me.

tonycoz · 2025-05-22T03:49:00Z

dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS/Node.pm

+    $self->{prototype}          = $proto;
+    $xsub->{prototype} = $proto;


nit: wildly different = alignment between these two lines

tonycoz · 2025-05-22T04:24:47Z

I'm done going through this, except for the issues above which are mostly cosmetic, it looks good to me.

iabyn · 2025-05-26T13:23:26Z

I've rebased and pushed an updated version. All the existing commits are unchanged apart from some fixed typos in their commit messages, as pointed out in Tony C's various comments. There are also 7 new commits at the head which address the various issues which Tony raised.

tonycoz · 2025-06-04T00:12:08Z

One commit message says:

Unfortunately this string (stored as $xsub->{xsub_decl}{class}) is also
used to generate an autocall line, which could lead to nonsense such as:

    RETVAL = new const X::Y();

While combining the const with the class name in a field called class triggered me, the above syntax is valid, assuming RETVAL is declared as const X::Y *, since new takes a type, and a cv-qualifier is part of the type.

It doesn't strike me as particularly useful in an XS context though.

(I don't think any changes are needed here)

Add file and line_no fields to the base class of all Node types to record where that node was defined within the XS src file. (There aren't many node types yet, so this commit doesn't really do anything useful at the moment.)

Add these Node subclasses: ExtUtils::ParseXS::Node::multiline ExtUtils::ParseXS::Node::code ExtUtils::ParseXS::Node::PREINIT The first is a very generic base class for the many planned node types which will represent multi-line XS keywords, such as FOO: aaa bbb ccc The lines are just read into an array ref. ExtUtils::ParseXS::Node::code is a generic subclass of Node::multiline which represents keywords that contain sections of C code, such as PREINIT, PPCODE etc. It does extra processing such as skipping leading blank lines and wrapping the output in '#line ...'. ExtUtils::ParseXS::Node::PREINIT is a concrete subclass of Node::code which represents the PREINIT keyword. This is kind of a proof-of-concept; other such code keywords (CODE, PPCODE etc) will be added later. The net effect of this commit is to move processing of the PREINIT: keyword into the parse() and as_code() methods of the Node::PREINIT class (and/or its parents) and away from the print_section() and PREINIT_handler() methods in ExtUtils::ParseXS. This is intended as a small incremental step towards having a real AST. A PREINIT object is currently created, parsed, printed and destroyed all at the same point in time. In some future refactoring, the intention is that the object will be stored in a tree, and the parsing and code-emitting steps will be done at different times.

Add these Node subclasses: ExtUtils::ParseXS::Node::multiline_merged ExtUtils::ParseXS::Node::C_ARGS ExtUtils::ParseXS::Node::INTERFACE ExtUtils::ParseXS::Node::INTERFACE_MACRO multiline_merged is a subclass of multiline which merges all the lines associated with a keyword into a single string. It is then used as a base class for the three concrete classes C_ARGS etc which correspond to keywords which are multi-line but treat all lines as a single string. The effect of this commit is to move more keyword processing out of ExtUtils::ParseXS and into Node.pm, in preparation for building an AST.

@isa

Add a helper function, build_subclass, to ExtUtils::ParseXS::Node to simplify the boilerplate required to declare the @isa and @fields of each Node subclass.

Rename the just-added ExtUtils::ParseXS::Node::code class to ExtUtils::ParseXS::Node::codeblock, as that better describes its purpose. (I would have updated the original commit, but reordering squashing commits was too complex due to intervening commits.)

Before this commit, three keywords which take ENABLE/DISABLE as their argument were exceedingly lax about what they would accept. This commit makes them slightly less lax: they now have to match an exact word, not a part word; i.e. the regex has changed from: /^(ENABLE|DISABLE)/i; to: /^(ENABLE|DISABLE)\b/i; Note that it still quietly ignores trailing garbage. So before both these lines were legal; now only the second is: PROTOTYPES: ENABLEaaa bsbsbs stbsbsb PROTOTYPES: EnablE bsbsbs stbsbsb This commit makes VERSIONCHECK, PROTOTYPES, EXPORT_XSUB_SYMBOLS match SCOPE, which already had the \b. This is in preparation for an upcoming commit, which will use a common method to parse such keywords. This commit also changes the test infrastructure slightly: the test_many() function no longer bails out if the eval fails; instead the eval error message is added to any STDERR text, accessible to tests which can now test that ParseXS did indeed call death().

The legacy KEYWORD_handler() methods expect, on entry, for $_ to hold the remainder of the current line (after s/^s*KEYWORD:\s*//), and for @{$self->{line}} to contain any remaining unparsed lines from the current XSUB. On return, they set $_ to the next unparsed line, and @{$self->{line}} to subsequent lines. Recent commits have started added Node (and its subclasses) parse() methods to replace the KEYWORD_handler() methods. Currently they use the same "rely on $_ to pass the current line to and from" scheme. This commit changes them so that they only get lines from @{$pxs->{line}}. This removes one source of weird action-at-a-distance.

The SCOPE: keyword, which enables wrapping an XSUB's main body with ENTER/LEAVE, has been partially broken since 5.12.0. This commit fixes that, adds tests, and updates the very vague documentation for SCOPE in perlxs.pod. AFAIKT, neither the SCOPE keyword, nor it's associated /* SCOPE */ magic comment in typemap files, are used anywhere in core or on CPAN, nor in any tests. ('SCOPE: DISABLE' appears in a single test file, but disabled is the default anyway.) Background: The SCOPE keyword was added by perl-5.003_03-21-gdb3b941461 (with documentation added soon after by perl-5.003_03-34-g84287afe68). This made the SCOPE: keyword an XSUB-body-scoped keyword, e.g. void foo() SCOPE: ENABLED CODE: blah where the emitted 'blah' code would now be wrapped with ENTER/LEAVE. 13 years later, with v5.11.0-30-g28892255e8, this was extended so that the keyword could appear just before the XSUB too: SCOPE: ENABLED void foo() CODE: blah I don't know what the motivation was behind this; the commit was part of a larger upgrade, which just listed among other bug fixes: - Fix the SCOPE keyword [Goro Fuji] but I can't find any trace of a corresponding problem description on p5p or RT. This change had the unfortunate side-effect of breaking the existing XSUB-scoped variant. This is indirectly due to the fact that XSUB-scoped KEYWORD_handler() methods are supposed to set $_ to the next line before returning, while file scoped ones aren't supposed to. That change made SCOPE_handler() both file- and xsub-scoped, and also made it no longer update $_. So the new file-scoped variant worked, while the old xsub-scope variant broke, because it now retuned with $_ set to 'ENABLE' rather than to the next line. The temporary fix in this commit makes SCOPE_handler() check who its caller is and sets $_ (or not) accordingly. A proper fix will occur shortly when a SCOPE Node subclass is added, since the NODE::parse() methods don't pass values back and forth in $_. This commit also updates the pod for SCOPE, which was very vague about what the SCOPE keyword did and where it should go, syntax-wise. I also changed it so that it suggests the magic comment token in a typemap entry should be /* SCOPE */. The actually regex is {/\*.*scope.*\*/}i, which matches a whole bunch of stuff. If we ever make it stricter, insisting on an upper-case SCOPE with just surrounding white space seems the way to go.

Add the following classes: ExtUtils::ParseXS::Node::oneline ExtUtils::ParseXS::Node::enable ExtUtils::ParseXS::Node::EXPORT_XSUB_SYMBOLS ExtUtils::ParseXS::Node::PROTOTYPES ExtUtils::ParseXS::Node::SCOPE ExtUtils::ParseXS::Node::VERSIONCHECK The first two are base classes for XS keywords which consume only a since line of XS src, and which then expect the keyword to have a value of ENABLE/DISABLE. The rest are concrete Node subclasses representing all the purely ENABLE/DISABLE keywords.

This method is no longer used anywhere

Currently the parsing of an XSUB's signature, and the parsing of the individual comma-separated items within that signature, are done in the same function, Params->parse(). This commit is the first of three which will extract out the latter into a separate Param->parse() method. For now, the per-param code is kept in-place (to make the diff easier to understand), but is wrapped within an immediately-called anon sub, in preparation to be moved. So before, the code was (very simplified): for (split /,/, $params_text) { ... parse type, name, init etc ... next if can't parse; my $param = Param->new(var = $var, type => $type, ...); push @{$params->{kids}}, $param; } After this commit, it looks more like: for (split /,/, $params_text) { my $param = Param->new(); sub { my $param = shift; ... ... parse type, name, init etc ... return if can't parse; $param->{var} = $var; ... return 1; }->{$param, ...) or next; push @{$params->{kids}}, $param; } Note that the inner sub leaves pushing the new param, updating the names hash and setting the arg_num to the caller. In theory there are no functional changes, except that when a synthetic RETVAL is being kept (but its position within kids moved), we now keep the Param hash and update its contents, rather than replace it with a new hash. This shouldn't make any difference.

This commit just moves a block of code of the form sub {...}->() into its own named sub. There are no changes to the moved lines of code apart from indentation. This is the second of three commits to create the parse() method. The next commit will do any final tidying up.

This is the third of three commits to create the parse() method. Mainly do s/$param/$self/g, and add a call to set file/line number foer the object.

Move all the code out of ExtUtils::ParseXS::Node::IO_Param::as_input_code() which is responsible for looking up the template initialisation code in the typemap (or elsewhere) and put it in it's own method, lookup_input_typemap(). As well as splitting a 300-line method into two approx 150-line methods, this will also allow us shortly to move the template lookup to earlier, at parse time rather than code-emitting time. Also add some more tests for the length(foo) pseudo-parameter, which I broke while working on this commit, and then noticed it was under-tested.

Move all the code out of ExtUtils::ParseXS::Node::IO_Param::as_output_code() which is responsible for looking up the template output code in the typemap (or elsewhere) and put it in it's own method, lookup_output_typemap(). As well as splitting a 490-line method into two 200 and 340-line methods, this will also allow us shortly to move the template lookup to earlier, at parse time rather than code-emitting time. It may also be possible at some point to merge the two methods added by these last two commits, lookup_intput_typemap and lookup_output_typemap, into a single method, since they share a lot of common code.

Previously these two values were set at the end of parsing an XSUB: XSRETURN_count_basic XSRETURN_count_extra They represent whether a RETVAL SV will be returned by the XSUB, and how many extra SVs are returned due to parameters declared as OUTLIST. This commit sets them earlier, as in particular, the next commit will need to access XSRETURN_count_basic earlier. XSRETURN_count_extra is now set right after parsing the XSUB's declaration, as its value can't change after then. XSRETURN_count_basic is now set after parsing the output part of the each body of the XSUB (an XSUB can have a body per CASE). Its value *aught* to be consistent across all bodies, but it's possible for the CODE_sets_ST0 hack (which looks for code like like 'ST(0) = ...' in any CODE: block) to vary across bodies; so this commit also adds a new warning and test for that.

The last few commits have moved the looking-up and processing of typemap entries (but not the evalling) for parameters from Param::as_input_code() and Param::as_output_code() into their own subs, lookup_input_typemap() and lookup_output_typemap(). This commit takes that one step further, and makes those new subs be called at parse time, rather than at code-generation time. This is needed because in principle, XSUB ASTs should be completely self-contained, and the code they emit shouldn't vary depending on when their top-level as_code() methods are called. But via the TYPEMAP: <<EOF mechanism, its possible for the typemap to change between XSUBs. This commit does this in a very crude way. Formerly, at code-emitting time, as_input_code() etc would do: my ($foo, $bar, ...) = lookup_input_typemap(...); Now, the parsing code does $self->{input_typemap_vals} = [ lookup_input_typemap(...) ]; and as_input_code() etc does: my ($foo, $bar, ...) = @{$self->{input_typemap_vals}}; Note that there are both output_typemap_vals and output_typemap_vals_outlist fields, as it's possible for the same parameter to be used both for updating the original arg (OUTPUT) and for returning the current value as a new SV (OUTLIST). So potentially we save the results of *two* calls to lookup_output_typemap() for each parameter.

Rationalise warning and error messages which appear in Node.pm: - always prefix with Warning: / Error: / Internal error: - lower-case the first letter following Error: etc - fix grammar - ensure full test coverage (except 'Internal error', which shouldn't be reproducible).

Some node types have fields to point to particular children. Make these kids also be in the generic @{$self->{kids}} array. That way, hypothetical generic tree-walking code will be able to access the whole tree just by following @{$self->{kids}}, without needing to know for example that the xsub_decl Node type has a child pointed to by $self->{return_type}.

Do a general tidy-up of this src file: white space, plus wrap long lines and strings.

For aesthetic reasons, give the $build_subclass sub an extra first arg which must be the string 'parent'. Then change invocations from: BEGIN { $build_subclass->('Foo', # parent 'field1', # ... ... } to BEGIN { $build_subclass->(parent => 'Foo', 'field1', # ... ... }

Update the code comments in calls to $build_subclass->() to indicate more consistently the 'type' of each field being declared.

In the INPUT_line and OUTPUT_line subclasses, rename the 'param' field to 'ioparam', to better reflect that it holds an IO_Param object rather than a Param object.

The work in this branch broke the parser under 5.8.9. Fix it, by not trying to autovivify an undef object pointer (which under 5.8.9 is a pseudo-hash thingy and generally behaves weirdly). The attempt to autovivify an undef $xsub was always wrong, but harmless: the value wasn't needed and was soon discarded. But under 5.8.9, it became a runtime error.

whitespace-only

The $build_subclass->() method called in BEGIN by subclasses to create a subclass, has as the first two parameters: a key which must be 'parent', and the suffix of the parent class name. If the latter is '', 'ExtUtils::ParseXS::Node' is assumed. This commit makes these two args optional; if not present, it acts like ('parent' => '') used to. In addition, the 'parent' arg has been renamed to '-parent' to distinguish it from a possible field name. Based on a suggestion from Tony C.

It's possible for an XSUB's name to include a class, in which case the XS parser will treat it as a C++ method and declare a THIS variable. It's also possible for an XSUB to have a postfix 'const' modifier to mimic similar in C++ method declarations. For example this XS declaration: void X::Y::foo(...) const causes this C++ variable to be declared: const X__Y * THIS = ...; The 'const' is supposed to be an optional modifier to a C++ XSUB but the parser currently allows it it on plain XSUBs, e.g. void foo(...) const which leads to 'uninit var' warnings during parsing, and to badly-formed C code or weird errors about missing typemap entries. This commit makes it an error to have 'const' on a non-C++ XSUB.

An XSUB can be a C++ method if its name includes a class; it can also have an optional 'const' postfix modifier; e.g.: int X::Y::foo(...) const Before this commit, both the class of the XSUB and the const modifier were combined into a single string that was used in various places, suvh as to declare the type of the THIS variable, e.g. const X__Y THIS = ...; This commit splits the state into two separate fields during parsing, then these two fields are used later during code generation to Do the Right Thing. So before, the XSUB declaration above would, at parse time, have created an xsub_decl node which looked like: $xsub->{xsub_decl}{class} = 'const X::Y'; and which now looks like: $xsub->{xsub_decl}{class} = 'X::Y'; $xsub->{xsub_decl}{is_const} = TRUE; The three tests added by this commit all failed before this fix.

This line: PROTOTYPES: ENABLE; is actually an error: such declarations shouldn't end with a semicolon. However, the XS parser has historically treated anything it doesn't recognise following the PROTOTYPES: keyword as if was 'DISABLE'. So this commit changes it to 'DISABLE'. This should make no functional difference, but it will avoid this module starting to warn when ParseXS is updated shortly to be stricter.

These four keywords are supposed to have only ENABLE or DISABLE as their value: EXPORT_XSUB_SYMBOLS: SCOPE: VERSIONCHECK: PROTOTYPES: It turns out that the XS parser is very lax and allows all sorts of rubbish as the keyword's value. For example all these are valid, and the first of them was originally interpreted as *DISABLE*, due to case-insensitive validation, but case-sensitive value interpretation: KEYWORD: Enable KEYWORD: ENABLE; KEYWORD: ENABLE # comment KEYWORD: ENABLE the quick brown fox An earlier commit in this branch, in the course of refactoring, silently made the value matching to be case insensitive for *all* keywords (originally it was only CI for SCOPE). So originally, PROTOTYPES: Enable actually disabled prototypes; now it enables them. This commit and the next will restore the original behaviour and/or make things stricter. This commit makes all such keywords stricter, apart from PROTOTYPES, which is much more widely used (often incorrectly) and will require more careful backwards-compatibility handling. It's behaviour is left untouched by this commit; the next commit will update it. For the first three keywords, this commit makes the only acceptable values to match be /^(ENABLE|DISABLE)\s*$/.

See the previous commit for a general discussion on the problems with ENABLE/DISABLE-valued keywords. This commit specifically updates how the PROTOTYPES: keyword is handled. Compared with the previous production release, PROTOTYPES is now stricter in validation. It still accepts ENABLE or DISABLE in a case-insensitive matter, but now treats any trailing text as an error, apart from possibly a 'D' or ':'. So Anything apart these forms are compile-time errors now: ENABLE enable Enabled DISABLE; DisablED; etc For backwards compatibility it still treats anything apart from /^ENABLE/ as false; so all except the first example above are treated as DISABLE. But now, it warns if the keyword has been accepted but isn't one of ENABLE/DISABLE.

jkeenan · 2025-07-06T10:49:07Z

Since this pull request affects a library under dist/, it will presumably at some point be uploaded to CPAN for installation against older versions of perl. Do we have a way of testing whether or not this EUPXS will then pass its own test suite?

tonycoz · 2025-07-06T23:21:32Z

Since this pull request affects a library under dist/, it will presumably at some point be uploaded to CPAN for installation against older versions of perl. Do we have a way of testing whether or not this EUPXS will then pass its own test suite?

CI tests all dist modules against 5.38, 5.24, 5.18, 5.10 threaded and non-threaded builds on Linux, and against the system perl on Mac OS. See https://github.com/Perl/perl5/actions/runs/16097800911/job/45423184974 for example

jkeenan added the defer-next-dev This PR should not be merged yet, but await the next development cycle label Apr 24, 2025

tonycoz reviewed May 7, 2025

View reviewed changes

tonycoz reviewed May 15, 2025

View reviewed changes

tonycoz reviewed May 19, 2025

View reviewed changes

tonycoz reviewed May 21, 2025

View reviewed changes

tonycoz reviewed May 22, 2025

View reviewed changes

iabyn force-pushed the davem/xs_refactor10 branch from 5d95e16 to cfee024 Compare May 26, 2025 13:20

tonycoz approved these changes Jun 4, 2025

View reviewed changes

khwilliamson removed the defer-next-dev This PR should not be merged yet, but await the next development cycle label Jul 4, 2025

iabyn added 10 commits July 6, 2025 10:43

ParseXS: bump version 3.57 => 3.58

75a0e54

ParseXS: refactor: add file/line_no fields to Node

539712e

Add file and line_no fields to the base class of all Node types to record where that node was defined within the XS src file. (There aren't many node types yet, so this commit doesn't really do anything useful at the moment.)

ParseXS: refactor: Node.pm: add class helper fn

620bd2f

Add a helper function, build_subclass, to ExtUtils::ParseXS::Node to simplify the boilerplate required to declare the @isa and @fields of each Node subclass.

iabyn added 25 commits July 6, 2025 10:43

ParseXS: refactor: remove print_section()

df71d29

This method is no longer used anywhere

ParseXS: refactor: clean-up Param->parse() method

156f4ea

This is the third of three commits to create the parse() method. Mainly do s/$param/$self/g, and add a call to set file/line number foer the object.

ParseXS: refactor: tidy up Node.pm

003f2ca

Do a general tidy-up of this src file: white space, plus wrap long lines and strings.

ParseXS: refactor: fixup some code comments

10adb09

ParseXS: refactor: update field code comments

664dcc0

Update the code comments in calls to $build_subclass->() to indicate more consistently the 'type' of each field being declared.

ParseXS: refactor: rename some fields to ioparam

55be473

In the INPUT_line and OUTPUT_line subclasses, rename the 'param' field to 'ioparam', to better reflect that it holds an IO_Param object rather than a Param object.

ParseXS: refactor: add basic POD for Node.pm

e7db425

ParseXS: refactor: fix trivial indentation issue

373806a

whitespace-only

ParseXS: refactor: tweak some code comments

fec16c7

iabyn force-pushed the davem/xs_refactor10 branch from cfee024 to fec16c7 Compare July 6, 2025 09:56

iabyn merged commit 195fee3 into blead Jul 6, 2025
67 checks passed

iabyn deleted the davem/xs_refactor10 branch July 6, 2025 11:52

		@@ -405,14 +405,14 @@ sub process_file {

		# Global Constants

		my ($Is_VMS, $VMS_SymSet);

ParseXS: build an AST for each XSUB #23225

ParseXS: build an AST for each XSUB #23225

Uh oh!

Conversation

iabyn commented Apr 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iabyn commented May 10, 2025 via email

Uh oh!

Leont commented May 10, 2025

Uh oh!

jkeenan commented May 11, 2025

Uh oh!

iabyn commented May 12, 2025 via email

Uh oh!

tonycoz commented May 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tonycoz commented May 22, 2025

Uh oh!

iabyn commented May 26, 2025

Uh oh!

tonycoz commented Jun 4, 2025

Uh oh!

jkeenan commented Jul 6, 2025

Uh oh!

Uh oh!

tonycoz commented Jul 6, 2025

Uh oh!

Uh oh!