Skip to content

ParseXS: build an AST for each XSUB #23225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 156 commits into from
Jul 6, 2025
Merged

ParseXS: build an AST for each XSUB #23225

merged 156 commits into from
Jul 6, 2025

Conversation

iabyn
Copy link
Contributor

@iabyn iabyn commented Apr 24, 2025

This PR is part of my ParseXS refactoring work. I don't intend for it to be merged until 5.43.1.

This series of about 150 commits changes ExtUtils::ParseXS so that, instead of intermixing parsing and code-emitting for each XSUB, it now parses each XSUB into an Abstract Syntax Tree, and then walks the tree to emit all the C code for that XSUB.

This makes the code generally more understandable and maintainable.

For now it still just discards each tree after the XSUB is parsed; in future work, the AST will be extended so that it holds the whole file (including all the XSUBs) rather than just the current XSUB.

This branch contains six types of commit.

  1. For terminal AST nodes for keywords such as FOO, the old

    ExtUtils::ParseXS::handle_FOO()

method is removed and a new ExtUtils::ParseXS::Node::FOO class is added with parse() and as_code() methods which copy over the parsing and code-emitting parts of the handle_FOO() method. For a few keywords like INPUT which have values per line, both a Node::FOO and Node::FOO_line class are created, with several FOO_line nodes being children of the FOO node.

Note that doing the modifications for a single keyword often consists in fact of several commits in sequence.

  1. For higher-level nodes, a Node::foo class is created with parse() and as_code() methods as before, but the contents of these methods are typically populated by moving the relevant bits of code over from the big ExtUtils::ParseXS::process_file() method.

  2. Most of the state fields of the ExtUtils::ParseXS class (especially all the xsub_foo ones) are removed and similar fields added to the various Node subclasses instead.

  3. Fixups to ensure that all parse-time code is in parse() methods or associated helper functions, and similarly for as_code().

  4. Various bug fixes related to state that should be per-CASE rather than per-XSUB. Some of these bugs were pre-existing, some were introduced during this branch.

  5. General tidying-up, fixing code comments, adding POD etc.

@jkeenan jkeenan added the defer-next-dev This PR should not be merged yet, but await the next development cycle label Apr 24, 2025
Comment on lines 1551 to 1558
#XXX note that for backwards compatibility, matching DIS/ENABLE is
#very lax: it is case insensitive, and ignores any trailing garbage
# on the line
unless ($s =~ /^(ENABLE|DISABLE)\b/i) {
my ($keyword) = ($self =~ /(\w+)=/); # final component of class name
$pxs->death("Error: $keyword: ENABLE/DISABLE")
}
$self->{enable} = uc($1) eq 'ENABLE' ? 1 : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code was lax, but as far as I can tell this changes the behaviour when the keyword isn't exactly "ENABLE" or "DISABLE".

The original code from one case:

  $self->death("Error: PROTOTYPES: ENABLE/DISABLE")
    unless $setting =~ /^(ENABLE|DISABLE)\b/i;

  $self->{PROTOTYPES_value} = 1 if $1 eq 'ENABLE';
  $self->{PROTOTYPES_value} = 0 if $1 eq 'DISABLE';
  $self->{proto_behaviour_specified} = 1;

In the original code like: PROTOTYPES: enable would pass the validation but then ignore the setting (except for suppressing the warning).

With the change prototypes (versioncheck, etc) will be enabled.

This isn't a bad change, but it is backward incompatible, I can see it changing the behaviour of some code.

https://metacpan.org/release/HAG/BSD-Itimer-0.8/source/Itimer.xs#L80
https://metacpan.org/release/JHI/BSD-Resource-1.2911/source/Resource.xs#L568
https://metacpan.org/release/TMM/HEAT-Crypto-0.07/source/Crypto.xs#L30
https://metacpan.org/release/GRICHTER/ExtUtils-XSBuilder-0.28/source/XSBuilder/WrapXS.pm#L1455

(there's more than this, and some obsolete matches like mod_perl 1.x modules)

@iabyn
Copy link
Contributor Author

iabyn commented May 10, 2025 via email

@Leont
Copy link
Contributor

Leont commented May 10, 2025

PerlIO-via/via.xs: PROTOTYPES: ENABLE;

That's a core module, so we might as well fix it immediately.

@jkeenan
Copy link
Contributor

jkeenan commented May 11, 2025

As of Sun May 11 14:47:18 UTC 2025, the GH interface for this p.r. is stating, "This branch cannot be rebased due to conflicts."

I decided to explore this. I did a local checkout of the p.r., rebased it on blead, and proceeded to configure and build. For unknown reasons, my machine slowed to a crawl during the compilation of utf8.c. I had to do a hard reboot.

When I recovered, I did a new checkout of the p.r. but did not rebase it on blead. I then configured, built and tested without incident. I called git clean; git rebase blead. I then configured, built and again retested without incident.

I'm not sure what's happening, but I suggest we figure out why this ticket is saying that the branch cannot be rebased.

@iabyn
Copy link
Contributor Author

iabyn commented May 12, 2025 via email

@tonycoz
Copy link
Contributor

tonycoz commented May 12, 2025

So I propose that

  1. Anything not matching that last regex fails with an error, and we supply fixes for those 4 now-failing distros.

  2. Anything other than 'ENABLE' is interpreted as DISABLE (thus maintaining backcompat).

  3. anything other than /\s*(ENABLE|DISABLE)\s*$/ issues a warning to the effect that only ENABLE/DISABLE are valid keywords and that the unrecognised value has been interpreted as DISABLE.

In total, this means that we break 4 CPAN distros and make another 103 start to emit a warning during build (but would otherwise continue to work as before). Note that this change will be applied early in the development cycle, so we would have nearly a year for distros to be updated or for us to revert this change.

This is fine for me.

I'm not sure what's happening, but I suggest we figure out why this ticket is saying that the branch cannot be rebased.

It rebased fine for me too, and the automation isn't adding the "hasConficts" label, so I think we're fine.

@@ -1038,18 +1038,11 @@ EOF
$ppcode->as_code($self);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit message "thins commit"

@@ -1026,25 +1026,18 @@ EOF
# another way, it indicates an implicit "OUTPUT:\n\tRETVAL".
my $implicit_OUTPUT_RETVAL;

if (/^\s*NOT_IMPLEMENTED_YET/) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit message "chuck of code" chunk?

@@ -405,14 +405,14 @@ sub process_file {

# Global Constants

my ($Is_VMS, $VMS_SymSet);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in commit message "mover an unshift"

@@ -730,6 +730,14 @@ EOM
next PARAGRAPH;
}

# Initialise more per-XSUB state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xsub_pdecl in the commit message should be xsub_decl I think

Comment on lines 51 to 58
my $parent = shift;
my ($key, $parent, @fields) = @_;

die "Internal error: key '$key' not 'parent'\n"
unless $key eq 'parent';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we have a bunch of parent => '', rather than forcing an empty parameter that implies a bare ...::Node base class, maybe parent could be optional (perhaps renamed to -parent to avoid parent being the only bare identifier with special behaviour).

So all the places that now do $build_subclass->(parent => '', fields...) would be $build_subclass->(fields...).

Calls that need a non-bare-::Node base class would be $build_subclass->(-parent=> 'parentname', fields...)`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds sensible. I'll probably do something along those lines. (NB: I'm not intending to make any changes until after you've reviewed the entirety of the PR in its current state.)

unless $func_header =~
/^(?:([\w:]*)::)?(\w+)\s*\(\s*(.*?)\s*\)\s*(const)?\s*(;\s*)?$/s;

my ($class, $name, $params_text) = ($1, $2, $3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise this was originally xsub_class, but given the ways it is used and that it includes the const I wonder if the class member here would be better named this_type or something similar.

It's not entirely true, since THIS is actually {class} *, but the "class name" including the const seemed strange to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps leave the field as called "class", but don't store the "const" string as part of it, and add a separate field called 'const' to indicate whether a 'const' string needs to be emitted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for me.

Comment on lines 3854 to 3889
$self->{prototype} = $proto;
$xsub->{prototype} = $proto;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wildly different = alignment between these two lines

@tonycoz
Copy link
Contributor

tonycoz commented May 22, 2025

I'm done going through this, except for the issues above which are mostly cosmetic, it looks good to me.

@iabyn iabyn force-pushed the davem/xs_refactor10 branch from 5d95e16 to cfee024 Compare May 26, 2025 13:20
@iabyn
Copy link
Contributor Author

iabyn commented May 26, 2025

I've rebased and pushed an updated version. All the existing commits are unchanged apart from some fixed typos in their commit messages, as pointed out in Tony C's various comments. There are also 7 new commits at the head which address the various issues which Tony raised.

@tonycoz
Copy link
Contributor

tonycoz commented Jun 4, 2025

One commit message says:

Unfortunately this string (stored as $xsub->{xsub_decl}{class}) is also
used to generate an autocall line, which could lead to nonsense such as:

    RETVAL = new const X::Y();

While combining the const with the class name in a field called class triggered me, the above syntax is valid, assuming RETVAL is declared as const X::Y *, since new takes a type, and a cv-qualifier is part of the type.

It doesn't strike me as particularly useful in an XS context though.

(I don't think any changes are needed here)

@khwilliamson khwilliamson removed the defer-next-dev This PR should not be merged yet, but await the next development cycle label Jul 4, 2025
iabyn added 10 commits July 6, 2025 10:43
Add file and line_no fields to the base class of all Node types to
record where that node was defined within the XS src file.

(There aren't many node types yet, so this commit doesn't really do
anything useful at the moment.)
Add these Node subclasses:

    ExtUtils::ParseXS::Node::multiline
    ExtUtils::ParseXS::Node::code
    ExtUtils::ParseXS::Node::PREINIT

The first is a very generic base class for the many planned node types
which will represent multi-line XS keywords, such as

    FOO: aaa
       bbb
       ccc

The lines are just read into an array ref.

ExtUtils::ParseXS::Node::code is a generic subclass of Node::multiline
which represents keywords that contain sections of C code, such as
PREINIT, PPCODE etc. It does extra processing such as skipping leading
blank lines and wrapping the output in '#line ...'.

ExtUtils::ParseXS::Node::PREINIT is a concrete subclass of Node::code
which represents the PREINIT keyword. This is kind of a
proof-of-concept; other such code keywords (CODE, PPCODE etc) will be
added later.

The net effect of this commit is to move processing of the PREINIT:
keyword into the parse() and as_code() methods of the Node::PREINIT
class (and/or its parents) and away from the print_section() and
PREINIT_handler() methods in ExtUtils::ParseXS.

This is intended as a small incremental step towards having a real AST.

A PREINIT object is currently created, parsed, printed and destroyed all
at the same point in time. In some future refactoring, the intention is
that the object will be stored in a tree, and the parsing and
code-emitting steps will be done at different times.
Add these Node subclasses:

    ExtUtils::ParseXS::Node::multiline_merged
    ExtUtils::ParseXS::Node::C_ARGS
    ExtUtils::ParseXS::Node::INTERFACE
    ExtUtils::ParseXS::Node::INTERFACE_MACRO

multiline_merged is a subclass of multiline which merges all the lines
associated with a keyword into a single string. It is then used as a
base class for the three concrete classes C_ARGS etc which correspond to
keywords which are multi-line but treat all lines as a single string.

The effect of this commit is to move more keyword processing out of
ExtUtils::ParseXS and into Node.pm, in preparation for building an AST.
Add a helper function, build_subclass, to ExtUtils::ParseXS::Node
to simplify the boilerplate required to declare the @isa and @fields
of each Node subclass.
Rename the just-added ExtUtils::ParseXS::Node::code class to
ExtUtils::ParseXS::Node::codeblock, as that better describes its
purpose.

(I would have updated the original commit, but reordering squashing
commits was too complex due to intervening commits.)
Before this commit, three keywords which take ENABLE/DISABLE
as their argument were exceedingly lax about what they would accept.
This commit makes them slightly less lax: they now have to match
an exact word, not a part word; i.e. the regex has changed from:

   /^(ENABLE|DISABLE)/i;
to:
   /^(ENABLE|DISABLE)\b/i;

Note that it still quietly ignores trailing garbage. So before both
these lines were legal; now only the second is:

    PROTOTYPES: ENABLEaaa bsbsbs stbsbsb
    PROTOTYPES: EnablE    bsbsbs stbsbsb

This commit makes VERSIONCHECK, PROTOTYPES, EXPORT_XSUB_SYMBOLS match
SCOPE, which already had the \b.

This is in preparation for an upcoming commit, which will use a common
method to parse such keywords.

This commit also changes the test infrastructure slightly: the
test_many() function no longer bails out if the eval fails; instead the
eval error message is added to any STDERR text, accessible to tests
which can now test that ParseXS did indeed call death().
The legacy KEYWORD_handler() methods expect, on entry, for $_ to hold
the remainder of the current line (after s/^s*KEYWORD:\s*//), and for
@{$self->{line}} to contain any remaining unparsed lines from the
current XSUB. On return, they set $_ to the next unparsed line, and
@{$self->{line}} to subsequent lines.

Recent commits have started added Node (and its subclasses) parse()
methods to replace the KEYWORD_handler() methods. Currently they use the
same "rely on $_ to pass the current line to and from" scheme.

This commit changes them so that they only get lines from @{$pxs->{line}}.
This removes one source of weird action-at-a-distance.
The SCOPE: keyword, which enables wrapping an XSUB's main body with
ENTER/LEAVE, has been partially broken since 5.12.0. This commit fixes
that, adds tests, and updates the very vague documentation for SCOPE in
perlxs.pod.

AFAIKT, neither the SCOPE keyword, nor it's associated /* SCOPE */
magic comment in typemap files, are used anywhere in core or on CPAN,
nor in any tests. ('SCOPE: DISABLE' appears in a single test file, but
disabled is the default anyway.)

Background:

The SCOPE keyword was added by perl-5.003_03-21-gdb3b941461 (with
documentation added soon after by perl-5.003_03-34-g84287afe68).
This made the SCOPE: keyword an XSUB-body-scoped keyword, e.g.

    void
    foo()
        SCOPE: ENABLED
        CODE:
            blah

where the emitted 'blah' code would now be wrapped with ENTER/LEAVE.

13 years later, with v5.11.0-30-g28892255e8, this was extended so that
the keyword could appear just before the XSUB too:

    SCOPE: ENABLED
    void
    foo()
        CODE:
            blah

I don't know what the motivation was behind this; the commit was part of
a larger upgrade, which just listed among other bug fixes:

    - Fix the SCOPE keyword [Goro Fuji]

but I can't find any trace of a corresponding problem description on p5p
or RT.

This change had the unfortunate side-effect of breaking the existing
XSUB-scoped variant. This is indirectly due to the fact that XSUB-scoped
KEYWORD_handler() methods are supposed to set $_ to the next line before
returning, while file scoped ones aren't supposed to.

That change made SCOPE_handler() both file- and xsub-scoped, and also
made it no longer update $_. So the new file-scoped variant worked,
while the old xsub-scope variant broke, because it now retuned with
$_ set to 'ENABLE' rather than to the next line.

The temporary fix in this commit makes SCOPE_handler() check who its
caller is and sets $_ (or not) accordingly. A proper fix will occur
shortly when a SCOPE Node subclass is added, since the NODE::parse()
methods don't pass values back and forth in $_.

This commit also updates the pod for SCOPE, which was very vague about
what the SCOPE keyword did and where it should go, syntax-wise.  I also
changed it so that it suggests the magic comment token in a typemap
entry should be /* SCOPE */. The actually regex is {/\*.*scope.*\*/}i,
which matches a whole bunch of stuff. If we ever make it stricter,
insisting on an upper-case SCOPE with just surrounding white space seems
the way to go.
Add the following classes:

    ExtUtils::ParseXS::Node::oneline
    ExtUtils::ParseXS::Node::enable
    ExtUtils::ParseXS::Node::EXPORT_XSUB_SYMBOLS
    ExtUtils::ParseXS::Node::PROTOTYPES
    ExtUtils::ParseXS::Node::SCOPE
    ExtUtils::ParseXS::Node::VERSIONCHECK

The first two are base classes for XS keywords which consume only a
since line of XS src, and which then expect the keyword to have a value
of ENABLE/DISABLE.
The rest are concrete Node subclasses representing all the purely
ENABLE/DISABLE keywords.
iabyn added 25 commits July 6, 2025 10:43
This method is no longer used anywhere
Currently the parsing of an XSUB's signature, and the parsing of
the individual comma-separated items within that signature, are done in
the same function, Params->parse(). This commit is the first of three
which will extract out the latter into a separate Param->parse() method.

For now, the per-param code is kept in-place (to make the diff easier to
understand), but is wrapped within an immediately-called anon sub, in
preparation to be moved.

So before, the code was (very simplified):

    for (split /,/, $params_text) {
        ... parse type, name, init etc ...
        next if can't parse;
        my $param = Param->new(var = $var, type => $type, ...);
        push @{$params->{kids}}, $param;
    }

After this commit, it looks more like:

    for (split /,/, $params_text) {
        my $param = Param->new();
        sub {
            my $param = shift;
            ...
            ... parse type, name, init etc ...
            return if can't parse;
            $param->{var} = $var; ...
            return 1;
        }->{$param, ...)
            or next;

        push @{$params->{kids}}, $param;
    }

Note that the inner sub leaves pushing the new param, updating the names
hash and setting the arg_num to the caller.

In theory there are no functional changes, except that when a synthetic
RETVAL is being kept (but its position within kids moved), we now keep
the Param hash and update its contents, rather than replace it with a new
hash. This shouldn't make any difference.
This commit just moves a block of code of the form sub {...}->()
into its own named sub. There are no changes to the moved lines of code
apart from indentation.

This is the second of three commits to create the parse() method.
The next commit will do any final tidying up.
This is the third of three commits to create the parse() method.
Mainly do s/$param/$self/g, and add a call to set file/line number foer
the object.
Move all the code out of

ExtUtils::ParseXS::Node::IO_Param::as_input_code()

which is responsible for looking up the template initialisation code in
the typemap (or elsewhere) and put it in it's own method,
lookup_input_typemap().

As well as splitting a 300-line method into two approx 150-line methods,
this will also allow us shortly to move the template lookup to earlier,
at parse time rather than code-emitting time.

Also add some more tests for the length(foo) pseudo-parameter, which I
broke while working on this commit, and then noticed it was under-tested.
Move all the code out of

ExtUtils::ParseXS::Node::IO_Param::as_output_code()

which is responsible for looking up the template output code in the
typemap (or elsewhere) and put it in it's own method,
lookup_output_typemap().

As well as splitting a 490-line method into two 200 and 340-line methods,
this will also allow us shortly to move the template lookup to earlier,
at parse time rather than code-emitting time.

It may also be possible at some point to merge the two methods added by
these last two commits, lookup_intput_typemap and lookup_output_typemap,
into a single method, since they share a lot of common code.
Previously these two values were set at the end of parsing an XSUB:

    XSRETURN_count_basic
    XSRETURN_count_extra

They represent whether a RETVAL SV will be returned by the XSUB, and
how many extra SVs are returned due to parameters declared as OUTLIST.

This commit sets them earlier, as in particular, the next commit will
need to access XSRETURN_count_basic earlier.

XSRETURN_count_extra is now set right after parsing the XSUB's
declaration, as its value can't change after then.

XSRETURN_count_basic is now set after parsing the output part of the
each body of the XSUB (an XSUB can have a body per CASE). Its value
*aught* to be consistent across all bodies, but it's possible for the
CODE_sets_ST0 hack (which looks for code like like 'ST(0) = ...' in any
CODE: block) to vary across bodies; so this commit also adds a new
warning and test for that.
The last few commits have moved the looking-up and processing of
typemap entries (but not the evalling) for parameters from
Param::as_input_code() and Param::as_output_code() into their
own subs, lookup_input_typemap() and lookup_output_typemap().

This commit takes that one step further, and makes those new subs be
called at parse time, rather than at code-generation time.
This is needed because in principle, XSUB ASTs should be completely
self-contained, and the code they emit shouldn't vary depending on when
their top-level as_code() methods are called. But via the

    TYPEMAP: <<EOF

mechanism, its possible for the typemap to change between XSUBs.

This commit does this in a very crude way. Formerly, at code-emitting
time, as_input_code() etc would do:

    my ($foo, $bar, ...) = lookup_input_typemap(...);

Now, the parsing code does

    $self->{input_typemap_vals} = [ lookup_input_typemap(...) ];

and as_input_code() etc does:

    my ($foo, $bar, ...) = @{$self->{input_typemap_vals}};

Note that there are both output_typemap_vals and
output_typemap_vals_outlist fields, as it's possible for the same
parameter to be used both for updating the original arg (OUTPUT) and for
returning the current value as a new SV (OUTLIST). So potentially we
save the results of *two* calls to lookup_output_typemap() for each
parameter.
Rationalise warning and error messages which appear in Node.pm:

- always prefix with Warning: / Error: / Internal error:
- lower-case the first letter following Error: etc
- fix grammar
- ensure full test coverage (except 'Internal error', which
  shouldn't be reproducible).
Some node types have fields to point to particular children. Make these
kids also be in the generic @{$self->{kids}} array.

That way, hypothetical generic tree-walking code will be able to access
the whole tree just by following @{$self->{kids}}, without needing to
know for example that the xsub_decl Node type has a child pointed to by
$self->{return_type}.
Do a general tidy-up of this src file: white space, plus wrap long lines
and strings.
For aesthetic reasons, give the $build_subclass sub an extra first arg
which must be the string 'parent'. Then change invocations from:

    BEGIN { $build_subclass->('Foo', # parent
        'field1', # ...
        ...
    }

to
    BEGIN { $build_subclass->(parent => 'Foo',
        'field1', # ...
        ...
    }
Update the code comments in calls to $build_subclass->() to indicate
more consistently the 'type' of each field being declared.
In the INPUT_line and OUTPUT_line subclasses, rename the 'param' field
to 'ioparam', to better reflect that it holds an IO_Param object rather
than a Param object.
The work in this branch broke the parser under 5.8.9. Fix it, by not
trying to autovivify an undef object pointer (which under 5.8.9 is a
pseudo-hash thingy and generally behaves weirdly).

The attempt to autovivify an undef $xsub was always wrong, but harmless:
the value wasn't needed and was soon discarded. But under 5.8.9, it
became a runtime error.
The $build_subclass->() method called in BEGIN by subclasses to create
a subclass, has as the first two parameters: a key which must be
'parent', and the suffix of the parent class name. If the latter is '',
'ExtUtils::ParseXS::Node' is assumed.

This commit makes these two args optional; if not present, it acts like
('parent' => '') used to.

In addition, the 'parent' arg has been renamed to '-parent' to
distinguish it from a possible field name.

Based on a suggestion from Tony C.
It's possible for an XSUB's name to include a class, in which case the
XS parser will treat it as a C++ method and declare a THIS variable.
It's also possible for an XSUB to have a postfix 'const' modifier to
mimic similar in C++ method declarations. For example this XS declaration:

    void
    X::Y::foo(...) const

causes this C++ variable to be declared:

    const X__Y * THIS = ...;

The 'const' is supposed to be an optional modifier to a C++ XSUB but the
parser currently allows it it on plain XSUBs, e.g.

    void
    foo(...) const

which leads to 'uninit var' warnings during parsing, and to badly-formed
C code or weird errors about missing typemap entries.

This commit makes it an error to have 'const' on a non-C++ XSUB.
An XSUB can be a C++ method if its name includes a class; it can also
have an optional 'const' postfix modifier; e.g.:

    int
    X::Y::foo(...) const

Before this commit, both the class of the XSUB and the const modifier
were combined into a single string that was used in various places, suvh
as to declare the type of the THIS variable, e.g.

    const X__Y THIS = ...;

This commit splits the state into two separate fields during parsing,
then these two fields are used later during code generation to Do the
Right Thing.

So before, the XSUB declaration above would, at parse time, have created
an xsub_decl node which looked like:

    $xsub->{xsub_decl}{class} = 'const X::Y';

and which now looks like:

    $xsub->{xsub_decl}{class}    = 'X::Y';
    $xsub->{xsub_decl}{is_const} = TRUE;

The three tests added by this commit all failed before this fix.
This line:

    PROTOTYPES: ENABLE;

is actually an error: such declarations shouldn't end with a semicolon.
However, the XS parser has historically treated anything it doesn't
recognise following the PROTOTYPES: keyword as if was 'DISABLE'.

So this commit changes it to  'DISABLE'. This should make no functional
difference, but it will avoid this module starting to warn when ParseXS
is updated shortly to be stricter.
These four keywords are supposed to have only ENABLE or DISABLE
as their value:

    EXPORT_XSUB_SYMBOLS:
    SCOPE:
    VERSIONCHECK:
    PROTOTYPES:

It turns out that the XS parser is very lax and allows all sorts of
rubbish as the keyword's value.  For example all these are valid, and
the first of them was originally interpreted as *DISABLE*, due to
case-insensitive validation, but case-sensitive value interpretation:

    KEYWORD: Enable
    KEYWORD: ENABLE;
    KEYWORD: ENABLE # comment
    KEYWORD: ENABLE the quick brown fox

An earlier commit in this branch, in the course of refactoring, silently
made the value matching to be case insensitive for *all* keywords
(originally it was only CI for SCOPE).

So originally,

    PROTOTYPES: Enable

actually disabled prototypes; now it enables them. This commit and the
next will restore the original behaviour and/or make things stricter.

This commit makes all such keywords stricter, apart from PROTOTYPES,
which is much more widely used (often incorrectly) and will require more
careful backwards-compatibility handling. It's behaviour is left
untouched by this commit; the next commit will update it.

For the first three keywords, this commit makes the only acceptable
values to match be /^(ENABLE|DISABLE)\s*$/.
See the previous commit for a general discussion on the problems
with ENABLE/DISABLE-valued keywords.

This commit specifically updates how the PROTOTYPES: keyword is handled.

Compared with the previous production release, PROTOTYPES is now
stricter in validation. It still accepts ENABLE or DISABLE in a
case-insensitive matter, but now treats any trailing text as an error,
apart from possibly a 'D' or ':'. So Anything apart these forms are
compile-time errors now:

    ENABLE
    enable
    Enabled
    DISABLE;
    DisablED;
    etc

For backwards compatibility it still treats anything apart from
/^ENABLE/ as false; so all except the first example above are treated
as DISABLE. But now, it warns if the keyword has been accepted but isn't
one of ENABLE/DISABLE.
@iabyn iabyn force-pushed the davem/xs_refactor10 branch from cfee024 to fec16c7 Compare July 6, 2025 09:56
@jkeenan
Copy link
Contributor

jkeenan commented Jul 6, 2025

Since this pull request affects a library under dist/, it will presumably at some point be uploaded to CPAN for installation against older versions of perl. Do we have a way of testing whether or not this EUPXS will then pass its own test suite?

@iabyn iabyn merged commit 195fee3 into blead Jul 6, 2025
67 checks passed
@iabyn iabyn deleted the davem/xs_refactor10 branch July 6, 2025 11:52
@tonycoz
Copy link
Contributor

tonycoz commented Jul 6, 2025

Since this pull request affects a library under dist/, it will presumably at some point be uploaded to CPAN for installation against older versions of perl. Do we have a way of testing whether or not this EUPXS will then pass its own test suite?

CI tests all dist modules against 5.38, 5.24, 5.18, 5.10 threaded and non-threaded builds on Linux, and against the system perl on Mac OS. See https://github.com/Perl/perl5/actions/runs/16097800911/job/45423184974 for example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants