Skip to content

ParseXS: build an AST for each XSUB #23225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 148 commits into
base: blead
Choose a base branch
from
Open

ParseXS: build an AST for each XSUB #23225

wants to merge 148 commits into from

Conversation

iabyn
Copy link
Contributor

@iabyn iabyn commented Apr 24, 2025

This PR is part of my ParseXS refactoring work. I don't intend for it to be merged until 5.43.1.

This series of about 150 commits changes ExtUtils::ParseXS so that, instead of intermixing parsing and code-emitting for each XSUB, it now parses each XSUB into an Abstract Syntax Tree, and then walks the tree to emit all the C code for that XSUB.

This makes the code generally more understandable and maintainable.

For now it still just discards each tree after the XSUB is parsed; in future work, the AST will be extended so that it holds the whole file (including all the XSUBs) rather than just the current XSUB.

This branch contains six types of commit.

  1. For terminal AST nodes for keywords such as FOO, the old

    ExtUtils::ParseXS::handle_FOO()

method is removed and a new ExtUtils::ParseXS::Node::FOO class is added with parse() and as_code() methods which copy over the parsing and code-emitting parts of the handle_FOO() method. For a few keywords like INPUT which have values per line, both a Node::FOO and Node::FOO_line class are created, with several FOO_line nodes being children of the FOO node.

Note that doing the modifications for a single keyword often consists in fact of several commits in sequence.

  1. For higher-level nodes, a Node::foo class is created with parse() and as_code() methods as before, but the contents of these methods are typically populated by moving the relevant bits of code over from the big ExtUtils::ParseXS::process_file() method.

  2. Most of the state fields of the ExtUtils::ParseXS class (especially all the xsub_foo ones) are removed and similar fields added to the various Node subclasses instead.

  3. Fixups to ensure that all parse-time code is in parse() methods or associated helper functions, and similarly for as_code().

  4. Various bug fixes related to state that should be per-CASE rather than per-XSUB. Some of these bugs were pre-existing, some were introduced during this branch.

  5. General tidying-up, fixing code comments, adding POD etc.

iabyn added 30 commits April 23, 2025 12:08
Add file and line_no fields to the base class of all Node types to
record where that node was defined within the XS src file.

(There aren't many node types yet, so this commit doesn't really do
anything useful at the moment.)
Add these Node subclasses:

    ExtUtils::ParseXS::Node::multiline
    ExtUtils::ParseXS::Node::code
    ExtUtils::ParseXS::Node::PREINIT

The first is a very generic base class for the many planned node types
which will represent multi-line XS keywords, such as

    FOO: aaa
       bbb
       ccc

The lines are just read into an array ref.

ExtUtils::ParseXS::Node::code is a generic subclass of Node::multiline
which represents keywords that contain sections of C code, such as
PREINIT, PPCODE etc. It does extra processing such as skipping leading
blank lines and wrapping the output in '#line ...'.

ExtUtils::ParseXS::Node::PREINIT is a concrete subclass of Node::code
which represents the PREINIT keyword. This is kind of a
proof-of-concept; other such code keywords (CODE, PPCODE etc) will be
added later.

The net effect of this commit is to move processing of the PREINIT:
keyword into the parse() and as_code() methods of the Node::PREINIT
class (and/or its parents) and away from the print_section() and
PREINIT_handler() methods in ExtUtils::ParseXS.

This is intended as a small incremental step towards having a real AST.

A PREINIT object is currently created, parsed, printed and destroyed all
at the same point in time. In some future refactoring, the intention is
that the object will be stored in a tree, and the parsing and
code-emitting steps will be done at different times.
Add these Node subclasses:

    ExtUtils::ParseXS::Node::multiline_merged
    ExtUtils::ParseXS::Node::C_ARGS
    ExtUtils::ParseXS::Node::INTERFACE
    ExtUtils::ParseXS::Node::INTERFACE_MACRO

multiline_merged is a subclass of multiline which merges all the lines
associated with a keyword into a single string. It is then used as a
base class for the three concrete classes C_ARGS etc which correspond to
keywords which are multi-line but treat all lines as a single string.

The effect of this commit is to move more keyword processing out of
ExtUtils::ParseXS and into Node.pm, in preparation for building an AST.
Add a helper function, build_subclass, to ExtUtils::ParseXS::Node
to simplify the boilerplate required to declare the @isa and @fields
of each Node subclass.
Rename the just-added ExtUtils::ParseXS::Node::code class to
ExtUtils::ParseXS::Node::codeblock, as that better describes its
purpose.

(I would have updated the original commit, but reordering squashing
commits was too complex due to intervening commits.)
Before this commit, three keywords which take ENABLE/DISABLE
as their argument were exceedingly lax about what they would accept.
This commit makes them slightly less lax: they now have to match
an exact word, not a part word; i.e. the regex has changed from:

   /^(ENABLE|DISABLE)/i;
to:
   /^(ENABLE|DISABLE)\b/i;

Note that it still quietly ignores trailing garbage. So before both
these lines were legal; now only the second is:

    PROTOTYPES: ENABLEaaa bsbsbs stbsbsb
    PROTOTYPES: EnablE    bsbsbs stbsbsb

This commit makes VERSIONCHECK, PROTOTYPES, EXPORT_XSUB_SYMBOLS match
SCOPE, which already had the \b.

This is in preparation for an upcoming commit, which will use a common
method to parse such keywords.

This commit also changes the test infrastructure slightly: the
test_many() function no longer bails out if the eval fails; instead the
eval error message is added to any STDERR text, accessible to tests
which can now test that ParseXS did indeed call death().
The legacy KEYWORD_handler() methods expect, on entry, for $_ to hold
the remainder of the current line (after s/^s*KEYWORD:\s*//), and for
@{$self->{line}} to contain any remaining unparsed lines from the
current XSUB. On return, they set $_ to the next unparsed line, and
@{$self->{line}} to subsequent lines.

Recent commits have started added Node (and its subclasses) parse()
methods to replace the KEYWORD_handler() methods. Currently they use the
same "rely on $_ to pass the current line to and from" scheme.

This commit changes them so that they only get lines from @{$pxs->{line}}.
This removes one source of weird action-at-a-distance.
The SCOPE: keyword, which enables wrapping an XSUB's main body with
ENTER/LEAVE, has been partially broken since 5.12.0. This commit fixes
that, adds tests, and updates the very vague documentation for SCOPE in
perlxs.pod.

AFAIKT, neither the SCOPE keyword, nor it's associated /* SCOPE */
magic comment in typemap files, are used anywhere in core or on CPAN,
nor in any tests. ('SCOPE: DISABLE' appears in a single test file, but
disabled is the default anyway.)

Background:

The SCOPE keyword was added by perl-5.003_03-21-gdb3b941461 (with
documentation added soon after by perl-5.003_03-34-g84287afe68).
This made the SCOPE: keyword an XSUB-body-scoped keyword, e.g.

    void
    foo()
        SCOPE: ENABLED
        CODE:
            blah

where the emitted 'blah' code would now be wrapped with ENTER/LEAVE.

13 years later, with v5.11.0-30-g28892255e8, this was extended so that
the keyword could appear just before the XSUB too:

    SCOPE: ENABLED
    void
    foo()
        CODE:
            blah

I don't know what the motivation was behind this; the commit was part of
a larger upgrade, which just listed among other bug fixes:

    - Fix the SCOPE keyword [Goro Fuji]

but I can't find any trace of a corresponding problem description on p5p
or RT.

This change had the unfortunate side-effect of breaking the existing
XSUB-scoped variant. This is indirectly due to the fact that XSUB-scoped
KEYWORD_handler() methods are supposed to set $_ to the next line before
returning, while file scoped ones aren't supposed to.

That change made SCOPE_handler() both file- and xsub-scoped, and also
made it no longer update $_. So the new file-scoped variant worked,
while the old xsub-scope variant broke, because it now retuned with
$_ set to 'ENABLE' rather than to the next line.

The temporary fix in this commit makes SCOPE_handler() check who its
caller is and sets $_ (or not) accordingly. A proper fix will occur
shortly when a SCOPE Node subclass is added, since the NODE::parse()
methods don't pass values back and forth in $_.

This commit also updates the pod for SCOPE, which was very vague about
what the SCOPE keyword did and where it should go, syntax-wise.  I also
changed it so that it suggests the magic comment token in a typemap
entry should be /* SCOPE */. The actually regex is {/\*.*scope.*\*/}i,
which matches a whole bunch of stuff. If we ever make it stricter,
insisting on an upper-case SCOPE with just surrounding white space seems
the way to go.
Add the following classes:

    ExtUtils::ParseXS::Node::oneline
    ExtUtils::ParseXS::Node::enable
    ExtUtils::ParseXS::Node::EXPORT_XSUB_SYMBOLS
    ExtUtils::ParseXS::Node::PROTOTYPES
    ExtUtils::ParseXS::Node::SCOPE
    ExtUtils::ParseXS::Node::VERSIONCHECK

The first two are base classes for XS keywords which consume only a
since line of XS src, and which then expect the keyword to have a value
of ENABLE/DISABLE.
The rest are concrete Node subclasses representing all the purely
ENABLE/DISABLE keywords.
Add this Node subclass:

    ExtUtils::ParseXS::Node::PROTOTYPE

This commit moves the parsing code for the PROTOTYPE keyword from the
old PROTOTYPE_handler() method in ExtUtils::ParseXS and into a new
Node subclass parse() method.

Also add a few more tests for PROTOTYPE - especially parsing edge cases.
In Node.pm, replace a bunch of declarations of the form

    package ExtUtils::ParseXS::Node::INTERFACE_MACRO;

    sub parse {
        my ExtUtils::ParseXS::Node::INTERFACE_MACRO $self = shift;
        ...
    }

with

    package ExtUtils::ParseXS::Node::INTERFACE_MACRO;

    sub parse {
        my __PACKAGE__ $self = shift;
        ...
    }
A recent commit expanded test_many() in t/001-basic.t to test for errors
as well as warnings. This commit tweaks that change to work under 5.8.x:
it was emitting "Use of uninitialized value in concatenation" warnings.
When I moved this sub from ParseXS.pm to Node.pm it retained its
2-char indent. Node.pm uses a 4-char indent, so reindent it.

This is whitespace-only change, apart from splitting a few long lines
and re-wrapping some comment paragraphs.
Add these Node subclasses:

    ExtUtils::ParseXS::Node::keylines
    ExtUtils::ParseXS::Node::keyline
    ExtUtils::ParseXS::Node::ALIAS
    ExtUtils::ParseXS::Node::ALIAS_line

An ALIAS node represents an ALIAS keyword, which can have multiple
ALIAS_line kid nodes, each of which represent one processed line from
an ALIAS section.

keylines and keyline are base classes for ALIAS and ALIAS_line
respectively, which handle the general processing of keywords which are
multi-line but where each line needs treating individually. Other
examples would be INPUT and OUTPUT keywords (not yet done). It's
slightly overkill just for ALIAS (arguably all the data could have just
been stored in a single ALIAS node), but doing it properly now will make
converting INPUT and OUTPUT keywords into nodes easier in the near
future.

The base classes also handle shifting lines off the input queue in such
a way that warnings and errors come from the right line.

Note that this is the first commit which adds an *intermediate* AST tree
node class: the previous commits have just been adding terminal nodes.
In particular, this commit adds a 'kids' array ref field to the base
Node class which allows nodes to have kids; and the parse method for
ALIAS repeatedly creates ALIAS_line objects, calls their parse method,
then adds to them the ALIAS's kids list. Thus it's an embryonic
recursive-decent parser, in the sense that parser subs for 'big' things
call parser subs for smaller things. Technically, while there will be
nested calls to parser methods, there won't be actual recursion, since
the XS syntax isn't recursive.

The bulk of this commit consists of moving the get_aliases() sub from
Parse.pm into Node.pm and renaming it to
ExtUtils::ParseXS::Node::ALIAS_line::parse(). The code is basically
unchanged except for tweaks required to make it a Node subclass.
Similarly, ALIAS_handler() becomes
ExtUtils::ParseXS::Node::ALIAS::parse().

This commit also adds some more tests for the ALIAS keyword: in
particular, while there were already some tests for alias warnings,
there didn't seem to be any for errors.

The old, existing test code for ALIAS is modified slightly so that 'die' text
isn't lost if something goes horribly wrong. That test code doesn't use
the newer, more general test_many() function from t/001-basic.t which
handles that sort of thing better.
I audited all the Warn(), death() etc calls in Node.pm and added
tests for any which weren't yet covered (apart from hard-to-reproduce
ones like internal errors).
Add
     ExtUtils::ParseXS::Node::ATTRS

class, and add a basic test.
Add
     ExtUtils::ParseXS::Node::OVERLOAD

class, and add a basic test.

Note that currently this code doesn't warn about duplicate op names (it
just silently skips duplicates), nor warn about unknown op names (it
happily accepts them). This commit preserves the current behaviour for
now.
This is #1 of a small series of commits to refactor the INPUT_handler()
method and turn it into a Node subclass method.

This commit changes the main loop from using $_ to hold the current line,
to using the variable $line instead.
This is #2 of a small series of commits to refactor the INPUT_handler()
method and turn it into a Node subclass method.

This commit splits the method into two: a smaller outer one which
has the 'foreach line' loop, and a new method, INPUT_handler_line()
which contains the bulk of the old method and processes a single line
from an INPUT section.
This is #3 of a small series of commits to refactor the INPUT_handler()
method and turn it into a Node subclass method.

This commit moves the ExtUtils::ParseXS methods

    INPUT_handler()
    INPUT_handler_line()

from ParseXS.pm into ParseXS/Node.pm.  For now they temporarily remain
as ExtUtils::ParseXS methods; this is just a straight cut and paste,
except for fully-qualifying the $BLOCK_regexp package variable name and
adding a couple of temporary 'package ExtUtils::ParseXS' declarations.
This is #4 of a small series of commits to refactor the INPUT_handler()
method and turn it into a Node subclass method.

This commit reindents INPUT_handler() and INPUT_handler_line() from
2-indent to 4-indent to match the policy of the file they were moved to
in the previous commit.

Whitespace-only change
This is #5 of a small series of commits to refactor INPUT keyword
handling. This commit adds these two classes:

     ExtUtils::ParseXS::Node::INPUT
     ExtUtils::ParseXS::Node::INPUT_line

and converts the two ExtUtils::ParseXS methods

    INPUT_handler()
    INPUT_handler_line()

into parse() methods of those two classes

In a very minor way, this commit also starts separating in time the
parsing and the code emitting. Whereas before, each INPUT line was
parsed and then C code for it immediately emitted, now *all* lines from
an explicit or implicit INPUT section are parsed and stored as an INPUT
node with multiple INPUT_line children, and *then* the as_code() method
is called for each child. This should make no difference to the
generated output code.
This is #6 of a small series of commits to refactor INPUT keyword
handling.

There's no need any more to save the original line in $orig_line, as
$self->{line} now holds that value.

Also, wrap
    ... or blurt(...), return;

in a do block for clarity:
This is #7 of a small series of commits to refactor INPUT keyword
handling.

The main job of parsing an INPUT line is to extract any information on
that line and use it to update the associated Param object (which was
likely created earlier when the XSUB's signature was parsed).

This commit makes that information also be stored in new fields in the
INPUT_line object. These new fields aren't currently used for anything,
but they could in principle become useful if options for deparsing or
exporting were added to ParseXS.
Move the declaration of the 'defer' Node::Param field  into
the "values derived from the XSUB's INPUT line" part of the
declaration.

No functional change, just fixing an error in the documentation.
Rename the existing check() method to set_proto().  The only thing the
method was doing was calculating the overridden prototype char
for that parameter based on it's type typemap entry, if any.
So give it a better name.

Also, rationalise where and when the method is called. It was being
called each time a parameter was created, or when its type changed.

Instead, just call the method once on all parameters just after all INPUT
processing is complete, so the types can't change, but before any inline
TYPEMAP entries might change the proto char for that type.

In theory this commit should make no functional change.
This keyword, used in place of CODE or PPCODE, emits a stub
body that just call croak(). It's undocumented, untested, and appears to
be used in only one XS file in all of CPAN.

This commit adds some very basic tests.

The next commit will change the
behaviour slightly: currently, K&R-style params get C declarations, but
code emitting stops before ANSI-style declarations and deferred
initialisations would normally be emitted. SO a couple of tests are
marked as expected to fail.
The undocumented and almost-entirely-unused NOT_IMPLEMENTED_YET keyword
can be used at the same point in parsing where CODE: or PPCODE: could
appear, and emits a croak() call whereas a call to a C library function
would otherwise have been auto-generated.

This keyword was checked for for partway during emitting of
initialisation code; this meant that K&R-style declarations were
emitted, but ANSI_style ones weren't.

This commit moves the checking for the presence of this keyword to a bit
later: after all initialisation code emitting is complete.

This makes NOT_IMPLEMENTED_YET logically part of the body-processing
section, which now looks roughly like

    if (/NOT_IMPLEMENTED_YET/)
        emit croak
    elsif (/PPCODE:/)
        ...
    elsif (/CODE:/)
        ...
    else
        emit autocall

and so makes the parsing code cleaner.

Conceptually it means that a NOT_IMPLEMENTED_YET can now appear after
an INIT keyword; in practice, only *INPUT* section parsing is
special-cased to recognise NOT_IMPLEMENTED_YET as another valid keyword
which terminates the current section. So INIT, C_ARGS etc sections
continue to see "NOT_IMPLEMENTED_YET" as just a bit of text to be
consumed from the input stream and added to the init code or the C
signature or whatever.  Some tests have been added to confirm this.
The previous commit altered the structure of a big if/else.
Reindent to match. Whitespace-only change
iabyn added 29 commits April 23, 2025 12:08
Move this field from Node::xbody to Node::output_part, as it's only
used while generating the code for the output part of the xsub.

No functional change.
Remove the following field from the ExtUtils::ParseXS class:

    xsub_stack_was_reset

and replace it with this new field in the
ExtUtils::ParseXS::Node::output_part class:

    stack_was_reset
Remove the following fields from the ExtUtils::ParseXS class:

    xsub_interface_macro
    xsub_interface_macro_set

and replace them with these new fields in the
ExtUtils::ParseXS::Node::xsub class:

    interface_macro
    interface_macro_set

There is also a slight change in the way these two fields are used.
Formerly they were initialised to the default values "XSINTERFACE_FUNC"
and "XSINTERFACE_FUNC_SET", then potentially changed by the
INTERFACE_MACRO keyword, then the current values were used to emit the
interface function pointer getting and setting code.

Now, the values are initially undef, and the emitting code checks for
defined-ness and if so uses the default value. This means that the logic
for using default or overridden value is local to where that value is
used rather than being hidden away elsewhere.No change in functionality
though.
Remove the following fields from the ExtUtils::ParseXS class:

    xsub_map_overload_name_to_seen
    xsub_prototype

and replace them with these new fields in the
ExtUtils::ParseXS::Node::xsub class:

    overload_name_seen
    prototype
This commit renames the ExtUtils::ParseXS class field

    xsub_SCOPE_enabled
to
    file_SCOPE_enabled

and adds a new field in the ExtUtils::ParseXS::Node::xsub class:

    SCOPE_enabled

This is because SCOPE can be used either in file scope:

    SCOPE: ENABLE
    int
    foo(...)

or in XSUB scope,

    int
    foo(...)
        SCOPE: ENABLE

The file_SCOPE_enabled field records whether a SCOPE keyword has been
encountered just before the XSUB, while the Node::xsub  SCOPE_enabled
field is initialised to the current value of file_SCOPE_enabled when
XSUB parsing starts, and is updated if the SCOPE keyword is encountered
within the XSUB.
During the course of the refactoring in this branch, perl code has
gradually been split between doing parsing in Node::FOO::parse() methods
and code emitting in Node::FOO::as_code() methods (before, both were
completely interleaved).

How the current xsub and xbody nodes are tracked varies between those
two types of methods: the as_code() methods pass them as explicit
parameters, while the parse() methods rely on two 'global' fields
within the ExtUtils::ParseXS object, cur_xsub and cur_xbody.

However, some some as_code() methods were still relying on
cur_xsub/xbody rather than the passed $xsub and $xbody params. This
commit fixes that. At the moment it is mostly harmless, as each XSUB's
top_level as_code() is called immediately after it's top-level parse(),
so cur_xsub still points to the right XSUB. But that will change in
future, so get it right now. The next commit will in fact explicitly
undef cur_xsub/xbody immediately after parsing is finished.

This commit includes a test for one edge case where the cur_xbody being
wrong did make a difference.
Currently, the fields cur_xsub and cur_xbody of ExtUtils::ParseXS track
the current xsub and body nodes during parsing. This commit undefs them
immediately after use so that they can't be inadvertently used
elsewhere. The fixups in the previous commit were all discovered
by this undeffing.
Currently all the Node::FOO::as_code() methods get passed two args,
$xsub and xbody, to indicate the current Node::xsub and Node::xbody
objects.

Conversely, all the Node::FOO::parse() methods access the current two
objects via two 'global' fields in the ExtUtils:;ParseXS object:

    cur_xsub
    cur_xbody

This commit deletes these two fields and instead passes the objects as
extra parameters to all the parse() methods. Less action-at-a-distance.
Add comments about keywords which can be both inside or outside an XSUB.
The Node::Params class has a 'params' field which holds a list of
Node::Param objects. This class was one of the first Node classes to be
created during my recent refactoring work, and at the time, Node
subclasses didn't have a generic 'kids' field. They do now, so just
store the list of Param objects of 'kids' of the Params object.
Add a ExtUtils::ParseXS::Node::IO_Param class as a subclass of the
existing ExtUtils::ParseXS::Node::Param class.

Then Param objects will be used solely to hold the details of a
parameter which have been extracted from an XSUB's signature, while
IO_Param objects contain a copy of that info, but augmented with any
further info gleaned from INPUT or OUTPUT lines. For example with

    void
    foo(a)
        int a
        OUTPUT:
            a

Then the Param object for 'a' will look something like:

   {
     arg_num => 1
     var     => 'a',
   }

while the corresponding IO_Param object will look something like:

   {
     arg_num   => 1,
     var       => 'a',
     type      => 'int',
     in_input  => 1,
     in_output => 1,
     ....
   }

All the code-emitting methods have been moved from Param to IO_Param,
and the as_code() method has been renamed to as_input_code(), to better
match the naming convention of the existing as_output_code() method:
an IO_Param can generate code both to declare/initialise a var, and to
update/return a var.
If the list of aliases for an XSUB doesn't include the XSUB's main name,
an extra alias entry is added, mapping the main name to ix 0.

Move this setting from the code generation phase to the end of the
parsing phase, because the AST should really be complete by the end of
parsing.

Also add a test for this behaviour.

Shouldn't affect hat code is generated.
This method is no longer used anywhere
Currently the parsing of an XSUB's signature, and the parsing of
the individual comma-separated items within that signature, are done in
the same function, Params->parse(). This commit is the first of three
which will extract out the latter into a separate Param->parse() method.

For now, the per-param code is kept in-place (to make the diff easier to
understand), but is wrapped within an immediately-called anon sub, in
preparation to be moved.

So before, the code was (very simplified):

    for (split /,/, $params_text) {
        ... parse type, name, init etc ...
        next if can't parse;
        my $param = Param->new(var = $var, type => $type, ...);
        push @{$params->{kids}}, $param;
    }

After this commit, it looks more like:

    for (split /,/, $params_text) {
        my $param = Param->new();
        sub {
            my $param = shift;
            ...
            ... parse type, name, init etc ...
            return if can't parse;
            $param->{var} = $var; ...
            return 1;
        }->{$param, ...)
            or next;

        push @{$params->{kids}}, $param;
    }

Note that the inner sub leaves pushing the new param, updating the names
hash and setting the arg_num to the caller.

In theory there are no functional changes, except that when a synthetic
RETVAL is being kept (but its position within kids moved), we now keep
the Param hash and update its contents, rather than replace it with a new
hash. This shouldn't make any difference.
This commit just moves a block of code of the form sub {...}->()
into its own named sub. There are no changes to the moved lines of code
apart from indentation.

This is the second of three commits to create the parse() method.
The next commit will do any final tidying up.
This is the third of three commits to create the parse() method.
Mainly do s/$param/$self/g, and add a call to set file/line number foer
the object.
Move all the code out of

ExtUtils::ParseXS::Node::IO_Param::as_input_code()

which is responsible for looking up the template initialisation code in
the typemap (or elsewhere) and put it in it's own method,
lookup_input_typemap().

As well as splitting a 300-line method into two approx 150-line methods,
this will also allow us shortly to move the template lookup to earlier,
at parse time rather than code-emitting time.

Also add some more tests for the length(foo) pseudo-parameter, which I
broke while working on this commit, and then noticed it was under-tested.
Move all the code out of

ExtUtils::ParseXS::Node::IO_Param::as_output_code()

which is responsible for looking up the template output code in the
typemap (or elsewhere) and put it in it's own method,
lookup_output_typemap().

As well as splitting a 490-line method into two 200 and 340-line methods,
this will also allow us shortly to move the template lookup to earlier,
at parse time rather than code-emitting time.

It may also be possible at some point to merge the two methods added by
these last two commits, lookup_intput_typemap and lookup_output_typemap,
into a single method, since they share a lot of common code.
Previously these two values were set at the end of parsing an XSUB:

    XSRETURN_count_basic
    XSRETURN_count_extra

They represent whether a RETVAL SV will be returned by the XSUB, and
how many extra SVs are returned due to parameters declared as OUTLIST.

This commit sets them earlier, as in particular, the next commit will
need to access XSRETURN_count_basic earlier.

XSRETURN_count_extra is now set right after parsing the XSUB's
declaration, as its value can't change after then.

XSRETURN_count_basic is now set after parsing the output part of the
each body of the XSUB (an XSUB can have a body per CASE). Its value
*aught* to be consistent across all bodies, but it's possible for the
CODE_sets_ST0 hack (which looks for code like like 'ST(0) = ...' in any
CODE: block) to vary across bodies; so this commit also adds a new
warning and test for that.
The last few commits have moved the looking-up and processing of
typemap entries (but not the evalling) for parameters from
Param::as_input_code() and Param::as_output_code() into their
own subs, lookup_input_typemap() and lookup_output_typemap().

This commit takes that one step further, and makes those new subs be
called at parse time, rather than at code-generation time.
This is needed because in principle, XSUB ASTs should be completely
self-contained, and the code they emit shouldn't vary depending on when
their top-level as_code() methods are called. But via the

    TYPEMAP: <<EOF

mechanism, its possible for the typemap to change between XSUBs.

This commit does this in a very crude way. Formerly, at code-emitting
time, as_input_code() etc would do:

    my ($foo, $bar, ...) = lookup_input_typemap(...);

Now, the parsing code does

    $self->{input_typemap_vals} = [ lookup_input_typemap(...) ];

and as_input_code() etc does:

    my ($foo, $bar, ...) = @{$self->{input_typemap_vals}};

Note that there are both output_typemap_vals and
output_typemap_vals_outlist fields, as it's possible for the same
parameter to be used both for updating the original arg (OUTPUT) and for
returning the current value as a new SV (OUTLIST). So potentially we
save the results of *two* calls to lookup_output_typemap() for each
parameter.
Rationalise warning and error messages which appear in Node.pm:

- always prefix with Warning: / Error: / Internal error:
- lower-case the first letter following Error: etc
- fix grammar
- ensure full test coverage (except 'Internal error', which
  shouldn't be reproducible).
Some node types have fields to point to particular children. Make these
kids also be in the generic @{$self->{kids}} array.

That way, hypothetical generic tree-walking code will be able to access
the whole tree just by following @{$self->{kids}}, without needing to
know for example that the xsub_decl Node type has a child pointed to by
$self->{return_type}.
Do a general tidy-up of this src file: white space, plus wrap long lines
and strings.
For aesthetic reasons, give the $build_subclass sub an extra first arg
which must be the string 'parent'. Then change invocations from:

    BEGIN { $build_subclass->('Foo', # parent
        'field1', # ...
        ...
    }

to
    BEGIN { $build_subclass->(parent => 'Foo',
        'field1', # ...
        ...
    }
Update the code comments in calls to $build_subclass->() to indicate
more consistently the 'type' of each field being declared.
In the INPUT_line and OUTPUT_line subclasses, rename the 'param' field
to 'ioparam', to better reflect that it holds an IO_Param object rather
than a Param object.
The work in this branch broke the parser under 5.8.9. Fix it, by not
trying to autovivify an undef object pointer (which under 5.8.9 is a
pseudo-hash thingy and generally behaves weirdly).

The attempt to autovivify an undef $xsub was always wrong, but harmless:
the value wasn't needed and was soon discarded. But under 5.8.9, it
became a runtime error.
@jkeenan jkeenan added the defer-next-dev This PR should not be merged yet, but await the next development cycle label Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defer-next-dev This PR should not be merged yet, but await the next development cycle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants