nested ':=' reference assignment fails #6768

AntonNM · 2025-01-29T00:39:56Z

Issues arise when a function is called with the reference assignment operator ':=', which in-turn modifies the dt by reference as well. Below is the example running on the latest dev version. The result of the outer function overwrites the inner function in the incorrect column named 'se', the column named new is left empty.

> library(data.table)
data.table 1.16.99 IN DEVELOPMENT built 2025-01-27 04:18:35 UTC; root using 7 threads (see ?getDTthreads).  Latest news: r-datatable.com

library(data.table)
options(datatable.verbose = TRUE)
dt = data.table(a=1:10)
inner <- function(dt){
    dt[,se:=1:10]
}
outer <- function(dt){
    inner(dt);
    return (11:20)
}
dt[,new:=outer(dt)]
print(dt)

Detected that j uses these columns: <none>
Detected that j uses these columns: <none>
Assigning to all 10 rows
RHS_list_of_columns == false
RHS for item 1 has been duplicated because MAYBE_REFERENCED==1 MAYBE_SHARED==1, but then is being plonked. length(values)==10; length(cols)==1)
Assigning to all 10 rows
RHS_list_of_columns == false
RHS for item 1 has been duplicated because MAYBE_REFERENCED==1 MAYBE_SHARED==1, but then is being plonked. length(values)==10; length(cols)==1)
        a    se    new
    <int> <int> <NULL>
 1:     1    11   NULL
 2:     2    12   NULL
 3:     3    13   NULL
 4:     4    14   NULL
 5:     5    15   NULL
 6:     6    16   NULL
 7:     7    17   NULL
 8:     8    18   NULL
 9:     9    19   NULL
10:    10    20   NULL

The text was updated successfully, but these errors were encountered:

AntonNM · 2025-01-29T00:41:19Z

There is a workaround of returning a DT containing the results of the inner and outer function and assigning them all at once, however this structure was more desirable.

MichaelChirico · 2025-01-29T00:56:32Z

could you confirm (1) whether the issue is present on CRAN (2) whether updating to the current version of data.table fixes the issue? There are some recent related commits that might fix it

AntonNM · 2025-01-29T01:32:52Z

Thank you, the latest CRAN does indeed resolve this issue!

MichaelChirico · 2025-01-29T02:14:51Z

Great! Please also check on the current devel version, this is probably due to #6551

AntonNM · 2025-01-31T17:58:06Z

Updating to the current dev version 32dfd8d (1.16.99) re-introduces this issue.

I also pulled, and tested the commits for #6551 & #6542 (the commit directly preceding #6551) and the issue is present in both.

Perhaps it was introduced in an earlier commit?

MichaelChirico · 2025-01-31T18:55:26Z

I see the same behavior on 1.16.0, 1.16.4 and current master:

Detected that j uses these columns: <none>
Detected that j uses these columns: <none>
Assigning to all 10 rows
RHS_list_of_columns == false
RHS for item 1 has been duplicated because MAYBE_REFERENCED==1 MAYBE_SHARED==1, but then is being plonked. length(values)==10; length(cols)==1)
Assigning to all 10 rows
RHS_list_of_columns == false
RHS for item 1 has been duplicated because MAYBE_REFERENCED==1 MAYBE_SHARED==1, but then is being plonked. length(values)==10; length(cols)==1)
        a    se    new
    <int> <int> <NULL>
 1:     1    11   NULL
 2:     2    12   NULL
 3:     3    13   NULL
 4:     4    14   NULL
 5:     5    15   NULL
 6:     6    16   NULL
 7:     7    17   NULL
 8:     8    18   NULL
 9:     9    19   NULL
10:    10    20   NULL

It's definitely odd that you've managed to create a column that's just NULL -- the resulting dt is not a proper data.table.

Could you double check & report which version produces the correct result?

AntonNM · 2025-02-01T16:49:58Z

I went back as far as 1.13.0 and was unable to find a version that produces the correct result, however I believe I found the root of the confusion:

library(data.table)
options(datatable.verbose = T)

dt = data.table(a=1:10)

inner <- function(dt){
    dt[,se:=1:10]
}
outer <- function(dt){
    inner(dt);
    return (11:20)
}

dt[,new:=outer(dt)]
print(dt)
dt[,new:=outer(dt)]
print(dt)

The second occurrence of the assignment statement produces the correct result. I believe this is what occurred when I first tested the latest CRAN version.

*Followed TODO: by mattdowle from resolution to '2-space indentation Rdatatable#2420' *Added tests for jsub that modify DT by-reference *Added test case for interger vector indexing

tdhock · 2025-02-10T12:58:03Z

I'm surprised you would expect this to work at all. Shouldn't it be outer(dt) instead of dt[,new:=outer(dt)] ?

AntonNM · 2025-02-10T15:56:35Z

I would expect dt[, new:=outer(dt)] to create 2 new columns one called 'new' with the return value of outer, (11:20) and another "side-effect" column, called 'se', with the values of 1:10 as created in the inner function.

Calling only outer(dt) would creates the column 'se' correctly, however the return value of outer() is lost.

library(data.table)
options(datatable.verbose = F)

dt =  data.table(a=1:10)

inner <- function(dt){
    dt[,se:=1:10]
}

outer <- function(dt){

    inner(dt);

    return (11:20)
}

outer(dt)

print(dt)

 [1] 11 12 13 14 15 16 17 18 19 20 #lost return value of outer
        a    se
    <int> <int>
 1:     1     1
 2:     2     2
 3:     3     3
 4:     4     4
 5:     5     5
 6:     6     6
 7:     7     7
 8:     8     8
 9:     9     9
10:    10    10

AntonNM · 2025-02-10T15:57:24Z

The result I would expect would be the following:

        a    se   new
    <int> <int> <int>
 1:     1     1    11
 2:     2     2    12
 3:     3     3    13
 4:     4     4    14
 5:     5     5    15
 6:     6     6    16
 7:     7     7    17
 8:     8     8    18
 9:     9     9    19
10:    10    10    20

TysonStanley · 2025-02-10T16:01:49Z

Thanks for more detail. If just running outer(dt) where would the returned vector get placed into the data.table? The output makes sense to me. It is running inner(dt) and then returning the vector you asked for. I don't see anywhere that it should be added as a column.

AntonNM · 2025-02-18T01:21:13Z

The original example included an assignment to the column 'se' and 'new'

1.) column se should be created with values 1-10 here

2.) the outer function returns the values 11-20 which should be assigned to new

3.) the return value of outer is assigned to new

The original example:

library(data.table)
options(datatable.verbose = TRUE)
dt = data.table(a=1:10)
inner <- function(dt){
    dt[,se:=1:10] #(1) 
}
outer <- function(dt){
    inner(dt);
    return (11:20) #(2)
}
dt[,new:=outer(dt)] # (3)
print(dt)

The results of the above snippet were:

Detected that j uses these columns: <none>
Detected that j uses these columns: <none>
Assigning to all 10 rows
RHS_list_of_columns == false
RHS for item 1 has been duplicated because MAYBE_REFERENCED==1 MAYBE_SHARED==1, but then is being plonked. length(values)==10; length(cols)==1)
Assigning to all 10 rows
RHS_list_of_columns == false
RHS for item 1 has been duplicated because MAYBE_REFERENCED==1 MAYBE_SHARED==1, but then is being plonked. length(values)==10; length(cols)==1)
        a    se    new
    <int> <int> <NULL>
 1:     1    11   NULL
 2:     2    12   NULL
 3:     3    13   NULL
 4:     4    14   NULL
 5:     5    15   NULL
 6:     6    16   NULL
 7:     7    17   NULL
 8:     8    18   NULL
 9:     9    19   NULL
10:    10    20   NULL

The results I expected were:

        a    se   new
    <int> <int> <int>
 1:     1     1    11
 2:     2     2    12
 3:     3     3    13
 4:     4     4    14
 5:     5     5    15
 6:     6     6    16
 7:     7     7    17
 8:     8     8    18
 9:     9     9    19
10:    10    10    20

I found that this occurs because the evaluation of the left-hand side at points (#1, & #3) both occur at the point when dt only contains the column 'a'. This leads to both new columns being assign the following available integer index, i.e '2'

Hopefully this was a more complete explanation of the bug. Please let me know is there are any other questions, confusion or concerns with the PR!

MichaelChirico added this to the 1.17.0 milestone Jan 31, 2025

MichaelChirico added the regression label Jan 31, 2025

MichaelChirico modified the milestones: 1.17.0, 1.18.0 Jan 31, 2025

MichaelChirico added bug and removed regression labels Jan 31, 2025

This was referenced Feb 2, 2025

Fixed nested ':=' reference assignment fails (#6768) #6788

Closed

Fixed nested ':=' reference assignment fails #6789

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nested ':=' reference assignment fails #6768

nested ':=' reference assignment fails #6768

AntonNM commented Jan 29, 2025 •

edited

Loading

AntonNM commented Jan 29, 2025 •

edited

Loading

MichaelChirico commented Jan 29, 2025

AntonNM commented Jan 29, 2025

MichaelChirico commented Jan 29, 2025

AntonNM commented Jan 31, 2025

MichaelChirico commented Jan 31, 2025

AntonNM commented Feb 1, 2025

tdhock commented Feb 10, 2025

AntonNM commented Feb 10, 2025

AntonNM commented Feb 10, 2025

TysonStanley commented Feb 10, 2025

AntonNM commented Feb 18, 2025

nested ':=' reference assignment fails #6768

nested ':=' reference assignment fails #6768

Comments

AntonNM commented Jan 29, 2025 • edited Loading

AntonNM commented Jan 29, 2025 • edited Loading

MichaelChirico commented Jan 29, 2025

AntonNM commented Jan 29, 2025

MichaelChirico commented Jan 29, 2025

AntonNM commented Jan 31, 2025

MichaelChirico commented Jan 31, 2025

AntonNM commented Feb 1, 2025

tdhock commented Feb 10, 2025

AntonNM commented Feb 10, 2025

AntonNM commented Feb 10, 2025

TysonStanley commented Feb 10, 2025

AntonNM commented Feb 18, 2025

AntonNM commented Jan 29, 2025 •

edited

Loading

AntonNM commented Jan 29, 2025 •

edited

Loading