Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expressions as parameters in StringNamespace functions #314

Open
fbx31 opened this issue Feb 24, 2025 · 4 comments
Open

Expressions as parameters in StringNamespace functions #314

fbx31 opened this issue Feb 24, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@fbx31
Copy link
Contributor

fbx31 commented Feb 24, 2025

Hi and sorry again,
I think I found a global issue in StringNamespace implementation.
Comparing with this example of slice implementation between string an list

let df = pl.DataFrame(
    {
        A: ["apple"],
        fruits: [["apple", "banana", "cherry", "dragon fruit"]],
    }
)
df = df.select(
    "A",
    "fruits",
    // "cars",
    pl.col("fruits").lst.slice(0, 2).alias("lst_slice_js"),
    pl.col("fruits").lst.slice(pl.lit(0), pl.lit(2)).alias("lst_slice_expr"),
    pl.col("A").str.slice(0,2).alias("str_slice_js"),
    pl.col("A").str.slice(pl.list(0),pl.lit(2)).alias("str_slice_expr"),
)

This raises the following error for the last expression .str.slice(pl.list(0),pl.lit(2)) :

<...>\node_modules\nodejs-polars\bin\lazy\expr\string.js:10
        return (0, expr_1._Expr)(_expr[method](...args));
                                              ^
Error: Failed to recover `JsExpr` type from napi value
    at wrap (<...>\node_modules\nodejs-polars\bin\lazy\expr\string.js:10:47)
    at Object.slice (<...>\node_modules\nodejs-polars\bin\lazy\expr\string.js:88:20)

Then I compared the code on rust side and js side for both slice implementations

    #[napi(catch_unwind)]
    pub fn list_slice(&self, offset: &JsExpr, length: Option<&JsExpr>) -> JsExpr {
        let length = match length {
            Some(i) => i.inner.clone(),
            None => dsl::lit(i64::MAX),
        };
        self.inner
            .clone()
            .list()
            .slice(offset.inner.clone(), length)
            .into()
...
	#[napi(catch_unwind)]
    pub fn str_slice(&self, offset: &JsExpr, length: &JsExpr) -> JsExpr {
        self.inner
            .clone()
            .str()
            .slice(offset.inner.clone(), length.inner.clone())
            .into()
    }

Except the difference for optional length (which is not an issue, nothing to highlight)
Then in JS:

list:
slice(offset, length) {
      return wrap(
        "listSlice",
        exprToLitOrExpr(offset)._expr,
        exprToLitOrExpr(length)._expr,
      );
    },

string:
	slice(start, length?) {
      if (!Expr.isExpr(start)) {
        start = lit(start)._expr;
      }
      if (!Expr.isExpr(length)) {
        length = lit(length)._expr;
      }

      return wrap("strSlice", start, length);
    },

The processing of the parameter seems to differ between both implementations.

Sorry in advance if I am wrong...

@fbx31 fbx31 added the bug Something isn't working label Feb 24, 2025
@fbx31
Copy link
Contributor Author

fbx31 commented Feb 24, 2025

OK fixed this locally on my computer for str.slice directly inside bin/.lazy/expr/string.js

        slice(start, length) {
            // if (!expr_1.Expr.isExpr(start)) {
            //     start = (0, functions_1.lit)(start)._expr;
            // }
            // if (!expr_1.Expr.isExpr(length)) {
            //     length = (0, functions_1.lit)(length)._expr;
            // }
            // return wrap("strSlice", start, length);
            return wrap("strSlice", (0, expr_1.exprToLitOrExpr)(start)._expr, (0, expr_1.exprToLitOrExpr)(length)._expr);
        },

And looking at the code, I think there are many places where we have the same type of "error/confusion".

For some of them, the rust code should be also reviewed. Exemple with str_replace function where pat and val should be &JsExpr instead of String and then calling pat.inner.clone() and val.inner.clone() and then calling ca.replace(pat.onner.clone(), val.inner.clone().

    #[napi(catch_unwind)]
    pub fn str_replace(&self, pat: String, val: String) -> JsExpr {
        let function = move |s: Column| {
            let ca = s.str()?;
            match ca.replace(&pat, &val) {
                Ok(ca) => Ok(Some(ca.into_column())),
                Err(e) => Err(PolarsError::ComputeError(format!("{:?}", e).into())),
            }
        };
        self.clone()
            .inner
            .map(function, GetOutput::same_type())
            .with_fmt("str.replace")
            .into()
    }

Only a "guess", shall be confirmed by the maintainers and eventually changed inside all functions accepting parameters (TBC), as I am not yet able to locally test modifications of rust code for now.

Thanks

@Bidek56
Copy link
Collaborator

Bidek56 commented Feb 24, 2025

Error: Failed to recover JsExpr type from napi value happens when we try to pass a string instead of a expression.
Some functions expect a string, others can handle string, regex or expression.
I am sure we have cases when thing fail apart due to so many options. We just need to improve our test coverage.
Feel free to submit a PR to add more test cases.

@fbx31
Copy link
Contributor Author

fbx31 commented Feb 25, 2025

Thanks a lot, OK, I will do my best to list and detail all missing test cases I encounter.
Currently trying to rebuild everything on my computer, but a bit hard as you are referring git versions in your toml and not "standard" packages, my personal network configuration is a bit (let's say) restrictive with that but I will manage.
Also... the nightly toolchain to be installed apparently...
Any news on 0.46 polars support that brings a bunch of new functions startsWith, endsWith... ?

@Bidek56
Copy link
Collaborator

Bidek56 commented Feb 25, 2025

  1. Git reference is needed b/c one of the json packages was not been converted to a crate by Ritche
  2. Nightly toolchain is used by the core Polars team, hence needed
  3. 0.46 PR is kind of blocked ATM b/c the core team has remove utf8 serialization, not sure how to fix it ATM
  4. Be sure to lint everything before submitting a PR and test with both yarn and bun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants