Accept logicals for truth (and estimate) inputs

First, I am fully aware that this request has already been made and declined ([#450](https://github.com/tidymodels/yardstick/issues/450)). However, I hope that this request is different because it includes a PR (https://github.com/tidymodels/yardstick/pull/544) with a resolution that I hope works. 

**TLDR: instead of rejecting logical inputs for truth or estimates as errors, just internally convert them to factors and then everything else works fine, with no headaches for `yardstick` developers.**

This request started out as a [bug report on `hardhat`](https://github.com/tidymodels/hardhat/issues/289). It evolved into an explanation for [why `tidymodels` and `yardstick` don't support logical outcomes for binary prediction](https://github.com/tidymodels/hardhat/issues/289#issuecomment-2925006992). Please excuse me for the lengthy quotation, but I think it's important to state here the motivation for my PR.

> 1. The real-world data has qualitative meaning, so users should be forced to encode the outcomes according to their qualitative meaning.
> 2. Since binary data has intrinsic qualitative meaning, it should be encoded as factors, which is the native R datatype for encoding qualitative meaning, just like multinomial outcomes.
> 3. Binary data lacks labels to distinguish the FALSE from the TRUE cases.
> 
> For the first point, I know you did not use the word "forced", but that is how I understood your point. And this is where I think I disagree. On one hand, I agree that as a package author, I should be disciplined to follow conventions such as tidymodels. On the other hand, I think that my packages should be sufficiently flexible to accept all **reasonable** kinds of input data that users provide. I consider it quite unreasonable to reject logical outcomes as "unreasonable" or "illegitimate" on the package level. `logical` is such a fundamental data format (for predictors and outcomes alike). It is the most natural and intuitive format for binary outcomes. I definitely think that any binary prediction modelling package should be able to cleanly and naturally handle logical outcomes without forcing the user to encode them as factors.
> 
> For the second and third points, I understand the logic, but again, I don't think such logic should be imposed on users. The resolution to me is very natural: the two levels should be labelled as "FALSE" and "TRUE". I don't understand why that would be complicated. If users want something better than that, then they should follow your advice and encode the outcomes as factors. But that should be their choice.
> 
> One important point that you implied here but made explicit in your 2012 "rant" is that it is unclear which of the two binary cases should be considered the primary or "positive" class. However, here I think your argument is self-defeating (unless I misunderstand it). This is only a problem when your advice is followed and binary outcomes are encoded as factors--indeed, then there is no natural choice of the first or the second level as the positive case. … **However, when the binary outcome is kept logical, then there is no issue. The clear and natural choice is that TRUE is the positive case.** There's no ambiguity there. …
> 
> My point here is not to convince you or anyone else to change the tidymodels conventions, but **my point is hopefully to convince you to give first-class support for "binary outcomes as logical". …** I hope that my arguments are sufficient to respect the choice to support such functionality for package designers who disagree with this controversial point, rather than shutting out much of the fantastic tidymodels infrastructure from us just because of this point of disagreement.

So, as a proposed resolution to this issue, my PR does the following: instead of rejecting logical inputs as errors, it converts them to factors (and sets `TRUE` to be the `event_level` if applicable). This solution is very simple and works elegantly in the PR. Its only inconvenience is that it required adding the following code (or versions thereof) in almost 30 places throughout the package, right before any call to `check_class_metric()` or `check_prob_metric()` that check for valid inputs:

```
  if (is.logical(truth)) {
    event_level <- "second"  # TRUE is second level of levels(factor(truth))
    truth <- factor(truth)
    estimate <- factor(estimate)
  }
```


This is done in the PR (with a couple of tests) and it works great. Crucially, it doesn't mess with any code that assumes working with factors, so it should be easy to maintain and extend for future functionality. Additionally, since this converts a previous error condition to an accepted condition, it should not break any working code anywhere downstream or in users' scripts.

Could you please accept it? **Again, the goal of this request is so that developers like me who believe that that binary outcomes are more naturally logicals than factors can still benefit from the fantastic `yardstick` infrastructure.**, without creating any development or maintenance headaches for the `yardstick` developers. 

If this solution works for `yardstick`, it can hopefully be extended to other `tidymodels` packages as relevant.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accept logicals for truth (and estimate) inputs #545

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Accept logicals for truth (and estimate) inputs #545

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions