Ordinal mode (between classification and regression)

I think that a major missing gap in parsnip is explicit support for ordinal models, by which I mean **models where the response variable is an ordered factor**.

My proposal here is a follow up to [closed issue 35](https://github.com/tidymodels/parsnip/issues/35), specifically focusing on **ordinal models**. @topepo wrote:

> The main issue that I had as about how to organize the functions. In parsnip we try to have the main model functions describe the structural aspects of the model (e.g. linear_reg(), rand_forest(), etc). For the ordinal models based on generalized linear model (e.g. cumulative logits, adjacent categories etc), my thinking was to have:
> 
> ordinal_cumulative(link = "logit", odds = "proportional")
> ordinal_adjacent(link = "logit", odds = "proportional")
> and so on. My thinking is that people would probably want to look at the parallel assumption (assuming they have the right design for that) and tuning over the odd argument would be helpful.
> 
> How would you like to see these types of model organized?
> 

I am not a statistical expert on ordinal models. but I have used them in my research code and I have studied to learn about different kinds. Based on my understanding, the examples above are the kind of detail that parsnip is supposed to abstract away. I would think that users care mainly about three things:
* The function call to the model versus the mode of the model
* The broom tidier functions for model outputs (tidy, glance, augment)
* The prediction results

## Function call versus mode

In his initial thoughts, @topepo seemed to frame ordinal models as a parsnip model type with names like `ordinal_cumulative` or `ordinal_adjacent`. But fundamentally, in parsnip terms, **I do not think of an ordinal model as a model type; I think of it as a mode**, just like classification and regression are modes. An ordinal model is simply one where the response variable type is neither an unordered class nor a real number but rather something in between, an ordered category. This is borne out by his mention of various types of ordinal (glmnetcr, ordinalNet, ordinalForest, party models for trees, and brms models). Each of these would correspond to their various underlying model types, be it `logistic_reg` or `rand_forest`, with their corresponding engines. `ordinal` would simply be the mode.

I don't think that most users need to get bogged down with the underlying ordinal model, especially when their interest is primarily predictive analytics that focuses primarily on the prediction rather than prescriptive or interpretive machine learning (IML) that cares about the meaning of the predictors. (Personally, I actually do care very much about these interpretive issues, but I am trying to think about the general user.) Users who are concerned with interpretation can simply specify the appropriate engine (e.g., `MASS::polr` or `ordinal::clm` when the parallel assumption is upheld and `VGAM::vglm` when it is not).

That said, I think it would be responsible to help newer users who are not familiar with the assumptions to do the test for them and warn them if it is violated. For example, if the default engine is `ordinal::clm`, the parsnip call could automatically run a parallel assumption test on the fitted model using something like `brant::brant`. If the test assumption fails (with default p-value of 0.05), then the model fit call could print a warning alerting the user that the data fails the parallel assumption test on which the chosen (or default) model depends, with a suggestion for another engine that does not require that assumption. The test results should probably be silent if there is no problem and only warn when there is. (Of course, all of these options should be customizable with arguments that could skip the test, change the p-value threshold, or print the results regardless of the result.)

## Model outputs

Perhaps the reason that @topepo suggested distinct model types for different kinds of ordinal models is that the model outputs can be quite different. In particular, proportional (cumulative) odds models have a single coefficient for each predictor whereas other ordinal models like partial proportional or adjacent models might have as many as one coefficient per level of each predictor. But as I see it, these are details that are already handled in the Tidyverse by the `broom` package with its `tidy` function. [Specifically for ordinal models](https://haoen-cui.github.io/SOA-Exam-PA-R-Package-Documentation/broom/reference/ordinal_tidiers.html),

> tidy.clm, tidy.clmm, tidy.polr and tidy.svyolr return one row for each coefficient at each level of the response variable, with six columns

So, I don't think parsnip needs to worry about this, other than perhaps prioritizing the implementation of these engines that already have tidy handlers.

## Prediction results

Unless I misunderstand something, the operation of the `predict` function for the ordinal mode should be very similar to that of multiclass classification: there should be options to return just the predicted ordinal class or to return probabilities for each class. Please let me know if there is a fundamental difference here that I am overlooking.


So, these are my thoughts. How feasible would it be add the ordinal mode to parsnip?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ordinal mode (between classification and regression) #953

Function call versus mode

Model outputs

Prediction results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ordinal mode (between classification and regression) #953

Description

Function call versus mode

Model outputs

Prediction results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions