More methods needed for residuals.flexsurvreg #78

chjackson · 2020-06-10T11:15:54Z

The current implementation only supports difference between observed survival and predicted mean, which ignores censoring, so is of limited use. The survival package seems to do something more considered for residuals.survreg.

It'd be simple to add deviance-type residuals, as flexsurvreg returns a component $logliki giving each individual's contribution to the maximised log-likelihood.

I sometimes get asked about Cox-Snell residuals, but I'm not familiar with those.

@mattwarkentin are you familiar with any of these methods?

The text was updated successfully, but these errors were encountered:

mattwarkentin · 2020-06-10T19:23:42Z

I definitely agree that residuals.flexsurvreg needs much better support for common survival-type residuals. I am modestly familiar with the types supported by residuals.coxph and residuals.survreg. It would be nice to build toward that kind of support.

As you mention, I think deviance-type residuals shouldn't be too tricky to implement. I also think dfbeta and dfbetas are simple enough, if not potentially computationally taxing. Otherwise, I am interested in digging more into Martingale, Schoenfeld, and Cox-Snell residuals to see what can be supported.

mattwarkentin · 2020-06-10T21:43:35Z

I was flipping through these slides, and in particular slide 4, which defines the Cox-Snell residual as:

Am I mistaken, or is this the value returned by predict(x, type = "cumhaz") when .time is equal to a participants time t? In other words, the value of the model-predicted cumulative hazard function for participant i at time ti.

If this is true, the Martingale residual follows from the above equation as (slide 13):

Where di is the value of the failure code (0 or 1) and ei is the Cox-Snell residual.

And lastly, the deviance residual follows as (slides 18-19):

Where mi is the martingale residual, li is the individual contribution to the log-likelihood, and ~li is the maximum possible log-likelihood.

Does this make some sense? I turn to your expertise.

chjackson · 2020-06-11T14:00:56Z

That's a nice clear set of slides, thanks for finding. That all makes sense. That definition of the Cox-Snell residual assumes a proportional hazards model, but like the slides say, that can be generalised to the fitted individual-specific cumulative hazards for any model, which are compared to Exp(1) to assess the model.

mattwarkentin · 2020-06-12T00:55:15Z

No problem! This has been a good learning experience for me.

I've enjoyed the process of contributing to this package, so I am interested in helping out with this issue too, if you are interested in my help. However, I want to be mindful of not overstaying my welcome, so to speak.

chjackson · 2020-06-12T06:53:46Z

Go for it - I'm happy to have you contribute.

heor-robyoung · 2022-11-03T16:56:20Z

Looks to me like the CS residuals are off for censored observations. Have a look at these modifications of the "true model" scenario:

set.seed(42)
y <- rweibull(10000, 2, 2)
fite <- flexsurvreg(Surv(y) ~ 1, dist="weibull")
cs <- coxsnell_flexsurvreg(fite)
plot(cs$"(qexp)", cs$est, pch=19, xlab="Theoretical quantiles", ylab="Cumulative hazard")
abline(a=0,b=1,col="red",lwd=2)

# Simple cut-off
max_time <- 2 * median(y)
cens_time <- max_time

cens <- y > cens_time
y2 <- ifelse(cens, cens_time, y)
fitf <- flexsurvreg(Surv(y2, 1*(!cens)) ~ 1, dist = "weibull")
cs <- coxsnell_flexsurvreg(fitf)
plot(cs$"(qexp)", cs$est, pch=19, xlab="Theoretical quantiles", ylab="Cumulative hazard")
abline(a=0,b=1,col="red",lwd=2)

# Random censoring
censor_rate <- 1 / median(y)
cens_time <- rexp(length(y), censor_rate)
cens <- y > cens_time
y2 <- ifelse(cens, cens_time, y)
fitf <- flexsurvreg(Surv(y2, 1*(!cens)) ~ 1, dist = "weibull")
cs <- coxsnell_flexsurvreg(fitf)
plot(cs$"(qexp)", cs$est, pch=19, xlab="Theoretical quantiles", ylab="Cumulative hazard")
abline(a=0,b=1,col="red",lwd=2)

The suggestion here: https://uu.diva-portal.org/smash/get/diva2:826234/FULLTEXT01.pdf - equation 2.1.4 - is to add the expected value (1) to censored observations. I'm not sure it's a happy solution for the heavily censored case, but it does at least shift the line back onto mean = 1 in this "perfect" case.

chjackson · 2022-11-04T17:32:13Z

I'm having trouble following the theory behind this modification. Why would we favour adding any specific number to the residual for censored observations? What is meant by making them "compatible"? In general I wouldn't expect censored observations to tell us much useful about the goodness of fit of the model, e.g as the censoring time gets earlier they don't tell us anything.

heor-robyoung · 2022-11-04T18:34:38Z

Censored information can be highly informative about fit, particularly if you have decreasing hazard and a lot of administrative censoring, such that your Kaplan - Meier is going through an extended plateau up to maximum follow-up. If the model predicts high density in this plateau period when you have evidence to the contrary in the form of censored observations at greater times, that's poor fit.

Removing the censored observations outright will cause issues of its own, as the remaining obs won't be unit exponential (unless censoring is somehow completely independent of event time, which is only going to happen if someone goes back into a dataset and randomly picks observations to censor - otherwise it is inevitable that longer survival times are more likely to be censored). So they need to be incorporated somehow, and the current implementation gives a clear bias to the residual plot.

Best,

Rob

chjackson added the enhancement label Sep 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More methods needed for residuals.flexsurvreg #78

More methods needed for residuals.flexsurvreg #78

chjackson commented Jun 10, 2020

mattwarkentin commented Jun 10, 2020

mattwarkentin commented Jun 10, 2020 •

edited

Loading

chjackson commented Jun 11, 2020 •

edited

Loading

mattwarkentin commented Jun 12, 2020

chjackson commented Jun 12, 2020

heor-robyoung commented Nov 3, 2022 •

edited

Loading

chjackson commented Nov 4, 2022

heor-robyoung commented Nov 4, 2022

More methods needed for residuals.flexsurvreg #78

More methods needed for residuals.flexsurvreg #78

Comments

chjackson commented Jun 10, 2020

mattwarkentin commented Jun 10, 2020

mattwarkentin commented Jun 10, 2020 • edited Loading

chjackson commented Jun 11, 2020 • edited Loading

mattwarkentin commented Jun 12, 2020

chjackson commented Jun 12, 2020

heor-robyoung commented Nov 3, 2022 • edited Loading

chjackson commented Nov 4, 2022

heor-robyoung commented Nov 4, 2022

mattwarkentin commented Jun 10, 2020 •

edited

Loading

chjackson commented Jun 11, 2020 •

edited

Loading

heor-robyoung commented Nov 3, 2022 •

edited

Loading