-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LinDA Unusual p-value Distribution #54
Comments
Hi @DarioStrbenac, Thank you for reporting this issue. We've conducted extensive testing of the p-value distribution in LinDA, and I'd like to share our findings:
Our test metrics showed:
~ Age * `Tissue Type` + Smoking * `Tissue Type` + Gender * `Tissue Type` + `Primary Site` + (1|Patient) To improve the p-value distribution, consider: a) Sample Size: Ensure you have sufficient samples relative to the number of parameters in your model
b) Model Complexity:
c) Data Quality:
# Check sample sizes in each group
table(clinicalTablePairs$`Tissue Type`)
table(clinicalTablePairs$Smoking)
table(clinicalTablePairs$Gender)
# Look at Age distribution
hist(clinicalTablePairs$Age)
# Consider a simpler model first
simpleModel <- linda(bacteriaMatrix, clinicalTablePairs,
"~ Age + `Tissue Type` + Smoking + Gender + (1|Patient)",
"proportion", is.winsor = FALSE) Could you share:
This information would help us provide more specific recommendations for your case. Best regards, |
Thank you for evaluating it so thoroughly. Data set is 105 samples and 54 species. Age is categorical, not continuous. > table(clinicalTablePairs$Age)
Young Old
64 41
> table(clinicalTablePairs$`Tissue Type`)
Normal Cancer
51 54
> table(clinicalTablePairs$Smoking)
No Yes
74 31
> table(clinicalTablePairs$Gender)
Female Male
38 67
> table(Tissue = clinicalTablePairs$`Tissue Type`, Smoking = clinicalTablePairs$Smoking)
Smoking
Tissue No Yes
Normal 36 15
Cancer 38 16
> table(Tissue = clinicalTablePairs$`Tissue Type`, Gender = clinicalTablePairs$Gender)
Gender
Tissue Female Male
Normal 19 32
Cancer 19 35 Fitting the simpler model with only main effects, I also see a strange histogram. |
Using the data set shared previously via e-mail and fitting
I get a strange-looking p-value distribution for many of the coefficients. For example,
How can it be made to be more uniform, as expected by statistics theory?
The text was updated successfully, but these errors were encountered: