-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many join keys #177
Comments
The size is not the problem, we modelled 4k samples and 10 million cells. there must be something strange with the dataset factors to sample. Do you have 119 unique donor_id? please send cell metadata with anonymised if needed |
Had to filter out some columns because of size, but here's the metadata. |
Thanks, please test with your own metadata first. I get > library(sccomp)
> read_csv("~/Downloads/navin_metadata.csv") |>
+ sccomp_estimate(
+ formula_composition = ~ BPA_score + self_reported_ethnicity + age2 + tissue_location + (1|source),
+ .sample = donor_id,
+ .cell_group = author_cell_type,
+ bimodal_mean_variability_association = TRUE,
+ cores = 16
+ )
New names:
• `` -> `...1`
• `...1` -> `...2`
Rows: 714331 Columns: 10
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (9): ...2, sample_id, donor_id, author_cell_type, BPA_bin, source, self_reported_ethnicity, tissue_location, age2
dbl (1): ...1
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Error in check_columns_exist(.data, c(quo_name(.sample), quo_name(.cell_group), :
The columns BPA_score are not present in your tibble Please paste here the full code from metadata + error from your metadata directly. |
Sorry forgot to check! Here's the metadata with the right columns. |
Thanks, can you please attach the code you use to execute from read_csv("navin_metadata.csv") |> ... and the error message? |
The code that came before involved filtering the metadata to find the samples that had complete data for all the covariates, and then dropping the unused levels in the factors in the actual seurat object. Let me know if you need anything else! Thanks again.
|
Thanks,
sccomp_result =
navin_data_nona |>
sccomp_estimate(
formula_composition = ~ BPA_score + self_reported_ethnicity + age2 + tissue_location + (1|source),
.sample = donor_id,
.cell_group = author_cell_type,
bimodal_mean_variability_association = TRUE,
cores = 16
)
|
Hope this helps! |
Please do the following to facilitate the reproducible example # Get cell metadata
seurat_object[[]] |> saveRDS("seurat_object_metadata.rds")
# Execute sccomp
readRDS("seurat_object_metadata.rds") |>
sccomp_estimate(
formula_composition = ~ BPA_score + self_reported_ethnicity + age2 + tissue_location + (1|source),
.sample = donor_id,
.cell_group = author_cell_type,
bimodal_mean_variability_association = TRUE,
cores = 16
) Then send me
|
Thanks for actively maintaining this @stemangiola!
I received the following message from running sccomp dataset on the HBCA data (link) with the following specification:
sccomp code
BPA_score
is a continuous score of geneset activity calculated form AUCell. The other covariates are from the study itself. NAs were removed before feeding into sccomp.Error Message
Perhaps there are too many covariates for the size of the data (119 samples, 658825 cells)? I'm trying with a subset now but would like to model jointly across all cell types if possible.
Thanks!
The text was updated successfully, but these errors were encountered: