Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xiang's implementation of RSSp and related methods #4

Open
xiangzhu opened this issue Sep 27, 2018 · 3 comments
Open

Xiang's implementation of RSSp and related methods #4

xiangzhu opened this issue Sep 27, 2018 · 3 comments

Comments

@xiangzhu
Copy link
Collaborator

@CreRecombinase Hi Nick,

Below are a few links that accompany my notes posted on Slack — i spent some time after our meeting to clean up these codes; hopefully they are not too hard to read and use:

  • fit_rssp.m: fit a RSSp model by maximizing its marginal likelihood
  • estimate_pve.m: obtain posterior mean and simulate posterior sample for PVE, given a fitted RSSp

You should have access to these scripts since i have asked Peter to added you to my developer repo. Please let me know if this is not the case.

Before including these scripts into your work, please first make sure they work on your side. To this end I have created a simple example script single_test_nick.m for you to run (you need add input data files there). I only tested these scripts in Matlab 2017b: module load matlab/2017b.

A few more remarks:

  • The baseline model that you are mainly interested in corresponds to my RSSp model with only c0 term; that is,
k = 1;
[cvec_1, obj_1] = fit_rssp(vhat, dvec, zeros(k,1));
  • fit_rssp.m is built on the fmincon function in Matlab, and so you can use it to confirm your R/C++ implementation of baseline model.

  • Can you please apply estimate_pve.m to re-estimate PVE for Framingham eQTL data? It should be not that hard, since all the RSSp models have been fitted and all EVDs have been precomputed.

  • Can you please also apply my RSSp models with confounding correction (i.e. with c1, c2, c3, ... terms) to some genes with extremely large PVE estimates in Framingham data? Ideally we want to see reduced PVE estimates for these "problematic" genes.

@CreRecombinase
Copy link
Contributor

I have added inst/m_code/negtwo_loglik.m and a "direct" transliteration to R inst/m_code/negtwo_loglik.R . I have added tests that my likelihood function is equivalent (except for the factor of 2 difference, see https://github.com/CreRecombinase/RSSp/blob/2cab5d5ac6f6fef2198edec5784de2d268ae5768/tests/testthat/test_dnorm.R#L3)

@xiangzhu
Copy link
Collaborator Author

@CreRecombinase Thanks for the update! Good to see two sets of codes written by us produce equivalent likelihood values (up to a constant 2).

However I am not sure if your tests above may complicate the package development or not.

I was suggesting a very simple consistency check in CreRecombinase/PolygenicRSS#5:

  1. simulate data
  2. run your R codes on the simulated data
  3. run my Matlab codes on the same data
  4. compare output: likelihood, PVE estimates, etc

To perform this consistency check, it seems we don't need to add any additional test codes to your R package. (Perhaps you have other reasons for doing so?)

To keep it simple, if we can proof that two sets of codes produce almost identical results on the same data, then we can probably move on at this point?

In the long run we may consider including these codes for more rigorous testing.

@xiangzhu
Copy link
Collaborator Author

@CreRecombinase as we discussed yesterday, I have generated the following input and output data using my own implementations in Matlab to facilitate your consistency check.

Please let me know if you have any question. Thanks!

Note that I will be using this thread to document all data files that are generated for your tests.

All data files are saved at /project/compbio/RSSp/consistency_check_data/, to which you have full access.

2019-03-12

In this file I simulate vhat from Normal(0, 4^2) and dvec from Unif(0, 2), and I create 100 replicates. For each replicate, I fit RSS-P(0) model, and then record output obj (-2 * log likelihood), par (MLE) and pve (PVE estimate).

You can easily load these data in R as follows:

> suppressPackageStartupMessages(library(R.matlab))
> res_file <- "/project/compbio/RSSp/consistency_check_data/rand_seed_312_k_1_rep_100.mat"
> res_data <- R.matlab::readMat(res_file)
> names(res_data)
[1] "dvec" "obj"  "par"  "pve"  "vhat"
> dim(res_data$dvec)
[1] 10000   100
> dim(res_data$vhat)
[1] 10000   100
> dim(res_data$obj)
[1] 100   1
> dim(res_data$par)
[1] 100   1
> dim(res_data$pve)
[1] 100   1

When applying your R codes on dvec[,j] and vhat[,j], please do not adjust the row order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants