Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index should accept CSV/JSON #26

Closed
averagehat opened this issue Aug 27, 2015 · 14 comments
Closed

Index should accept CSV/JSON #26

averagehat opened this issue Aug 27, 2015 · 14 comments

Comments

@averagehat
Copy link
Collaborator

I was thinking we could add a create (distinct from index) command to load JSON/CSV/FASTA files to initialize an existing database. But I think this could be included in the index command.

If the database does not exist, this creates a new database, otherwise adding stuff to an existing database (which seems to work out of the box). There is no reason either index couldn't accept any of these file types (differentiate by relying on the extension), and this is useful because of #23, and the fact that it is hard to store metadata in FASTA format.

Another factor is that the swissprot data already has a computed index field, while other json/fasta files will not. Maybe we should just check for the "index" field and compute the index only if it is not there. In that case the "index" field would be a sort of reserved field that users shouldn't use.

@DCGenomics
Copy link
Contributor

Hi Everyone,

I just want to make sure you guys are comfortable with me moving this repo
over to the NCBI hackathon org. this afternoon.

I'll wait one more hour, since Ive seen a lot of recent commits.

Cheers!

BEn

On Thu, Aug 27, 2015 at 12:38 PM, Mike Panciera [email protected]
wrote:

I was thinking we could add a create (distinct from index) command to
load JSON/CSV/FASTA files to initialize an existing database. But I think
this could be included in the index command.

If the database does not exist, this creates a new database, otherwise
adding stuff to an existing database (which seems to work out of the box).
There is no reason either index couldn't accept any of these file types
(differentiate by relying on the extension), and this is useful because of
#23 #23, and the fact that it
is hard to store metadata in FASTA format.

Another factor is that the swissprot data already has a computed index
field, while other json/fasta files will not. Maybe we should just check
for the index field and compute the index only if it is not there. In
that case the index field would be a sort of reserved field that users
shouldn't use.


Reply to this email directly or view it on GitHub
#26.

What have you done today to make the world a better place?

@lewisg-ncbi
Copy link
Collaborator

Ben,

What's the downside of the move? Just that we have to check out from another repository?

Best,
Lewis

From: DCGenomics [mailto:[email protected]]
Sent: Thursday, August 27, 2015 2:50 PM
To: DCGenomics/seqr [email protected]
Subject: Re: [seqr] Index should accept CSV/JSON (#26)

Hi Everyone,

I just want to make sure you guys are comfortable with me moving this repo
over to the NCBI hackathon org. this afternoon.

I'll wait one more hour, since Ive seen a lot of recent commits.

Cheers!

BEn

On Thu, Aug 27, 2015 at 12:38 PM, Mike Panciera [email protected]
wrote:

I was thinking we could add a create (distinct from index) command to
load JSON/CSV/FASTA files to initialize an existing database. But I think
this could be included in the index command.

If the database does not exist, this creates a new database, otherwise
adding stuff to an existing database (which seems to work out of the box).
There is no reason either index couldn't accept any of these file types
(differentiate by relying on the extension), and this is useful because of
#23 #23, and the fact that it
is hard to store metadata in FASTA format.

Another factor is that the swissprot data already has a computed index
field, while other json/fasta files will not. Maybe we should just check
for the index field and compute the index only if it is not there. In
that case the index field would be a sort of reserved field that users
shouldn't use.


Reply to this email directly or view it on GitHub
#26.

What have you done today to make the world a better place?


Reply to this email directly or view it on GitHubhttps://github.com//issues/26#issuecomment-135520461.

@averagehat
Copy link
Collaborator Author

I'd like to use my github account on Travis-CI for this project. We don't have that now; I'm not sure what permissions I will need in order for that to happen within an organization, but it works for the organizations I'm a part of (I'm admin in that organization though.)

@DCGenomics
Copy link
Contributor

Exactly. The upside is that we have some expanded admin capabilities.

Cheers!

Ben
On Aug 27, 2015 3:02 PM, "lewisg-ncbi" [email protected] wrote:

Ben,

What's the downside of the move? Just that we have to check out from
another repository?

Best,
Lewis

From: DCGenomics [mailto:[email protected]]
Sent: Thursday, August 27, 2015 2:50 PM
To: DCGenomics/seqr [email protected]
Subject: Re: [seqr] Index should accept CSV/JSON (#26)

Hi Everyone,

I just want to make sure you guys are comfortable with me moving this repo
over to the NCBI hackathon org. this afternoon.

I'll wait one more hour, since Ive seen a lot of recent commits.

Cheers!

BEn

On Thu, Aug 27, 2015 at 12:38 PM, Mike Panciera [email protected]

wrote:

I was thinking we could add a create (distinct from index) command to
load JSON/CSV/FASTA files to initialize an existing database. But I
think
this could be included in the index command.

If the database does not exist, this creates a new database, otherwise
adding stuff to an existing database (which seems to work out of the
box).
There is no reason either index couldn't accept any of these file types
(differentiate by relying on the extension), and this is useful because
of
#23 #23, and the fact that
it
is hard to store metadata in FASTA format.

Another factor is that the swissprot data already has a computed index
field, while other json/fasta files will not. Maybe we should just check
for the index field and compute the index only if it is not there. In
that case the index field would be a sort of reserved field that users
shouldn't use.


Reply to this email directly or view it on GitHub
#26.

What have you done today to make the world a better place?


Reply to this email directly or view it on GitHub<
https://github.com/DCGenomics/seqr/issues/26#issuecomment-135520461>.


Reply to this email directly or view it on GitHub
#26 (comment).

@lewisg-ncbi
Copy link
Collaborator

Sounds good to me, that is have an update and a create command.

Best,
Lewis

From: Mike Panciera [mailto:[email protected]]
Sent: Thursday, August 27, 2015 12:39 PM
To: DCGenomics/seqr [email protected]
Subject: [seqr] Index should accept CSV/JSON (#26)

I was thinking we could add a create (distinct from index) command to load JSON/CSV/FASTA files to initialize an existing database. But I think this could be included in the index command.

If the database does not exist, this creates a new database, otherwise adding stuff to an existing database (which seems to work out of the box). There is no reason either index couldn't accept any of these file types (differentiate by relying on the extension), and this is useful because of #23#23, and the fact that it is hard to store metadata in FASTA format.

Another factor is that the swissprot data already has a computed index field, while other json/fasta files will not. Maybe we should just check for the index field and compute the index only if it is not there. In that case the index field would be a sort of reserved field that users shouldn't use.


Reply to this email directly or view it on GitHubhttps://github.com//issues/26.

@DCGenomics
Copy link
Contributor

Ive moved it over. Let me know if it has the functionality you guys want.
If not, Matt or I will make the appropriate adjustments.

Cheers!

Ben

On Thu, Aug 27, 2015 at 5:33 PM, lewisg-ncbi [email protected]
wrote:

Sounds good to me, that is have an update and a create command.

Best,
Lewis

From: Mike Panciera [mailto:[email protected]]
Sent: Thursday, August 27, 2015 12:39 PM
To: DCGenomics/seqr [email protected]
Subject: [seqr] Index should accept CSV/JSON (#26)

I was thinking we could add a create (distinct from index) command to load
JSON/CSV/FASTA files to initialize an existing database. But I think this
could be included in the index command.

If the database does not exist, this creates a new database, otherwise
adding stuff to an existing database (which seems to work out of the box).
There is no reason either index couldn't accept any of these file types
(differentiate by relying on the extension), and this is useful because of
#23#23, and the fact that it
is hard to store metadata in FASTA format.

Another factor is that the swissprot data already has a computed index
field, while other json/fasta files will not. Maybe we should just check
for the index field and compute the index only if it is not there. In that
case the index field would be a sort of reserved field that users shouldn't
use.


Reply to this email directly or view it on GitHub<
https://github.com/DCGenomics/seqr/issues/26>.


Reply to this email directly or view it on GitHub
#26 (comment).

What have you done today to make the world a better place?

@nyetsche
Copy link
Collaborator

The Travis CI permissions required http://docs.travis-ci.com/user/github-oauth-scopes/ are tame, so I think everyone who's a current collaborator should be able to start using TravisCI. Admittedly, I've never used it.

That being said, I just made @lianyi , @lewisg-ncbi , & @averagehat 'admin' for the NCBI-Hackathon/seqr repo, so any of you have extra privileges. You can even add new collaborators!

@averagehat
Copy link
Collaborator Author

@nyetsche I think I need to be a member of the organization hosting the repository in order to add it in travis.

@DCGenomics
Copy link
Contributor

I already added it in travis

On Fri, Aug 28, 2015 at 12:44 PM, Mike Panciera [email protected]
wrote:

@nyetsche https://github.com/nyetsche I think I need to be a member of
the organization hosting the repository in order to add it in travis.


Reply to this email directly or view it on GitHub
#26 (comment)
.

What have you done today to make the world a better place?

@averagehat
Copy link
Collaborator Author

My mistake, apparently travis-ci urls are case sensitive. The builds are live here:
https://travis-ci.org/NCBI-Hackathons/seqr
Thanks!

@nyetsche
Copy link
Collaborator

👍

@averagehat
Copy link
Collaborator Author

@lianyi When inserting documents, is it necessary to use the FindIndex class manually, or is Solr set up to automatically index them? I know the latter is true for searching, but is it true for indexing? Thanks.

@lianyi
Copy link
Collaborator

lianyi commented Aug 30, 2015

The most recent update won't require FindIndex for indexing. When the sequence provided in the "sequence" field, i.e: {sequence:"AAAAAAA",id:6,...}. It will be automatically tokenized as we used to do in FindIndex.

@lianyi
Copy link
Collaborator

lianyi commented Sep 1, 2015

also probably we can add an option to allow the user to wipe clean all of the indexes before indexing new FASTA/JSONs. i.e -clean

without this -clean option, it's basically an incremental update mode.

lianyi added a commit that referenced this issue Sep 23, 2015
@lianyi lianyi closed this as completed Sep 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants