Index should accept CSV/JSON #26

averagehat · 2015-08-27T16:38:42Z

I was thinking we could add a create (distinct from index) command to load JSON/CSV/FASTA files to initialize an existing database. But I think this could be included in the index command.

If the database does not exist, this creates a new database, otherwise adding stuff to an existing database (which seems to work out of the box). There is no reason either index couldn't accept any of these file types (differentiate by relying on the extension), and this is useful because of #23, and the fact that it is hard to store metadata in FASTA format.

Another factor is that the swissprot data already has a computed index field, while other json/fasta files will not. Maybe we should just check for the "index" field and compute the index only if it is not there. In that case the "index" field would be a sort of reserved field that users shouldn't use.

The text was updated successfully, but these errors were encountered:

DCGenomics · 2015-08-27T18:49:42Z

Hi Everyone,

I just want to make sure you guys are comfortable with me moving this repo
over to the NCBI hackathon org. this afternoon.

I'll wait one more hour, since Ive seen a lot of recent commits.

Cheers!

BEn

On Thu, Aug 27, 2015 at 12:38 PM, Mike Panciera [email protected]
wrote:

I was thinking we could add a create (distinct from index) command to
load JSON/CSV/FASTA files to initialize an existing database. But I think
this could be included in the index command.

If the database does not exist, this creates a new database, otherwise
adding stuff to an existing database (which seems to work out of the box).
There is no reason either index couldn't accept any of these file types
(differentiate by relying on the extension), and this is useful because of
#23 #23, and the fact that it
is hard to store metadata in FASTA format.

Another factor is that the swissprot data already has a computed index
field, while other json/fasta files will not. Maybe we should just check
for the index field and compute the index only if it is not there. In
that case the index field would be a sort of reserved field that users
shouldn't use.

—
Reply to this email directly or view it on GitHub
#26.

What have you done today to make the world a better place?

lewisg-ncbi · 2015-08-27T19:02:47Z

Ben,

What's the downside of the move? Just that we have to check out from another repository?

Best,
Lewis

From: DCGenomics [mailto:[email protected]]
Sent: Thursday, August 27, 2015 2:50 PM
To: DCGenomics/seqr [email protected]
Subject: Re: [seqr] Index should accept CSV/JSON (#26)

Hi Everyone,

I just want to make sure you guys are comfortable with me moving this repo
over to the NCBI hackathon org. this afternoon.

I'll wait one more hour, since Ive seen a lot of recent commits.

Cheers!

BEn

On Thu, Aug 27, 2015 at 12:38 PM, Mike Panciera [email protected]
wrote:

I was thinking we could add a create (distinct from index) command to
load JSON/CSV/FASTA files to initialize an existing database. But I think
this could be included in the index command.

If the database does not exist, this creates a new database, otherwise
adding stuff to an existing database (which seems to work out of the box).
There is no reason either index couldn't accept any of these file types
(differentiate by relying on the extension), and this is useful because of
#23 #23, and the fact that it
is hard to store metadata in FASTA format.

Another factor is that the swissprot data already has a computed index
field, while other json/fasta files will not. Maybe we should just check
for the index field and compute the index only if it is not there. In
that case the index field would be a sort of reserved field that users
shouldn't use.

—
Reply to this email directly or view it on GitHub
#26.

What have you done today to make the world a better place?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/26#issuecomment-135520461.

averagehat · 2015-08-27T19:23:17Z

I'd like to use my github account on Travis-CI for this project. We don't have that now; I'm not sure what permissions I will need in order for that to happen within an organization, but it works for the organizations I'm a part of (I'm admin in that organization though.)

DCGenomics · 2015-08-27T20:27:10Z

Exactly. The upside is that we have some expanded admin capabilities.

Cheers!

Ben
On Aug 27, 2015 3:02 PM, "lewisg-ncbi" [email protected] wrote:

Ben,

What's the downside of the move? Just that we have to check out from
another repository?

Best,
Lewis

From: DCGenomics [mailto:[email protected]]
Sent: Thursday, August 27, 2015 2:50 PM
To: DCGenomics/seqr [email protected]
Subject: Re: [seqr] Index should accept CSV/JSON (#26)

Hi Everyone,

I just want to make sure you guys are comfortable with me moving this repo
over to the NCBI hackathon org. this afternoon.

I'll wait one more hour, since Ive seen a lot of recent commits.

Cheers!

BEn

On Thu, Aug 27, 2015 at 12:38 PM, Mike Panciera [email protected]

wrote:

I was thinking we could add a create (distinct from index) command to
load JSON/CSV/FASTA files to initialize an existing database. But I
think
this could be included in the index command.

If the database does not exist, this creates a new database, otherwise
adding stuff to an existing database (which seems to work out of the
box).
There is no reason either index couldn't accept any of these file types
(differentiate by relying on the extension), and this is useful because
of
#23 #23, and the fact that
it
is hard to store metadata in FASTA format.

Another factor is that the swissprot data already has a computed index
field, while other json/fasta files will not. Maybe we should just check
for the index field and compute the index only if it is not there. In
that case the index field would be a sort of reserved field that users
shouldn't use.

—
Reply to this email directly or view it on GitHub
#26.

What have you done today to make the world a better place?

—
Reply to this email directly or view it on GitHub<
https://github.com/DCGenomics/seqr/issues/26#issuecomment-135520461>.

—
Reply to this email directly or view it on GitHub
#26 (comment).

lewisg-ncbi · 2015-08-27T21:33:47Z

Sounds good to me, that is have an update and a create command.

Best,
Lewis

From: Mike Panciera [mailto:[email protected]]
Sent: Thursday, August 27, 2015 12:39 PM
To: DCGenomics/seqr [email protected]
Subject: [seqr] Index should accept CSV/JSON (#26)

I was thinking we could add a create (distinct from index) command to load JSON/CSV/FASTA files to initialize an existing database. But I think this could be included in the index command.

If the database does not exist, this creates a new database, otherwise adding stuff to an existing database (which seems to work out of the box). There is no reason either index couldn't accept any of these file types (differentiate by relying on the extension), and this is useful because of #23 #23, and the fact that it is hard to store metadata in FASTA format.

Another factor is that the swissprot data already has a computed index field, while other json/fasta files will not. Maybe we should just check for the index field and compute the index only if it is not there. In that case the index field would be a sort of reserved field that users shouldn't use.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/26.

DCGenomics · 2015-08-28T12:49:44Z

Ive moved it over. Let me know if it has the functionality you guys want.
If not, Matt or I will make the appropriate adjustments.

Cheers!

Ben

On Thu, Aug 27, 2015 at 5:33 PM, lewisg-ncbi [email protected]
wrote:

Sounds good to me, that is have an update and a create command.

Best,
Lewis

From: Mike Panciera [mailto:[email protected]]
Sent: Thursday, August 27, 2015 12:39 PM
To: DCGenomics/seqr [email protected]
Subject: [seqr] Index should accept CSV/JSON (#26)

I was thinking we could add a create (distinct from index) command to load
JSON/CSV/FASTA files to initialize an existing database. But I think this
could be included in the index command.

If the database does not exist, this creates a new database, otherwise
adding stuff to an existing database (which seems to work out of the box).
There is no reason either index couldn't accept any of these file types
(differentiate by relying on the extension), and this is useful because of
#23 #23, and the fact that it
is hard to store metadata in FASTA format.

Another factor is that the swissprot data already has a computed index
field, while other json/fasta files will not. Maybe we should just check
for the index field and compute the index only if it is not there. In that
case the index field would be a sort of reserved field that users shouldn't
use.

—
Reply to this email directly or view it on GitHub<
https://github.com/DCGenomics/seqr/issues/26>.

—
Reply to this email directly or view it on GitHub
#26 (comment).

What have you done today to make the world a better place?

nyetsche · 2015-08-28T14:45:50Z

The Travis CI permissions required http://docs.travis-ci.com/user/github-oauth-scopes/ are tame, so I think everyone who's a current collaborator should be able to start using TravisCI. Admittedly, I've never used it.

That being said, I just made @lianyi , @lewisg-ncbi , & @averagehat 'admin' for the NCBI-Hackathon/seqr repo, so any of you have extra privileges. You can even add new collaborators!

averagehat · 2015-08-28T16:44:31Z

@nyetsche I think I need to be a member of the organization hosting the repository in order to add it in travis.

DCGenomics · 2015-08-28T17:02:57Z

I already added it in travis

On Fri, Aug 28, 2015 at 12:44 PM, Mike Panciera [email protected]
wrote:

@nyetsche https://github.com/nyetsche I think I need to be a member of
the organization hosting the repository in order to add it in travis.

—
Reply to this email directly or view it on GitHub
#26 (comment)
.

What have you done today to make the world a better place?

averagehat · 2015-08-28T17:19:25Z

My mistake, apparently travis-ci urls are case sensitive. The builds are live here:
https://travis-ci.org/NCBI-Hackathons/seqr
Thanks!

nyetsche · 2015-08-28T17:29:33Z

👍

averagehat · 2015-08-28T21:53:30Z

@lianyi When inserting documents, is it necessary to use the FindIndex class manually, or is Solr set up to automatically index them? I know the latter is true for searching, but is it true for indexing? Thanks.

lianyi · 2015-08-30T00:12:16Z

The most recent update won't require FindIndex for indexing. When the sequence provided in the "sequence" field, i.e: {sequence:"AAAAAAA",id:6,...}. It will be automatically tokenized as we used to do in FindIndex.

lianyi · 2015-09-01T13:51:54Z

also probably we can add an option to allow the user to wipe clean all of the indexes before indexing new FASTA/JSONs. i.e -clean

without this -clean option, it's basically an incremental update mode.

Index cmd , also covers #26

averagehat mentioned this issue Aug 28, 2015

JSON/CSV file loading #27

Merged

averagehat mentioned this issue Aug 28, 2015

Continuous Integration #14

Closed

lianyi added a commit that referenced this issue Sep 23, 2015

Merge pull request #47 from NCBI-Hackathons/index-cmd

0a69f3d

Index cmd , also covers #26

lianyi closed this as completed Sep 23, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index should accept CSV/JSON #26

Index should accept CSV/JSON #26

averagehat commented Aug 27, 2015

DCGenomics commented Aug 27, 2015

lewisg-ncbi commented Aug 27, 2015

averagehat commented Aug 27, 2015

DCGenomics commented Aug 27, 2015

lewisg-ncbi commented Aug 27, 2015

DCGenomics commented Aug 28, 2015

nyetsche commented Aug 28, 2015

averagehat commented Aug 28, 2015

DCGenomics commented Aug 28, 2015

averagehat commented Aug 28, 2015

nyetsche commented Aug 28, 2015

averagehat commented Aug 28, 2015

lianyi commented Aug 30, 2015

lianyi commented Sep 1, 2015

Index should accept CSV/JSON #26

Index should accept CSV/JSON #26

Comments

averagehat commented Aug 27, 2015

DCGenomics commented Aug 27, 2015

lewisg-ncbi commented Aug 27, 2015

averagehat commented Aug 27, 2015

DCGenomics commented Aug 27, 2015

lewisg-ncbi commented Aug 27, 2015

DCGenomics commented Aug 28, 2015

nyetsche commented Aug 28, 2015

averagehat commented Aug 28, 2015

DCGenomics commented Aug 28, 2015

averagehat commented Aug 28, 2015

nyetsche commented Aug 28, 2015

averagehat commented Aug 28, 2015

lianyi commented Aug 30, 2015

lianyi commented Sep 1, 2015