Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV inputs don't yield results #23

Open
acolorado1 opened this issue Oct 29, 2024 · 2 comments
Open

CSV inputs don't yield results #23

acolorado1 opened this issue Oct 29, 2024 · 2 comments

Comments

@acolorado1
Copy link
Collaborator

acolorado1 commented Oct 29, 2024

Despite the documentation saying that CSV files are an acceptable input format this is not the case. Two senarios:

  1. Combination of .txt and .csv files (-i and --other_gene_set) resulted in all compounds and KOs being mapped to the .txt file
  2. Only using .csv file (-i) resulted in the following error:
raise ValueError('KEGG has forbidden request after %s attempts for url %s , which returns a response status of %s' %
                     (attempts, url, response.status_code))

While I think that this is a relatively easy fix, it certainly took a while to troubleshoot.

What has worked for me are .txt files containing rows with either one compound or KO not surrounded by quotes. For example:

K00001
K00002
K00003

I have not tested the utilization of TSVs but at this point it seems unlikely that it would be an acceptable format. Additionally, all my CSVs had a column name (KO) therefore the removal of the column name might allow CSVs to work. Nevertheless, this should be tested and the documentation updated.

@sterrettJD
Copy link
Collaborator

Hey @acolorado1 ,

I think it should be able to read .csv files, based on the section of the KO reader function linked here:

elif file_loc.endswith('.tsv') or file_loc.endswith('.csv'):

How is your CSV file formatted? AMON assumes that the KOs are column headers. So if your CSV file has KOs as rows (or inside of a column), it will get read incorrectly. I'll run some tests to verify, but that's my first idea - trying to put it out there quick since I know you're on a bit of a time crunch.

CSV reading is tested here:

def test_read_in_ids_csv(ids_csv, list_of_kos):

The input CSV would need to be formatted as:

,K00001,K00002,K00003
sample_A,0,0,1
sample_B,1,0,1

The .txt file parser is different from the .csv parser, given that the .txt parser splits based on any whitespace, and the .csv parser is effectively pd.read_table().columns.

If this is what's going on, we can definitely fix the documentation for that!

Also - what is the actual error you're getting from this? It'd be helpful to know what url it's trying to access.

raise ValueError('KEGG has forbidden request after %s attempts for url %s , which returns a response status of %s' %
                     (attempts, url, response.status_code))

@acolorado1
Copy link
Collaborator Author

Hey @sterrettJD,

It appears that I did not correctly format the file then as I gave KOs as rows which is how the txt file appears.

Unfortunately, I don't have the full error available but it ended with the ValueError, if I remember correctly it referencing lines of code within the API? I am not sure as I could not find where it went wrong. Unfortunately, half the time there is no error or any indication that anything is wrong until you get the results which are clearly inacurrate (which is why this formatting error took me so long to catch).

Regardless, I have gotten it to run without a problem, but it would be helpful to add example inputs to the README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants