Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 2022_Liu_Micronesia #199

Closed

Conversation

bamorim-bio
Copy link
Contributor

@bamorim-bio bamorim-bio commented Aug 15, 2024

PR Checklist for a new package submission

  • The package does not exist already in the community archive, also not with a different name.
  • The package title in the POSEIDON.yml conforms to the general title structure suggested here: <Year>_<Last name of first author>_<Region, time period or special feature of the paper>, e.g. 2021_Zegarac_SoutheasternEurope, 2021_SeguinOrlando_BellBeaker or 2021_Kivisild_MedievalEstonia.
  • The package is stored in a directory that is named like the package title.

  • The package is complete and features the following elements:
    • Genotype data in binary PLINK format (not EIGENSTRAT format).
    • A POSEIDON.yml file with not just the file-referencing fields, but also the following meta-information fields present and filled: poseidonVersion, title, description, contributor, packageVersion, lastModified (see here for their definition)
    • A reasonably filled .janno file (for a list of available fields look here and here for more detailed documentation about them).
    • A .bib file with the necessary literature references for each sample in the .janno file.
  • Every file in the submission is correctly referenced in the POSEIDON.yml file and there are no additional, supplementary files in the submission that are not documented there.
  • Genotype data, .janno and .bib file are all named after the package title and only differ in the file extension.
  • The package version in the POSEIDON.yml file is 1.0.0.
  • The poseidonVersion of the package in the POSEIDON.yml file is set to the latest version of the Poseidon schema.
  • The POSEIDON.yml file contains the corresponding checksums for the fields genoFile, snpFile, indFile, jannoFile and bibFile.
  • There is either no CHANGELOG file or one with a single entry for version 1.0.0.

  • The Publication column in the .janno file is filled and the respective .bib file has complete entries for the listed mentioned keys.
  • The .janno file does not include any empty columns or columns only filled with n/a.
  • The order of columns in the .janno file adheres to the standard order as defined in the Poseidon schema here.
  • The .janno and the .ssf files are not fully quoted, so they only use single- or double quotes ("...", '...') to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).

  • The package passes a validation with trident validate --fullGeno.

  • Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here. If you accidentally pushed the files the wrong way you can fix it with git lfs migrate import --no-rewrite path/to/file.bed (see here).

@nevrome nevrome added the only .janno This PR does not feature a full package, but only a .janno file label Aug 18, 2024
@stschiff
Copy link
Member

Thanks, @bamorim-bio, great. We'll take a look.

@nevrome
Copy link
Member

nevrome commented Aug 19, 2024

Thanks for preparing this .janno file! I see the following issues

  • The package name does not follow our expected standard of Year_AuthorName_RelevantKeyword. I propose 2022_Liu_Micronesia.
  • Please remove all columns that are completely empty/filled only with n/a.
  • In the lists for Relation_Degree there can not be any n/a values. Documenting a relationship of "unknown" degree would be a pit pointless, right?
  • What do the many other values in Relation_Degree mean? They are not specified in Relation_Type either and are thus not informative. Is there no way to specify them further? At least there should be a an entry in Relation_Note to explain them.
  • The column Endogenous contains a lot of mixed entries, where numbers are expected.
  • Date_C14_Labnr should also include the Laboratory identifier PSUAMS-, not just the number of the age.
  • Date_Note should only document additional details of the dating, not the actual dating. Contextual dates like [1000-1668 CE] should be entered in the columns Date_BC_AD_Start, Date_BC_AD_Median and Date_BC_AD_Stop.

Maybe you could quickly have a look 👍. Please run trident validate --janno on the file after you implemented the necessary changes.

@bamorim-bio
Copy link
Contributor Author

Thanks Clemens, here are my changes:

  • The package name does not follow our expected standard of Year_AuthorName_RelevantKeyword. I propose 2022_Liu_Micronesia.
  • Please remove all columns that are completely empty/filled only with n/a.
  • In the lists for Relation_Degree there can not be any n/a values. Documenting a relationship of "unknown" degree would be a pit pointless, right?
  • What do the many other values in Relation_Degree mean? They are not specified in Relation_Type either and are thus not informative. Is there no way to specify them further? At least there should be a an entry in Relation_Note to explain them.
  • The column Endogenous contains a lot of mixed entries, where numbers are expected.
  • Date_C14_Labnr should also include the Laboratory identifier PSUAMS-, not just the number of the age.
  • Date_Note should only document additional details of the dating, not the actual dating. Contextual dates like [1000-1668 CE] should be entered in the columns Date_BC_AD_Start, Date_BC_AD_Median and Date_BC_AD_Stop.

@nevrome nevrome self-assigned this Sep 6, 2024
@stschiff
Copy link
Member

stschiff commented Dec 3, 2024

I will review this Janno once more, and we would like to add the genotype data from AADR here (it's David's labe that made the data, so that should be fine).

@stschiff stschiff self-assigned this Dec 3, 2024
@nevrome nevrome deleted the branch poseidon-framework:dev January 17, 2025 11:58
@nevrome nevrome closed this Jan 17, 2025
@stschiff stschiff assigned bamorim-bio and unassigned nevrome and stschiff Jan 31, 2025
@stschiff stschiff self-requested a review January 31, 2025 14:46
@stschiff
Copy link
Member

We need to make a decision, who can grep the genotype data from AADR for this package.

@stschiff stschiff removed their request for review January 31, 2025 14:47
@stschiff stschiff removed the only .janno This PR does not feature a full package, but only a .janno file label Jan 31, 2025
@bamorim-bio
Copy link
Contributor Author

We need to make a decision, who can grep the genotype data from AADR for this package.

I can do this Stephan!

@nevrome nevrome reopened this Feb 1, 2025
@stschiff stschiff marked this pull request as draft February 3, 2025 14:19
@stschiff stschiff deleted the branch poseidon-framework:dev February 13, 2025 07:01
@stschiff stschiff closed this Feb 13, 2025
@stschiff stschiff reopened this Feb 13, 2025
@stschiff
Copy link
Member

Sorry for closing this @bamorim-bio. This is still open, of course.

@bamorim-bio bamorim-bio marked this pull request as ready for review February 17, 2025 15:52
@bamorim-bio bamorim-bio marked this pull request as draft February 17, 2025 15:53
@bamorim-bio bamorim-bio changed the title Create 2022_Liu_Science_Ancient.janno Add 2022_Liu_Science_Ancient Feb 17, 2025
@bamorim-bio
Copy link
Contributor Author

@nevrome Can you help me with the janno file? I added some more details but I cannot for some reason validate, I keep getting these errors:

[Warning] Contributor missing in POSEIDON.yml file of package 2022_Liu_Science_Ancient-0.1.0
[Error] Can't read sample in /mnt/scratch/bamorim/poseidon/2022_Liu_Science_Ancient/2022_Liu_Science_Ancient.janno in line 2 (Poseidon_ID: I7111.AG): Contamination, Contamination_Err and Contamination_Meas do not have the same lengths
[Error] Can't read sample in /mnt/scratch/bamorim/poseidon/2022_Liu_Science_Ancient/2022_Liu_Science_Ancient.janno in line 14 (Poseidon_ID: I24236.AG): Contamination, Contamination_Err and Contamination_Meas do not have the same lengths
[Error] Can't read sample in /mnt/scratch/bamorim/poseidon/2022_Liu_Science_Ancient/2022_Liu_Science_Ancient.janno in line 15: parse error (Failed reading: satisfy) at "2nd or 3rd degree relatives" 52, 2014.052.126; P8290 USA n/a n/a Naton Beach Site (Guam, Tamuning) (truncated)
[Error] Can't read sample in /mnt/scratch/bamorim/poseidon/2022_Liu_Science_Ancient/2022_Liu_Science_Ancient.janno in line 16: parse error (Failed reading: satisfy) at "2nd or 3rd degree relatives" 50, 2014.052.125; P8291 USA n/a n/a Naton Beach Site (Guam, Tamuning) (truncated)
[Error] Can't read sample in /mnt/scratch/bamorim/poseidon/2022_Liu_Science_Ancient/2022_Liu_Science_Ancient.janno in line 21: parse error (Failed reading: satisfy) at "2nd or 3rd degree relatives" 48, 2014.052.123; P8296 USA n/a n/a Naton Beach Site (Guam, Tamuning) (truncated)
[Warning] Some packages were skipped due to issues:
[Warning] In the package described in /mnt/scratch/bamorim/poseidon/2022_Liu_Science_Ancient/POSEIDON.yml:
[Warning] Consistency issues in file /mnt/scratch/bamorim/poseidon/2022_Liu_Science_Ancient/2022_Liu_Science_Ancient.janno: Broken lines.

@bamorim-bio bamorim-bio marked this pull request as ready for review February 17, 2025 16:04
@bamorim-bio
Copy link
Contributor Author

Also forgot to add YML file but I can't seem to push commit , might be getting permission error.

@nevrome nevrome changed the title Add 2022_Liu_Science_Ancient Add 2022_Liu_Micronesia Feb 18, 2025
@nevrome
Copy link
Member

nevrome commented Feb 18, 2025

Hey @bamorim-bio - OK.

Your Git permission error is hard to debug from afar. Maybe we have a call about this. Feel free to write me on Mattermost if the issue persists.

Regarding the .janno file: Two samples have pretty straight forward issues

[Error] Can't read sample in /mnt/scratch/bamorim/poseidon/2022_Liu_Science_Ancient/2022_Liu_Science_Ancient.janno in line 2 (Poseidon_ID: I7111.AG): Contamination, Contamination_Err and Contamination_Meas do not have the same lengths
[Error] Can't read sample in /mnt/scratch/bamorim/poseidon/2022_Liu_Science_Ancient/2022_Liu_Science_Ancient.janno in line 14 (Poseidon_ID: I24236.AG): Contamination, Contamination_Err and Contamination_Meas do not have the same lengths

The message Contamination, Contamination_Err and Contamination_Meas do not have the same lengths means that you have different numbers of ;-separated values in these columns. I didn't check the data -- let me know if this can not be easily resolved.

The other broken samples are a bit more tricky:

[Error] Can't read sample in /mnt/scratch/bamorim/poseidon/2022_Liu_Science_Ancient/2022_Liu_Science_Ancient.janno in line 15: parse error (Failed reading: satisfy) at "2nd or 3rd degree relatives" 52, 2014.052.126; P8290 USA n/a n/a Naton Beach Site (Guam, Tamuning) (truncated)
[Error] Can't read sample in /mnt/scratch/bamorim/poseidon/2022_Liu_Science_Ancient/2022_Liu_Science_Ancient.janno in line 16: parse error (Failed reading: satisfy) at "2nd or 3rd degree relatives" 50, 2014.052.125; P8291 USA n/a n/a Naton Beach Site (Guam, Tamuning) (truncated)
[Error] Can't read sample in /mnt/scratch/bamorim/poseidon/2022_Liu_Science_Ancient/2022_Liu_Science_Ancient.janno in line 21: parse error (Failed reading: satisfy) at "2nd or 3rd degree relatives" 48, 2014.052.123; P8296 USA n/a n/a Naton Beach Site (Guam, Tamuning) (truncated)

Parsing errors like this typically point to an issue with the tabs separating the columns. Or some rogue quotes. I suggest you open the .janno file in libre-office as a tab-separated file, and then check if something is off about these rows. Probably you can spot and fix the error there quickly.

@nevrome
Copy link
Member

nevrome commented Feb 19, 2025

@bamorim-bio I just realized that this package never got checked with the main submission checklist. It only started as a .janno file and not a regular package. That's why that was missed.

I added the list now to your initial comment here: #199 (comment)

Maybe you could quickly go through and check the boxes. You already invested quite some work into this, so I understand that this may start to feel tedious. Feel free to ask for help here.

@nevrome
Copy link
Member

nevrome commented Feb 19, 2025

This PR was superseded by #257. I close this one now.

@nevrome nevrome closed this Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants