Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal fragmentation #82

Open
pavel-shliaha opened this issue Mar 20, 2016 · 13 comments
Open

internal fragmentation #82

pavel-shliaha opened this issue Mar 20, 2016 · 13 comments
Assignees

Comments

@pavel-shliaha
Copy link

I am now doing some intact protein analysis and it was recently demonstrated that when you fragment proteins you produce a lot of internal fragments:

http://www.ncbi.nlm.nih.gov/pubmed/25716753

considering these internal fragments results in a huge boost in coverage. Could internal fragmentation be introduced in calculateFragments

@sgibb sgibb self-assigned this Mar 20, 2016
@sgibb
Copy link
Collaborator

sgibb commented Mar 26, 2016

I just implemented a first approach to the internal fragments problem (currently in another branch).

calculateFragments("PQRST", type=c("b", "bIy"))
#         mz       ion type pos z seq
# 1 303.1775  bIy[2-3]  bIy   2 1  QR
# 2 262.1510  bIy[3-4]  bIy   3 1  RS
# 3 390.2096  bIy[2-4]  bIy   2 1 QRS
# 4 270.1799 bIy[2-3]_ bIy_   2 1  QR
# 5 229.1533 bIy[3-4]_ bIy_   3 1  RS
# 6 357.2119 bIy[2-4]_ bIy_   2 1 QRS
# 7 286.1510 bIy[2-3]* bIy*   2 1  QR
# 8 373.1830 bIy[2-4]* bIy*   2 1 QRS

Because of my minimal chemical background I am unsure whether all these calculations are correct and reasonable.

I use the following additions:

  add <- c(a=-(mass["C"]+mass["O"]),            # + H - CO
           b=0,                                 # + H
           c=mass["N"]+3*mass["H"],             # + H + NH3
           x=mass["C"]+2*mass["O"],             # + CO + OH
           y=2*mass["H"]+mass["O"],             # + H2 + OH
           z=-(mass["N"]+mass["H"])+mass["O"],  # - NH2 + OH
           ### internal fragments
           aIx=mass["O"],                       # (- CO + CO) + OH
           bIy=2*mass["H"]+mass["O"],           # + H2 + OH
           cIz=mass["H"]+mass["O"])             # + NH3 - NH2 + OH
## an additional H+ is added later

Is neutral loss resonable for aIx, bIy and cIz or are there any limitations?

(neutral loss is discussed in #47)

@lgatto
Copy link
Owner

lgatto commented Jun 26, 2016

@pavel-shliaha @sgibb any news on this front?

@sgibb
Copy link
Collaborator

sgibb commented Jun 26, 2016

The code is ready and could be merged if it is chemical correct. I am waiting for @pavel-shliaha's review.

@lgatto
Copy link
Owner

lgatto commented Jun 27, 2016

Ok, thanks.

@lgatto
Copy link
Owner

lgatto commented Sep 8, 2017

Any news on this front?

@pavel-shliaha
Copy link
Author

I will play around with this shortly for proteins, once I finish the work on top-down with normal fragments. In the next few months

@sgibb
Copy link
Collaborator

sgibb commented Nov 29, 2017

@pavel-shliaha As far as I understand the internal fragments there could be all kind of combinations, e.g. aIx, aIy, aIz, bIx, bIy, bIz, cIx, cIy, cIz. Should we focus on aIx, bIy and cIz first? Or should I also implement the other ones?

@pavel-shliaha
Copy link
Author

pavel-shliaha commented May 27, 2018

Sorry, to keep you waiting, but I needed to submit the work we have already done. Yes it could be any sort of combination and yes having aIx, bIy and cIz is a good start. I think that top-down dataset we have is perfect for the work, since I expect more internal fragments than with small peptides. Can I ask you to implement the internal fragments in topdownr? This would allow me to test internal fragmentation much easier and more systematic and we need this functionality for top down work anyway.

The internal fragments should also support adducts, since many of the fragments require adducts for identification. Just a reminder of the fragment types we observe, so if the probabilities of fragmenting a fragment is the same as fragmenting the precursor, then indeed bIy should be preferable in CID:

distribution of fragments

sgibb added a commit to sgibb/topdownr that referenced this issue Jun 3, 2018
@sgibb
Copy link
Collaborator

sgibb commented Jun 3, 2018

Currently the internal fragment feature lives in an extra branch (a special side project) of MSnbase. As I am not sure that the feature is working as expected I avoid merging it into the official MSnbase (@lgatto if you think this doesn't really matter you can merge the issue82-internalFragments branch).
@pavel-shliaha: You could test it with this special MSnbase version and the newest topdownr. To install both please use the following:

install_github("lgatto/MSnbase@issue82-internalFragments")
install_github("sgibb/topdownr")

Next you could simply add aIx, bIy and/or cIz to the type argument (even with this three additional fragment types the number of fragments increases 5x times for the myoglobin example data set):

library("topdownr")

## default: fragments a, b, c, x, y, z
tdsDflt <- readTopDownFiles(topdownrdata::topDownDataPath("myo"))
tdsDflt
# TopDownSet object (4.33 Mb)
# - - - Protein data - - -
# Amino acid sequence (153): GLSDGEWQQVLNVWGKVEADIAGHGQ...GAMTKALELFRNDIAAKYKELGFQG
# Mass : 16964.96
# Modifications (3): Carbamidomethyl, Acetyl, Met-loss
# - - - Fragment data - - -
# Number of theoretical fragments: 2700
# Theoretical fragment types (18): a, a_, a*, b, b_, ..., y_, y*, z, z_, z*
# Theoretical mass range: [68.03;16925.98]
# - - - Condition data - - -
# Number of conditions: 1852
# Number of scans: 5882
# Condition variables (61): File, Scan, ..., Sample, MedianIonInjectionTimeMs
# - - - Intensity data - - -
# Size of array: 2700x5882 (0.67% != 0)
# Number of matched fragments: 106282
# Intensity range: [109.29;10704001.00]
# - - - Processing information - - -
# [2018-06-03 20:41:53] 106282 fragments [2700;5882] matched (tolerance: 5 ppm, strategies ion/fragment: remove/remove).
# [2018-06-03 20:41:53] Condition names updated based on: Mz, AgcTarget, EtdReagentTarget, EtdActivation, CidActivation, HcdActivation. Order of conditions changed. 1852 conditions.
# [2018-06-03 20:41:54] Recalculate median injection time based on: Mz, AgcTarget.

## internal fragments
tdsIntF <- readTopDownFiles(topdownrdata::topDownDataPath("myo"),
                            type=c("a", "b", "c", "x", "y", "z",
                                   "aIx", "bIy", "cIz"))
tdsIntF
# TopDownSet object (24.70 Mb)
# - - - Protein data - - -
# Amino acid sequence (153): GLSDGEWQQVLNVWGKVEADIAGHGQ...GAMTKALELFRNDIAAKYKELGFQG
# Mass : 16964.96
# Modifications (3): Carbamidomethyl, Acetyl, Met-loss
# - - - Fragment data - - -
# Number of theoretical fragments: 102999
# Theoretical fragment types (27): a, a_, a*, aIx, aIx_, ..., y_, y*, z, z_, z*
# Theoretical mass range: [68.03;16925.98]
# - - - Condition data - - -
# Number of conditions: 1852
# Number of scans: 5882
# Condition variables (61): File, Scan, ..., Sample, MedianIonInjectionTimeMs
# - - - Intensity data - - -
# Size of array: 102999x5882 (0.09% != 0)
# Number of matched fragments: 515232
# Intensity range: [71.40;10704001.00]
# - - - Processing information - - -
# [2018-06-03 20:42:20] 515232 fragments [102999;5882] matched (tolerance: 5 ppm, strategies ion/fragment: remove/remove).
# [2018-06-03 20:42:20] Condition names updated based on: Mz, AgcTarget, EtdReagentTarget, EtdActivation, CidActivation, HcdActivation. Order of conditions changed. 1852 conditions.
# [2018-06-03 20:42:20] Recalculate median injection time based on: Mz, AgcTarget.

## rowViews
rowViews(tdsDflt)
# FragmentViews on a 153-letter sequence:
#   GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPE...SKHPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG
# Mass:
#   16964.964625
# Modifications:
#   Carbamidomethyl
#   Acetyl
#   Met-loss
# Views:
#        start end width     mass name  type   z
#    [1]     1   1     1    68.03 z1_   z_     1 [G]
#    [2]     1   1     1    72.04 a1    a      1 [G]
#    [3]     1   1     1    85.05 y1_   y_     1 [G]
#    [4]     1   1     1   100.04 b1    b      1 [G]
#    [5]     1   1     1   101.02 z1    z      1 [G]
#    ...   ... ...   ...      ... ...   ...  ... ...
# [2696]     2 153   152 16893.90 x152* x*     1 [LSDGEWQQVLNVWG...NDIAAKYKELGFQG]
# [2697]     1 152   152 16908.95 b152  b      1 [GLSDGEWQQVLNVW...RNDIAAKYKELGFQ]
# [2698]     1 152   152 16908.95 c152* c*     1 [GLSDGEWQQVLNVW...RNDIAAKYKELGFQ]
# [2699]     2 153   152 16910.93 x152  x      1 [LSDGEWQQVLNVWG...NDIAAKYKELGFQG]
# [2700]     1 152   152 16925.98 c152  c      1 [GLSDGEWQQVLNVW...RNDIAAKYKELGFQ]

rowViews(tdsIntF)
# FragmentViews on a 153-letter sequence:
#   GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPE...SKHPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG
# Mass:
#   16964.964625
# Modifications:
#   Carbamidomethyl
#   Acetyl
#   Met-loss
# Views:
#          start end width     mass name        type   z                          
#      [1]     1   1     1    68.03 z1_         z_     1 [G]                      
#      [2]     1   1     1    72.04 a1          a      1 [G]                      
#      [3]     1   1     1    85.05 y1_         y_     1 [G]                      
#      [4]    73  74     2    98.05 aIx[73-74]_ aIx_   1 [GG]                     
#      [5]    73  74     2    99.06 cIz[73-74]_ cIz_   1 [GG]                     
#      ...   ... ...   ...      ... ...         ...  ... ...                      
# [102995]     2 153   152 16893.90 x152*       x*     1 [LSDGEWQQVL...AKYKELGFQG]
# [102996]     1 152   152 16908.95 b152        b      1 [GLSDGEWQQV...AAKYKELGFQ]
# [102997]     1 152   152 16908.95 c152*       c*     1 [GLSDGEWQQV...AAKYKELGFQ]
# [102998]     2 153   152 16910.93 x152        x      1 [LSDGEWQQVL...AKYKELGFQG]
# [102999]     1 152   152 16925.98 c152        c      1 [GLSDGEWQQV...AAKYKELGFQ]

## select only internal fragments
tdsIntF[c("aIx", "bIy", "cIz"),]
# TopDownSet object (9.74 Mb)
# - - - Protein data - - -
# Amino acid sequence (153): GLSDGEWQQVLNVWGKVEADIAGHGQ...GAMTKALELFRNDIAAKYKELGFQG
# Mass : 16964.96
# Modifications (3): Carbamidomethyl, Acetyl, Met-loss
# - - - Fragment data - - -
# Number of theoretical fragments: 33975
# Theoretical fragment types (3): aIx, bIy, cIz
# Theoretical mass range: [131.05;16827.93]
# - - - Condition data - - -
# Number of conditions: 1852
# Number of scans: 5882
# Condition variables (61): File, Scan, ..., Sample, MedianIonInjectionTimeMs
# - - - Intensity data - - -
# Size of array: 33975x5882 (0.07% != 0)
# Number of matched fragments: 148392
# Intensity range: [71.40;1966595.88]
# - - - Processing information - - -
# [2018-06-03 20:42:20] 515232 fragments [102999;5882] matched (tolerance: 5 ppm, strategies ion/fragment: remove/remove).
# [2018-06-03 20:42:20] Condition names updated based on: Mz, AgcTarget, EtdReagentTarget, EtdActivation, CidActivation, HcdActivation. Order of conditions changed. 1852 conditions.
# [2018-06-03 20:42:20] Recalculate median injection time based on: Mz, AgcTarget.
# [2018-06-03 20:47:51] Subsetted 515232 fragments [102999;5882] to 148392 fragments [33975;5882].

@sgibb
Copy link
Collaborator

sgibb commented Jun 3, 2018

sgibb added a commit to sgibb/topdownr that referenced this issue Jul 17, 2018
@sgibb
Copy link
Collaborator

sgibb commented Jul 17, 2018

On @pavel-shliaha's request I revert the changes from topdownr.

To test this feature you have to install both packages from specific branches:

devtools::install_github("lgatto/MSnbase@issue82-internalFragments")
devtools::install_github("sgibb/topdownr@internalFragments")

@lgatto
Copy link
Owner

lgatto commented Oct 28, 2020

What's the status of this?

@sgibb
Copy link
Collaborator

sgibb commented Oct 29, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants