Skip to content

Biopandas PDB output formatting leads to a ton of segments when reading with MDAnalysis: reason and my quick fix #109

@mrauha

Description

@mrauha

Hi all,

was a bit baffled opening biopandas PDB output with MDAnalysis. Instead of some dozen segments, I got thousands. Here's why & my hacky fix:

Biopandas outputs the rows in a following way:

ATOM  50786  CB  ASP q  96     219.123 233.404 332.880  1.00 97.39           C
ATOM  50787  N   PRO q  97     222.483 233.701 332.586  1.00 100.66           N

while in MDAnalysis expects this format:

ATOM  51419  O   UNK r 113     214.624 201.542 285.597  1.00 99.63           O
ATOM  51420  CB  UNK r 113     217.297 202.297 286.117  1.00100.32           C

Due to this formatting when B-factors have five numbers (>99.99), MDAnalysis parses the last digit of the B-factor to be the segid and uses them as chains, see the code for th eparser:
Line 297:

                segids.append(line[66:76].strip())

Lines 304-306:

        # If segids not present, try to use chainids
        if not any(segids):
            segids = chainids

As a quick fix, I commented out the last if statement in MDAnalysis.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions