Biopandas PDB output formatting leads to a ton of segments when reading with MDAnalysis: reason and my quick fix

Hi all,

was a bit baffled opening biopandas PDB output with MDAnalysis. Instead of some dozen segments, I got thousands. Here's why & my hacky fix:

Biopandas outputs the rows in a following way:
```
ATOM  50786  CB  ASP q  96     219.123 233.404 332.880  1.00 97.39           C
ATOM  50787  N   PRO q  97     222.483 233.701 332.586  1.00 100.66           N
```
while in MDAnalysis expects this format:

```
ATOM  51419  O   UNK r 113     214.624 201.542 285.597  1.00 99.63           O
ATOM  51420  CB  UNK r 113     217.297 202.297 286.117  1.00100.32           C
```

Due to this formatting when B-factors have five numbers (>99.99), MDAnalysis parses the last digit of the B-factor to be the segid and uses them as chains, see [the code for th eparser](https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/topology/PDBParser.py):
Line 297:
```
                segids.append(line[66:76].strip())
```

Lines 304-306:
```
        # If segids not present, try to use chainids
        if not any(segids):
            segids = chainids
```

As a quick fix, I commented out the last if statement in MDAnalysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Biopandas PDB output formatting leads to a ton of segments when reading with MDAnalysis: reason and my quick fix #109

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Biopandas PDB output formatting leads to a ton of segments when reading with MDAnalysis: reason and my quick fix #109

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions