[BUG] ctd (conjoint triad descriptors) features are not normalised and vary on each run. #34

MattElt · 2024-01-30T11:37:41Z

Hello.

I've found this library very useful, but have recently noticed a bug.
When calculating the ctd features I noticed differing results with the same list of sequences.
Looking further into it I first thought that the column names (ctd_desc) were being assigned in a random order, but even when trying to match up columns with similar data and ignoring column headers, the data (ctd_arr) did not match identically on subsequent runs.
Also, from the paper referenced, describing the ctd calculation, the output is supposed to be normalised, i.e. between 0 and 1. The output from ctd is given in integers.

import protlearn.features as ftr
seqs = list(df[protein_sequence_column_name])
ctd_arr, ctd_desc = ftr.ctd(seqs)
df = pd.DataFrame(data=ctd_arr, columns=ctd_desc)

It looks like there is some error in the implementation of the ctd function.

Versions:
python 3.11.3, protlearn 0.0.3, pandas 2.0.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] ctd (conjoint triad descriptors) features are not normalised and vary on each run. #34

[BUG] ctd (conjoint triad descriptors) features are not normalised and vary on each run. #34

MattElt commented Jan 30, 2024

[BUG] ctd (conjoint triad descriptors) features are not normalised and vary on each run. #34

[BUG] ctd (conjoint triad descriptors) features are not normalised and vary on each run. #34

Comments

MattElt commented Jan 30, 2024