You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've found this library very useful, but have recently noticed a bug.
When calculating the ctd features I noticed differing results with the same list of sequences.
Looking further into it I first thought that the column names (ctd_desc) were being assigned in a random order, but even when trying to match up columns with similar data and ignoring column headers, the data (ctd_arr) did not match identically on subsequent runs.
Also, from the paper referenced, describing the ctd calculation, the output is supposed to be normalised, i.e. between 0 and 1. The output from ctd is given in integers.
Hello.
I've found this library very useful, but have recently noticed a bug.
When calculating the ctd features I noticed differing results with the same list of sequences.
Looking further into it I first thought that the column names (ctd_desc) were being assigned in a random order, but even when trying to match up columns with similar data and ignoring column headers, the data (ctd_arr) did not match identically on subsequent runs.
Also, from the paper referenced, describing the ctd calculation, the output is supposed to be normalised, i.e. between 0 and 1. The output from ctd is given in integers.
import protlearn.features as ftr
seqs = list(df[protein_sequence_column_name])
ctd_arr, ctd_desc = ftr.ctd(seqs)
df = pd.DataFrame(data=ctd_arr, columns=ctd_desc)
It looks like there is some error in the implementation of the ctd function.
Versions:
python 3.11.3, protlearn 0.0.3, pandas 2.0.3
The text was updated successfully, but these errors were encountered: