-
Notifications
You must be signed in to change notification settings - Fork 713
Open
Labels
bugUnexpected behaviour that should be corrected (type)Unexpected behaviour that should be corrected (type)scikit-learnIssue could be related to scikit-learn framework (component)Issue could be related to scikit-learn framework (component)trees
Description
I've trained a model using scikit-learn's DecisionTreeClassifier on a dataset with 1,600,000 rows, 15 features, max_depth=7. When I try to convert using coremltools for a model over about 2 mb, I get the following error:
malloc: *** error for object 0x7fb096a0f738: incorrect checksum for freed object - object was probably modified after being freed. *** set a breakpoint in malloc_error_break to debug
I might be doing this part wrong, but when I try to log and debug, I get:
Segmentation fault: 11
To recreate:
import pandas as pd
import numpy as np
import sklearn
import coremltools
import random
import string
from sklearn.tree import DecisionTreeClassifier
X = np.random.choice([0, 1], size=(15*1000,), p=[1./3, 2./3])
X = np.split(X,1000)
y = []
for i in range(0,1000):
x = ''.join(random.choice(string.lowercase) for x in range(5))
y.append(x)
clf = DecisionTreeClassifier()
clf.fit(X,y)
d = {'arr': X, 'str': y}
df = pd.DataFrame(data=d)
coreml_model = coremltools.converters.sklearn.convert(clf, "arr",
"str") ## malloc error
coreml_model.save('mymodel.mlmodel')
Metadata
Metadata
Assignees
Labels
bugUnexpected behaviour that should be corrected (type)Unexpected behaviour that should be corrected (type)scikit-learnIssue could be related to scikit-learn framework (component)Issue could be related to scikit-learn framework (component)trees