AutoAi is a high-level AI automation library that allows things like automatic training for a large amount of differents models and automatic data preprocessing. Support custom class implementation, making it work on any neural network you can imagine.
- Added support in the AutoPreprocessor for applying custom functions (or lambda) on specific columns at specific time using the addApplyFunctionForColumn method
- Added examples for the addApplyFunctionForColumn method in the AutoPreprocessor class
- Added support for exporting the scalers when calling the export method in the AutoPreprocessor class
- Added support in the AutoPreprocessor class for adding new columns
- Added support in the AutoPreprocessor class for setting colums values
- Tested in Python 3.7.5, should work in all Python 3.7+ versions.
- AutoAi is not able to train multiple models in parallel, it does so sequentially.
-
Classes
-
Interfaces
-
Download the repository
-
Install the requirements in the requirements.txt file (pip install -r requirements.txt)
Class that allows neural network / machine handling model handling with autotraining support.
This example create an AIModel object with a given machine learning model from the scikit-learn library, then train it.
# Get the dataset
import pandas as pd
df = pd.read_csv("Test_Dataset\\iris_preprocessed_predict_all.csv")
x = df.iloc[:, 0:4]
y = df.iloc[:, 4:]
# Create a machine learning model
from sklearn.ensemble import RandomForestClassifier
mlModel = RandomForestClassifier()
# Import AIModel
from autoAi.AIModel import AIModel
# Create the AIModel
model = AIModel("MyModel_CustomModel", baseDumpPath="Output_Models")
# Update the AIModel model
model.updateModel(mlModel)
# Update the AIModel dataset
model.updateDataSet(x, y, test_size=0.2)
# Train the AIModel
model.train(max_iter=50, batchSize=10, dumpEachIter=25, verboseLevel=2)
# Load the best model from all trained models located in "Output_Models/MyModel_CustomModel"
model.loadBestModel()
When we run the previous code, we should get something like this:
The image speaks for itself.
This example create a AIModel object with the auto trainer enabled, then train it on every model available in the default AutoTrainer class.
# Get the dataset
import pandas as pd
df = pd.read_csv("Test_Dataset\\iris_preprocessed_predict_all.csv")
x = df.iloc[:, 0:4]
y = df.iloc[:, 4:]
# Import AIModel
from autoAi.AIModel import AIModel
# Create the AIModel with the auto trainer enabled
model = AIModel("MyModel_AutoTraining", baseDumpPath="Output_Models", autoTrainer=True)
# Update the AIModel dataset
model.updateDataSet(x, y, test_size=0.2)
# Train the AIModel
model.train(max_iter=50, batchSize=10, dumpEachIter=25, verboseLevel=2)
# Load the best model from all trained models located in "Output_Models/MyModel_AutoTraining"
model.loadBestModel()
When we run the previous code, we should get this line telling us that the auto trainer is activated:
Class that contains all supported auto-training models
*This example create a custom AutoTrainer object, then feeds it to an AIModel object, then train the AIModel. In the method getModelsTypes in the AutoTrainer class, the second element in every tuple is a dictionary of parameters for the first tuple element, which is a machine learning / neural network model class.
# Get the dataset
import pandas as pd
df = pd.read_csv("Test_Dataset\\iris_preprocessed_predict_all.csv")
x = df.iloc[:, 0:4]
y = df.iloc[:, 4:]
# Creating the custom AutoTrainer class.
class AutoTrainer():
def getModelsTypes(self):
import sklearn.ensemble
import sklearn.linear_model
return [
sklearn.ensemble.RandomForestClassifier(n_estimators=100, random_state=42),
sklearn.ensemble.RandomForestRegressor(),
sklearn.linear_model.LinearRegression()
]
# Import AIModel
from autoAi.AIModel import AIModel
# Create the AIModel
model = AIModel("MyModel_CustomTrainer", baseDumpPath="Output_Models", autoTrainer=True, autoTrainerInstance=AutoTrainer())
# Update the AIModel dataset
model.updateDataSet(x, y, test_size=0.2)
# Train the AIModel
model.train(max_iter=50, batchSize=10, dumpEachIter=25, verboseLevel=2)
# Load the best model from all trained models located in "Output_Models/MyModel_CustomTrainer"
model.loadBestModel()
Class that allows automatic data preprocessing
This example uses the AIModel and create a model for each column in the dataset that has NaN values to predict those values. The AutoTrainer can either be specified or not, if not like in this case, the default AutoTrainer class is used.
from autoAi.AutoPreprocessor import AutoPreprocessor
# Create the AutoPreprocessor object
obj = AutoPreprocessor(datasetPath='Test_Dataset\\iris.csv',
datasetType='csv', yDataNames=['species'])
# Specify the dataset categorical names
obj.updateCategoricalColumns(categoricalNames=['species'])
# Specify the current data scale type
obj.updateScaleData(scaleDataType=['minmax'])
# Specify the dataset data handling method. In this case 'predict', which
# will use the autoAi.AiModel to build models that will predict the NaNs values
obj.updateNaNHandlingMethod(nanDataHandling='predict', predictMaxIter=50, predictBatchSize=10,
predictDumpEachIter=25, predictVerboseLevel=2)
# Execute the preprocessing with the current settings
obj.execute()
# Export the preprocessed dataset
obj.export(filePath="Test_Dataset\\iris_preprocessed_predict_all.csv",
fileType='csv')
# Print the preprocessed data
print(obj.getFullDataset())
This example uses the AIModel and create a model for each column in the dataset that has NaN values to predict those values. The AutoTrainer is created and passed as a parameter to the AutoPreprocessor instance.
from autoAi.AutoPreprocessor import AutoPreprocessor
# Creating the custom AutoTrainer class
class CustomAutoTrainer():
def getModelsTypes(self):
import sklearn.ensemble
return [
sklearn.ensemble.VotingRegressor(estimators=[('lr', sklearn.linear_model.LinearRegression()),
('rf', sklearn.ensemble.RandomForestRegressor(n_estimators=50))])
]
# Create the AutoPreprocessor object
obj = AutoPreprocessor(datasetPath='Test_Dataset\\iris.csv',
datasetType='csv', yDataNames=['species'])
# Specify the dataset categorical names
obj.updateCategoricalColumns(categoricalNames=['species'])
# Specify the current data scale type
obj.updateScaleData(scaleDataType=['minmax'])
# Specify the dataset data handling method. In this case 'predict', which
# will use the autoAi.AiModel to build models that will predict the NaNs values
obj.updateNaNHandlingMethod(nanDataHandling='predict', predictAutoTrainer=CustomAutoTrainer(),
predictMaxIter=50, predictBatchSize=10, predictDumpEachIter=25,
predictVerboseLevel=2)
# Execute the preprocessing with the current settings
obj.execute()
# Export the preprocessed dataset
obj.export(filePath="Test_Dataset\\iris_preprocessed_predict_custom.csv",
fileType='csv')
# Print the preprocessed data
print(obj.getFullDataset())
This example apply a first lambda expression on the column 'sepal_length' after the categorical preprocessing, then a second lambda expression on the same column after all preprocessing steps occured
from autoAi.AutoPreprocessor import AutoPreprocessor
# Create the AutoPreprocessor object
obj = AutoPreprocessor(datasetPath='Test_Dataset\\iris.csv',
datasetType='csv', yDataNames=['species'])
# Specify the dataset categorical names
obj.updateCategoricalColumns(categoricalNames=['species'])
# Specify the current data scale type
obj.updateScaleData(scaleDataType=['minmax'])
# Multiplying by 1000 each element in the column 'sepal_length' after doing the
# categorical preprocessing (1)
obj.addApplyFunctionForColumn('sepal_length', lambda x: x * 1000, step=1)
# Dividing by 80 each element in the column 'sepal_length' after every preprocessing steps (5)
obj.addApplyFunctionForColumn('sepal_length', lambda x: x / 80, step=5)
# Execute the preprocessing with the current settings
obj.execute()
# Export the preprocessed dataset
obj.export(filePath="Test_Dataset\\iris_preprocessed_lambdas.csv",
fileType='csv')
# Print the preprocessed data
print(obj.getFullDataset())
Interface for custom neural network implementation
This example create an AIModel object with a given custom neural nework model from the keras library using the ICustomWrapper interface, then train it.
# Get the dataset
import pandas as pd
df = pd.read_csv("Test_Dataset\\iris_preprocessed_predict_all.csv")
x = df.iloc[:, 0:4]
y = df.iloc[:, 4:]
# Import the ICustomWrapper interface
from autoAi.Interfaces import ICustomWrapper
# Import the Keras libraries
from keras.layers import Dense
from keras.models import Sequential
# Create the Wrapper using the ICustomWrapper interface
class CustomKerasWrapper(ICustomWrapper):
def __init__(self):
self.model = Sequential()
self.model.add(Dense(7, input_dim=4, activation='relu'))
self.model.add(Dense(7, activation='relu'))
self.model.add(Dense(2, activation='sigmoid'))
self.model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['mean_squared_error'])
def fit(self, X, y):
self.model.fit(X, y, verbose=0)
def predict(self, X):
return self.model.predict(X)
# Import AIModel
from autoAi.AIModel import AIModel
# Create the AIModel
model = AIModel("MyModel_CustomModelKeras", baseDumpPath="Output_Models")
# Update the AIModel model with a CustomKerasWrapper
model.updateModel(CustomKerasWrapper())
# Update the AIModel dataset
model.updateDataSet(x, y, test_size=0.2)
# Train the AIModel
model.train(max_iter=1000, batchSize=10, dumpEachIter=500, verboseLevel=2)
# Load the best model from all trained models located in "Output_Models/MyModel_CustomModelKeras"
model.loadBestModel()