Notes:
- Versioning: The versioning scheme depends on the Khiops version supported (first 3 digits) and
a Khiops Python Library correlative (4th digit).
- Example: 10.2.1.4 is the 5th version that supports khiops 10.2.1.
- Internals: Changes in Internals sections are unlikely to be of interest for data scientists.
- (
sklearn)keep_selected_variables_onlyparameter to the predictors (KhiopsClassifierandKhiopsRegressor)
- (
core) Renamevariable_part_dimensionstoinner_variable_dimensionsin Coclustering results.
- (
sklearn)n_feature_partsparameter to the supervised estimators
- (
sklearn) Default value ofn_featuresfor the supervised estimators. - Internals:
- Detection of unsupported installation modes on Windows operating systems.
- (
core) Samples dir path construction when HOME is a remote path
- (General) Automatic Credentials Discovery-based credential retrieval for Google cloud storage (GCS).
- (
sklearn) Temporary directory race condition in estimators.
- (General) Support for Python 3.14.
- (General) Warnings and error messages for unsupported installation setups.
- (
core) Dictionary API support for dictionary, variable and variable block comments, and dictionary and variable block internal comments. - (
core) DictionaryRuleclass and supporting API for serializingRuleinstances. - (
core) New way to add a variable to a dictionary using a complete specification. - (
core) New API constants for rules used in automatic variable construction:DEFAULT_CONSTRUCTION_RULES: names of table and entity construction rules, which are applied by defaultCALENDRICAL_CONSTRUCTION_RULES: names of date, time and timestamp rules.
- (
sklearn)TextKhiops type support at the estimator level.
- (General) Pip packages are published on PyPI.
- (General) Conda packages only depend on the
conda-forgechannel and are published onconda-forge. - (
core) Dictionary API (DictionaryDomain, Dictionary, MetaData), when a requested key is not found in getters, returnNoneinstead of raising aKeyErrorexception.
- (
sklearn) Remove then_features_evaluated_,feature_evaluated_names,feature_evaluated_importances_,n_features_used_,feature_used_names_andfeature_used_importances_Khiops classifier and regressor estimator attributes.
- (General) Inconsistency between the
tools.download_datasetsfunction and the current samples directory according tocore.api.get_samples_dir().
- (
core) API support for predictor interpretation and reinforcement. - (
core) API support for instance-variable coclustering model training. - (
core) Support for text types in prediction and coclustering models. - (
core) Analysis and coclustering report JSON serialization support. - (
sklearn) Automatic removal of newline characters in strings on Pandas dataframe columns. This is to ensure the proper working of the Khiops engine.
- (
core) Syntax for additional data tables specification, which uses the data paths. - (
core) API specification of the results path: full paths to report files are now used instead of result directories. - (
sklearn) Specification of the hierarchical multi-table schemata, which now uses data paths as in the Core API. - (
general) Various other changes and updates for Khiops 11.0.0-b.0 compatibility.
- (
core) The results directory parameter of the Core API functions. The full path to the reports must now be specified instead. - (
core) The "``"-based secondary table path specification. The "/"-based data paths must now be used instead. - (
sklearn) The specification syntax for hierarchical multi-table datasets. The "/"-based data paths must now be used instead, as in the Core API.
- (
general) All functions, attributes and features that had been deprecated in the 10.3.2.0 version.
- (
sklearn) Documentation display for thetrain_test_split_datasetsklearn helper function.
- (
sklearn) Support for boolean and float targets inKhiopsClassifier.
- (
sklearn) Crash when there were no informative trees in predictors.
- (
core) Thebuild_multi_table_dictionary_domainhelper function.
- (
core) Dictionary file.jsonextension check in thekhiops.dictionary.read_dictionary_filefunction.
- (
sklearn) Thetrain_test_split_datasethelper has been moved fromkhiops.utilstokhiops.sklearn. - (
sklearn) Thetransform_pairsparameter of theKhiopsEncodersklearn estimator has been renamed totransform_type_pairs.
- (
sklearn) Theis_fitted_estimator attribute. The Scikit-learncheck_is_fittedfunction can be used to test the fitted state of the estimators. - (
sklearn) Then_pairsparameter of theKhiopsRegressorsklearn estimator. It was never supported.
- (General) Support for Python 3.13.
- (General)
visualize_reporthelper function to open reports with the Khiops Visualization and Khiops Co-Visualization app.
- (General) Initialization failing in Conda-based environments.
- (
core) Support for system parameters has been moved from theKhiopsLocalRunnerto thecoreAPI. - (
core) System parametermax_memory_mbhas been renamed tomemory_limit_mb. - (
core) System parameterkhiops_temp_dirhas been renamed totemp_dir.
- (General) Khiops Python 9 compatibility.
- (
sklearn)train_test_split_datasethelper function to ease the splitting in train/test for multi-table datasets. - (
sklearn) Complete support forcoreAPI functions parameters in thesklearnestimators.
- (General) The Conda package only depends on the
conda-forgeandkhiopschannels. - Internals:
- Improve and simplify the integration with the
khiops-corepackage via itskhiops_envscript.
- Improve and simplify the integration with the
- (
sklearn) Sklearn's attributes for supervised estimators.
- (
core) API functions handling of unknown parameters: they now fail. - Internals:
- Detection of the path to the MPI command: the real path to the executable is now used.
- (
core) Documentation of thespecific_pairsparameter for thetrain_predictorandtrain_recodercore API functions.
- (
core) The following parameters of thetrain_predictorcore API functions:max_groupsmax_intervalsmin_group_frequencymin_interval_frequencyresults_prefixsnb_predictorunivariate_predictor_numberdiscretization_methodfor supervised learninggrouping_methodfor supervised learning
- Internals:
- The OpenMPI backend now executes with the
--allow-run-as-rootoption.
- The OpenMPI backend now executes with the
- (
sklearn) Support for sparse arrays in sklearn estimators.
- Internals:
- MPI backend from MPICH to OpenMPI for native + Pip-based Linux installations.
core- Metric name search in estimator analysis report.
- (
sklearn) 1:1 relations to multi-table datasets. - (
sklearn) Estimators'fitmethods now accept single-column pandas dataframes asytarget.
- (
core) Improve user error and warning messaging.
- (General) Reinstate Rocky Linux 8 support.
Note: This release marks the open sourcing of Khiops:
- The
khiopspackage replaces the oldpykhiopspackage. We recommend to uninstallpykhiopsbefore installingkhiops. More information at the Khiops site. - The
khiopspackage uses a new four digit versioning convention. - The
khiopsconda package is available for many environments. See the Khiops site for more information.
- General:
khiops-pythonis now available with condakhiopspackage. This package bundleskhiops-pythonand the Khiops binaries so no system-wide Khiops installation is needed. More information at the Khiops website.- Support for python 3.12.
sklearn- Estimator classes can now be trained from Numpy arrays in single-table mode.
corestdout_file_pathandstderr_file_pathparameters forkhiops.core.apifunctions. These parameters allow to save the stdout/stderr output of the Khiops execution.
sklearn- Estimator classes now comply with scikit-learn standards.
core- The JSON initialization of
AnalysisResults,CoclusteringResultsand its component classes is coherent with the empty initialization.
- The JSON initialization of
core- Wrong default discretization and grouping methods in
train_predictorandtrain_recoder. KhiopsDockerRunnerchecking the existenceshared_diron remote paths.
- Wrong default discretization and grouping methods in
sklearn:- Direct support for coclustering simplification, via the
KhiopsCoclustering.simplifymethod.
- Direct support for coclustering simplification, via the
- Internals:
- The
TaskRegistry.set_task_end_versionmethod for specifying the ending Khiops version for a task.
- The
sklearn:- Verbose mode support is now complete for coclustering.
- Internals:
- User-provided scenario prologue is now taken into account into the tasks.
- General:
- License has been updated to BSD-3 Clear.
sklearn:auto_sortreplacesinternal_sortto control input table sorting in estimators.- The multi-table documentation has been streamlined to be more precise and clearer.
sklearn:- The
max_part_numbersparameter ofKhiopsCoclusteringfitmethod. TheKhiopsClusteringsimplifymethod now contains the simplification feature. - The
internal_sortestimator parameter. Theauto_sortestimator parameter replaces it.
- The
core:- The
build_multi_table_dictionaryAPI function. Thebuild_multi_table_dictionary_domainhelper function provides the same functionality.
- The
- Internals:
- The
build_multi_table_dictionarytask. This task will not be supported after Khiops 11.
- The
sklearn:- Support for snowflake database schemas.
core:- Support for Khiops on MacOS.
- core:
- Khiops coclustering is not executed with MPI anymore.
- Bug when the JSON reports had colliding character ranges but no particular colliding character.
- Internals:
- The transformation of the
core.apifunction parameters to scenario files has now an additional layer mediated by theKhiopsTaskclass. These objects have all the necessary information about a particular Khiops tasks (ex.train_predictor) to transform its parameters to an scenario file. Furthermore, this allows to export the task signatures to API description languages such as Protocol Buffers. - The
core.filesystemnow exposes its API as a set of functions instead of resource objects. They are still available but the API should be prioritized for its use.
- The transformation of the
- General:
- Support for Python 3.6, pyKhiops 10.1.1 was the last version to support it.
- General:
- Jupyter notebooks tutorials to the documentation site.
pk-statusscript to check the pyKhiops installation.
- General:
- Code samples scripts not being installed: They are located in
<pykhiops_install_dir>/samples.
- Code samples scripts not being installed: They are located in
sklearnKhiopsCoclusteringraising an exception instead of a warning when no informative coclustering was found.internal_sortparameter being ignored inKhiopsCoclustering.
coredetect_formatfailing when the Khiops log had extra output.
sklearn:- Estimators now accept dataframes with numerical column indexes.
KhiopsClassifiernow accepts integer target vectors.classes_estimator attribute forKhiopsClassifier(available once fitted).feature_names_out_estimator attribute forKhiopsEncoder(available once fitted).export_report_fileandexport_dictionary_fileto export Khiops report and dictionary files once the estimators are fitted.internal_sortparameter for estimators that may be used to not sort the tables on the internal procedures of pyKhiops (default isTrue). Disabling it may give speed gains in large datasets.verboseflag for debugging estimators: It shows internal information and doesn't erase temporary files.
core:get_khiops_versionAPI function.- New rule
LocalTimestamprule for AutoML feature generation (requires Khiops 10.1). max_total_partsparameter tosimplify_coclusteringcore API function (requires Khiops 10.1).
- Internals:
- Khiops samples directory in Linux now defaults to
/opt/khiops/sampleswhich is where it is installed by default.
- Khiops samples directory in Linux now defaults to
sklearn:- Breaking: Estimators return NumPy arrays instead of dataframes in
predict,predict_proba,transform,fit_predictandfit_transformmethods.
- Breaking: Estimators return NumPy arrays instead of dataframes in
core:train_recoderAPI function does not build trees by default anymore.- When pyKhiops reads a legacy Khiops JSON report/dictionary with Unicode decoding errors it now
only warns and loads it anyway with the
errors="replace"setting. Before it raised an exception.
- General:
- Simpler multi-table samples in the documentation.
sklearn:- Datasets based on file paths. From pyKhiops 11 only in-memory datasets will be accepted. File
based treatments can be treated with the
coreAPI. max_part_numberas instance parameter ofKhiopsCoclustering. It is now afitparameter. It will be eliminated in pyKhiops 11.
- Datasets based on file paths. From pyKhiops 11 only in-memory datasets will be accepted. File
based treatments can be treated with the
core:get_khiops_infoandget_khiops_coclusteringAPI functions. From Khiops 10.1 there is no need of license key so these methods have no use anymore. They are kept deprecated for backwards compatibility only. It will be eliminated in pyKhiops 11.
- Internals:
legacy_modeinPyKhiopsRunner. It its place there is generic versioning scheme to handle features and Khiops scenarios.
sklearn:- Bug in dataframe-based datasets with numerical key columns
sklearn:- A new way to specify multi-table inputs for estimators via a
dict. From now on it is the standard way to specify multi-table datasets and the others are deprecated. See the documentation for more details. - New examples of use of
sklearnin the scriptsamples_sklearn.py. Available also in the documentation.
- A new way to specify multi-table inputs for estimators via a
core:- It now fully supports remote filesystems provided for which the extra dependencies are installed (it is still necessary to install Khiops remote filesystem plugins).
- Other:
- Most methods that accept containers now additionally accept classes implementing their abstract
interface (eg.
collections.abc.Sequence,collections.abc.Mapping).
- Most methods that accept containers now additionally accept classes implementing their abstract
interface (eg.
- Internals:
- The default value of
samples_dirof thePyKhiopsLocalRunnerclass can now be set via the environment variableKHIOPS_SAMPLES_DIR. - New classes
DatasetandDatasetTabletosklearn.tablesto handle sklearn table transformations for Khiops.
- The default value of
- General:
- Improved documentation completeness and layout.
sklearn- Estimators do not depend anymore on local files. This fixes many issues including those related to serialization.
KhiopsRegressornow warns whenn_trees > 0.
core- Functions
deploy_coclusteringanddeploy_model_for_metricsare moved fromcore.apitocore.helpers. The latter module will contain non-basic functionality, whereascore.apiwill contain only the official Khiops API.
- Functions
sklearn:tupleandlistmulti-table input modes in estimators.keyparameter of estimators.variablesparameter ofKhiopsCoclusteringestimator.
sklearn:- Breaking
computation_dirparameter insklearnestimators. Khiops output files can still be saved with the parameteroutput_dir.
- Breaking
- Other:
- Breaking Support for Python 2: 10.0.4 was the last version to support it.
sklearn:- Data-race when using many estimators in parallel.
- Bug in
KhiopsCoclusteringwhen the trained coclustering did not cluster the key variable. - Bug in
KhiopsEncoderthat happened because a bad handling of OS-dependent line separators.
- Other:
- Bug with
KHIOPS_HOMEenvironment variable not properly being taken into account when initializing the runner.
- Bug with
- Class
PyKhiopsDockerRunnerin packagepykhiops.extrasallowing to run pyKhiops with a remote Khiops Docker image as backend.
- Bug with
PyKhiopsRunner'sscenario_prologuefailing to execute. - Bug in
pykhiops.sklearnestimators not taking into account the target variable asUsed. - Bug in CentOS not taking into account environment variables and failing to execute.
extract_clusterscore API function to extract a dimension's clusters into a file.deploy_predictor_for_metricscore API function to evaluate performance metrics with third-party.detect_data_table_formatcore API function to obtain (heuristically) theheader_lineandfield_separatorfrom a data file (requires Khiops >= 10.0.1) libraries.train_predictorandevaluate_predictornow accept amain_target_valueparameter- Various ease-of-access methods:
AnalysisResults:get_reportsEvaluationReport:get_snb_performance,get_snb_lift_curveandget_snb_rec_curvePredictorPerformance:get_metricandget_metric_names
- New examples to
samples.py.
Internals:
- Support for remote filesystems
s3andgcsinsklearnmodule (installation with extra dependencies required). - New
scenariomodule containing classes to write templatized-scenarios and that also handle character encoding (see Fixed below). - Support for the new
subToolkey of Khiops JSON files. - Command-line options for
samples.pyto specify which samples to run.
strparameters of core API functions may now also bebytesandbytearray- Changed all
coremodule docstrings to the "NumPy" style. writeandwritelnmethods of classes in thedictionary,analysis_resultsandcoclustering_resultsnow require aPyKhiopsOutputFileobject as argument.- Query methods such as
get_dictionaryfromDictionaryDomainnow raiseKeyErrorinstead of returningNoneif the query fails. - Core API functions that use a
field_separatorparameter now accept the string "\t". KhiopsClassifierandKhiopsRegressornow warn of incorrect types of target variable.- Internals:
PyKhiopsLocalRunnernow calls directly theMODLexecutables instead of the Khiops launch scripts (only for Khiops >= 10.0.1).- Specific pair parameter is not handled anymore with a temporary file.
- Improved temporary file services in
PyKhiopsRunner.
- Field separator constructor parameter for estimator classes of
sklearn
- Dictionary files created with pyKhiops are now guaranteed to be free of character
encoding errors unless the new JSON field
khiops_encodingis non-existent or set tocolliding_ansi_utf8in which case a warning is emitted - Khiops execution problems due to the character encoding of certain parameters
- Khiops error reporting problems due to to character encoding
train_coclusteringnow returns the path of the JSON coclustering report (.khcj)get_dimensionsnot working at all inCoclusteringReport- Some Python 2 incompatibilities in Linux
get_samples_dircore API function (works only with a local runner).train_predictor,evaluate_model,train_recoder,train_coclusteringanddeploy_coclusteringnow have return values (paths of relevant output files).
transfer_databasecore API function renamed todeploy_model.build_transferred_dictionarycore API function renamed tobuild_deployed_dictionary.- In general the "model deployment" concept replaces that of "database transfer" in all code and in particular in the samples scripts.
- It is not necessary to specify a relative path as
./pathfor theresults_dirargument. - Messages enabled with the
traceparameter go again tostdout.
sklearnsub-module updated for pyKhiops 10.sklearnsamples notebooks.deploy_coclusteringcore API function.build_multitable_dictionarycore API function.
- The information messages of
sklearnare now deactivated by default (they can be reactivated manually).
sklearndependency onoverridespackage.
- Small transformation bug in
convert-pk10.
detect_formatparameters to API methods that read databases. It is enabled by default and Khiops will try to automatically detect the format of input data tables. See the docstrings for the new behavior ofheader_lineandfield_separator.specific_pairsoption replacingonly_pairs_with. It allows the methodstrain_predictorandtrain_recodermore options to generate pairs of variables (only_pairs_withkept in legacy mode).PykhiopsRunnerclass to extend pyKhiops to different backends.PyKhiopsLocalRunnerimplements the current functionality and is the default runner.
dictionary_domainparameter removed from all relevant API methods. Now methods accepting a dictionary file path as argument also accept aDictionaryDomainobject.- Renamed various parameters. Until the next major release pykhiops will warn when these old parameters are used.
- All optional parameters of API methods are now proper named parameters (no kwargs).
- All errors are now handled with custom
PyKhiops*exceptions. - Updated default values to those of Khiops 10. Notably
max_trees == 10by default. tools/convert-pk10.pyscript no longer exists. Now when installing pykhiops aconvert-pk10will be automatically be installed the user's local python scripts directory. Optionally, the functionpykhiops.tools.convert_pk10provides the same functionality.samples.pyscript is now in snake case and improved.- Simplified
samples.ipynb. - Messages in
tracemode now go tostderr.
- Naive Bayes classifier option from
train_predictor.
simplify_coclustering:results_prefixnow works.subprocess.Popenreturning 1 in Linux even when the Khiops process ended correctly. This made the legacy mode detection fail.- API functions failing when
stderrwas not empty even though the Khiops process ended correctly. Now it just emits a warning.
- Compatibility for Khiops 10
- Legacy support for Khiops 9
- Partial compatibility for Khiops 10 JSON reports (no tree report)
- Script
tools/convert_pk10.pyto migrate from pyKhiops 9 to 10. See Changed below - Extraction of dictionary data paths: See
core.DictionaryDomain.extract_data_paths - Robust JSON loading: tries
utf-8encoding, then the system's default. - Licence file
- Now all variable/method names follow the PEP8 convention: All methods are now in snake_case
- In
core.train_predictor:fill_test_database_settingsandmap(kept in legacy mode).
- Sources (first commit)