Release which includes image support for the Content Analyzer
and RecSys
modules!
- This release was co-developed with @m-elio
NOTE: The minimum Python version has been bumped up from Python 3.7 to Python 3.8 in order to use @functools.cached_property
decorator
Added
Content Analyzer
- Implemented visual preprocessors thanks to
torchvision
library- Also
torch
augmenters were implemented - All of them can be checked in the docs
- Also
- Implemented postprocessors techniques which also work for textual techniques
- Visual bag of words (with count and tfidf weighting schema)
- Scipy vector quantization
- Dimensionality reduction techniques from
sklearn
(PCA, Gaussian random projections, Feature agglomeration)
- Images path to process specified in the raw source could be an absolute_path, relative_path, online url!
- Implemented several content techniques which extract embedding features from images
- Pre-trained models from
timm
- Pre-trained caffe models using
opencv.dnn
- Hog descriptor, Canny edge detector, LBP, SIFT from
skimage
- Color histogram
- Custom filter convolution
- Pre-trained models from
- Implemented
FromNPY
technique, which imports features from a numpy serialized matrix
RecSys
- Implemented VBPR technique following the corresponding paper
Changed
Content Analyzer
- Changed
Ratings
class to use numpy arrays and integer mappings instead of relying on python dictionaries and strings - Adapted
FieldContentProductionTechnique
to consider the distinction between textual and visual techniques - Added possibility to serialize contents produced with multi threading
RecSys
- Vectorized computation of
CentroidVector
algorithm - Adapted content based algorithm abstraction to make room for neural algorithms
- Fixed missing Bootstrap partitioning technique from online documentation
AllItemsMethodology
by default now considers as items catalog the union between train and test setHoldOutPartitioningTechnique
can now accept an integer value representing the n° of instances to hold rather than a percentage- Changed log of users skipped in partitioning/algorithm fitting: a single print with total number of skipped users is fired instead of a single one for each skipped user
EvalModel
- Changed
NDCG
implementation to allow the choice of thegain
weights (linear
orexponential
) and the definition of adiscount
function - Improved visualization of statistical tests results