Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error AttributeError: 'implicit.evaluation._memoryviewslice' object has no attribute 'dtype' when calling mean_average_precision_at_k function #726

Open
MRossa157 opened this issue Jan 7, 2025 · 4 comments

Comments

@MRossa157
Copy link

Hello! I encountered an issue when using the mean_average_precision_at_k function from the implicit library.

Problem Description:

When calling the mean_average_precision_at_k function, the following error occurs:

AttributeError: 'implicit.evaluation._memoryviewslice' object has no attribute 'dtype'

Context:

  • Operating System: Windows 10
  • Python: 3.10.7
  • implicit library version: 0.7.2
  • Installed dependencies:
    [tool.poetry.dependencies]
    python = "^3.10.7"
    pandas = "^2.2.2"
    implicit = "^0.7.2"

Steps to Reproduce:

  1. Installed the implicit library version 0.7.2.
  2. Called the mean_average_precision_at_k function with the following parameters:
    metric_map = mean_average_precision_at_k(
        model,
        csr_train,
        csr_test,
        K=6,
        show_progress=True,
    )
  3. Encountered the above-mentioned error.

Expected Behavior:

The function should return the MAP@K metric value without errors.

Additional Information:

  • Tried reinstalling the library and its dependencies, but the error persists.
  • The code includes the following imports:
    import numpy as np
    import pandas as pd
    from implicit.cpu.als import AlternatingLeastSquares as ALScpu
    from implicit.evaluation import mean_average_precision_at_k
    from scipy.sparse import coo_matrix

I would appreciate any assistance in resolving this issue.

@fkurushin
Copy link

fkurushin commented Jan 10, 2025

Hi @MRossa157 ! I have trained model with python implicit package and faced the same problem:

The minimum example to reproduce the error

import os
import random
import pandas as pd
from scipy.sparse import csr_matrix
from implicit.evaluation import train_test_split, ndcg_at_k, mean_average_precision_at_k
from implicit.gpu.als import AlternatingLeastSquares

os.environ['OPENBLAS_NUM_THREADS']="1"
os.environ['CUDA_VISIBLE_DEVICES']="0"

# init random data
n_actions = 100000
max_uid = 100000
max_action_id = 10000

df = pd.DataFrame(data={
    "user_id" : [random.randint(1, max_uid) for i in range(0, n_actions)],
    "action" : [random.randint(1, max_action_id) for i in range(0, n_actions)],
    "impression" : [1 for i in range(0, n_actions)]
})

# convert to sparse format
user_rows = [uid for uid in df.user_id.tolist()]
query_cols = [st for st in df.action.tolist()]
qvecs = csr_matrix((df.impression, (user_rows, query_cols)))

# train test split and model training
train_user_items, test_user_items = train_test_split(qvecs, train_percentage=0.9, random_state=19)

model = AlternatingLeastSquares(factors=130, regularization=0.05, alpha=1.0, calculate_training_loss=True)
model.fit(train_user_items)

# calculate ndcg
ndcg = ndcg_at_k(model, train_user_items, test_user_items, K=14, show_progress=True, num_threads=1)

packages version:
implicit-0.7.2 (built from source)
python-3.11.2
cuda-12.3

os:
Debian GNU/Linux 12

@sorlandet
Copy link

Updating to scipy 1.14.1 should resolve the issue.

@gdragotto
Copy link

Updating to scipy 1.14.1 should resolve the issue.

It does not in the wheels, at least on my side. Did you compile from scratch?

@gdragotto
Copy link

Update: Python workaround to perform the evaluation "manually"

def ranking_metrics_at_k(model, train_user_items, test_user_items, K=10, show_progress=True):
    """
    Calculates ranking metrics (Precision@K, MAP@K, NDCG@K, AUC) for a trained model.

    Parameters:
        model : Trained ALS model (or other Implicit model).
        train_user_items : csr_matrix
            User-item interaction matrix used for training.
        test_user_items : csr_matrix
            User-item interaction matrix for evaluation.
        K : int
            Number of items to evaluate.
        show_progress : bool
            Show a progress bar during evaluation.

    Returns:
        dict : Dictionary with precision, MAP, NDCG, and AUC scores.
    """

    # Ensure matrices are in CSR format
    train_user_items = train_user_items.tocsr()
    test_user_items = test_user_items.tocsr()

    num_users, num_items = test_user_items.shape
    relevant = 0
    total_precision_div = 0
    total_map = 0
    total_ndcg = 0
    total_auc = 0
    total_users = 0

    # Compute cumulative gain for NDCG normalization
    cg = 1.0 / np.log2(np.arange(2, K + 2))  # Discount factor
    cg_sum = np.cumsum(cg)  # Ideal DCG normalization

    # Get users with at least one item in the test set
    users_with_test_data = np.where(np.diff(test_user_items.indptr) > 0)[0]

    # Progress bar
    progress = tqdm.tqdm(total=len(users_with_test_data), disable=not show_progress)

    batch_size = 1000
    start_idx = 0

    while start_idx < len(users_with_test_data):
        batch_users = users_with_test_data[start_idx:start_idx + batch_size]
        recommended_items, _ = model.recommend(batch_users, train_user_items[batch_users], N=K)
        start_idx += batch_size

        for user_idx, user_id in enumerate(batch_users):
            test_items = set(test_user_items.indices[test_user_items.indptr[user_id]:test_user_items.indptr[user_id + 1]])
            
            if not test_items:
                continue  # Skip users without test data

            num_relevant = len(test_items)
            total_precision_div += min(K, num_relevant)

            ap = 0
            hit_count = 0
            auc = 0
            idcg = cg_sum[min(K, num_relevant) - 1]  # Ideal Discounted Cumulative Gain (IDCG)
            num_negative = num_items - num_relevant

            for rank, item in enumerate(recommended_items[user_idx]):
                if item in test_items:
                    relevant += 1
                    hit_count += 1
                    ap += hit_count / (rank + 1)
                    total_ndcg += cg[rank] / idcg
                else:
                    auc += hit_count  # Accumulate hits for AUC calculation

            auc += ((hit_count + num_relevant) / 2.0) * (num_negative - (K - hit_count))
            total_map += ap / min(K, num_relevant)
            total_auc += auc / (num_relevant * num_negative)
            total_users += 1
        
        progress.update(len(batch_users))

    progress.close()

    # Compute final metrics
    precision = relevant / total_precision_div if total_precision_div > 0 else 0
    mean_ap = total_map / total_users if total_users > 0 else 0
    mean_ndcg = total_ndcg / total_users if total_users > 0 else 0
    mean_auc = total_auc / total_users if total_users > 0 else 0

    return {
        "precision": precision,
        "map": mean_ap,
        "ndcg": mean_ndcg,
        "auc": mean_auc
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants