Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Title: Integrate Custom Dataset from Databricks for Aspect Sentiment Triplet Extraction #406

Open
Surajhulketa opened this issue Jun 17, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@Surajhulketa
Copy link

Title: Integrate Custom Dataset from Databricks for Aspect Sentiment Triplet Extraction
Description
I am trying to integrate a custom dataset stored in Databricks for Aspect Sentiment Triplet Extraction (ASTE) using the pyabsa library. However, I am encountering an error related to dataset loading. Below are the details of my implementation and the issues I am facing.

Code Implementation
python
Copy code
from pyabsa import (
ModelSaveOption,
DeviceTypeOption,
DatasetItem,
)

from pyabsa import AspectSentimentTripletExtraction as ASTE
import pandas as pd

if name == "main":
config = ASTE.ASTEConfigManager.get_aste_config_english()
config.max_seq_len = 120
config.log_step = -1
config.pretrained_bert = "bert-base-chinese"
config.num_epoch = 100
config.learning_rate = 2e-5
config.use_amp = True
config.cache_dataset = True
config.spacy_model = "zh_core_web_sm"

# Load dataset from Databricks
dataset_path = "datasets/atepc_datasets/300.vokols/vokols.test.txt.atepc'"
dataset = '300.vokols'


trainer = ASTE.ASTETrainer(
    config=config,
    dataset=dataset,
    checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT,
    auto_device=True,
)
triplet_extractor = trainer.load_trained_model()

examples = [
    "I love this laptop, it is very good.",
    "I hate this laptop, it is very bad.",
    "I like this laptop, it is very good.",
    "I dislike this laptop, it is very bad.",
]
for example in examples:
    prediction = triplet_extractor.predict(example)
    print(prediction)

Error Encountered
vbnet
Copy code
ValueError: Cannot find dataset: 300.vokols, you may need to remove existing integrated_datasets and try again. Please note that if you are using keywords to let findfile search the dataset, you need to save your dataset(s) in integrated_datasets/task_name/dataset_name
Issues Faced
Dataset Loading: Clarification is needed on how to properly format and load a custom dataset from Databricks into the pyabsa library.
Integration: Guidance on ensuring that the custom dataset is correctly integrated and utilized during the training process.
Directory Structure: Instructions on the required directory structure for custom datasets to be recognized by pyabsa.
Steps to Reproduce
Place a custom dataset in Databricks (ensure it is in .atepc format).
Use the provided code to load the dataset and attempt to train the model.
Observe the error related to dataset loading.
Expected Behavior
The custom dataset should be loaded correctly, and the model should train and predict without errors.

@Surajhulketa Surajhulketa added the bug Something isn't working label Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant