Title: Integrate Custom Dataset from Databricks for Aspect Sentiment Triplet Extraction #406

Surajhulketa · 2024-06-17T12:14:53Z

Title: Integrate Custom Dataset from Databricks for Aspect Sentiment Triplet Extraction
Description
I am trying to integrate a custom dataset stored in Databricks for Aspect Sentiment Triplet Extraction (ASTE) using the pyabsa library. However, I am encountering an error related to dataset loading. Below are the details of my implementation and the issues I am facing.

Code Implementation
python
Copy code
from pyabsa import (
ModelSaveOption,
DeviceTypeOption,
DatasetItem,
)

from pyabsa import AspectSentimentTripletExtraction as ASTE
import pandas as pd

if name == "main":
config = ASTE.ASTEConfigManager.get_aste_config_english()
config.max_seq_len = 120
config.log_step = -1
config.pretrained_bert = "bert-base-chinese"
config.num_epoch = 100
config.learning_rate = 2e-5
config.use_amp = True
config.cache_dataset = True
config.spacy_model = "zh_core_web_sm"

# Load dataset from Databricks
dataset_path = "datasets/atepc_datasets/300.vokols/vokols.test.txt.atepc'"
dataset = '300.vokols'


trainer = ASTE.ASTETrainer(
    config=config,
    dataset=dataset,
    checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT,
    auto_device=True,
)
triplet_extractor = trainer.load_trained_model()

examples = [
    "I love this laptop, it is very good.",
    "I hate this laptop, it is very bad.",
    "I like this laptop, it is very good.",
    "I dislike this laptop, it is very bad.",
]
for example in examples:
    prediction = triplet_extractor.predict(example)
    print(prediction)

Error Encountered
vbnet
Copy code
ValueError: Cannot find dataset: 300.vokols, you may need to remove existing integrated_datasets and try again. Please note that if you are using keywords to let findfile search the dataset, you need to save your dataset(s) in integrated_datasets/task_name/dataset_name
Issues Faced
Dataset Loading: Clarification is needed on how to properly format and load a custom dataset from Databricks into the pyabsa library.
Integration: Guidance on ensuring that the custom dataset is correctly integrated and utilized during the training process.
Directory Structure: Instructions on the required directory structure for custom datasets to be recognized by pyabsa.
Steps to Reproduce
Place a custom dataset in Databricks (ensure it is in .atepc format).
Use the provided code to load the dataset and attempt to train the model.
Observe the error related to dataset loading.
Expected Behavior
The custom dataset should be loaded correctly, and the model should train and predict without errors.

The text was updated successfully, but these errors were encountered:

Surajhulketa added the bug Something isn't working label Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Title: Integrate Custom Dataset from Databricks for Aspect Sentiment Triplet Extraction #406

Title: Integrate Custom Dataset from Databricks for Aspect Sentiment Triplet Extraction #406

Surajhulketa commented Jun 17, 2024

Title: Integrate Custom Dataset from Databricks for Aspect Sentiment Triplet Extraction #406

Title: Integrate Custom Dataset from Databricks for Aspect Sentiment Triplet Extraction #406

Comments

Surajhulketa commented Jun 17, 2024