Skip to content

ML.NET how to handle data classification for extra-long texts? #7456

@dfengpo

Description

@dfengpo
  1. Now I have a large number of conversation recordings between staff and customers and get long text data in the following (original conversation example) format through ASR.
  2. The beginning of the sample data is the beginning and end time of each sentence, followed by the role, and the colon starts with the sentence content.
  3. During the reception of the staff throughout the day, we will have a dialogue with multiple groups of customers. Each batch of customers may be one person or multiple people.
  4. I need to train a 'session boundary detection' model with multiple dialogue sentence paragraphs as input.
  5. Predicts whether the current input dialog segment has a boundary point for the start or end of the session, and returns the start time and boundary label value of the corresponding sentence as 1 or 0. The model needs to be able to segment the dialogue between the staff and each customer.
  6. The following is an example of the data.
    11:03:42-11:03:42 :Hello, do you need help?
    ........
    ........
    ........
    12:03:42-12:03:42 :Please walk slowly and welcome to the next visit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    untriagedNew issue has not been triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions