Skip to content

Add Sentiment Analysis Example #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 198 additions & 0 deletions techniques/sentiment-analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
# Sentiment Analysis

<details>
<summary>How to run this example</summary>
<br/>

```bash
# Set your API key as an environment variable.
export SUBSTRATE_API_KEY=ENTER_YOUR_KEY

# Run the TypeScript example

# If using tsx:
cd typescript # Navigate to the typescript example
npm install # Install dependencies
npx tsx example.ts # Run the example

# If using Deno:
cd typescript
deno run example.ts

# Run the Python example

# If using Poetry:
cd python # Navigate to the python example
poetry install # Install dependencies and build the example
poetry run main # Run the example

# If using Rye:
# Update pyproject.toml to switch to Rye.
cd python
rye sync
rye run main
```

</details>

![hero](hero.png)

Sentiment analysis is used to extract and quantify feelings from text. It's used in many contexts such
as customer support, customer reviews, recommendations, and more.

In this example, we'll use LLMs via Substrate to build a simple sentiment analysis pipeline that will
output scores for how positive, negative, or neutral some given text is and use this to determine an
overall classification.

The first thing we're going to do is to is use an LLM to analyze some content and generate a report by
asking the LLM to write out it's interpretation of the text and score the text. This can be done with
the `ComputeText` node and some prompt engineering.

In TypeScript this could look like this:

```typescript
const message = "what a great product, NOT!";

const assess = new ComputeText({
prompt: `
Your job is to assess a piece of CONTENT on the underlying emotional sentiment it communicates.
Please follow the RULES of the assignment and be mindful of the TIPS.

=== RULES
Analyze the content and report on POSITIVE sentiment if there is any, NEUTRAL sentiment if there is any, and NEGATIVE sentiment if there is any.
Record your rationale for your conclusions.

Once you have done so, please assign a score for each:
POSITIVE (0-100): ___
NEUTRAL (0-100): ___
NEGATIVE (0-100): ___

=== TIPS
* Sometimes content may seem positive when it is negative, for example when using sarcasm or negation
* When the sentiment is mixed, do your best to represent that across the scores

=== CONTENT
${message}
`,
temperature: 0.2,
})
```

And similarly in Python it could look like this:

```python
message = "what a great product, NOT!"

assess = ComputeText(
prompt=f"""
Your job is to assess a piece of CONTENT on the underlying emotional sentiment it communicates.
Please follow the RULES of the assignment and be mindful of the TIPS.

=== RULES
Analyze the content and report on POSITIVE sentiment if there is any, NEUTRAL sentiment if there is any, and NEGATIVE sentiment if there is any.
Record your rationale for your conclusions.

Once you have done so, please assign a score for each:
POSITIVE (0-100): ___
NEUTRAL (0-100): ___
NEGATIVE (0-100): ___

=== TIPS
* Sometimes content may seem positive when it is negative, for example when using sarcasm or negation
* When the sentiment is mixed, do your best to represent that across the scores

=== CONTENT
${message}""",
temperature=0.2,
)
```

The output of this node will be a `string` of the report, but next we can use a `ComputeJSON` node to extract
structured data from this report to be used in a real system.

In TypeScript the next step could look like this:

```typescript
const extract = new ComputeJSON({
prompt: sb.interpolate`
Your job is to extract the relevant details of the sentiment analysis REPORT.
Please do so as accurately as possible and include a short summary of the rationale.

=== REPORT
${assess.future.text}
`,
json_schema: {
type: "object",
properties: {
positive_score: { type: "integer", minimum: 0, maximum: 100 },
neutral_score: { type: "integer", minimum: 0, maximum: 100 },
negative_score: { type: "integer", minimum: 0, maximum: 100 },
sentiment: { type: "string", enum: ["positive", "neutral", "negative"] },
summary: { type: "string" },
}
}
})
```

Or in Python, like this:

```python
extract = ComputeJSON(
prompt=sb.format(
"""
Your job is to extract the relevant details of the sentiment analysis REPORT.
Please do so as accurately as possible and include a short summary of the rationale.

=== REPORT
${report}
""",
report=assess.future.text,
),
json_schema={
"type": "object",
"properties": {
"positive_score": {"type": "integer", "minimum": 0, "maximum": 100},
"neutral_score": {"type": "integer", "minimum": 0, "maximum": 100},
"negative_score": {"type": "integer", "minimum": 0, "maximum": 100},
"sentiment": {"type": "string", "enum": ["positive", "neutral", "negative"]},
"summary": {"type": "string"},
},
},
)
```

We're using a JSON Schema here to structure the outputs exactly how we want them and can ensure the fields
are constrained as we like. Note here that we're passing in the output of the previous node using the _future_
provided by the `assess` node we created. This way we can run the entire workflow with a single API request.

Finally we can run this graph like so in TypeScript:

```typescript
const res = await substrate.run(extract);
console.log(res.get(extract).json_object);
```

Or in Python:

```python
res = substrate.run(extract)
print(res.get(extract).json_object)
```

The final result will look like the following:

```json
{
"positive_score": 0,
"neutral_score": 0,
"negative_score": 100,
"sentiment": "negative",
"summary": "The sentiment is primarily negative due to the use of sarcasm and negation in the phrase 'NOT!' at the end of the sentence."
}
```

You can build on this example further if you like. For example, perhaps there are other dimensions you would like to assess for or
perhaps there are more details you would like the LLM to pay closer attention too. You could even run the same pipeline multiple
times on the same data to collect some aggregated assessment of several passes.

We hope this example is helpful for you and that you enjoy working with Substrate as much as we do!
43 changes: 43 additions & 0 deletions techniques/sentiment-analysis/diagram.d2
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
direction: right
classes: {
substrate: {
label: "substrate"
style: {
font: mono
font-color: gray
font-size: 20
stroke: gray
stroke-dash: 1
fill: "transparent"
border-radius: 16
}
}
node: {
style: {
font: mono
font-size: 24
stroke-width: 2
fill: transparent
stroke: gray
border-radius: 16
stroke-dash: 1
3d: true
}
}
edge: {
style: {
stroke: "#000"
stroke-dash: 2
}
}
}

substrate.class: substrate
substrate.a.class: node
substrate.b.class: node
substrate.c.class: node
substrate.a.label: heuristic
substrate.b.label: symbolic
substrate.c.label: computation
substrate.a->substrate.c { class: edge }
substrate.b->substrate.c { class: edge }
Loading