|
| 1 | +--- |
| 2 | +title: "RAG on Hacker News comments to generate a research summary" |
| 3 | +description: "Learn how to search Hacker News comments for a topic, extract sentiment, and generate a research summary in 34 lines of Substrate code. Runs dozens of LLM calls in parallel and streams markdown. Built in 15 minutes, easy to remix." |
| 4 | +date: 2024-07-15 |
| 5 | +image: "/hnrag.png" |
| 6 | +--- |
| 7 | + |
| 8 | +<div class="hero-image"> |
| 9 | + <img width={1020} height={510} src="/hnrag.png" alt="RAG on Hacker News comments to generate a research summary" /> |
| 10 | +</div> |
| 11 | + |
| 12 | +In this post, we'll show you how to search Hacker News comments for a topic, extract sentiment, and generate a research summary in 34 lines of code using Substrate. |
| 13 | + |
| 14 | +- [Read on Twitter](https://x.com/vprtwn/status/1812844236401762513) |
| 15 | +- [Read on LinkedIn](https://www.linkedin.com/pulse/rag-hacker-news-comments-34-lines-code-substratelabs-pouje) |
| 16 | + |
| 17 | +<br/> |
| 18 | + |
| 19 | +This concise RAG implementation runs dozens of LLM calls in parallel and streams the markdown in no time. It's easy to remix, and genuinely useful. Internally, we've already written several scripts like this for Reddit, LinkedIn, and Twitter, and set up alerts to Slack. |
| 20 | + |
| 21 | +<iframe width="100%" height="600px" src="https://www.val.town/embed/substrate/hackerNewsRAG" title="Val Town" frameborder="0" allow="web-share" allowfullscreen></iframe> |
| 22 | + |
| 23 | +<br/> |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | +First, we search HackerNews comments using the [Algolia HN Search API](https://hn.algolia.com/api). |
| 28 | + |
| 29 | +```typescript |
| 30 | +const searchResults = await hnSearch({ |
| 31 | + query: query, |
| 32 | + numericFilters: `created_at_i>${Math.floor(Date.now() / 1000) - 60 * 60 * 24 * 7 * 4}`, |
| 33 | + tags: "comment", |
| 34 | +}); |
| 35 | +``` |
| 36 | + |
| 37 | +<br/> |
| 38 | + |
| 39 | +Next, we use ComputeJSON to extract summary, sentiment, and other metadata from each comment. Structured JSON generation is ergonomic, reliable and blazing-fast on Substrate compared to other providers. This is critical for multi-step workflows. |
| 40 | + |
| 41 | +```typescript |
| 42 | +let summaries = []; |
| 43 | +for (const hit of searchResults.hits) { |
| 44 | + summaries.push( |
| 45 | + new ComputeJSON({ |
| 46 | + prompt: `Summarize this comment and how it relates to the topic: ${query} |
| 47 | + Use "negative" sentiment for posts about API, abstraction, documentation, tutorial, general quality, slowness, or performance issues. |
| 48 | + COMMENT: ${JSON.stringify(hit)}`, |
| 49 | + json_schema: zodToJsonSchema(commentInfo), |
| 50 | + }), |
| 51 | + ); |
| 52 | +} |
| 53 | +``` |
| 54 | + |
| 55 | +<br/> |
| 56 | + |
| 57 | +Finally, we use ComputeText to generate a markdown summary of all the extracted JSON, and stream the results. Streaming on Substrate is really cool. You can of course stream the response of an individual LLM. But you can also stream the incremental steps of your workflow. |
| 58 | + |
| 59 | +```typescript |
| 60 | +const markdown = new ComputeText({ |
| 61 | + prompt: sb.concat( |
| 62 | + `Below is a list of summarized comments about ${query} on Hacker News. |
| 63 | + Generate concise markdown summarizing the results. |
| 64 | + Summarize the contents of the comment and the sentiment about ${query}. |
| 65 | + Categorize results under sentiment headers. |
| 66 | + Order from most negative to least negative within each category. |
| 67 | + Add a link to the original story URL in this format: [<story title>](https://news.ycombinator.com/item?id=<objectID>) |
| 68 | + Filter out posts that do not seem to be about ${query}. |
| 69 | + RESULTS:\n`, |
| 70 | + ...summaries.map((s) => sb.jq(s.future.json_object, "@json")), |
| 71 | + ), |
| 72 | + model: "Llama3Instruct70B", |
| 73 | +}); |
| 74 | +const stream = await substrate.stream(markdown); |
| 75 | +``` |
| 76 | + |
| 77 | +<br/> |
| 78 | + |
| 79 | +The code we wrote was really simple. Implicitly, we were creating the graph below. But we didn't have to think about the graph at all! With Substrate, by simply relating tasks to each other, we get automatic parallelization of dozens of LLM calls for free, and 0 roundtrips. |
| 80 | + |
| 81 | + |
| 82 | + |
| 83 | +Great power with great simplicity. |
| 84 | + |
| 85 | +View the full source, fork, and remix here: https://www.val.town/v/substrate/hackerNewsRAG |
| 86 | + |
| 87 | +- [Read on Twitter](https://x.com/vprtwn/status/1812844236401762513) |
| 88 | +- [Read on LinkedIn](https://www.linkedin.com/pulse/rag-hacker-news-comments-34-lines-code-substratelabs-pouje) |
0 commit comments