Skip to content

docs: update apify integration #29553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Feb 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 94 additions & 28 deletions docs/docs/integrations/document_loaders/apify_dataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "xwiDq5fOuoRn"
},
"source": [
"# Apify Dataset\n",
"\n",
Expand All @@ -20,33 +22,63 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qRW2-mokuoRp",
"tags": []
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet apify-client"
"%pip install --upgrade --quiet langchain langchain-apify langchain-openai"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "8jRVq16LuoRq"
},
"source": [
"First, import `ApifyDatasetLoader` into your source code:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"execution_count": 2,
"metadata": {
"id": "umXQHqIJuoRq"
},
"outputs": [],
"source": [
"from langchain_community.document_loaders import ApifyDatasetLoader\n",
"from langchain_apify import ApifyDatasetLoader\n",
"from langchain_core.documents import Document"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "NjGwKy59vz1X"
},
"source": [
"Find your [Apify API token](https://console.apify.com/account/integrations) and [OpenAI API key](https://platform.openai.com/account/api-keys) and initialize these into environment variable:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "AvzNtyCxwDdr"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"APIFY_API_TOKEN\"] = \"your-apify-api-token\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"your-openai-api-key\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "d1O-KL48uoRr"
},
"source": [
"Then provide a function that maps Apify dataset record fields to LangChain `Document` format.\n",
"\n",
Expand All @@ -64,8 +96,10 @@
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"execution_count": 8,
"metadata": {
"id": "m1SpA7XZuoRr"
},
"outputs": [],
"source": [
"loader = ApifyDatasetLoader(\n",
Expand All @@ -78,16 +112,20 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"execution_count": 9,
"metadata": {
"id": "0hWX7ABsuoRs"
},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "EJCVFVKNuoRs"
},
"source": [
"## An example with question answering\n",
"\n",
Expand All @@ -96,21 +134,26 @@
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"execution_count": 14,
"metadata": {
"id": "sNisJKzZuoRt"
},
"outputs": [],
"source": [
"from langchain.indexes import VectorstoreIndexCreator\n",
"from langchain_community.utilities import ApifyWrapper\n",
"from langchain_apify import ApifyWrapper\n",
"from langchain_core.documents import Document\n",
"from langchain_openai import OpenAI\n",
"from langchain_core.vectorstores import InMemoryVectorStore\n",
"from langchain_openai import ChatOpenAI\n",
"from langchain_openai.embeddings import OpenAIEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"execution_count": 15,
"metadata": {
"id": "qcfmnbdDuoRu"
},
"outputs": [],
"source": [
"loader = ApifyDatasetLoader(\n",
Expand All @@ -123,27 +166,47 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"execution_count": 16,
"metadata": {
"id": "8b0xzKJxuoRv"
},
"outputs": [],
"source": [
"index = VectorstoreIndexCreator(embedding=OpenAIEmbeddings()).from_loaders([loader])"
"index = VectorstoreIndexCreator(\n",
" vectorstore_cls=InMemoryVectorStore, embedding=OpenAIEmbeddings()\n",
").from_loaders([loader])"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"execution_count": 17,
"metadata": {
"id": "7zPXGsVFwUGA"
},
"outputs": [],
"source": [
"llm = ChatOpenAI(model=\"gpt-4o-mini\")"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"id": "ecWrdM4guoRv"
},
"outputs": [],
"source": [
"query = \"What is Apify?\"\n",
"result = index.query_with_sources(query, llm=OpenAI())"
"result = index.query_with_sources(query, llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"execution_count": null,
"metadata": {
"id": "QH8r44e9uoRv",
"outputId": "361fe050-f75d-4d5a-c327-5e7bd190fba5"
},
"outputs": [
{
"name": "stdout",
Expand All @@ -162,6 +225,9 @@
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
Expand All @@ -181,5 +247,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 4
}
"nbformat_minor": 0
}
29 changes: 23 additions & 6 deletions docs/docs/integrations/providers/apify.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,28 +14,45 @@ blogs, or knowledge bases.

## Installation and Setup

- Install the Apify API client for Python with `pip install apify-client`
- Install the LangChain Apify package for Python with:
```bash
pip install langchain-apify
```
- Get your [Apify API token](https://console.apify.com/account/integrations) and either set it as
an environment variable (`APIFY_API_TOKEN`) or pass it to the `ApifyWrapper` as `apify_api_token` in the constructor.
an environment variable (`APIFY_API_TOKEN`) or pass it as `apify_api_token` in the constructor.

## Tool

You can use the `ApifyActorsTool` to use Apify Actors with agents.

```python
from langchain_apify import ApifyActorsTool
```

## Utility
See [this notebook](/docs/integrations/tools/apify_actors) for example usage.

For more information on how to use this tool, visit [the Apify integration documentation](https://docs.apify.com/platform/integrations/langgraph).

## Wrapper

You can use the `ApifyWrapper` to run Actors on the Apify platform.

```python
from langchain_community.utilities import ApifyWrapper
from langchain_apify import ApifyWrapper
```

For more information on this wrapper, see [the API reference](https://python.langchain.com/api_reference/community/utilities/langchain_community.utilities.apify.ApifyWrapper.html).
For more information on how to use this wrapper, see [the Apify integration documentation](https://docs.apify.com/platform/integrations/langchain).


## Document loader

You can also use our `ApifyDatasetLoader` to get data from Apify dataset.

```python
from langchain_community.document_loaders import ApifyDatasetLoader
from langchain_apify import ApifyDatasetLoader
```

For a more detailed walkthrough of this loader, see [this notebook](/docs/integrations/document_loaders/apify_dataset).


Source code for this integration can be found in the [LangChain Apify repository](https://github.com/apify/langchain-apify).
Loading
Loading