Skip to content

feat(search_toolkit): Add Alibaba Tongxiao Search API Support #2127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

RonaldJEN
Copy link
Collaborator

Add Alibaba Tongxiao Search API Support

Feature Description

This PR introduces the search_ali method to the SearchToolkit class, integrating the Alibaba Tongxiao Search API. Tongxiao Search is a powerful real-time search API providing structured data from various search engines and knowledge bases, with specific optimizations for Chinese content search.

API Reference: Standard Search API - GenericSearch

Implementation Details

  • Retrieves API credentials via the TONGXIAO_API_KEY environment variable.

  • Supports the following search parameters:

    • timeRange: Time frame filter (OneDay/OneWeek/OneMonth/OneYear/NoLimit)
    • industry: Industry filter (finance, law, medical, etc.)
    • page: Result pagination
    • returnMainText: Whether to return webpage main text
    • returnMarkdownText: Whether to return Markdown formatted content
    • enableRerank: Whether to enable result reranking (can reduce response time)
  • Return Structure Formatting:

    • Extracts and standardizes key fields (title, snippet, url, etc.)
    • Prioritizes the summary field; uses mainText as a fallback summary if summary is unavailable.
    • Maintains consistency with the return structure of other search methods (e.g., search_bing).

Documentation

The method includes a detailed docstring covering:

  • Method description
  • Parameter explanations
  • Return value format
  • Error handling mechanisms

This PR enhances CAMEL's search capabilities, particularly for Chinese content, offering users more diverse search options.

Copy link
Member

@Wendong-Fan Wendong-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @RonaldJEN for the contribution! there's some conflicts, could you resolve this?

添加新的 search_ali 方法,调用阿里巴巴通晓搜索 API,支持以下特性:
- 支持时间范围和行业过滤
- 支持分页查询
- 自动提取摘要信息 (优先使用 summary,回退到 mainText)
- 可选返回网页正文和 Markdown 格式内容
- 支持搜索结果重排序优化
- 返回结构与其他搜索方法保持一致
- fix(search_toolkit): resolve linting issues in search_ali method
添加新的 search_ali 方法,调用阿里巴巴通晓搜索 API
@RonaldJEN RonaldJEN reopened this Apr 10, 2025
@RonaldJEN
Copy link
Collaborator Author

@Wendong-Fan Thanks for pointing out the conflicts! I've resolved them now.

Copy link
Member

@Wendong-Fan Wendong-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @RonaldJEN 's contribution, left some comments below, could we also add unit test code?

effective for Chinese language queries.

Args:
query (str): The search query string (length >= 1 and <= 100).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add validation for length limit?

Comment on lines +1087 to +1093
timeRange (str): Time frame filter for search results. Default
is "NoLimit". Options include:
- 'OneDay': Past day.
- 'OneWeek': Past week.
- 'OneMonth': Past month.
- 'OneYear': Past year.
- 'NoLimit': No time limit (default).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use literal instead of str here? for the variable naming, following current camel style use time_range instead of timeRange

Comment on lines +1094 to +1104
industry (Optional[str]): Industry-specific search filter. When
specified, only returns results from sites in the specified
industries. Multiple industries can be comma-separated.
Options include:
- 'finance': Financial industry.
- 'law': Legal industry.
- 'medical': Medical industry.
- 'internet': Internet (curated).
- 'tax': Tax industry.
- 'news_province': Provincial news.
- 'news_center': Central news.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use literal instead of str here?

- 'tax': Tax industry.
- 'news_province': Provincial news.
- 'news_center': Central news.
page (int): Page number for results pagination. Default is 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for default value, use format (default: :obj:1), same as others

Comment on lines +1115 to +1134
Dict[str, Any]: A dictionary containing search results or an error
message. The structure includes:
- 'requestId': A unique identifier for the request.
- 'results': A list of dictionaries, each representing a
search result with the following keys:
- 'result_id': The index of the result.
- 'title': The title of the webpage.
- 'snippet': A dynamic summary of relevant content matching
the query keywords.
- 'mainText': The main content of the webpage (if
returnMainText is True).
- 'markdownText': Markdown formatted content (if
returnMarkdownText is True).
- 'hostname': The name of the website.
- 'url': The URL of the webpage.
- 'publishTime': Publication timestamp in milliseconds.
- 'score': Relevance score.
- 'searchInformation': Additional metadata about the search
operation.
- or 'error': An error message if something went wrong.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we simpify the returns to make it more tidy? since the toolkit would be used by agent, it would be more LLM friendly to make it simple

@@ -394,3 +394,20 @@ class PersonInfo(BaseModel):
on improving efficiency, fault tolerance, and minimizing resource overheads.
===============================================================================
"""

search_ali_response = SearchToolkit().search_ali(
query="阿里巴巴2025年的芯片投入",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use english query as example would be better

@GitHoobar
Copy link
Collaborator

@RonaldJEN you can fix all things mentioned and its gtg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[Feature Request] Add Alibaba Tongxiao Search API Support
3 participants