-
Notifications
You must be signed in to change notification settings - Fork 397
feat: Add statistics_log_format
parameter to BasicCrawler
#1061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
e4cf4cb
add `use_table_logs` parameter
Mantisus 739b08f
add example
Mantisus 8e43d6e
fix type checking for example with external tool
Mantisus b3d92e6
add test for logs without table
Mantisus 1b2b2b3
Update docs/examples/json_logging.mdx
Mantisus 4a59af5
Update docs/examples/code_examples/configure_json_logging.py
Mantisus e6468a5
update code example
Mantisus cc0aa0c
bool to literal for statistics_log_format
Mantisus 3018bd9
move stats data to `extra` for log
Mantisus 2ef1798
resolve
Mantisus 40c390f
update log format
Mantisus File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
from __future__ import annotations | ||
|
||
import asyncio | ||
import inspect | ||
import logging | ||
import sys | ||
from typing import TYPE_CHECKING | ||
|
||
from loguru import logger | ||
|
||
from crawlee.crawlers import HttpCrawler, HttpCrawlingContext | ||
|
||
if TYPE_CHECKING: | ||
from loguru import Record | ||
|
||
|
||
# Configure loguru interceptor to capture standard logging output | ||
class InterceptHandler(logging.Handler): | ||
def emit(self, record: logging.LogRecord) -> None: | ||
# Get corresponding Loguru level if it exists | ||
try: | ||
level: str | int = logger.level(record.levelname).name | ||
except ValueError: | ||
level = record.levelno | ||
|
||
# Find caller from where originated the logged message | ||
frame, depth = inspect.currentframe(), 0 | ||
while frame: | ||
filename = frame.f_code.co_filename | ||
is_logging = filename == logging.__file__ | ||
is_frozen = 'importlib' in filename and '_bootstrap' in filename | ||
if depth > 0 and not (is_logging | is_frozen): | ||
break | ||
frame = frame.f_back | ||
depth += 1 | ||
|
||
dummy_record = logging.LogRecord('dummy', 0, 'dummy', 0, 'dummy', None, None) | ||
standard_attrs = set(dummy_record.__dict__.keys()) | ||
extra_dict = { | ||
key: value | ||
for key, value in record.__dict__.items() | ||
if key not in standard_attrs | ||
} | ||
|
||
( | ||
logger.bind(**extra_dict) | ||
.opt(depth=depth, exception=record.exc_info) | ||
.patch(lambda loguru_record: loguru_record.update({'name': record.name})) | ||
.log(level, record.getMessage()) | ||
) | ||
|
||
|
||
# Configure loguru formatter | ||
def formatter(record: Record) -> str: | ||
basic_format = '[{name}] | <level>{level: ^8}</level> | - {message}' | ||
if record['extra']: | ||
basic_format = basic_format + ' {extra}' | ||
return f'{basic_format}\n' | ||
|
||
|
||
# Remove default loguru logger | ||
logger.remove() | ||
|
||
# Set up loguru with JSONL serialization in file `crawler.log` | ||
logger.add('crawler.log', format=formatter, serialize=True, level='INFO') | ||
|
||
# Set up loguru logger for console | ||
logger.add(sys.stderr, format=formatter, colorize=True, level='INFO') | ||
|
||
# Configure standard logging to use our interceptor | ||
logging.basicConfig(handlers=[InterceptHandler()], level=logging.INFO, force=True) | ||
|
||
|
||
async def main() -> None: | ||
# Initialize crawler with disabled table logs | ||
crawler = HttpCrawler( | ||
configure_logging=False, # Disable default logging configuration | ||
statistics_log_format='inline', # Set inline formatting for statistics logs | ||
) | ||
|
||
# Define the default request handler, which will be called for every request. | ||
@crawler.router.default_handler | ||
async def request_handler(context: HttpCrawlingContext) -> None: | ||
context.log.info(f'Processing {context.request.url} ...') | ||
|
||
# Run the crawler | ||
await crawler.run(['https://www.crawlee.dev/']) | ||
|
||
|
||
if __name__ == '__main__': | ||
asyncio.run(main()) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
--- | ||
id: configure-json-logging | ||
title: Сonfigure JSON logging | ||
--- | ||
|
||
import ApiLink from '@site/src/components/ApiLink'; | ||
import CodeBlock from '@theme/CodeBlock'; | ||
|
||
import JsonLoggingExample from '!!raw-loader!./code_examples/configure_json_logging.py'; | ||
|
||
This example demonstrates how to configure JSON line (JSONL) logging with Crawlee. By using the `use_table_logs=False` parameter, you can disable table-formatted statistics logs, which makes it easier to parse logs with external tools or to serialize them as JSON. | ||
|
||
The example shows how to integrate with the popular [`loguru`](https://github.com/delgan/loguru) library to capture Crawlee logs and format them as JSONL (one JSON object per line). This approach works well when you need to collect logs for analysis, monitoring, or when integrating with logging platforms like ELK Stack, Grafana Loki, or similar systems. | ||
|
||
<CodeBlock className="language-python"> | ||
{JsonLoggingExample} | ||
</CodeBlock> | ||
|
||
Here's an example of what a crawler statistics log entry in JSONL format. | ||
|
||
```json | ||
{ | ||
"text": "[HttpCrawler] | INFO | - Final request statistics: {'requests_finished': 1, 'requests_failed': 0, 'retry_histogram': [1], 'request_avg_failed_duration': None, 'request_avg_finished_duration': 3.57098, 'requests_finished_per_minute': 17, 'requests_failed_per_minute': 0, 'request_total_duration': 3.57098, 'requests_total': 1, 'crawler_runtime': 3.59165}\n", | ||
"record": { | ||
"elapsed": { "repr": "0:00:05.604568", "seconds": 5.604568 }, | ||
"exception": null, | ||
"extra": { | ||
"requests_finished": 1, | ||
"requests_failed": 0, | ||
"retry_histogram": [1], | ||
"request_avg_failed_duration": null, | ||
"request_avg_finished_duration": 3.57098, | ||
"requests_finished_per_minute": 17, | ||
"requests_failed_per_minute": 0, | ||
"request_total_duration": 3.57098, | ||
"requests_total": 1, | ||
"crawler_runtime": 3.59165 | ||
}, | ||
"file": { | ||
"name": "_basic_crawler.py", | ||
"path": "/crawlers/_basic/_basic_crawler.py" | ||
}, | ||
"function": "run", | ||
"level": { "icon": "ℹ️", "name": "INFO", "no": 20 }, | ||
"line": 583, | ||
"message": "Final request statistics:", | ||
"module": "_basic_crawler", | ||
"name": "HttpCrawler", | ||
"process": { "id": 198383, "name": "MainProcess" }, | ||
"thread": { "id": 135312814966592, "name": "MainThread" }, | ||
"time": { | ||
"repr": "2025-03-17 17:14:45.339150+00:00", | ||
"timestamp": 1742231685.33915 | ||
} | ||
} | ||
} | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.