Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add use_table_logs parameter to control using tables in logs #1061

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Mantisus
Copy link
Collaborator

@Mantisus Mantisus commented Mar 7, 2025

Description

  • Add the use_table_logs parameter that allows disabling tables in logs. This makes log parsing easier when needed.

Issues

@Mantisus
Copy link
Collaborator Author

Mantisus commented Mar 7, 2025

Thinking about this task, I believe we shouldn't add any third-party logger. A flag that disables tables in logs, which make data parsing difficult, is sufficient.

This will allow users to use any logger that's compatible with the standard one and enables customization of log output.

Example for loguru

import inspect
import logging

from loguru import logger

class InterceptHandler(logging.Handler):
    def emit(self, record: logging.LogRecord) -> None:
        # Get corresponding Loguru level if it exists.
        try:
            level: str | int = logger.level(record.levelname).name
        except ValueError:
            level = record.levelno

        # Find caller from where originated the logged message.
        frame, depth = inspect.currentframe(), 0
        while frame:
            filename = frame.f_code.co_filename
            is_logging = filename == logging.__file__
            is_frozen = 'importlib' in filename and '_bootstrap' in filename
            if depth > 0 and not (is_logging | is_frozen):
                break
            frame = frame.f_back
            depth += 1

        logger.opt(depth=depth, exception=record.exc_info).log(level, record.getMessage())


logger.add('crawler.log', serialize=True, level='INFO')
logging.basicConfig(handlers=[InterceptHandler()], level=logging.INFO, force=True)

crawler = BeautifulSoupCrawler(configure_logging=False, use_table_logs=False)

Log record:

{
    "text": "2025-03-07 16:51:09.947 | INFO     | crawlee.crawlers._basic._basic_crawler:run:580 - Final request statistics: requests_finished: 1; requests_failed: 0; retry_histogram: [1]; request_avg_failed_duration: None; request_avg_finished_duration: 0.795506; requests_finished_per_minute: 73; requests_failed_per_minute: 0; request_total_duration: 0.795506; requests_total: 1; crawler_runtime: 0.818803\n",
    "record": {
        "elapsed": { "repr": "0:00:01.921982", "seconds": 1.921982 },
        "exception": null,
        "extra": {},
        "file": {
            "name": "_basic_crawler.py",
            "path": "/src/crawlee/crawlers/_basic/_basic_crawler.py"
        },
        "function": "run",
        "level": { "icon": "ℹ️", "name": "INFO", "no": 20 },
        "line": 580,
        "message": "Final request statistics: requests_finished: 1; requests_failed: 0; retry_histogram: [1]; request_avg_failed_duration: None; request_avg_finished_duration: 0.795506; requests_finished_per_minute: 73; requests_failed_per_minute: 0; request_total_duration: 0.795506; requests_total: 1; crawler_runtime: 0.818803",
        "module": "_basic_crawler",
        "name": "crawlee.crawlers._basic._basic_crawler",
        "process": { "id": 32118, "name": "MainProcess" },
        "thread": { "id": 139760540858176, "name": "MainThread" },
        "time": {
            "repr": "2025-03-07 16:51:09.947345+00:00",
            "timestamp": 1741366269.947345
        }
    }
}

@janbuchar
Copy link
Collaborator

I like this take. Could you add this as an example to the docs about setting up JSON logs?

@Mantisus Mantisus requested review from janbuchar and vdusek and removed request for janbuchar and vdusek March 7, 2025 17:27
@Mantisus Mantisus self-assigned this Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an option for JSON-compatible logs
2 participants