Skip to content

Add Configurable HTML Table Formatter for DataFusion DataFrames in Python #1100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Apr 21, 2025

Conversation

kosiew
Copy link
Contributor

@kosiew kosiew commented Apr 8, 2025

Which issue does this PR close?

Closes #1096.

Rationale for this change

This PR introduces a flexible and customizable HTML rendering mechanism for DataFusion DataFrames in the Python API. The existing _repr_html_ method was hardcoded and inflexible. By refactoring to use a dedicated HTML formatter, we gain modularity, customization options, and testability, improving the user experience especially in notebook environments.

What changes are included in this PR?

  • Added datafusion/html_formatter.py:
    • DataFrameHtmlFormatter class
    • StyleProvider and CellFormatter protocols for customization
    • Global configuration via configure_formatter, get_formatter, reset_formatter
    • Expandable cell rendering with embedded JavaScript
  • Refactored PyDataFrame._repr_html_() to delegate to the Python HTML formatter
  • Updated __init__.py to expose configure_formatter
  • Added extensive tests:
    • Style configuration and override
    • Type-based formatting
    • Custom header and cell rendering
    • Style-sharing logic across renders

Are there any user-facing changes?

✅ Yes. Users can now:

  • Customize how DataFrames render in Jupyter notebooks
  • Use custom styling, formatters, and rendering behavior
  • Control expand/collapse behavior for long cell values
  • Use configure_formatter() to globally control output formatting

Documentation should be updated to include usage examples of the HTML formatter (e.g., in the Python API docs or notebook examples). This is tracked in #1101

kosiew added 30 commits April 8, 2025 11:03
- Added List import to typing for type hints.
- Refactored format_html method to modularize HTML component generation.
- Created separate methods for building HTML header, table container, header, body, expandable cells, regular cells, and footer for better readability and maintainability.
- Updated table_uuid generation to use f-string for consistency.
- Ensured all HTML components are returned as lists for efficient joining.
…eader styles

- Added methods `get_cell_style()` and `get_header_style()` to allow subclasses to customize the CSS styles for table cells and headers.
- Updated `_build_table_header()` and `_build_regular_cell()` methods to utilize the new styling methods for improved maintainability.
- Introduced a registry for custom type formatters in `DataFrameHtmlFormatter` to enable flexible formatting of cell values based on their types.
- Enhanced `_format_cell_value()` to check for registered formatters before defaulting to string conversion, improving extensibility.
…builders

- Introduced CellFormatter and StyleProvider protocols for better extensibility.
- Added DefaultStyleProvider class with default CSS styles for cells and headers.
- Updated DataFrameHtmlFormatter to support custom cell and header builders.
- Refactored methods to utilize the new style provider for consistent styling.
- Improved documentation for methods and classes to clarify usage and customization options.
…xamples and enhancing cell formatting methods

- Removed lengthy examples from the docstring of DataFrameHtmlFormatter to improve readability.
- Added methods for extracting and formatting cell values, enhancing the clarity and maintainability of the code.
- Updated cell building methods to utilize the new formatting logic, ensuring consistent application of styles and behaviors.
- Introduced a reset fixture for tests to ensure the formatter is returned to default settings after each test case.
- Added tests for HTML formatter configuration, custom style providers, type formatters, custom cell builders, and complex customizations to ensure robust functionality.
…tilizing raw values for custom cell builders and optimizing expandable cell creation
…est formatter and improving cell value formatting logic
…ta collection and schema retrieval for clarity

refactor: enhance reset_formatter fixture to preserve original formatter configuration during tests
…and enhance debugging output in DataFrameHtmlFormatter
…nd add HTML formatter integration tests

- Removed debug print statements from format_html, _build_table_body, and get_formatter methods in DataFrameHtmlFormatter to clean up the code.
- Introduced a new debug_utils.py file containing a function to check HTML formatter integration.
- Updated __init__.py to include configure_formatter for easier access.
- Enhanced DataFrame class to include a docstring for _repr_html_ method.
- Added comprehensive tests for HTML formatter configuration, custom style providers, type formatters, and cell/header builders in test_dataframe.py.
…n tests

- Removed redundant import of `configure_formatter` in `__init__.py`.
- Added `configure_formatter` to `__all__` in `__init__.py` for better module exposure.
- Cleaned up import statements in `html_formatter.py` for clarity.
- Consolidated import statements in `test_dataframe.py` for improved readability.
- Simplified the `reset_formatter` fixture by removing unnecessary imports and comments.
… state

- Implemented reset_formatter to create a new default DataFrame HTML formatter and update the global reference.
- Added clean_formatter_state fixture in tests to ensure a fresh formatter state for each test case.
- Updated test cases to use clean_formatter_state instead of the previous reset_formatter implementation.
…eset functionality

- Added `use_shared_styles` parameter to control loading of styles/scripts.
- Implemented logic to conditionally include styles based on `use_shared_styles`.
- Updated the constructor to validate `use_shared_styles` as a boolean.
- Introduced `reset_styles_loaded_state` function to reset the styles loaded state.
- Modified `reset_formatter` to reset the `_styles_loaded` flag.
@timsaucer
Copy link
Contributor

I'm sorry I haven't had time to look through it until now. This looks incredible!

I want to take some time to play with it myself before I hit approve. Thank you so much for this PR!

@timsaucer
Copy link
Contributor

I've tested this locally and it works well. I'd love to see the documentation, but I see that's tracked on another issue since this one is so large. Thank you so much for the contribution!

@timsaucer timsaucer merged commit 818975b into apache:main Apr 21, 2025
17 checks passed
@kosiew
Copy link
Contributor Author

kosiew commented Apr 22, 2025

@timsaucer

thanks for the review and merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move DataFrame HTML Rendering to Configurable Python Formatter
2 participants