-
Notifications
You must be signed in to change notification settings - Fork 108
Add Configurable HTML Table Formatter for DataFusion DataFrames in Python #1100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+1,061
−114
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Added List import to typing for type hints. - Refactored format_html method to modularize HTML component generation. - Created separate methods for building HTML header, table container, header, body, expandable cells, regular cells, and footer for better readability and maintainability. - Updated table_uuid generation to use f-string for consistency. - Ensured all HTML components are returned as lists for efficient joining.
…eader styles - Added methods `get_cell_style()` and `get_header_style()` to allow subclasses to customize the CSS styles for table cells and headers. - Updated `_build_table_header()` and `_build_regular_cell()` methods to utilize the new styling methods for improved maintainability. - Introduced a registry for custom type formatters in `DataFrameHtmlFormatter` to enable flexible formatting of cell values based on their types. - Enhanced `_format_cell_value()` to check for registered formatters before defaulting to string conversion, improving extensibility.
…builders - Introduced CellFormatter and StyleProvider protocols for better extensibility. - Added DefaultStyleProvider class with default CSS styles for cells and headers. - Updated DataFrameHtmlFormatter to support custom cell and header builders. - Refactored methods to utilize the new style provider for consistent styling. - Improved documentation for methods and classes to clarify usage and customization options.
…amples and customization options
…xamples and enhancing cell formatting methods - Removed lengthy examples from the docstring of DataFrameHtmlFormatter to improve readability. - Added methods for extracting and formatting cell values, enhancing the clarity and maintainability of the code. - Updated cell building methods to utilize the new formatting logic, ensuring consistent application of styles and behaviors. - Introduced a reset fixture for tests to ensure the formatter is returned to default settings after each test case. - Added tests for HTML formatter configuration, custom style providers, type formatters, custom cell builders, and complex customizations to ensure robust functionality.
…tilizing raw values for custom cell builders and optimizing expandable cell creation
…est formatter and improving cell value formatting logic
…ta collection and schema retrieval for clarity refactor: enhance reset_formatter fixture to preserve original formatter configuration during tests
…and enhance debugging output in DataFrameHtmlFormatter
…lue retrieval in cell formatting
…rite_compressed_parquet
…nternal representation
…nd add HTML formatter integration tests - Removed debug print statements from format_html, _build_table_body, and get_formatter methods in DataFrameHtmlFormatter to clean up the code. - Introduced a new debug_utils.py file containing a function to check HTML formatter integration. - Updated __init__.py to include configure_formatter for easier access. - Enhanced DataFrame class to include a docstring for _repr_html_ method. - Added comprehensive tests for HTML formatter configuration, custom style providers, type formatters, and cell/header builders in test_dataframe.py.
…n tests - Removed redundant import of `configure_formatter` in `__init__.py`. - Added `configure_formatter` to `__all__` in `__init__.py` for better module exposure. - Cleaned up import statements in `html_formatter.py` for clarity. - Consolidated import statements in `test_dataframe.py` for improved readability. - Simplified the `reset_formatter` fixture by removing unnecessary imports and comments.
… state - Implemented reset_formatter to create a new default DataFrame HTML formatter and update the global reference. - Added clean_formatter_state fixture in tests to ensure a fresh formatter state for each test case. - Updated test cases to use clean_formatter_state instead of the previous reset_formatter implementation.
…eset functionality - Added `use_shared_styles` parameter to control loading of styles/scripts. - Implemented logic to conditionally include styles based on `use_shared_styles`. - Updated the constructor to validate `use_shared_styles` as a boolean. - Introduced `reset_styles_loaded_state` function to reset the styles loaded state. - Modified `reset_formatter` to reset the `_styles_loaded` flag.
…idation in DataFrameHtmlFormatter
…clarity and maintainability
…and enhancing regex patterns for body data
I'm sorry I haven't had time to look through it until now. This looks incredible! I want to take some time to play with it myself before I hit approve. Thank you so much for this PR! |
I've tested this locally and it works well. I'd love to see the documentation, but I see that's tracked on another issue since this one is so large. Thank you so much for the contribution! |
thanks for the review and merge. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #1096.
Rationale for this change
This PR introduces a flexible and customizable HTML rendering mechanism for DataFusion DataFrames in the Python API. The existing
_repr_html_
method was hardcoded and inflexible. By refactoring to use a dedicated HTML formatter, we gain modularity, customization options, and testability, improving the user experience especially in notebook environments.What changes are included in this PR?
datafusion/html_formatter.py
:DataFrameHtmlFormatter
classStyleProvider
andCellFormatter
protocols for customizationconfigure_formatter
,get_formatter
,reset_formatter
PyDataFrame._repr_html_()
to delegate to the Python HTML formatter__init__.py
to exposeconfigure_formatter
Are there any user-facing changes?
✅ Yes. Users can now:
configure_formatter()
to globally control output formattingDocumentation should be updated to include usage examples of the HTML formatter (e.g., in the Python API docs or notebook examples). This is tracked in #1101