-
Notifications
You must be signed in to change notification settings - Fork 75
Fix startup time of cli #3887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix startup time of cli #3887
Conversation
Moved cea/plugin.py to cea/plugin/plugin.py and added an __init__.py to expose instantiate_plugin and add_plugins. This improves the organization and import structure of the plugin package.
Refactored the parse_string_to_list function from cea.config to cea.utilities for better modularity and reuse. Updated imports in relevant files to use the new location.
Moved PluginPlotCategory and PluginPlotBase from plugin.py to a new plot.py module for better code organization. Also updated imports and usage in plugin.py to reflect this change, and simplified the import of parse_string_to_list.
Moved instantiate_plugin and add_plugins functions from plugin.py to a new loader.py module for better separation of concerns. Updated __init__.py to import these functions from loader.py instead of plugin.py.
Renamed plugin.py to base.py and updated __init__.py to import and expose the CeaPlugin base class. This change clarifies the purpose of the base class and makes it available for user plugin imports.
Relocated pandas imports from the module level to within the CsvSchemaIo class methods that require them. This reduces unnecessary imports when pandas is not needed, improving module import performance and avoiding potential import errors if pandas is not installed.
SchemaIo now inherits from abc.ABC and its methods read, write, and new are marked as abstract. A new DefaultSchemaIo class is introduced to handle default cases, replacing SchemaIo in create_schema_io when file_type is unknown. This improves code clarity and enforces implementation of required methods in subclasses.
The pandas import is now performed within the load_cached_value method instead of at the module level. This change reduces unnecessary imports when the method is not used, potentially improving module load performance.
Changed the return type of SchemaIo.new to Any and updated the typing imports to include Any.
WalkthroughMoves string parsing into utilities, adds a plugin API (base, loader, public exports) that patches configs from plugin-provided config, makes pandas import lazy in plot cache, and converts SchemaIo into an abstract base with a DefaultSchemaIo fallback. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as cea_config.py
participant Utils as cea.utilities
participant Loader as cea/plugin/loader.py
participant Plugin as CeaPlugin
participant Defaults as default_config
participant UserCfg as user_config
User->>CLI: run add-plugins
CLI->>Utils: parse_string_to_list(plugins_line)
CLI->>Loader: add_plugins(defaults, user_cfg)
Loader->>Loader: iterate plugin names
Loader->>Plugin: instantiate_plugin(fqname)
Plugin-->>Loader: plugin instance or None
alt plugin instance
Loader->>Plugin: read plugin.config (sections/options)
Loader->>Defaults: create/patch sections, set option defaults
Loader->>UserCfg: ensure sections, mirror simple options
end
Loader-->>CLI: configs updated in-place
CLI-->>User: finished
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cea/plugin/plot_category.py (1)
36-56: Fix late‑binding bug in nested Plot classes (plot_config/plugin captured by reference).
plot_configandpluginare looked up when__init__runs, so all generated Plot classes will use the last loop’s values. Freeze them via defaults or a small factory.- class Plot(PluginPlotBase): + class Plot(PluginPlotBase): name = plot_label category_name = category.name category_path = category.name - expected_parameters = plot_config.get("expected-parameters", {}) + expected_parameters = dict(plot_config.get("expected-parameters", {})) if "scenario-name" not in expected_parameters: expected_parameters["scenario-name"] = "general:scenario-name" - def __init__(self, project, parameters, cache): - super(Plot, self).__init__(project, parameters, cache, plugin, plot_config) + def __init__(self, project, parameters, cache, + _plugin=plugin, _plot_config=plot_config, _cat_name=category.name): + super(Plot, self).__init__(project, parameters, cache, _plugin, _plot_config) - self.category_name = category.name - self.category_path = category.name + self.category_name = _cat_name + self.category_path = _cat_name
🧹 Nitpick comments (5)
cea/plots/cache.py (1)
123-125: Lazy import of pandas: good; add a friendly ImportError guard.Defers pandas cost until needed. Consider guarding the import to produce a clearer runtime message if pandas isn’t installed.
def load_cached_value(self, data_path, parameters): """Load a Dataframe from disk""" - import pandas as pd + try: + import pandas as pd + except ImportError as e: + raise RuntimeError("pandas is required to load cached plot data. Please install pandas.") from e return pd.read_pickle(self._cached_data_file(data_path, parameters))cea/plugin/plot_category.py (1)
29-35: Use “self” in property and return a list, not a generator, if callers expect a list.Minor readability: rename the parameter to
self. If callers expect a list, wrapyieldinto a list comprehension.- @property - def plots(category): + @property + def plots(self): @@ - for plot_config in category.plot_configs: + for plot_config in self.plot_configs:Or:
- for plot_config in self.plot_configs: - ... - yield Plot + return [make_plot_class for ...]cea/plugin/base.py (1)
5-10: Optional: defer importing PyYAML to reduce baseline CLI import cost.Move
import yamlinto the properties (scripts,plot_categories,schemas) to shave a few ms off startup on “help”/light commands.cea/schemas.py (1)
253-264: Make DefaultSchemaIo’s intent explicit.Instead of calling
super()(which raises), raise directly for clarity.class DefaultSchemaIo(SchemaIo): """Read and write default files - and attempt to validate them.""" - def read(self, *args, **kwargs): - super().read(*args, **kwargs) + def read(self, *args, **kwargs): + raise AttributeError(f"{self.lm}: don't know how to read file_type {self.schema['file_type']}") - def write(self, df, *args, **kwargs): - super().write(df, *args, **kwargs) + def write(self, df, *args, **kwargs): + raise AttributeError(f"{self.lm}: don't know how to write file_type {self.schema['file_type']}") - def new(self): - super().new() + def new(self): + raise AttributeError(f"{self.lm}: don't know how to create a new Dataframe for file_type {self.schema['file_type']}")cea/plugin/loader.py (1)
21-48: Consider using f-strings consistently for string formatting.The function logic correctly validates plugin configurations and patches both config objects. However, there's a minor style inconsistency: line 17 uses f-strings while lines 37 and 44 use
.format(). Consider standardizing on f-strings throughout the file.Optional refactor for consistency:
if section_name in default_config.sections(): - raise ValueError("Plugin tried to redefine config section {section_name}".format( - section_name=section_name)) + raise ValueError(f"Plugin tried to redefine config section {section_name}")if option_name in default_config.options(section_name): - raise ValueError("Plugin tried to redefine parameter {section_name}:{option_name}".format( - section_name=section_name, option_name=option_name)) + raise ValueError(f"Plugin tried to redefine parameter {section_name}:{option_name}")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
cea/config.py(1 hunks)cea/interfaces/cli/cea_config.py(2 hunks)cea/plots/cache.py(1 hunks)cea/plugin/__init__.py(1 hunks)cea/plugin/base.py(1 hunks)cea/plugin/loader.py(1 hunks)cea/plugin/plot_category.py(1 hunks)cea/schemas.py(8 hunks)cea/utilities/__init__.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
cea/plugin/__init__.py (2)
cea/plugin/loader.py (2)
instantiate_plugin(7-18)add_plugins(21-48)cea/plugin/base.py (1)
CeaPlugin(22-87)
cea/interfaces/cli/cea_config.py (2)
cea/utilities/__init__.py (1)
parse_string_to_list(90-96)cea/config.py (19)
encode(430-432)encode(505-511)encode(526-527)encode(557-558)encode(626-627)encode(642-647)encode(674-679)encode(695-704)encode(717-720)encode(769-770)encode(787-791)encode(812-813)encode(867-879)encode(896-902)encode(960-968)encode(999-1002)get(259-261)get(438-445)get(946-954)
cea/plugin/base.py (1)
cea/plugin/plot_category.py (1)
PluginPlotCategory(11-56)
cea/plugin/loader.py (2)
cea/utilities/__init__.py (1)
parse_string_to_list(90-96)cea/plugin/base.py (1)
config(69-83)
cea/config.py (1)
cea/utilities/__init__.py (2)
unique(74-87)parse_string_to_list(90-96)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: tests (macos-latest)
- GitHub Check: tests (windows-latest)
- GitHub Check: tests (ubuntu-latest)
🔇 Additional comments (8)
cea/plugin/plot_category.py (1)
6-8: Import path and base class switch: LGTM.Using
from cea.plots.categories import PlotCategoryand deriving from it is consistent with the new plugin base.Also applies to: 11-11
cea/schemas.py (3)
146-148: Factory fallback to DefaultSchemaIo: LGTM.Gracefully handles unknown file types without instantiating the abstract base.
167-213: SchemaIo as ABC: LGTM.Abstracting
read/writeclarifies intent and prevents accidental instantiation.
279-303: Lazy pandas imports: LGTM.
CsvSchemaIo.read/newnow import pandas on demand, matching the startup-time goal.cea/interfaces/cli/cea_config.py (1)
10-10: Utility migration: LGTM.Using
cea.utilities.parse_string_to_listkeeps the CLI light and aligns with the new utility location.Also applies to: 49-50
cea/config.py (1)
20-20: LGTM! Clean refactor to centralize utility function.The refactor to import
parse_string_to_listfromcea.utilitiesimproves code reusability and reduces duplication. All usage sites within this file remain compatible.cea/utilities/__init__.py (1)
90-96: LGTM! Well-implemented utility function.The
parse_string_to_listfunction correctly handles edge cases (None input, line breaks) and provides clean CSV-style parsing with whitespace normalization.cea/plugin/__init__.py (1)
1-6: LGTM! Clean public API surface.The module correctly re-exports the plugin system's public interface and uses
__all__to make the API explicit. This is a standard Python pattern for package organization.
- Replace yaml.CLoader with yaml.safe_load to avoid arbitrary code execution and improve compatibility - Return empty list instead of dict when plots.yml is missing to match documented return type - Catch Exception instead of BaseException to allow system signals to propagate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
cea/plugin/base.py (1)
38-52: Past issues resolved; consider simplifying iteration.The previous concerns about unsafe YAML loading and incorrect return type have been fully addressed. The code now uses
yaml.safe_loadand returns an empty list when the file is missing.Optional: The
.keys()call on line 51 is redundant since iterating over a dict already yields its keys:- return [PluginPlotCategory(category_label, categories[category_label], self) for category_label in - categories.keys()] + return [PluginPlotCategory(category_label, categories[category_label], self) for category_label in categories]
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cea/plugin/base.py(1 hunks)cea/plugin/loader.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- cea/plugin/loader.py
🧰 Additional context used
🧬 Code graph analysis (1)
cea/plugin/base.py (1)
cea/plugin/plot_category.py (1)
PluginPlotCategory(11-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: tests (windows-latest)
- GitHub Check: tests (ubuntu-latest)
- GitHub Check: tests (macos-latest)
🔇 Additional comments (3)
cea/plugin/base.py (3)
54-66: Past issue resolved.The previous concern about unsafe YAML loading has been addressed. The code now uses
yaml.safe_loadwith an appropriate fallback.
68-83: LGTM!The config property correctly handles missing configuration files and uses the standard ConfigParser API appropriately.
85-87: LGTM!The
__str__method provides a clear, fully qualified class identifier suitable for serialization.
Ensures that loading scripts_yml with yaml.safe_load returns an empty dict if the file is empty or contains null, preventing potential errors when accessing scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
cea/plugin/base.py (2)
28-66: Consider adding exception handling for malformed YAML files.Currently, if any of the YAML files (
scripts.yml,plots.yml,schemas.yml) contain invalid YAML syntax,yaml.safe_loadwill raise ayaml.YAMLError, potentially crashing the plugin loading process. Consider wrapping the YAML loading in try-except blocks to provide more graceful degradation and helpful error messages.Example approach for the
scriptsproperty:@property def scripts(self): """Return the scripts.yml dictionary.""" scripts_yml = os.path.join(os.path.dirname(inspect.getmodule(self).__file__), "scripts.yml") if not os.path.exists(scripts_yml): return {} - with open(scripts_yml, "r") as scripts_yml_fp: - scripts = yaml.safe_load(scripts_yml_fp) or {} + try: + with open(scripts_yml, "r") as scripts_yml_fp: + scripts = yaml.safe_load(scripts_yml_fp) or {} + except (yaml.YAMLError, IOError) as e: + # Log warning or raise more informative error + return {} return scriptsApply similar patterns to
plot_categoriesandschemasproperties.
28-66: Optional: Extract YAML loading to a helper method.The YAML loading pattern is repeated three times with slight variations. Consider extracting this to a private helper method to reduce duplication and centralize any error handling logic.
Example:
def _load_plugin_yaml(self, filename, default_value): """Load a YAML file from the plugin directory, returning default_value if missing.""" yaml_path = os.path.join(os.path.dirname(inspect.getmodule(self).__file__), filename) if not os.path.exists(yaml_path): return default_value with open(yaml_path, "r") as fp: return yaml.safe_load(fp) or default_valueThen simplify each property:
@property def scripts(self): """Return the scripts.yml dictionary.""" return self._load_plugin_yaml("scripts.yml", {}) @property def schemas(self): """Return the schemas dict for this plugin...""" return self._load_plugin_yaml("schemas.yml", {})Note:
plot_categorieswould need slightly different handling since it processes the loaded data.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cea/plugin/base.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cea/plugin/base.py (1)
cea/plugin/plot_category.py (1)
PluginPlotCategory(11-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: tests (windows-latest)
- GitHub Check: tests (ubuntu-latest)
- GitHub Check: tests (macos-latest)
🔇 Additional comments (2)
cea/plugin/base.py (2)
1-20: LGTM! Clean imports align with startup time goals.The imports are minimal and appropriate for a base plugin class. No heavy dependencies like pandas, which aligns with the PR's objective to reduce CLI startup time.
68-87: LGTM! Config and string representation are well-implemented.The
configproperty correctly usesConfigParserwith graceful handling of missing files. The__str__method appropriately returns the fully qualified class name for serialization purposes.
This pull request mainly tries to improve the startup time of the cea cli during the first install. When running cea cli for the first time e.g.
cea dashboard, it takes around 30 seconds before it starts the command.After investigation, it is found that the bulk of the time was done during the initialization of pandas during import (~25s). One of the main ways to solve this is to import pandas lazily by moving the import where it is needed and not at the module level. However this might not totally solve the issue as any other command would eventually have to load pandas and wait for the initial 25s anyway. We could look into initializing pandas during install to prevent this, but this fix is sufficient for now.
It also introduces a refactor and modularization of the plugin system in the CEA codebase, with a focus on improving plugin management and schema handling. The most significant changes include moving plugin base and loader logic to dedicated modules, enhancing schema IO with abstract base classes, and cleaning up utility imports and pandas dependencies.
Summary by CodeRabbit
New Features
Performance
Chores