Skip to content

Conversation

@reyery
Copy link
Member

@reyery reyery commented Aug 18, 2025

This pull request mainly tries to improve the startup time of the cea cli during the first install. When running cea cli for the first time e.g. cea dashboard, it takes around 30 seconds before it starts the command.

After investigation, it is found that the bulk of the time was done during the initialization of pandas during import (~25s). One of the main ways to solve this is to import pandas lazily by moving the import where it is needed and not at the module level. However this might not totally solve the issue as any other command would eventually have to load pandas and wait for the initial 25s anyway. We could look into initializing pandas during install to prevent this, but this fix is sufficient for now.

It also introduces a refactor and modularization of the plugin system in the CEA codebase, with a focus on improving plugin management and schema handling. The most significant changes include moving plugin base and loader logic to dedicated modules, enhancing schema IO with abstract base classes, and cleaning up utility imports and pandas dependencies.

Summary by CodeRabbit

  • New Features

    • Public plugin APIs for discovering, instantiating and managing plugins.
    • Plugins can expose metadata, scripts and plots via adjacent configuration files.
    • Schema I/O converted to an abstract-base pattern for more consistent extensibility.
    • CLI/plugin handling now integrates plugin-provided configuration into app configs.
  • Performance

    • Cache loading is now lazy to reduce upfront resource usage.
  • Chores

    • Added a utility to parse comma-separated strings into lists.

reyery added 9 commits August 17, 2025 21:59
Moved cea/plugin.py to cea/plugin/plugin.py and added an __init__.py to expose instantiate_plugin and add_plugins. This improves the organization and import structure of the plugin package.
Refactored the parse_string_to_list function from cea.config to cea.utilities for better modularity and reuse. Updated imports in relevant files to use the new location.
Moved PluginPlotCategory and PluginPlotBase from plugin.py to a new plot.py module for better code organization. Also updated imports and usage in plugin.py to reflect this change, and simplified the import of parse_string_to_list.
Moved instantiate_plugin and add_plugins functions from plugin.py to a new loader.py module for better separation of concerns. Updated __init__.py to import these functions from loader.py instead of plugin.py.
Renamed plugin.py to base.py and updated __init__.py to import and expose the CeaPlugin base class. This change clarifies the purpose of the base class and makes it available for user plugin imports.
Relocated pandas imports from the module level to within the CsvSchemaIo class methods that require them. This reduces unnecessary imports when pandas is not needed, improving module import performance and avoiding potential import errors if pandas is not installed.
SchemaIo now inherits from abc.ABC and its methods read, write, and new are marked as abstract. A new DefaultSchemaIo class is introduced to handle default cases, replacing SchemaIo in create_schema_io when file_type is unknown. This improves code clarity and enforces implementation of required methods in subclasses.
The pandas import is now performed within the load_cached_value method instead of at the module level. This change reduces unnecessary imports when the method is not used, potentially improving module load performance.
Changed the return type of SchemaIo.new to Any and updated the typing imports to include Any.
@reyery reyery marked this pull request as ready for review October 20, 2025 11:23
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 20, 2025

Walkthrough

Moves string parsing into utilities, adds a plugin API (base, loader, public exports) that patches configs from plugin-provided config, makes pandas import lazy in plot cache, and converts SchemaIo into an abstract base with a DefaultSchemaIo fallback.

Changes

Cohort / File(s) Summary
Plugin public API & exports
cea/plugin/__init__.py
Re-exports plugin utilities: instantiate_plugin, add_plugins, and CeaPlugin via __all__.
Plugin base & metadata
cea/plugin/base.py
Adds CeaPlugin exposing scripts, plot_categories, schemas, config properties and __str__, loading adjacent YAML/config files.
Plugin loader & config patching
cea/plugin/loader.py
Adds instantiate_plugin(fqname) for dynamic import/instantiation and add_plugins(default_config, user_config) that instantiates plugins and patches configs with conflict checks.
Plugin plot category
cea/plugin/plot_category.py
Simplifies PluginPlotCategory to inherit PlotCategory (imported from cea.plots.categories) and removes legacy scaffolding.
Utilities / config callers
cea/utilities/__init__.py, cea/config.py, cea/interfaces/cli/cea_config.py
Adds parse_string_to_list() to utilities; cea.config now imports it instead of defining it; CLI uses parse_string_to_list() and removes an unused cea.inputlocator import.
Lazy pandas import
cea/plots/cache.py
Moves import pandas as pd from module level into load_cached_value() to defer pandas import until needed.
Schema I/O abstraction
cea/schemas.py
Converts SchemaIo into an abc.ABC with abstract read/write/new; adds DefaultSchemaIo and updates factory to return DefaultSchemaIo for unknown file types.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI as cea_config.py
    participant Utils as cea.utilities
    participant Loader as cea/plugin/loader.py
    participant Plugin as CeaPlugin
    participant Defaults as default_config
    participant UserCfg as user_config

    User->>CLI: run add-plugins
    CLI->>Utils: parse_string_to_list(plugins_line)
    CLI->>Loader: add_plugins(defaults, user_cfg)

    Loader->>Loader: iterate plugin names
    Loader->>Plugin: instantiate_plugin(fqname)
    Plugin-->>Loader: plugin instance or None
    alt plugin instance
      Loader->>Plugin: read plugin.config (sections/options)
      Loader->>Defaults: create/patch sections, set option defaults
      Loader->>UserCfg: ensure sections, mirror simple options
    end
    Loader-->>CLI: configs updated in-place
    CLI-->>User: finished
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 From burrows of YAML and ink,

I hop through configs in a blink,
Parsers moved, pandas waits at bay,
Plugins bring their own display,
Schemas now stand proud and think.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "Fix startup time of cli" directly corresponds to the primary objective documented in the PR summary, which identifies slow CLI startup on first run as the main issue and proposes lazy-loading pandas as the solution. The changeset contains multiple targeted improvements aligned with this goal: lazy importing of pandas in cea/plots/cache.py (addressing the ~25s delay), refactoring the plugin system to reduce unnecessary imports, and consolidating utility functions to streamline the import chain. The title is concise, specific, and clearly communicates the main change without vagueness or off-topic elements.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-startup-time

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cea/plugin/plot_category.py (1)

36-56: Fix late‑binding bug in nested Plot classes (plot_config/plugin captured by reference).

plot_config and plugin are looked up when __init__ runs, so all generated Plot classes will use the last loop’s values. Freeze them via defaults or a small factory.

-            class Plot(PluginPlotBase):
+            class Plot(PluginPlotBase):
                 name = plot_label
                 category_name = category.name
                 category_path = category.name
-                expected_parameters = plot_config.get("expected-parameters", {})
+                expected_parameters = dict(plot_config.get("expected-parameters", {}))
                 if "scenario-name" not in expected_parameters:
                     expected_parameters["scenario-name"] = "general:scenario-name"

-                def __init__(self, project, parameters, cache):
-                    super(Plot, self).__init__(project, parameters, cache, plugin, plot_config)
+                def __init__(self, project, parameters, cache,
+                             _plugin=plugin, _plot_config=plot_config, _cat_name=category.name):
+                    super(Plot, self).__init__(project, parameters, cache, _plugin, _plot_config)
-                    self.category_name = category.name
-                    self.category_path = category.name
+                    self.category_name = _cat_name
+                    self.category_path = _cat_name
🧹 Nitpick comments (5)
cea/plots/cache.py (1)

123-125: Lazy import of pandas: good; add a friendly ImportError guard.

Defers pandas cost until needed. Consider guarding the import to produce a clearer runtime message if pandas isn’t installed.

     def load_cached_value(self, data_path, parameters):
         """Load a Dataframe from disk"""
-        import pandas as pd
+        try:
+            import pandas as pd
+        except ImportError as e:
+            raise RuntimeError("pandas is required to load cached plot data. Please install pandas.") from e
         return pd.read_pickle(self._cached_data_file(data_path, parameters))
cea/plugin/plot_category.py (1)

29-35: Use “self” in property and return a list, not a generator, if callers expect a list.

Minor readability: rename the parameter to self. If callers expect a list, wrap yield into a list comprehension.

-    @property
-    def plots(category):
+    @property
+    def plots(self):
@@
-        for plot_config in category.plot_configs:
+        for plot_config in self.plot_configs:

Or:

-        for plot_config in self.plot_configs:
-            ...
-            yield Plot
+        return [make_plot_class for ...]
cea/plugin/base.py (1)

5-10: Optional: defer importing PyYAML to reduce baseline CLI import cost.

Move import yaml into the properties (scripts, plot_categories, schemas) to shave a few ms off startup on “help”/light commands.

cea/schemas.py (1)

253-264: Make DefaultSchemaIo’s intent explicit.

Instead of calling super() (which raises), raise directly for clarity.

 class DefaultSchemaIo(SchemaIo):
     """Read and write default files - and attempt to validate them."""
-    def read(self, *args, **kwargs):
-        super().read(*args, **kwargs)
+    def read(self, *args, **kwargs):
+        raise AttributeError(f"{self.lm}: don't know how to read file_type {self.schema['file_type']}")
 
-    def write(self, df, *args, **kwargs):
-        super().write(df, *args, **kwargs)
+    def write(self, df, *args, **kwargs):
+        raise AttributeError(f"{self.lm}: don't know how to write file_type {self.schema['file_type']}")
 
-    def new(self):
-        super().new()
+    def new(self):
+        raise AttributeError(f"{self.lm}: don't know how to create a new Dataframe for file_type {self.schema['file_type']}")
cea/plugin/loader.py (1)

21-48: Consider using f-strings consistently for string formatting.

The function logic correctly validates plugin configurations and patches both config objects. However, there's a minor style inconsistency: line 17 uses f-strings while lines 37 and 44 use .format(). Consider standardizing on f-strings throughout the file.

Optional refactor for consistency:

             if section_name in default_config.sections():
-                raise ValueError("Plugin tried to redefine config section {section_name}".format(
-                    section_name=section_name))
+                raise ValueError(f"Plugin tried to redefine config section {section_name}")
                 if option_name in default_config.options(section_name):
-                    raise ValueError("Plugin tried to redefine parameter {section_name}:{option_name}".format(
-                        section_name=section_name, option_name=option_name))
+                    raise ValueError(f"Plugin tried to redefine parameter {section_name}:{option_name}")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad31cb0 and d2fb0aa.

📒 Files selected for processing (9)
  • cea/config.py (1 hunks)
  • cea/interfaces/cli/cea_config.py (2 hunks)
  • cea/plots/cache.py (1 hunks)
  • cea/plugin/__init__.py (1 hunks)
  • cea/plugin/base.py (1 hunks)
  • cea/plugin/loader.py (1 hunks)
  • cea/plugin/plot_category.py (1 hunks)
  • cea/schemas.py (8 hunks)
  • cea/utilities/__init__.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
cea/plugin/__init__.py (2)
cea/plugin/loader.py (2)
  • instantiate_plugin (7-18)
  • add_plugins (21-48)
cea/plugin/base.py (1)
  • CeaPlugin (22-87)
cea/interfaces/cli/cea_config.py (2)
cea/utilities/__init__.py (1)
  • parse_string_to_list (90-96)
cea/config.py (19)
  • encode (430-432)
  • encode (505-511)
  • encode (526-527)
  • encode (557-558)
  • encode (626-627)
  • encode (642-647)
  • encode (674-679)
  • encode (695-704)
  • encode (717-720)
  • encode (769-770)
  • encode (787-791)
  • encode (812-813)
  • encode (867-879)
  • encode (896-902)
  • encode (960-968)
  • encode (999-1002)
  • get (259-261)
  • get (438-445)
  • get (946-954)
cea/plugin/base.py (1)
cea/plugin/plot_category.py (1)
  • PluginPlotCategory (11-56)
cea/plugin/loader.py (2)
cea/utilities/__init__.py (1)
  • parse_string_to_list (90-96)
cea/plugin/base.py (1)
  • config (69-83)
cea/config.py (1)
cea/utilities/__init__.py (2)
  • unique (74-87)
  • parse_string_to_list (90-96)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: tests (macos-latest)
  • GitHub Check: tests (windows-latest)
  • GitHub Check: tests (ubuntu-latest)
🔇 Additional comments (8)
cea/plugin/plot_category.py (1)

6-8: Import path and base class switch: LGTM.

Using from cea.plots.categories import PlotCategory and deriving from it is consistent with the new plugin base.

Also applies to: 11-11

cea/schemas.py (3)

146-148: Factory fallback to DefaultSchemaIo: LGTM.

Gracefully handles unknown file types without instantiating the abstract base.


167-213: SchemaIo as ABC: LGTM.

Abstracting read/write clarifies intent and prevents accidental instantiation.


279-303: Lazy pandas imports: LGTM.

CsvSchemaIo.read/new now import pandas on demand, matching the startup-time goal.

cea/interfaces/cli/cea_config.py (1)

10-10: Utility migration: LGTM.

Using cea.utilities.parse_string_to_list keeps the CLI light and aligns with the new utility location.

Also applies to: 49-50

cea/config.py (1)

20-20: LGTM! Clean refactor to centralize utility function.

The refactor to import parse_string_to_list from cea.utilities improves code reusability and reduces duplication. All usage sites within this file remain compatible.

cea/utilities/__init__.py (1)

90-96: LGTM! Well-implemented utility function.

The parse_string_to_list function correctly handles edge cases (None input, line breaks) and provides clean CSV-style parsing with whitespace normalization.

cea/plugin/__init__.py (1)

1-6: LGTM! Clean public API surface.

The module correctly re-exports the plugin system's public interface and uses __all__ to make the API explicit. This is a standard Python pattern for package organization.

- Replace yaml.CLoader with yaml.safe_load to avoid arbitrary code execution and improve compatibility
- Return empty list instead of dict when plots.yml is missing to match documented return type
- Catch Exception instead of BaseException to allow system signals to propagate
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cea/plugin/base.py (1)

38-52: Past issues resolved; consider simplifying iteration.

The previous concerns about unsafe YAML loading and incorrect return type have been fully addressed. The code now uses yaml.safe_load and returns an empty list when the file is missing.

Optional: The .keys() call on line 51 is redundant since iterating over a dict already yields its keys:

-        return [PluginPlotCategory(category_label, categories[category_label], self) for category_label in
-                categories.keys()]
+        return [PluginPlotCategory(category_label, categories[category_label], self) for category_label in categories]
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2fb0aa and dc33a8d.

📒 Files selected for processing (2)
  • cea/plugin/base.py (1 hunks)
  • cea/plugin/loader.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cea/plugin/loader.py
🧰 Additional context used
🧬 Code graph analysis (1)
cea/plugin/base.py (1)
cea/plugin/plot_category.py (1)
  • PluginPlotCategory (11-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: tests (windows-latest)
  • GitHub Check: tests (ubuntu-latest)
  • GitHub Check: tests (macos-latest)
🔇 Additional comments (3)
cea/plugin/base.py (3)

54-66: Past issue resolved.

The previous concern about unsafe YAML loading has been addressed. The code now uses yaml.safe_load with an appropriate fallback.


68-83: LGTM!

The config property correctly handles missing configuration files and uses the standard ConfigParser API appropriately.


85-87: LGTM!

The __str__ method provides a clear, fully qualified class identifier suitable for serialization.

Ensures that loading scripts_yml with yaml.safe_load returns an empty dict if the file is empty or contains null, preventing potential errors when accessing scripts.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
cea/plugin/base.py (2)

28-66: Consider adding exception handling for malformed YAML files.

Currently, if any of the YAML files (scripts.yml, plots.yml, schemas.yml) contain invalid YAML syntax, yaml.safe_load will raise a yaml.YAMLError, potentially crashing the plugin loading process. Consider wrapping the YAML loading in try-except blocks to provide more graceful degradation and helpful error messages.

Example approach for the scripts property:

     @property
     def scripts(self):
         """Return the scripts.yml dictionary."""
         scripts_yml = os.path.join(os.path.dirname(inspect.getmodule(self).__file__), "scripts.yml")
         if not os.path.exists(scripts_yml):
             return {}
-        with open(scripts_yml, "r") as scripts_yml_fp:
-            scripts = yaml.safe_load(scripts_yml_fp) or {}
+        try:
+            with open(scripts_yml, "r") as scripts_yml_fp:
+                scripts = yaml.safe_load(scripts_yml_fp) or {}
+        except (yaml.YAMLError, IOError) as e:
+            # Log warning or raise more informative error
+            return {}
         return scripts

Apply similar patterns to plot_categories and schemas properties.


28-66: Optional: Extract YAML loading to a helper method.

The YAML loading pattern is repeated three times with slight variations. Consider extracting this to a private helper method to reduce duplication and centralize any error handling logic.

Example:

def _load_plugin_yaml(self, filename, default_value):
    """Load a YAML file from the plugin directory, returning default_value if missing."""
    yaml_path = os.path.join(os.path.dirname(inspect.getmodule(self).__file__), filename)
    if not os.path.exists(yaml_path):
        return default_value
    with open(yaml_path, "r") as fp:
        return yaml.safe_load(fp) or default_value

Then simplify each property:

@property
def scripts(self):
    """Return the scripts.yml dictionary."""
    return self._load_plugin_yaml("scripts.yml", {})

@property
def schemas(self):
    """Return the schemas dict for this plugin..."""
    return self._load_plugin_yaml("schemas.yml", {})

Note: plot_categories would need slightly different handling since it processes the loaded data.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc33a8d and 402ae91.

📒 Files selected for processing (1)
  • cea/plugin/base.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cea/plugin/base.py (1)
cea/plugin/plot_category.py (1)
  • PluginPlotCategory (11-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: tests (windows-latest)
  • GitHub Check: tests (ubuntu-latest)
  • GitHub Check: tests (macos-latest)
🔇 Additional comments (2)
cea/plugin/base.py (2)

1-20: LGTM! Clean imports align with startup time goals.

The imports are minimal and appropriate for a base plugin class. No heavy dependencies like pandas, which aligns with the PR's objective to reduce CLI startup time.


68-87: LGTM! Config and string representation are well-implemented.

The config property correctly uses ConfigParser with graceful handling of missing files. The __str__ method appropriately returns the fully qualified class name for serialization purposes.

@reyery reyery merged commit 8662527 into master Oct 20, 2025
8 checks passed
@reyery reyery deleted the fix-startup-time branch October 20, 2025 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants