Skip to content

Conversation

@t0mdavid-m
Copy link
Member

@t0mdavid-m t0mdavid-m commented Oct 5, 2025

Summary by CodeRabbit

  • New Features

    • Added Feature Table and TIC Chromatogram views; sequence-dependent views (Sequence view, Internal fragment map) now appear when sequence data is present.
  • Performance

    • Switched key data materialization to streaming collection to reduce memory and improve loading for heatmaps and compression.
  • Chores

    • UI wiring and component registration updated to expose feature and TIC data for the new views.
  • Bug Fixes

    • Feature-level filtering for heatmap selection now returns matching feature data or clears gracefully.
  • Documentation

    • Minor README formatting tweak.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 5, 2025

Walkthrough

Adds "Feature table" and "TIC Chromatogram" UI components and wiring; requires spec1 TSV in parseDeconv and builds feature_table, mass_table (with FeatureIndex arrays) and tic datasets; switches several Polars collects to streaming; and adds feature-level filtering in the update path.

Changes

Cohort / File(s) Summary
Layout & UI options
content/FLASHDeconv/FLASHDeconvLayoutManager.py
Adds "Feature table" and "TIC Chromatogram" to COMPONENT_OPTIONS and adds feature_table / tic_chromatogram to COMPONENT_NAMES; setSequenceView() appends sequence-dependent options/names when sequence present.
Parser: feature & TIC construction, signature change
src/parse/deconv.py
Makes spec1_tsv required; loads spec1/spec2 TSVs with Level tags; merges and indexes scans; builds per-scan feature_dfs; aggregates feature_table (Apex, RTStart/RTEnd, NumScans, RTDuration); constructs tic from ms1/ms2 heatmaps; extends mass_table with per-scan FeatureIndex arrays; replaces some LazyFrame materializations with collect(streaming=True).
Renderer components
src/render/components.py
Adds Chromatogram class (title 'TIC', componentName 'TICChromatogram'); adds Tabulator branch for table_type == 'FeatureTable' (title 'Feature Table', componentName "TabulatorFeatureTable"); flips _RELEASE constant to False.
Renderer initialization / wiring
src/render/initialize.py
Imports Chromatogram; exposes feature_data in ms1 heatmap path; for comp_name == 'tic_chromatogram' fetches tic, optional feature_table, and feature_dfs (converted to JSON-serializable form) and instantiates Chromatogram; for comp_name == 'feature_table' fetches feature_table and instantiates Tabulator('FeatureTable').
Runtime update logic (feature filtering)
src/render/update.py
Adds feature-level filtering for Deconvolved MS1 Heatmap: when selection contains scanIndex and massIndex, filters feature_data streaming-wise to derive FeatureIndex and re-filters feature_data by that FeatureIndex (or sets an empty DataFrame).
Streaming materialization tweak
src/render/compression.py
Replaces sorted_data.collect() with sorted_data.collect(streaming=True) before NumPy conversions.
Submodule bump
openms-streamlit-vue-component
Updates submodule commit reference.
Docs formatting
README.md
Adds a trailing period character (formatting-only).
Ignore list
.gitignore
Adds CLAUDE.md to ignore list.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant UI as Layout Manager / Browser
  participant Init as Renderer Initialize
  participant Parser as parseDeconv / Data Store
  participant Comp as Component (Chromatogram / Tabulator)

  User->>UI: Select component ("TIC Chromatogram" or "Feature table")
  UI->>Init: Request initialization with comp_name
  Init->>Parser: Request datasets (tic, feature_table, feature_dfs, mass_table)
  Note over Parser: parseDeconv returns streamed datasets (feature_dfs, feature_table, mass_table, tic)
  Parser-->>Init: Return requested streamed dataset(s)
  Init->>Comp: Instantiate component with data_to_send (tic / feature_table / feature_dfs)
  Init-->>UI: component_arguments + data_to_send
  UI-->>User: Render component with streamed data
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay extra attention to:
    • src/parse/deconv.py: signature change for spec1_tsv, ScanIndex mapping, Apex/RT aggregation, and construction of FeatureIndex arrays in mass_table.
    • Use of collect(streaming=True) for Polars LazyFrames and ordering/memory implications.
    • Wiring in src/render/initialize.py and src/render/update.py for tic, feature_table, feature_dfs, and the new feature-level filtering.
    • UI component contracts in src/render/components.py (TabulatorFeatureTable and TICChromatogram) vs. frontend expectations.

Possibly related PRs

  • update vue #22 — Submodule updates overlap (openms-streamlit-vue-component) and may be the same submodule bump.
  • New Features #37 — Overlaps in layout manager and parse/update changes related to feature components and heatmap logic.
  • New Features #50 — Likely related work touching parseDeconv, renderer components, and initialization wiring.

Poem

I nibble rows and chase the peaks,
I hop through scans and count the tweaks,
A TIC that twinkles, a table that speaks,
Carrot-coded charts and streaming streaks,
Tiny rabbit cheers — data bliss in squeaks. 🐇📈

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Add TicChromaogram' is related to the changeset but contains a typo ('TicChromaogram' instead of 'TIC Chromatogram') and is incomplete—the PR adds multiple features including Feature Table, TIC Chromatogram, Sequence View, and enhancements to deconvolution parsing. Consider revising the title to be more comprehensive or accurate, such as 'Add TIC Chromatogram and Feature Table components' to better reflect all major changes in the changeset.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch add_tic_component

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bf00bd and 3a727dd.

📒 Files selected for processing (2)
  • .gitignore (1 hunks)
  • openms-streamlit-vue-component (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • .gitignore
🚧 Files skipped from review as they are similar to previous changes (1)
  • openms-streamlit-vue-component
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-full-app
  • GitHub Check: build-openms

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/parse/deconv.py (1)

276-283: Consider vectorized alternative for MS2 array creation.

The map_elements approach with a lambda function works correctly but may be slower for large datasets compared to vectorized operations.

If performance becomes a concern, consider this vectorized alternative:

             pl.when(pl.col('MSLevel') == 2)
               .then(
-                  pl.col('num_masses').map_elements(
-                      lambda n: [-1] * n,
-                      return_dtype=pl.List(pl.Int64)
-                  )
+                  pl.int_range(0, pl.col('num_masses'))
+                    .map_elements(lambda _: -1, return_dtype=pl.Int64)
+                    .implode()
               )
               .otherwise(pl.col('FeatureIndices'))

Or use pl.repeat if available in your Polars version. However, the current implementation is correct and may be sufficient unless profiling indicates a bottleneck.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68a7615 and aee5eb7.

📒 Files selected for processing (6)
  • content/FLASHDeconv/FLASHDeconvLayoutManager.py (2 hunks)
  • openms-streamlit-vue-component (1 hunks)
  • src/parse/deconv.py (5 hunks)
  • src/render/components.py (3 hunks)
  • src/render/initialize.py (3 hunks)
  • src/render/update.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • openms-streamlit-vue-component
🚧 Files skipped from review as they are similar to previous changes (1)
  • content/FLASHDeconv/FLASHDeconvLayoutManager.py
🧰 Additional context used
🧬 Code graph analysis (2)
src/render/initialize.py (2)
src/render/components.py (4)
  • FDRPlotly (72-75)
  • FLASHQuant (101-104)
  • Chromatogram (107-110)
  • Tabulator (48-64)
src/workflow/FileManager.py (1)
  • get_results (428-478)
src/parse/deconv.py (1)
src/workflow/FileManager.py (2)
  • store_data (327-345)
  • get_results (428-478)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-openms
  • GitHub Check: build-full-app
🔇 Additional comments (8)
src/render/initialize.py (3)

6-6: LGTM!

The Chromatogram import and feature data wiring follow the existing patterns. The use_polars=True flag is correctly used to load the LazyFrame stored in deconv.py.

Also applies to: 36-41


182-185: LGTM!

The TIC chromatogram component initialization follows the established pattern and correctly wires the data stored in deconv.py.


186-189: LGTM!

The feature table component initialization is consistent with other Tabulator components and correctly references the data stored in deconv.py.

src/render/components.py (3)

62-64: LGTM!

The FeatureTable addition to Tabulator follows the established pattern and naming conventions.


107-110: LGTM!

The Chromatogram component class follows the established pattern for component definitions.


9-9: No action required. The _RELEASE = False setting is intentional and correct for development mode. The comments on lines 7-8 clearly document this pattern: development uses localhost:5173 (the local dev server), while production uses the built distribution from js-component/dist. This is the proper state for an active development branch.

Likely an incorrect or invalid review comment.

src/parse/deconv.py (2)

81-104: LGTM!

The feature table aggregation logic correctly computes per-feature statistics including apex detection, retention time ranges, and intensity summaries.


154-175: LGTM!

The TIC table construction correctly aggregates total ion current across MS levels and maintains chronological order.

def parseDeconv(
file_manager, dataset_id, out_deconv_mzML, anno_annotated_mzML,
spec1_tsv=None, spec2_tsv=None, logger=None
spec1_tsv, spec2_tsv=None, logger=None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Handle optional spec2_tsv parameter.

While spec1_tsv is now required, spec2_tsv remains optional (default None). However, Lines 33-38 attempt to read spec2_tsv without checking if it's None, which will cause a crash.

Apply this diff to handle the optional parameter:

     spec1_df.loc[:,'Level'] = 1
     file_manager.store_data(dataset_id, 'spec1_df', spec1_df)
-    spec2_df = pd.read_csv(
-        spec2_tsv, sep='\t', usecols=[
-            'FeatureIndex', 'MonoisotopicMass', 'SumIntensity', 'RetentionTime', 
-            'ScanNum'
-        ]
-    )
-    spec2_df.loc[:,'Level'] = 2
-    file_manager.store_data(dataset_id, 'spec2_df', spec2_df)
-    del spec1_df
-    del spec2_df
+    if spec2_tsv is not None:
+        spec2_df = pd.read_csv(
+            spec2_tsv, sep='\t', usecols=[
+                'FeatureIndex', 'MonoisotopicMass', 'SumIntensity', 'RetentionTime', 
+                'ScanNum'
+            ]
+        )
+        spec2_df.loc[:,'Level'] = 2
+        file_manager.store_data(dataset_id, 'spec2_df', spec2_df)
+        del spec2_df
+    else:
+        # Store empty dataframe to maintain consistent interface
+        spec2_df = pd.DataFrame(columns=['FeatureIndex', 'MonoisotopicMass', 'SumIntensity', 'RetentionTime', 'ScanNum', 'Level'])
+        file_manager.store_data(dataset_id, 'spec2_df', spec2_df)
+        del spec2_df
+    del spec1_df

Also applies to: 33-40

🤖 Prompt for AI Agents
In src/parse/deconv.py around lines 11 and 33-40, the function signature left
spec2_tsv defaulting to None but the code later unconditionally attempts to read
and process spec2_tsv, which will crash when None; update the code to check if
spec2_tsv is not None before attempting to open/read it and only build related
variables (e.g., spec2_df/spec2_spectra) when provided, otherwise set those
variables to None or empty structures and skip any downstream processing that
requires spec2 data; ensure any logging reflects whether spec2 was supplied.

Comment on lines +44 to +79
features = file_manager.get_results(
dataset_id, ['spec1_df', 'spec2_df'], use_polars=True
)
# Build the base once
base = pl.concat([features["spec1_df"], features["spec2_df"]])

# Sort first so indices reflect first appearance order in the data
sorted_base = base.sort("RetentionTime")

# Create a ScanNum -> ScanIndex mapping in order of first occurrence
scan_index_map = (
sorted_base
.select("ScanNum")
.unique(maintain_order=True)
.with_row_count("ScanIndex")
)

# Build dataframe
features = (
sorted_base
# needed for MassIndex; global index after sort
.with_row_count("RowID")
.with_columns(
# per-ScanNum 0-based MassIndex using RowID
(pl.col("RowID") - pl.col("RowID").min().over("ScanNum")).alias("MassIndex"),
# Retention time in seconds to comply with other datastructures
(pl.col("RetentionTime") * 60).alias("RetentionTime"),
)
# Attach scan index
.join(scan_index_map, on="ScanNum", how="left")
# For now we only consider features at ms1 level
.filter(pl.col("Level") == 1)
# Drop helper columns
.drop(["Level", "RowID"])
)
file_manager.store_data(dataset_id, 'feature_dfs', features)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg "spec2_df" --type py -C 3

Repository: OpenMS/FLASHApp

Length of output: 174


🏁 Script executed:

rg "spec2_tsv" --type py -C 3

Repository: OpenMS/FLASHApp

Length of output: 6886


🏁 Script executed:

rg "file_manager.get_results\|file_manager.store_data" --type py -C 2 -A 2

Repository: OpenMS/FLASHApp

Length of output: 41


🏁 Script executed:

cat -n src/parse/deconv.py | head -100

Repository: OpenMS/FLASHApp

Length of output: 4566


🏁 Script executed:

rg "store_data.*spec2_df|spec2_df.*store" --type py -C 3

Repository: OpenMS/FLASHApp

Length of output: 347


spec2_df unconditional read will crash when spec2_tsv is None.

Lines 33–40 read spec2_tsv without checking if it's None. When called from Workflow.py with spec2_tsv=None (when MS2 data is absent), pd.read_csv(None, ...) will raise a TypeError. Add a conditional check: wrap lines 33–40 in if spec2_tsv is not None: before attempting to read and store spec2_df.

🤖 Prompt for AI Agents
In src/parse/deconv.py around lines 44 to 79, the code unconditionally reads and
stores spec2_df which will crash when spec2_tsv is None; wrap the block that
reads/creates/stores spec2_df (the lines that call pd.read_csv and
file_manager.store_data for spec2) in an if spec2_tsv is not None: guard so you
only attempt to read and store spec2_df when a path is provided; ensure
downstream code that concatenates features handles the missing spec2_df (e.g.,
only include spec2_df in pl.concat if it exists or create an empty DataFrame
placeholder) so concatenation and subsequent operations do not fail.

)
mass_row = feature_info.collect(streaming=True)
if mass_row.height == 0:
data['feature_data'] = pd.DataFrame()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix type inconsistency in empty DataFrame assignments.

The empty cases assign pd.DataFrame() (Pandas), but Line 205 collects a Polars DataFrame with collect(streaming=True). This creates type inconsistency for data['feature_data'].

Apply this diff to use Polars consistently:

             mass_row = feature_info.collect(streaming=True)
             if mass_row.height == 0:
-                data['feature_data'] = pd.DataFrame()
+                data['feature_data'] = pl.DataFrame()
             else:
                 idx = mass_row.row(0, named=True)['FeatureIndex']
                 if idx is None:
-                    data['feature_data'] = pd.DataFrame()
+                    data['feature_data'] = pl.DataFrame()
                 else:
                     feature_data = (
                         feature_data
                         .filter(pl.col("FeatureIndex") == idx)
                         .sort("RetentionTime")
                     )
                     data['feature_data'] = feature_data.collect(streaming=True)
         else:
-            data['feature_data'] = pd.DataFrame()
+            data['feature_data'] = pl.DataFrame()

Also applies to: 198-198, 207-207

🤖 Prompt for AI Agents
In src/render/update.py around lines 194, 198 and 207, empty cases assign pandas
DataFrame (pd.DataFrame()) which conflicts with later code that expects a Polars
DataFrame collected via collect(streaming=True); replace those pd.DataFrame()
assignments with an equivalent empty polars DataFrame (e.g., pl.DataFrame()) so
the variable type is consistently a Polars DataFrame throughout; ensure the
import for polars (pl) is available at the top of the file if not already.

- Send feature_table and feature_dfs data to TIC chromatogram component
- Vue component displays selected feature as red line trace
- Both TIC and feature traces share same y-axis scale
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/render/initialize.py (2)

182-202: Consider using use_polars=True for consistency and simplify type handling.

Two observations:

  1. Inconsistency: Line 183 doesn't specify use_polars=True, so feature_dfs defaults to pandas. However, ms1_deconv_heat_map (line 39) fetches the same feature_dfs with use_polars=True. This makes the LazyFrame check (lines 190-192) and Polars DataFrame check (lines 193-195) dead code in the current implementation.

  2. Type checking: Using isinstance is more explicit and maintainable than duck-typing with hasattr.

If you want to use Polars consistently:

-        data = file_manager.get_results(selected_data,  ['tic', 'feature_table', 'feature_dfs'])
+        data = file_manager.get_results(selected_data,  ['tic', 'feature_table', 'feature_dfs'], use_polars=True)
         data_to_send['tic'] = data['tic']
         data_to_send['feature_table'] = data.get('feature_table')
         # feature_dfs contains per-scan intensity data for each feature
         feature_dfs = data.get('feature_dfs')
         if feature_dfs is not None:
             # Convert DataFrame to list of dicts for JSON serialization
-            if hasattr(feature_dfs, 'collect'):
+            if isinstance(feature_dfs, pl.LazyFrame):
                 # It's a Polars LazyFrame
                 df = feature_dfs.collect()
-            elif hasattr(feature_dfs, 'to_dicts'):
+            elif isinstance(feature_dfs, pl.DataFrame):
                 # It's a Polars DataFrame
                 df = feature_dfs
             else:

36-41: Inconsistent handling of Polars LazyFrame across branches.

The ms1_deconv_heat_map branch (lines 36-41) passes the feature_dfs LazyFrame to data_to_send without collecting, while the tic_chromatogram branch (lines 190-201) collects and converts it to dicts immediately. This inconsistency makes the code harder to follow. Consider collecting and converting to dicts here for consistency:

feature_data = file_manager.get_results(
    selected_data, ['feature_dfs'], use_polars=True
)['feature_dfs']
if hasattr(feature_data, 'collect'):
    feature_data = feature_data.collect().to_dicts()
data_to_send['feature_data'] = feature_data
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aee5eb7 and 0bf00bd.

📒 Files selected for processing (2)
  • openms-streamlit-vue-component (1 hunks)
  • src/render/initialize.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • openms-streamlit-vue-component
🧰 Additional context used
🧬 Code graph analysis (1)
src/render/initialize.py (2)
src/render/components.py (1)
  • Chromatogram (107-110)
src/workflow/FileManager.py (1)
  • get_results (428-478)
🔇 Additional comments (2)
src/render/initialize.py (2)

6-6: LGTM!

Import is valid and used in the new tic_chromatogram branch.


203-206: LGTM!

Follows the established pattern for table components.

Clicking on the red feature trace line automatically sets integration
bounds to the feature's RTStart/RTEnd and calculates both TIC and
feature area.
Feature trace and integration are MS1-only features. When MS2 filter
is active: hide feature trace, disable toggle button, clear any
existing integration.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants