Utility web app to analyze & compare files (images and videos) logged to Neptune #32

LeoRoccoBreedt · 2025-10-07T17:01:07Z

Description

Include a summary of the changes and the related issue.

Related to: <ClickUp/JIRA task name>

Any expected test failures?

Add a [X] to relevant checklist items

❔ This change

adds a new feature
fixes breaking code
is cosmetic (refactoring/reformatting)

✔️ Pre-merge checklist

Refactored code (sourcery)
Tested code locally
Precommit installed and run before pushing changes
Added code to GitHub tests (notebooks, scripts)
Updated GitHub README
Updated the projects overview page on Notion

🧪 Test Configuration

OS: Windows 11
Python version: 3.12
Neptune version: 0.27
Affected libraries with version: streamlit

Summary by Sourcery

Introduce a new Streamlit-based File Comparison App for downloading, visualizing, and comparing image and video series logged in Neptune experiments, and update documentation to list the new visualization tool.

New Features:

Add a Streamlit application to fetch and display media files from Neptune experiments with interactive gallery, filtering, and pagination controls
Implement support for image and video media types with flexible layout options (step-based and experiment-based columns)

Build:

Add requirements.txt and .streamlit configuration for the new file comparison app

Documentation:

Update main README to include a Visualization tools section listing the File Comparer
Add a dedicated README for the Neptune File Series Comparison App

…iments

sourcery-ai · 2025-10-07T17:01:14Z

Reviewer's Guide

This PR introduces a new Streamlit-based utility for downloading, visualizing, and comparing media files (images and videos) from local folders or Neptune experiments, and updates the project documentation to include this new tool.

File-Level Changes

Change	Details	Files
Add Streamlit-based file comparison utility	Implement interactive app with Neptune integration, file discovery, download, and caching Build gallery UI with experiment filters, pagination, orientation controls, and media previews Extract file metadata, support image/video display, and step-based comparison grid Cache download results and manage state via Streamlit session Define app dependencies and theme configuration	`utils/visualization_tools/file_comparison_app/file_comparison_app.py` `utils/visualization_tools/file_comparison_app/README.md` `utils/visualization_tools/file_comparison_app/requirements.txt` `utils/visualization_tools/file_comparison_app/.streamlit/config.toml`
Update project README to include new visualization tool	Reformat existing tutorial tables for alignment and clarity Add a new Visualization tools section with File Comparer entry Adjust link references for the file comparison app	`README.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

utils/visualization_tools/file_analyzer/file_analyzer_app.py

… input, read env variables

utils/visualization_tools/file_comparer/file_comparer_app.py

utils/visualization_tools/file_comparison_app/file_comparison_app.py

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

The Streamlit app file is very large and mixes UI, data fetching, and utilities—consider refactoring into separate modules (e.g., data layer, UI components) to improve maintainability and readability.
The download_neptune_files function currently pulls all file types then filters locally; adding an extension or attribute filter before download could reduce bandwidth and speed up processing.
Rendering large galleries and downloading many files may block the UI—consider adding explicit progress indicators or lazy loading to improve responsiveness on big datasets.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The Streamlit app file is very large and mixes UI, data fetching, and utilities—consider refactoring into separate modules (e.g., data layer, UI components) to improve maintainability and readability.
- The download_neptune_files function currently pulls all file types then filters locally; adding an extension or attribute filter before download could reduce bandwidth and speed up processing.
- Rendering large galleries and downloading many files may block the UI—consider adding explicit progress indicators or lazy loading to improve responsiveness on big datasets.

## Individual Comments

### Comment 1
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:179-188` </location>
<code_context>
+    with st.sidebar.expander("Neptune Configuration", icon=":material/settings:", expanded=True):
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Session state is updated for neptune_project but not for neptune_api_token.

Please update st.session_state.neptune_api_token when the user enters a new token to ensure consistency.
</issue_to_address>

### Comment 2
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:245-252` </location>
<code_context>
+            experiment_regex_valid = True
+
+        # Attribute regex
+        attribute_regex = st.text_input(
+            "Attribute Regex",
+            value=st.session_state.get("attribute_regex"),
</code_context>

<issue_to_address>
**suggestion:** Default value for attribute_regex may be None, which could cause issues.

If the value is None, the text input will display None instead of a usable default. Use an empty string or a default pattern to improve user experience.

```suggestion
        # Attribute regex
        attribute_regex = st.text_input(
            "Attribute Regex",
            value=st.session_state.get("attribute_regex") or "",
            help="Regex pattern to match file attribute names. Defaults to `None` (all attributes)",
            placeholder="image_.*",
            icon=":material/search:",
        )
```
</issue_to_address>

### Comment 3
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:267-270` </location>
<code_context>
+            "Download and Visualize", icon=":material/download:", width="stretch", type="primary"
+        ):
+            # Check if experiment regex is valid before proceeding
+            if not experiment_regex_valid or not experiment_regex or not experiment_regex.strip():
+                st.error("Cannot proceed: Experiment regex is required!", icon=":material/error:")
+                st.stop()
</code_context>

<issue_to_address>
**suggestion:** Redundant experiment regex validation logic.

Simplify the conditional by relying solely on experiment_regex_valid, as it already covers regex validity.

```suggestion
            # Check if experiment regex is valid before proceeding
            if not experiment_regex_valid:
                st.error("Cannot proceed: Experiment regex is required!", icon=":material/error:")
                st.stop()
```
</issue_to_address>

### Comment 4
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:316-327` </location>
<code_context>
+
+        # Filter files by media type and regex patterns
+        filtered_files = []
+        for file_info in st.session_state.files:
+            # Check if it's a media file
+            if file_info.get("is_media", False):
</code_context>

<issue_to_address>
**suggestion:** Filtering logic may skip files with missing or malformed relative_path.

Add a check to handle cases where relative_path is missing or malformed to prevent errors and ensure valid files are not skipped.

```suggestion
        filtered_files = []
        for file_info in st.session_state.files:
            # Check if it's a media file
            if file_info.get("is_media", False):
                # Check for valid relative_path
                relative_path = file_info.get("relative_path")
                if (
                    isinstance(relative_path, str)
                    and relative_path.strip() != ""
                ):
                    try:
                        path_parts = Path(relative_path).parts
                        if len(path_parts) > 1:
                            experiment_name = path_parts[0]
                            if re.search(experiment_pattern, experiment_name) and re.search(
                                attribute_pattern, file_info["name"]
                            ):
                                filtered_files.append(file_info)
                    except Exception:
                        # Optionally log or handle malformed path
                        continue
                else:
                    # Optionally log or handle missing/invalid relative_path
                    continue
```
</issue_to_address>

### Comment 5
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:512-514` </location>
<code_context>
+            # Pagination controls
+            col1, col2, col3, col4, col5 = st.columns([1, 1, 2, 1, 1])
+
+            # Calculate current column range
+            current_column_index = st.session_state.current_column_index
+            current_columns = column_items[
+                current_column_index : current_column_index + columns_per_page
+            ]
</code_context>

<issue_to_address>
**issue:** Pagination logic may result in empty current_columns if index is out of bounds.

Add a check to ensure current_column_index does not exceed the length of column_items to prevent empty current_columns and potential UI issues.
</issue_to_address>

### Comment 6
<location> `utils/visualization_tools/file_comparison_app/README.md:66` </location>
<code_context>
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and limitations under the License.
+
+[Github issues]: https://github.com/neptune-ai/scale-examples/issues/new
+[Support center]: https://support.neptune.ai/
</code_context>

<issue_to_address>
**suggestion (typo):** Typo: 'Github' should be 'GitHub' for consistency.

Please update 'Github issues' to 'GitHub issues' for consistent spelling.

```suggestion
[GitHub issues]: https://github.com/neptune-ai/scale-examples/issues/new
```
</issue_to_address>

### Comment 7
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:1` </location>
<code_context>
+import os
+import re
+from pathlib import Path
</code_context>

<issue_to_address>
**issue (complexity):** Consider refactoring the code into focused modules for utilities, Neptune downloading, and gallery rendering to reduce duplication and nested branches.

Here are a few low-risk refactorings that will shave this 1,700 line file down into focused modules, remove almost all duplication and nested branches, and keep every feature exactly as-is.

1) Extract all your simple file utilities into `utils.py`  
```python
# utils.py
import os
from pathlib import Path
import pandas as pd
from typing import List, Dict, Any

SUPPORTED_IMG = {".png",".jpg",".jpeg",".gif",".bmp"}
SUPPORTED_VID = {".mp4",".avi",".mov",".mkv",".webm"}

def get_file_size_mb(path: str) -> float:
    try:
        return os.path.getsize(path) / (1024*1024)
    except OSError:
        return 0.0

def is_media_file(path: str) -> bool:
    return Path(path).suffix.lower() in SUPPORTED_IMG|SUPPORTED_VID

def create_file_statistics(files: List[Dict[str,Any]]) -> pd.DataFrame:
    if not files: return pd.DataFrame()
    df = pd.DataFrame(files)
    df["modified_date"] = pd.to_datetime(df["modified"], unit="s")
    return df
```

2) Move **all** your Neptune download/fetch code into `neptune_downloader.py`  
```python
# neptune_downloader.py
import re
from pathlib import Path
import streamlit as st
import neptune_query as nq
from neptune_query.filters import Filter
from typing import List, Dict, Any
from utils import get_file_size_mb, is_media_file

@st.cache_data
def download_neptune_files(
    project_name:str,
    exp_regex:str,
    attr_regex:str,
    download_dir:str,
    include_archived:bool
) -> (List[Dict[str,Any]], Dict[str,Any]):
    if not nq: 
        st.error("…neptune‐query missing…")
        return [], {}
    # …everything from listing exps through building downloaded_files & download_info…
    return downloaded_files, download_info
```

3) Collapse the two huge “Steps vs Experiments” branches into one generic renderer in `gallery.py`  
```python
# gallery.py
import re
import streamlit as st
from pathlib import Path
from typing import Dict

def build_folder_step_grid(media_files, folder_toggles):
    grid, all_steps = {}, set()
    for f in media_files:
        parts = Path(f["relative_path"]).parts
        if len(parts)>1 and folder_toggles.get(parts[0],True):
            step = int(re.search(r"step[_-]?(\d+)", f["name"]+f["relative_path"]).group(1) or 0)
            grid.setdefault(parts[0], {})[step] = f
            all_steps.add(step)
    return grid, sorted(all_steps), sorted(grid)

def render_grid(grid:Dict, steps, folders, layout, cols_per_page):
    # pagination + common header/row logic
    current = get_current_page(steps if layout=="Steps" else folders, cols_per_page)
    headers = ["Experiment" if layout=="Steps" else "Step"] + [
        f"{'Step' if layout=='Steps' else ''} {c}" for c in current
    ]
    col_cfg = [15] + [ (100-15)/len(current) ]*len(current)
    header_cols = st.columns(col_cfg)
    for i,h in enumerate(headers): header_cols[i].write(f"**{h}**")

    for row_key in (folders if layout=="Steps" else steps):
        row = st.columns(col_cfg)
        row[0].write(f"**{row_key}**")
        for idx, col_key in enumerate(current):
            cell = grid.get(row_key,{}).get(col_key) if layout=="Steps" else grid.get(col_key,{}).get(row_key)
            with row[idx+1]:
                if cell:
                    if Path(cell["path"]).suffix.lower() in {".mp4"}:
                        st.video(cell["path"])
                    else:
                        st.image(cell["path"], width="stretch")
                else:
                    st.write("—")
```

4) Finally, your `main.py` collapses to ~100 lines:  
```python
# main.py
import streamlit as st
from utils import create_file_statistics
from neptune_downloader import download_neptune_files
from gallery import build_folder_step_grid, render_grid

def main():
    # …sidebar inputs…
    files, info = download_neptune_files(...)
    st.session_state.update(files=files, download_info=info)

    filtered = [f for f in files if is_media_file(f["path"]) and <regex filters>]
    grid, steps, folders = build_folder_step_grid(filtered, folder_toggles)
    stats = create_file_statistics(filtered)
    if not stats.empty:
        render_grid(grid, steps, folders, layout_orientation, images_per_page)
    else:
        st.info("No media files…")

if __name__=="__main__":
    main()
```

By extracting:
- `utils.py` (pure helpers),
- `neptune_downloader.py` (all Neptune I/O),
- `gallery.py` (single renderer with one branch),
you remove almost all duplication, collapse nesting, and keep every feature intact.
</issue_to_address>

### Comment 8
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:95-97` </location>
<code_context>
@st.cache_data()
def download_neptune_files(
    project_name: str,
    experiment_regex: str,
    attribute_regex: str,
    download_dir: str,
    include_archived: bool,
) -> List[Dict[str, Any]]:
    """Download files from Neptune and return file information"""
    if not NEPTUNE_AVAILABLE:
        st.error("Neptune Query is not available. Please install neptune-query package.")
        return []

    try:
        # List experiments
        filter = Filter.name(experiment_regex)
        if not include_archived:
            filter = filter & Filter.eq("sys/archived", False)

        exps = nq.list_experiments(project=project_name, experiments=filter)

        if not exps:
            st.warning(f"No experiments found matching pattern: {experiment_regex}")
            return [], {}

        # Fetch files from experiments using the attribute regex
        files = nq.fetch_series(project=project_name, experiments=exps, attributes=attribute_regex)

        # Create project-specific download directory
        # Use project name as top-level folder to prevent mixing experiments from different projects
        project_download_dir = Path(download_dir) / project_name.replace("/", "_")
        project_download_dir.mkdir(parents=True, exist_ok=True)

        # Download files to project-specific directory
        # TODO: Download only supported file types
        nq.download_files(files=files, destination=str(project_download_dir))

        # Convert to our file format
        downloaded_files = []

        # Scan the project-specific download directory for files
        # Only include files from folders that match the experiment regex
        all_files = []
        for item in project_download_dir.iterdir():
            if item.is_dir() and re.search(experiment_regex, item.name):
                # Add all files from this matching folder
                all_files.extend(item.rglob("*.*"))

        media_count = 0
        for file_path in all_files:
            try:
                file_info = {
                    "name": file_path.name,
                    "path": str(file_path),
                    "relative_path": str(file_path.relative_to(project_download_dir)),
                    "size_mb": get_file_size_mb(str(file_path)),
                    "extension": file_path.suffix.lower(),
                    "is_media": is_media_file(str(file_path)),
                    "is_video": is_video_file(str(file_path)),
                    "modified": file_path.stat().st_mtime,
                }
                downloaded_files.append(file_info)

                if file_info["is_media"]:
                    media_count += 1

            except Exception as e:
                st.warning(f"Error processing file {file_path}: {e}")

        # Store download info for display in expander
        download_info = {
            "project_name": project_name,
            "experiments": exps,
            "attribute_regex": attribute_regex or ".*",
            "files_fetched": len(files),
            "download_dir": str(project_download_dir),
            "total_files": len(all_files),
            "media_files": media_count,
            "total_processed": len(downloaded_files),
        }

        return downloaded_files, download_info

    except Exception as e:
        st.error(f"Error downloading from Neptune: {e}")
        return [], {}

</code_context>

<issue_to_address>
**issue (code-quality):** Don't assign to builtin variable `filter` [×2] ([`avoid-builtin-shadow`](https://docs.sourcery.ai/Reference/Default-Rules/comments/avoid-builtin-shadow/))

<br/><details><summary>Explanation</summary>Python has a number of `builtin` variables: functions and constants that
form a part of the language, such as `list`, `getattr`, and `type`
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:

```python
list = [1, 2, 3]
```
However, this is considered poor practice.
- It will confuse other developers.
- It will confuse syntax highlighters and linters.
- It means you can no longer use that builtin for its original purpose.

How can you solve this?

Rename the variable something more specific, such as `integers`.
In a pinch, `my_list` and similar names are colloquially-recognized
placeholders.
Python has a number of `builtin` variables: functions and constants that
form a part of the language, such as `list`, `getattr`, and `type`
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:

```python
list = [1, 2, 3]
```
However, this is considered poor practice.
- It will confuse other developers.
- It will confuse syntax highlighters and linters.
- It means you can no longer use that builtin for its original purpose.

How can you solve this?

Rename the variable something more specific, such as `integers`.
In a pinch, `my_list` and similar names are colloquially-recognized
placeholders.</details>
</issue_to_address>

### Comment 9
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:438` </location>
<code_context>
def main():
    st.title("Neptune File Comparison App")
    st.text("Visualize and compare media file series across different Neptune experiments")

    # Project configuration in expandable container
    st.sidebar.markdown(f"**Version:** {__version__}")

    if not NEPTUNE_AVAILABLE:
        st.error("Neptune Query not available. Install using `pip install -U neptune-query`")
        st.stop()

    with st.sidebar.expander("Neptune Configuration", icon=":material/settings:", expanded=True):
        # Neptune API token
        _neptune_api_token = st.session_state.get("neptune_api_token") or os.getenv(
            "NEPTUNE_API_TOKEN"
        )
        neptune_api_token = st.text_input(
            "Neptune API Token",
            value=_neptune_api_token,
            placeholder="your_api_token",
            type="password",
            help="Defaults to `NEPTUNE_API_TOKEN` environment variable",
            icon=":material/password:",
        )
        if neptune_api_token:
            os.environ["NEPTUNE_API_TOKEN"] = neptune_api_token

        # Neptune project
        _neptune_project = st.session_state.get("neptune_project") or os.getenv("NEPTUNE_PROJECT")
        neptune_project = st.text_input(
            "Neptune Project",
            value=_neptune_project,
            placeholder="workspace_name/project_name",
            help="In the format `workspace_name/project_name`. Defaults to `NEPTUNE_PROJECT` environment variable.",
            icon=":material/folder:",
        )
        st.session_state.neptune_project = neptune_project

    with st.sidebar.expander("Download Configuration", icon=":material/tune:", expanded=True):
        # Download directory
        download_directory = st.text_input(
            "Download Directory",
            value=st.session_state.get("download_directory", "neptune_downloads"),
            help="Directory to download Neptune files to. Defaults to `neptune_downloads` in the current working directory.",
            icon=":material/folder:",
        )
        st.session_state.download_directory = download_directory

        # Experiment regex (required field)
        # TODO: Support passing a list of experiment names
        experiment_regex = st.text_input(
            "Experiments Regex",
            value=st.session_state.get("experiment_regex", ""),
            help="Regex specifying the experiments names to download from",
            placeholder="exp_.*",
            icon=":material/search:",
        )
        st.session_state.experiment_regex = experiment_regex

        include_archived = st.toggle("Include archived experiments", value=False)

        # Validate experiment regex is valid
        if not experiment_regex or not experiment_regex.strip():
            st.error(
                "Experiment regex is required. Please enter a pattern to match experiment names.",
                icon=":material/warning:",
            )
            experiment_regex_valid = False
        elif experiment_regex.strip() == ".*":
            st.warning(
                "Experiment regex is set to `.*`. This will download all experiments from the project.",
                icon=":material/warning:",
            )
            experiment_regex_valid = True
        else:
            experiment_regex_valid = True

        # Attribute regex
        attribute_regex = st.text_input(
            "Attribute Regex",
            value=st.session_state.get("attribute_regex"),
            help="Regex pattern to match file attribute names. Defaults to `None` (all attributes)",
            placeholder="image_.*",
            icon=":material/search:",
        )
        st.session_state.attribute_regex = attribute_regex

        if st.button(
            "Clear cache",
            icon=":material/delete:",
            width="stretch",
            help="Clear the cache to fetch latest files",
        ):
            st.cache_data.clear()
            st.rerun()

        if st.button(
            "Download and Visualize", icon=":material/download:", width="stretch", type="primary"
        ):
            # Check if experiment regex is valid before proceeding
            if not experiment_regex_valid or not experiment_regex or not experiment_regex.strip():
                st.error("Cannot proceed: Experiment regex is required!", icon=":material/error:")
                st.stop()

            with st.spinner("Downloading files from Neptune...", show_time=True):
                files, download_info = download_neptune_files(
                    neptune_project,
                    experiment_regex,
                    attribute_regex,
                    download_directory,
                    include_archived,
                )
                st.session_state.files = files
                st.session_state.download_info = download_info
                st.session_state.directory_scanned = True

            # Show success/warning message
            if files:
                st.success(f"Successfully downloaded {len(files)} files", icon=":material/check:")
            else:
                st.warning("No files were downloaded. Check your project name and regex patterns.")

        # Show download details in expander if available
        if "download_info" in st.session_state and st.session_state.download_info:
            with st.sidebar.expander(
                "Download Details", icon=":material/download:", expanded=False
            ):
                info = st.session_state.download_info
                st.write(f"**Project:** {info.get('project_name', 'N/A')}")
                with st.expander(
                    f"Experiments Found: **{len(info['experiments'])}**", icon=":material/science:"
                ):
                    for experiment in info["experiments"]:
                        st.write(experiment)
                st.write(f"**Attribute Regex:** `{info['attribute_regex']}`")
                st.write(f"**Download Directory:** {info['download_dir']}")
                st.write(f"**Total Files:** {info['total_files']}")
                st.write(f"**Files Processed:** {info['total_processed']}")
                st.write(f"**Media Files:** {info['media_files']}")

    # Gallery view options
    if "files" in st.session_state and st.session_state.files:
        # Apply regex filters to get experiments and media files

        experiment_pattern = st.session_state.get("experiment_regex", ".*")
        attribute_pattern = st.session_state.get("attribute_regex", ".*") or ".*"

        # Filter files by media type and regex patterns
        filtered_files = []
        for file_info in st.session_state.files:
            # Check if it's a media file
            if file_info.get("is_media", False):
                # Check experiment regex (folder name)
                path_parts = Path(file_info["relative_path"]).parts
                if len(path_parts) > 1:
                    experiment_name = path_parts[0]
                    if re.search(experiment_pattern, experiment_name) and re.search(
                        attribute_pattern, file_info["name"]
                    ):
                        filtered_files.append(file_info)

        # Get unique experiments from filtered files
        top_level_folders = set()
        for file_info in filtered_files:
            path_parts = Path(file_info["relative_path"]).parts
            if len(path_parts) > 1:
                top_level_folders.add(path_parts[0])

        # Create individual toggles for each experiment
        folder_toggles = {}
        if top_level_folders:
            with st.sidebar.expander(
                "Select experiments to view",
                icon=":material/visibility:",
                expanded=True,
            ):
                # Add select all / deselect all buttons
                col1, col2 = st.columns(2)
                with col1:
                    if st.button("Select All", icon=":material/check_box:", width="stretch"):
                        for folder in top_level_folders:
                            st.session_state[f"folder_toggle_{folder}"] = True
                        st.rerun()

                with col2:
                    if st.button(
                        "Deselect All", icon=":material/check_box_outline_blank:", width="stretch"
                    ):
                        for folder in top_level_folders:
                            st.session_state[f"folder_toggle_{folder}"] = False
                        st.rerun()

                for folder in sorted(top_level_folders):
                    folder_toggles[folder] = st.checkbox(
                        folder,
                        value=True,  # Default to showing all folders
                        key=f"folder_toggle_{folder}",
                    )

        # Gallery layout controls
        st.sidebar.subheader("📄 Gallery Layout")

        # Layout orientation
        layout_orientation = st.sidebar.segmented_control(
            "Column headers", options=["Steps", "Experiments"], default="Steps", width="stretch"
        )

        # Pagination controls
        columns_per_page = st.sidebar.slider(
            "Columns per page",
            min_value=1,
            max_value=10,
            value=5,
            help="Number of columns to show at once",
        )

        # Image sizing controls
        # st.sidebar.subheader("🖼️ Media Display")
        # consistent_sizing = st.sidebar.checkbox(
        #     "Consistent media size",
        #     value=False,
        #     help="Resize all images to the same dimensions for easier comparison",
        # )

        # if consistent_sizing:
        #     image_width = st.sidebar.slider(
        #         "Media width (pixels)", min_value=100, max_value=500, value=200, step=10
        #     )
        #     image_height = st.sidebar.slider(
        #         "Media height (pixels)", min_value=100, max_value=500, value=200, step=10
        #     )
        #     consistent_size = (image_width, image_height)
        # else:
        #     consistent_size = None

        st.session_state.filtered_files = filtered_files
        st.session_state.folder_toggles = folder_toggles
        st.session_state.images_per_page = columns_per_page
        # st.session_state.consistent_size = consistent_size
        st.session_state.layout_orientation = layout_orientation

    # Main content area
    if "files" not in st.session_state or not st.session_state.files:
        st.info(
            "Configure the download and click 'Download and Visualize' to get started",
            icon=":material/arrow_circle_left:",
        )
        return

    # Image comparison gallery

    filtered_df = create_file_statistics(
        st.session_state.get("filtered_files", st.session_state.files)
    )

    if not filtered_df.empty:
        # Grid gallery: rows = experiments, columns = steps
        media_files = [f for f in filtered_df.itertuples() if f.is_media]

        if media_files:
            st.subheader("Comparison Grid")

            # Get folder toggles, pagination settings, and image sizing
            folder_toggles = st.session_state.get("folder_toggles", {})
            columns_per_page = st.session_state.get("images_per_page", 3)
            consistent_size = st.session_state.get("consistent_size", None)
            layout_orientation = st.session_state.get("layout_orientation", "Steps")

            # Extract step number from filename or path
            def extract_step_number(file_info):
                try:
                    # Try to extract step from filename first
                    match = re.search(r"step_(\d+)", file_info.name)
                    if match:
                        return int(match.group(1))

                    # Try to extract step from path (for Neptune downloads)
                    match = re.search(r"step[_-]?(\d+)", file_info.path)
                    if match:
                        return int(match.group(1))

                    # Try to extract step from relative path
                    match = re.search(r"step[_-]?(\d+)", file_info.relative_path)
                    if match:
                        return int(match.group(1))

                    # If no step found, try to extract any number from filename
                    match = re.search(r"(\d+)", file_info.name)
                    if match:
                        return int(match.group(1))

                    return 0
                except:
                    return 0

            # Organize images by folder and step
            folder_step_grid = {}
            all_steps = set()

            for file_info in media_files:
                # Get first level folder from relative path
                path_parts = Path(file_info.relative_path).parts
                if len(path_parts) > 1:  # Only include actual folders, not root files
                    folder_name = path_parts[0]

                    # Only include if folder is enabled
                    if folder_toggles.get(folder_name, True):
                        step_num = extract_step_number(file_info)
                        all_steps.add(step_num)

                        if folder_name not in folder_step_grid:
                            folder_step_grid[folder_name] = {}
                        folder_step_grid[folder_name][step_num] = file_info

            if not folder_step_grid:
                st.info(
                    "No experiments are selected to display. Use the sidebar toggles to select which experiments to show in the gallery.",
                    icon=":material/arrow_circle_left:",
                )
                return

            # Sort steps and folders
            sorted_steps = sorted(all_steps)
            sorted_folders = sorted(folder_step_grid.keys())

            # Calculate column-based navigation
            if layout_orientation == "Steps":
                # When steps are columns, paginate through steps
                total_columns = len(sorted_steps)
                column_items = sorted_steps
                column_type = "steps"
            else:
                # When experiments are columns, paginate through experiments
                total_columns = len(sorted_folders)
                column_items = sorted_folders
                column_type = "experiments"

            # Initialize current column index in session state
            if "current_column_index" not in st.session_state:
                st.session_state.current_column_index = 0

            # Pagination controls
            col1, col2, col3, col4, col5 = st.columns([1, 1, 2, 1, 1])

            # Calculate current column range
            current_column_index = st.session_state.current_column_index
            current_columns = column_items[
                current_column_index : current_column_index + columns_per_page
            ]

            with col1:
                if st.button(
                    "First",
                    disabled=current_column_index == 0,
                    icon=":material/first_page:",
                    width="stretch",
                ):
                    st.session_state.current_column_index = 0
                    st.rerun()

            with col2:
                if st.button(
                    "Previous",
                    disabled=current_column_index == 0,
                    icon=":material/arrow_back:",
                    width="stretch",
                ):
                    # Move back by 1 column
                    st.session_state.current_column_index = max(0, current_column_index - 1)
                    st.rerun()

            with col3:
                if current_columns:
                    first_col = current_columns[0]
                    last_col = current_columns[-1]
                    st.write(
                        f"**{column_type.title()} {first_col} to {last_col}** ({total_columns} total {column_type})"
                    )
                else:
                    st.write(
                        f"**No {column_type} available** ({total_columns} total {column_type})"
                    )

            with col4:
                # Check if we can move forward by 1 column
                can_move_next = current_column_index + 1 + columns_per_page <= total_columns
                if st.button(
                    "Next",
                    disabled=not can_move_next,
                    icon=":material/arrow_forward:",
                    width="stretch",
                ):
                    # Move forward by 1 column
                    st.session_state.current_column_index = min(
                        total_columns - columns_per_page, current_column_index + 1
                    )
                    st.rerun()

            with col5:
                # Check if we're at the last possible position
                is_at_last = current_column_index + columns_per_page >= total_columns
                if st.button(
                    "Last", disabled=is_at_last, icon=":material/last_page:", width="stretch"
                ):
                    # Move to the last possible position
                    st.session_state.current_column_index = max(0, total_columns - columns_per_page)
                    st.rerun()

            # Get current columns based on column index
            current_columns = column_items[
                current_column_index : current_column_index + columns_per_page
            ]

            # Add column scrubber slider
            if column_items:
                # Create slider using actual column values
                current_first_col = current_columns[0] if current_columns else column_items[0]

                # Create slider for column selection using selectbox for discrete values
                selected_col = st.select_slider(
                    f"Jump to {column_type[:-1]}",
                    options=column_items,
                    value=current_first_col,
                    help=f"Use this slider to quickly jump to any {column_type[:-1]} in the series",
                )

                # Update current column index if slider value changed
                if selected_col in column_items:
                    new_index = column_items.index(selected_col)
                    if new_index != current_column_index:
                        st.session_state.current_column_index = new_index
                        st.rerun()

            if layout_orientation == "Steps":
                # Original layout: experiments as rows, steps as columns
                # Calculate optimal column widths with smart size limiting
                if sorted_folders:
                    # Find the longest experiment name
                    max_name_length = max(len(folder) for folder in sorted_folders)
                    # Add padding and convert to relative width (experiment names are typically 10-30 chars)
                    experiment_col_width = min(
                        max(max_name_length * 0.8, 12), 25
                    )  # Between 12 and 25

                    # Calculate available width for step columns
                    available_width = 100 - experiment_col_width

                    # Calculate step column width - each column can be smaller when more columns are added
                    step_col_width = available_width / len(current_columns)

                    # Smart size limiting: prevent any single image from being too large
                    # Reference size: 3 experiments, 4 files per page, but with smaller individual images
                    reference_experiment_width = 0  # Typical experiment column width
                    reference_available_width = 100 - reference_experiment_width
                    reference_step_width = reference_available_width / 4  # 4 files per page
                    # Reduce the maximum to 60% of the reference size for more reasonable single image size
                    max_step_width = (
                        reference_step_width * 1
                    )  # This is our maximum allowed step width

                    # Apply the size limit
                    if step_col_width > max_step_width:
                        step_col_width = max_step_width

                    # Create column configuration
                    col_config = [experiment_col_width] + [step_col_width] * len(current_columns)
                else:
                    col_config = [1] * (len(current_columns) + 1)

                # Create grid header with step numbers
                header_cols = st.columns(col_config)
                with header_cols[0]:
                    st.write("**Experiment**")
                for idx, step in enumerate(current_columns):
                    with header_cols[idx + 1]:
                        st.write(f"**Step {step}**")

                # Create grid rows (one per experiment)
                for folder_name in sorted_folders:
                    row_cols = st.columns(col_config)

                    # Experiment name in first column
                    with row_cols[0]:
                        st.write(folder_name)

                    # Images for each step in remaining columns
                    for idx, step in enumerate(current_columns):
                        with row_cols[idx + 1]:
                            if step in folder_step_grid[folder_name]:
                                file_info = folder_step_grid[folder_name][step]
                                try:
                                    # Check if it's a video file by extension
                                    file_extension = Path(file_info.path).suffix.lower()
                                    is_video = file_extension in _SUPPORTED_VIDEO_EXTENSIONS

                                    if is_video:
                                        # Display video using st.video
                                        st.video(file_info.path)
                                    else:
                                        # # Display image using PIL
                                        # image = Image.open(file_info.path)

                                        # # Apply consistent sizing if enabled
                                        # if consistent_size:
                                        #     image = image.resize(
                                        #         consistent_size, Image.Resampling.LANCZOS
                                        #     )

                                        # Display image
                                        st.image(file_info.path, width="stretch")
                                except Exception as e:
                                    st.error(f"Error loading {file_info.name}: {e}")
                            else:
                                st.write("—")  # No image for this step

            else:  # Experiments as Columns
                # New layout: steps as rows, experiments as columns
                # Calculate optimal column widths
                if sorted_folders:
                    # Find the longest experiment name
                    max_name_length = max(len(folder) for folder in sorted_folders)
                    # Add padding and convert to relative width
                    experiment_col_width = min(max(max_name_length * 0.8, 12), 25)

                    # Calculate available width for experiment columns
                    available_width = 100 - 15  # Reserve 15% for step labels

                    # Calculate experiment column width
                    experiment_col_width = available_width / len(current_columns)

                    # Apply smart size limiting for experiments too
                    max_experiment_width = 25  # Maximum 25% per experiment
                    if experiment_col_width > max_experiment_width:
                        experiment_col_width = max_experiment_width

                    # Create column configuration
                    col_config = [15] + [experiment_col_width] * len(current_columns)
                else:
                    col_config = [1] * (len(current_columns) + 1)

                # Create grid header with experiment names
                header_cols = st.columns(col_config)
                with header_cols[0]:
                    st.write("**Step**")
                for idx, folder_name in enumerate(current_columns):
                    with header_cols[idx + 1]:
                        st.write(f"**{folder_name}**")

                # Create grid rows (one per step)
                for step in sorted_steps:
                    row_cols = st.columns(col_config)

                    # Step number in first column
                    with row_cols[0]:
                        st.write(f"**{step}**")

                    # Images for each experiment in remaining columns
                    for idx, folder_name in enumerate(current_columns):
                        with row_cols[idx + 1]:
                            if step in folder_step_grid[folder_name]:
                                file_info = folder_step_grid[folder_name][step]
                                try:
                                    # Check if it's a video file by extension
                                    file_extension = Path(file_info.path).suffix.lower()
                                    is_video = file_extension in _SUPPORTED_VIDEO_EXTENSIONS

                                    if is_video:
                                        # Display video using st.video
                                        st.video(file_info.path)
                                    else:
                                        # Display image using PIL
                                        # image = Image.open(file_info.path)

                                        # # Apply consistent sizing if enabled
                                        # if consistent_size:
                                        #     image = Image.open(file_info.path).resize(
                                        #         consistent_size, Image.Resampling.LANCZOS
                                        #     )

                                        # Display image
                                        st.image(file_info.path, width="stretch")
                                except Exception as e:
                                    st.error(f"Error loading {file_info.name}: {e}")
                            else:
                                st.write("—")  # No image for this step

            # Show column range info
            if current_columns:
                st.info(f"Showing {column_type} {current_columns[0]} to {current_columns[-1]}")

        else:
            st.info("No media files found matching the current filters")

    else:
        st.warning("No media files found", icon=":material/info:")

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
- Use `except Exception:` rather than bare `except:` ([`do-not-use-bare-except`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/do-not-use-bare-except/))
- Replace m.group(x) with m[x] for re.Match objects ([`use-getitem-for-re-match-groups`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/use-getitem-for-re-match-groups/))
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

utils/visualization_tools/file_comparison_app/file_comparison_app.py

utils/visualization_tools/file_comparison_app/README.md

utils/visualization_tools/file_comparison_app/file_comparison_app.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

utils/visualization_tools/file_comparison_app/file_comparison_app.py

…le comparison app

README.md

utils/visualization_tools/file_comparison_app/README.md

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> Signed-off-by: Leo Breedt <[email protected]>

Co-authored-by: Sabine Ståhlberg <[email protected]> Signed-off-by: Leo Breedt <[email protected]>

LeoRoccoBreedt added 6 commits October 6, 2025 19:55

feat: utility app to fetch and visualize files across steps and exper…

16d9207

…iments

chore: add readme with changelog

420285f

small updates to UI

fdb2058

add functionality to toggle between steps as rows or colums

1a833f0

refactor: default location storage to project_workspace

4e5c568

add support for video files using steamlits native video support

b5b215c

chore: update readme

3edf0c7

SiddhantSadangi assigned LeoRoccoBreedt Oct 8, 2025

SiddhantSadangi added the enhancement New feature or request label Oct 8, 2025

LeoRoccoBreedt added 10 commits October 8, 2025 13:34

chore: update favicon

d8cb859

chore: address some comments

be98514

chore: removed unused function for getting emoji

9e783c1

chore: removed unused code for selecting emoji

ea71818

chore: remove code for displaying an image preview

296f8c0

chore: removed unused code and imports and fixed indentation errors

127aa80

refactor: remove redundant experiment filtering

20acd61

add File configuration into its own dropdown

80cb8be

refactor: media displau copy

2acbadf

refactor: use a toggle switch for the gallery orientation

9494318

github-advanced-security bot found potential problems Oct 8, 2025

View reviewed changes

utils/visualization_tools/file_analyzer/file_analyzer_app.py Fixed Show fixed Hide fixed

utils/visualization_tools/file_analyzer/file_analyzer_app.py Fixed Show fixed Hide fixed

LeoRoccoBreedt and others added 2 commits October 8, 2025 14:44

refactor: code change suggestions from cursor

5b4229e

WIP: bump versions, rename utility, update repo README, add API token…

1c5bb71

… input, read env variables

github-advanced-security bot found potential problems Oct 8, 2025

View reviewed changes

utils/visualization_tools/file_comparer/file_comparer_app.py Fixed Show fixed Hide fixed

utils/visualization_tools/file_comparer/file_comparer_app.py Fixed Show fixed Hide fixed

LeoRoccoBreedt added 6 commits October 9, 2025 10:37

update regex pattern matching logic for substrings

62e8a1e

add logic to ensure experiment regex is complete

12d3aeb

add caching for stremalit to assist downloading many files

acba100

fix: deprecation warning

21d96e5

add app favicon

0a0284b

remove: old naming of app

4810967

updates for release

2f6738c

SiddhantSadangi marked this pull request as ready for review October 9, 2025 15:51

github-advanced-security bot found potential problems Oct 9, 2025

View reviewed changes

utils/visualization_tools/file_comparison_app/file_comparison_app.py Dismissed Show dismissed Hide dismissed

utils/visualization_tools/file_comparison_app/file_comparison_app.py Dismissed Show dismissed Hide dismissed

sourcery-ai bot reviewed Oct 9, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

chatgpt-codex-connector bot reviewed Oct 9, 2025

View reviewed changes

utils/visualization_tools/file_comparison_app/file_comparison_app.py Show resolved Hide resolved

update: add file types to .gitignore and improve error handling in fi…

1ff5cd1

…le comparison app

SiddhantSadangi requested review from a team October 9, 2025 16:10

normandy7 reviewed Oct 10, 2025

View reviewed changes

LeoRoccoBreedt and others added 4 commits October 10, 2025 09:44

Update utils/visualization_tools/file_comparison_app/README.md

e185161

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> Signed-off-by: Leo Breedt <[email protected]>

Update README.md

9441403

Co-authored-by: Sabine Ståhlberg <[email protected]> Signed-off-by: Leo Breedt <[email protected]>

Apply suggestions from code review

a904a14

Co-authored-by: Sabine Ståhlberg <[email protected]> Signed-off-by: Leo Breedt <[email protected]>

Update README.md and overflow menu items

58a4c95

LeoRoccoBreedt requested a review from SiddhantSadangi October 10, 2025 08:24

SiddhantSadangi approved these changes Oct 10, 2025

View reviewed changes

normandy7 approved these changes Oct 10, 2025

View reviewed changes

LeoRoccoBreedt merged commit 9292bd5 into main Oct 10, 2025
8 checks passed

LeoRoccoBreedt deleted the lb/file_analyzer branch October 10, 2025 08:39

Utility web app to analyze & compare files (images and videos) logged to Neptune #32

Utility web app to analyze & compare files (images and videos) logged to Neptune #32

Uh oh!

Conversation

LeoRoccoBreedt commented Oct 7, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

❔ This change

✔️ Pre-merge checklist

🧪 Test Configuration

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LeoRoccoBreedt commented Oct 7, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 7, 2025 •

edited

Loading