-
Notifications
You must be signed in to change notification settings - Fork 3
Utility web app to analyze & compare files (images and videos) logged to Neptune #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideThis PR introduces a new Streamlit-based utility for downloading, visualizing, and comparing media files (images and videos) from local folders or Neptune experiments, and updates the project documentation to include this new tool. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
utils/visualization_tools/file_comparison_app/file_comparison_app.py
Dismissed
Show dismissed
Hide dismissed
utils/visualization_tools/file_comparison_app/file_comparison_app.py
Dismissed
Show dismissed
Hide dismissed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The Streamlit app file is very large and mixes UI, data fetching, and utilities—consider refactoring into separate modules (e.g., data layer, UI components) to improve maintainability and readability.
- The download_neptune_files function currently pulls all file types then filters locally; adding an extension or attribute filter before download could reduce bandwidth and speed up processing.
- Rendering large galleries and downloading many files may block the UI—consider adding explicit progress indicators or lazy loading to improve responsiveness on big datasets.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The Streamlit app file is very large and mixes UI, data fetching, and utilities—consider refactoring into separate modules (e.g., data layer, UI components) to improve maintainability and readability.
- The download_neptune_files function currently pulls all file types then filters locally; adding an extension or attribute filter before download could reduce bandwidth and speed up processing.
- Rendering large galleries and downloading many files may block the UI—consider adding explicit progress indicators or lazy loading to improve responsiveness on big datasets.
## Individual Comments
### Comment 1
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:179-188` </location>
<code_context>
+ with st.sidebar.expander("Neptune Configuration", icon=":material/settings:", expanded=True):
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Session state is updated for neptune_project but not for neptune_api_token.
Please update st.session_state.neptune_api_token when the user enters a new token to ensure consistency.
</issue_to_address>
### Comment 2
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:245-252` </location>
<code_context>
+ experiment_regex_valid = True
+
+ # Attribute regex
+ attribute_regex = st.text_input(
+ "Attribute Regex",
+ value=st.session_state.get("attribute_regex"),
</code_context>
<issue_to_address>
**suggestion:** Default value for attribute_regex may be None, which could cause issues.
If the value is None, the text input will display None instead of a usable default. Use an empty string or a default pattern to improve user experience.
```suggestion
# Attribute regex
attribute_regex = st.text_input(
"Attribute Regex",
value=st.session_state.get("attribute_regex") or "",
help="Regex pattern to match file attribute names. Defaults to `None` (all attributes)",
placeholder="image_.*",
icon=":material/search:",
)
```
</issue_to_address>
### Comment 3
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:267-270` </location>
<code_context>
+ "Download and Visualize", icon=":material/download:", width="stretch", type="primary"
+ ):
+ # Check if experiment regex is valid before proceeding
+ if not experiment_regex_valid or not experiment_regex or not experiment_regex.strip():
+ st.error("Cannot proceed: Experiment regex is required!", icon=":material/error:")
+ st.stop()
</code_context>
<issue_to_address>
**suggestion:** Redundant experiment regex validation logic.
Simplify the conditional by relying solely on experiment_regex_valid, as it already covers regex validity.
```suggestion
# Check if experiment regex is valid before proceeding
if not experiment_regex_valid:
st.error("Cannot proceed: Experiment regex is required!", icon=":material/error:")
st.stop()
```
</issue_to_address>
### Comment 4
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:316-327` </location>
<code_context>
+
+ # Filter files by media type and regex patterns
+ filtered_files = []
+ for file_info in st.session_state.files:
+ # Check if it's a media file
+ if file_info.get("is_media", False):
</code_context>
<issue_to_address>
**suggestion:** Filtering logic may skip files with missing or malformed relative_path.
Add a check to handle cases where relative_path is missing or malformed to prevent errors and ensure valid files are not skipped.
```suggestion
filtered_files = []
for file_info in st.session_state.files:
# Check if it's a media file
if file_info.get("is_media", False):
# Check for valid relative_path
relative_path = file_info.get("relative_path")
if (
isinstance(relative_path, str)
and relative_path.strip() != ""
):
try:
path_parts = Path(relative_path).parts
if len(path_parts) > 1:
experiment_name = path_parts[0]
if re.search(experiment_pattern, experiment_name) and re.search(
attribute_pattern, file_info["name"]
):
filtered_files.append(file_info)
except Exception:
# Optionally log or handle malformed path
continue
else:
# Optionally log or handle missing/invalid relative_path
continue
```
</issue_to_address>
### Comment 5
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:512-514` </location>
<code_context>
+ # Pagination controls
+ col1, col2, col3, col4, col5 = st.columns([1, 1, 2, 1, 1])
+
+ # Calculate current column range
+ current_column_index = st.session_state.current_column_index
+ current_columns = column_items[
+ current_column_index : current_column_index + columns_per_page
+ ]
</code_context>
<issue_to_address>
**issue:** Pagination logic may result in empty current_columns if index is out of bounds.
Add a check to ensure current_column_index does not exceed the length of column_items to prevent empty current_columns and potential UI issues.
</issue_to_address>
### Comment 6
<location> `utils/visualization_tools/file_comparison_app/README.md:66` </location>
<code_context>
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and limitations under the License.
+
+[Github issues]: https://github.com/neptune-ai/scale-examples/issues/new
+[Support center]: https://support.neptune.ai/
</code_context>
<issue_to_address>
**suggestion (typo):** Typo: 'Github' should be 'GitHub' for consistency.
Please update 'Github issues' to 'GitHub issues' for consistent spelling.
```suggestion
[GitHub issues]: https://github.com/neptune-ai/scale-examples/issues/new
```
</issue_to_address>
### Comment 7
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:1` </location>
<code_context>
+import os
+import re
+from pathlib import Path
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring the code into focused modules for utilities, Neptune downloading, and gallery rendering to reduce duplication and nested branches.
Here are a few low-risk refactorings that will shave this 1,700 line file down into focused modules, remove almost all duplication and nested branches, and keep every feature exactly as-is.
1) Extract all your simple file utilities into `utils.py`
```python
# utils.py
import os
from pathlib import Path
import pandas as pd
from typing import List, Dict, Any
SUPPORTED_IMG = {".png",".jpg",".jpeg",".gif",".bmp"}
SUPPORTED_VID = {".mp4",".avi",".mov",".mkv",".webm"}
def get_file_size_mb(path: str) -> float:
try:
return os.path.getsize(path) / (1024*1024)
except OSError:
return 0.0
def is_media_file(path: str) -> bool:
return Path(path).suffix.lower() in SUPPORTED_IMG|SUPPORTED_VID
def create_file_statistics(files: List[Dict[str,Any]]) -> pd.DataFrame:
if not files: return pd.DataFrame()
df = pd.DataFrame(files)
df["modified_date"] = pd.to_datetime(df["modified"], unit="s")
return df
```
2) Move **all** your Neptune download/fetch code into `neptune_downloader.py`
```python
# neptune_downloader.py
import re
from pathlib import Path
import streamlit as st
import neptune_query as nq
from neptune_query.filters import Filter
from typing import List, Dict, Any
from utils import get_file_size_mb, is_media_file
@st.cache_data
def download_neptune_files(
project_name:str,
exp_regex:str,
attr_regex:str,
download_dir:str,
include_archived:bool
) -> (List[Dict[str,Any]], Dict[str,Any]):
if not nq:
st.error("…neptune‐query missing…")
return [], {}
# …everything from listing exps through building downloaded_files & download_info…
return downloaded_files, download_info
```
3) Collapse the two huge “Steps vs Experiments” branches into one generic renderer in `gallery.py`
```python
# gallery.py
import re
import streamlit as st
from pathlib import Path
from typing import Dict
def build_folder_step_grid(media_files, folder_toggles):
grid, all_steps = {}, set()
for f in media_files:
parts = Path(f["relative_path"]).parts
if len(parts)>1 and folder_toggles.get(parts[0],True):
step = int(re.search(r"step[_-]?(\d+)", f["name"]+f["relative_path"]).group(1) or 0)
grid.setdefault(parts[0], {})[step] = f
all_steps.add(step)
return grid, sorted(all_steps), sorted(grid)
def render_grid(grid:Dict, steps, folders, layout, cols_per_page):
# pagination + common header/row logic
current = get_current_page(steps if layout=="Steps" else folders, cols_per_page)
headers = ["Experiment" if layout=="Steps" else "Step"] + [
f"{'Step' if layout=='Steps' else ''} {c}" for c in current
]
col_cfg = [15] + [ (100-15)/len(current) ]*len(current)
header_cols = st.columns(col_cfg)
for i,h in enumerate(headers): header_cols[i].write(f"**{h}**")
for row_key in (folders if layout=="Steps" else steps):
row = st.columns(col_cfg)
row[0].write(f"**{row_key}**")
for idx, col_key in enumerate(current):
cell = grid.get(row_key,{}).get(col_key) if layout=="Steps" else grid.get(col_key,{}).get(row_key)
with row[idx+1]:
if cell:
if Path(cell["path"]).suffix.lower() in {".mp4"}:
st.video(cell["path"])
else:
st.image(cell["path"], width="stretch")
else:
st.write("—")
```
4) Finally, your `main.py` collapses to ~100 lines:
```python
# main.py
import streamlit as st
from utils import create_file_statistics
from neptune_downloader import download_neptune_files
from gallery import build_folder_step_grid, render_grid
def main():
# …sidebar inputs…
files, info = download_neptune_files(...)
st.session_state.update(files=files, download_info=info)
filtered = [f for f in files if is_media_file(f["path"]) and <regex filters>]
grid, steps, folders = build_folder_step_grid(filtered, folder_toggles)
stats = create_file_statistics(filtered)
if not stats.empty:
render_grid(grid, steps, folders, layout_orientation, images_per_page)
else:
st.info("No media files…")
if __name__=="__main__":
main()
```
By extracting:
- `utils.py` (pure helpers),
- `neptune_downloader.py` (all Neptune I/O),
- `gallery.py` (single renderer with one branch),
you remove almost all duplication, collapse nesting, and keep every feature intact.
</issue_to_address>
### Comment 8
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:95-97` </location>
<code_context>
@st.cache_data()
def download_neptune_files(
project_name: str,
experiment_regex: str,
attribute_regex: str,
download_dir: str,
include_archived: bool,
) -> List[Dict[str, Any]]:
"""Download files from Neptune and return file information"""
if not NEPTUNE_AVAILABLE:
st.error("Neptune Query is not available. Please install neptune-query package.")
return []
try:
# List experiments
filter = Filter.name(experiment_regex)
if not include_archived:
filter = filter & Filter.eq("sys/archived", False)
exps = nq.list_experiments(project=project_name, experiments=filter)
if not exps:
st.warning(f"No experiments found matching pattern: {experiment_regex}")
return [], {}
# Fetch files from experiments using the attribute regex
files = nq.fetch_series(project=project_name, experiments=exps, attributes=attribute_regex)
# Create project-specific download directory
# Use project name as top-level folder to prevent mixing experiments from different projects
project_download_dir = Path(download_dir) / project_name.replace("/", "_")
project_download_dir.mkdir(parents=True, exist_ok=True)
# Download files to project-specific directory
# TODO: Download only supported file types
nq.download_files(files=files, destination=str(project_download_dir))
# Convert to our file format
downloaded_files = []
# Scan the project-specific download directory for files
# Only include files from folders that match the experiment regex
all_files = []
for item in project_download_dir.iterdir():
if item.is_dir() and re.search(experiment_regex, item.name):
# Add all files from this matching folder
all_files.extend(item.rglob("*.*"))
media_count = 0
for file_path in all_files:
try:
file_info = {
"name": file_path.name,
"path": str(file_path),
"relative_path": str(file_path.relative_to(project_download_dir)),
"size_mb": get_file_size_mb(str(file_path)),
"extension": file_path.suffix.lower(),
"is_media": is_media_file(str(file_path)),
"is_video": is_video_file(str(file_path)),
"modified": file_path.stat().st_mtime,
}
downloaded_files.append(file_info)
if file_info["is_media"]:
media_count += 1
except Exception as e:
st.warning(f"Error processing file {file_path}: {e}")
# Store download info for display in expander
download_info = {
"project_name": project_name,
"experiments": exps,
"attribute_regex": attribute_regex or ".*",
"files_fetched": len(files),
"download_dir": str(project_download_dir),
"total_files": len(all_files),
"media_files": media_count,
"total_processed": len(downloaded_files),
}
return downloaded_files, download_info
except Exception as e:
st.error(f"Error downloading from Neptune: {e}")
return [], {}
</code_context>
<issue_to_address>
**issue (code-quality):** Don't assign to builtin variable `filter` [×2] ([`avoid-builtin-shadow`](https://docs.sourcery.ai/Reference/Default-Rules/comments/avoid-builtin-shadow/))
<br/><details><summary>Explanation</summary>Python has a number of `builtin` variables: functions and constants that
form a part of the language, such as `list`, `getattr`, and `type`
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:
```python
list = [1, 2, 3]
```
However, this is considered poor practice.
- It will confuse other developers.
- It will confuse syntax highlighters and linters.
- It means you can no longer use that builtin for its original purpose.
How can you solve this?
Rename the variable something more specific, such as `integers`.
In a pinch, `my_list` and similar names are colloquially-recognized
placeholders.
Python has a number of `builtin` variables: functions and constants that
form a part of the language, such as `list`, `getattr`, and `type`
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:
```python
list = [1, 2, 3]
```
However, this is considered poor practice.
- It will confuse other developers.
- It will confuse syntax highlighters and linters.
- It means you can no longer use that builtin for its original purpose.
How can you solve this?
Rename the variable something more specific, such as `integers`.
In a pinch, `my_list` and similar names are colloquially-recognized
placeholders.</details>
</issue_to_address>
### Comment 9
<location> `utils/visualization_tools/file_comparison_app/file_comparison_app.py:438` </location>
<code_context>
def main():
st.title("Neptune File Comparison App")
st.text("Visualize and compare media file series across different Neptune experiments")
# Project configuration in expandable container
st.sidebar.markdown(f"**Version:** {__version__}")
if not NEPTUNE_AVAILABLE:
st.error("Neptune Query not available. Install using `pip install -U neptune-query`")
st.stop()
with st.sidebar.expander("Neptune Configuration", icon=":material/settings:", expanded=True):
# Neptune API token
_neptune_api_token = st.session_state.get("neptune_api_token") or os.getenv(
"NEPTUNE_API_TOKEN"
)
neptune_api_token = st.text_input(
"Neptune API Token",
value=_neptune_api_token,
placeholder="your_api_token",
type="password",
help="Defaults to `NEPTUNE_API_TOKEN` environment variable",
icon=":material/password:",
)
if neptune_api_token:
os.environ["NEPTUNE_API_TOKEN"] = neptune_api_token
# Neptune project
_neptune_project = st.session_state.get("neptune_project") or os.getenv("NEPTUNE_PROJECT")
neptune_project = st.text_input(
"Neptune Project",
value=_neptune_project,
placeholder="workspace_name/project_name",
help="In the format `workspace_name/project_name`. Defaults to `NEPTUNE_PROJECT` environment variable.",
icon=":material/folder:",
)
st.session_state.neptune_project = neptune_project
with st.sidebar.expander("Download Configuration", icon=":material/tune:", expanded=True):
# Download directory
download_directory = st.text_input(
"Download Directory",
value=st.session_state.get("download_directory", "neptune_downloads"),
help="Directory to download Neptune files to. Defaults to `neptune_downloads` in the current working directory.",
icon=":material/folder:",
)
st.session_state.download_directory = download_directory
# Experiment regex (required field)
# TODO: Support passing a list of experiment names
experiment_regex = st.text_input(
"Experiments Regex",
value=st.session_state.get("experiment_regex", ""),
help="Regex specifying the experiments names to download from",
placeholder="exp_.*",
icon=":material/search:",
)
st.session_state.experiment_regex = experiment_regex
include_archived = st.toggle("Include archived experiments", value=False)
# Validate experiment regex is valid
if not experiment_regex or not experiment_regex.strip():
st.error(
"Experiment regex is required. Please enter a pattern to match experiment names.",
icon=":material/warning:",
)
experiment_regex_valid = False
elif experiment_regex.strip() == ".*":
st.warning(
"Experiment regex is set to `.*`. This will download all experiments from the project.",
icon=":material/warning:",
)
experiment_regex_valid = True
else:
experiment_regex_valid = True
# Attribute regex
attribute_regex = st.text_input(
"Attribute Regex",
value=st.session_state.get("attribute_regex"),
help="Regex pattern to match file attribute names. Defaults to `None` (all attributes)",
placeholder="image_.*",
icon=":material/search:",
)
st.session_state.attribute_regex = attribute_regex
if st.button(
"Clear cache",
icon=":material/delete:",
width="stretch",
help="Clear the cache to fetch latest files",
):
st.cache_data.clear()
st.rerun()
if st.button(
"Download and Visualize", icon=":material/download:", width="stretch", type="primary"
):
# Check if experiment regex is valid before proceeding
if not experiment_regex_valid or not experiment_regex or not experiment_regex.strip():
st.error("Cannot proceed: Experiment regex is required!", icon=":material/error:")
st.stop()
with st.spinner("Downloading files from Neptune...", show_time=True):
files, download_info = download_neptune_files(
neptune_project,
experiment_regex,
attribute_regex,
download_directory,
include_archived,
)
st.session_state.files = files
st.session_state.download_info = download_info
st.session_state.directory_scanned = True
# Show success/warning message
if files:
st.success(f"Successfully downloaded {len(files)} files", icon=":material/check:")
else:
st.warning("No files were downloaded. Check your project name and regex patterns.")
# Show download details in expander if available
if "download_info" in st.session_state and st.session_state.download_info:
with st.sidebar.expander(
"Download Details", icon=":material/download:", expanded=False
):
info = st.session_state.download_info
st.write(f"**Project:** {info.get('project_name', 'N/A')}")
with st.expander(
f"Experiments Found: **{len(info['experiments'])}**", icon=":material/science:"
):
for experiment in info["experiments"]:
st.write(experiment)
st.write(f"**Attribute Regex:** `{info['attribute_regex']}`")
st.write(f"**Download Directory:** {info['download_dir']}")
st.write(f"**Total Files:** {info['total_files']}")
st.write(f"**Files Processed:** {info['total_processed']}")
st.write(f"**Media Files:** {info['media_files']}")
# Gallery view options
if "files" in st.session_state and st.session_state.files:
# Apply regex filters to get experiments and media files
experiment_pattern = st.session_state.get("experiment_regex", ".*")
attribute_pattern = st.session_state.get("attribute_regex", ".*") or ".*"
# Filter files by media type and regex patterns
filtered_files = []
for file_info in st.session_state.files:
# Check if it's a media file
if file_info.get("is_media", False):
# Check experiment regex (folder name)
path_parts = Path(file_info["relative_path"]).parts
if len(path_parts) > 1:
experiment_name = path_parts[0]
if re.search(experiment_pattern, experiment_name) and re.search(
attribute_pattern, file_info["name"]
):
filtered_files.append(file_info)
# Get unique experiments from filtered files
top_level_folders = set()
for file_info in filtered_files:
path_parts = Path(file_info["relative_path"]).parts
if len(path_parts) > 1:
top_level_folders.add(path_parts[0])
# Create individual toggles for each experiment
folder_toggles = {}
if top_level_folders:
with st.sidebar.expander(
"Select experiments to view",
icon=":material/visibility:",
expanded=True,
):
# Add select all / deselect all buttons
col1, col2 = st.columns(2)
with col1:
if st.button("Select All", icon=":material/check_box:", width="stretch"):
for folder in top_level_folders:
st.session_state[f"folder_toggle_{folder}"] = True
st.rerun()
with col2:
if st.button(
"Deselect All", icon=":material/check_box_outline_blank:", width="stretch"
):
for folder in top_level_folders:
st.session_state[f"folder_toggle_{folder}"] = False
st.rerun()
for folder in sorted(top_level_folders):
folder_toggles[folder] = st.checkbox(
folder,
value=True, # Default to showing all folders
key=f"folder_toggle_{folder}",
)
# Gallery layout controls
st.sidebar.subheader("📄 Gallery Layout")
# Layout orientation
layout_orientation = st.sidebar.segmented_control(
"Column headers", options=["Steps", "Experiments"], default="Steps", width="stretch"
)
# Pagination controls
columns_per_page = st.sidebar.slider(
"Columns per page",
min_value=1,
max_value=10,
value=5,
help="Number of columns to show at once",
)
# Image sizing controls
# st.sidebar.subheader("🖼️ Media Display")
# consistent_sizing = st.sidebar.checkbox(
# "Consistent media size",
# value=False,
# help="Resize all images to the same dimensions for easier comparison",
# )
# if consistent_sizing:
# image_width = st.sidebar.slider(
# "Media width (pixels)", min_value=100, max_value=500, value=200, step=10
# )
# image_height = st.sidebar.slider(
# "Media height (pixels)", min_value=100, max_value=500, value=200, step=10
# )
# consistent_size = (image_width, image_height)
# else:
# consistent_size = None
st.session_state.filtered_files = filtered_files
st.session_state.folder_toggles = folder_toggles
st.session_state.images_per_page = columns_per_page
# st.session_state.consistent_size = consistent_size
st.session_state.layout_orientation = layout_orientation
# Main content area
if "files" not in st.session_state or not st.session_state.files:
st.info(
"Configure the download and click 'Download and Visualize' to get started",
icon=":material/arrow_circle_left:",
)
return
# Image comparison gallery
filtered_df = create_file_statistics(
st.session_state.get("filtered_files", st.session_state.files)
)
if not filtered_df.empty:
# Grid gallery: rows = experiments, columns = steps
media_files = [f for f in filtered_df.itertuples() if f.is_media]
if media_files:
st.subheader("Comparison Grid")
# Get folder toggles, pagination settings, and image sizing
folder_toggles = st.session_state.get("folder_toggles", {})
columns_per_page = st.session_state.get("images_per_page", 3)
consistent_size = st.session_state.get("consistent_size", None)
layout_orientation = st.session_state.get("layout_orientation", "Steps")
# Extract step number from filename or path
def extract_step_number(file_info):
try:
# Try to extract step from filename first
match = re.search(r"step_(\d+)", file_info.name)
if match:
return int(match.group(1))
# Try to extract step from path (for Neptune downloads)
match = re.search(r"step[_-]?(\d+)", file_info.path)
if match:
return int(match.group(1))
# Try to extract step from relative path
match = re.search(r"step[_-]?(\d+)", file_info.relative_path)
if match:
return int(match.group(1))
# If no step found, try to extract any number from filename
match = re.search(r"(\d+)", file_info.name)
if match:
return int(match.group(1))
return 0
except:
return 0
# Organize images by folder and step
folder_step_grid = {}
all_steps = set()
for file_info in media_files:
# Get first level folder from relative path
path_parts = Path(file_info.relative_path).parts
if len(path_parts) > 1: # Only include actual folders, not root files
folder_name = path_parts[0]
# Only include if folder is enabled
if folder_toggles.get(folder_name, True):
step_num = extract_step_number(file_info)
all_steps.add(step_num)
if folder_name not in folder_step_grid:
folder_step_grid[folder_name] = {}
folder_step_grid[folder_name][step_num] = file_info
if not folder_step_grid:
st.info(
"No experiments are selected to display. Use the sidebar toggles to select which experiments to show in the gallery.",
icon=":material/arrow_circle_left:",
)
return
# Sort steps and folders
sorted_steps = sorted(all_steps)
sorted_folders = sorted(folder_step_grid.keys())
# Calculate column-based navigation
if layout_orientation == "Steps":
# When steps are columns, paginate through steps
total_columns = len(sorted_steps)
column_items = sorted_steps
column_type = "steps"
else:
# When experiments are columns, paginate through experiments
total_columns = len(sorted_folders)
column_items = sorted_folders
column_type = "experiments"
# Initialize current column index in session state
if "current_column_index" not in st.session_state:
st.session_state.current_column_index = 0
# Pagination controls
col1, col2, col3, col4, col5 = st.columns([1, 1, 2, 1, 1])
# Calculate current column range
current_column_index = st.session_state.current_column_index
current_columns = column_items[
current_column_index : current_column_index + columns_per_page
]
with col1:
if st.button(
"First",
disabled=current_column_index == 0,
icon=":material/first_page:",
width="stretch",
):
st.session_state.current_column_index = 0
st.rerun()
with col2:
if st.button(
"Previous",
disabled=current_column_index == 0,
icon=":material/arrow_back:",
width="stretch",
):
# Move back by 1 column
st.session_state.current_column_index = max(0, current_column_index - 1)
st.rerun()
with col3:
if current_columns:
first_col = current_columns[0]
last_col = current_columns[-1]
st.write(
f"**{column_type.title()} {first_col} to {last_col}** ({total_columns} total {column_type})"
)
else:
st.write(
f"**No {column_type} available** ({total_columns} total {column_type})"
)
with col4:
# Check if we can move forward by 1 column
can_move_next = current_column_index + 1 + columns_per_page <= total_columns
if st.button(
"Next",
disabled=not can_move_next,
icon=":material/arrow_forward:",
width="stretch",
):
# Move forward by 1 column
st.session_state.current_column_index = min(
total_columns - columns_per_page, current_column_index + 1
)
st.rerun()
with col5:
# Check if we're at the last possible position
is_at_last = current_column_index + columns_per_page >= total_columns
if st.button(
"Last", disabled=is_at_last, icon=":material/last_page:", width="stretch"
):
# Move to the last possible position
st.session_state.current_column_index = max(0, total_columns - columns_per_page)
st.rerun()
# Get current columns based on column index
current_columns = column_items[
current_column_index : current_column_index + columns_per_page
]
# Add column scrubber slider
if column_items:
# Create slider using actual column values
current_first_col = current_columns[0] if current_columns else column_items[0]
# Create slider for column selection using selectbox for discrete values
selected_col = st.select_slider(
f"Jump to {column_type[:-1]}",
options=column_items,
value=current_first_col,
help=f"Use this slider to quickly jump to any {column_type[:-1]} in the series",
)
# Update current column index if slider value changed
if selected_col in column_items:
new_index = column_items.index(selected_col)
if new_index != current_column_index:
st.session_state.current_column_index = new_index
st.rerun()
if layout_orientation == "Steps":
# Original layout: experiments as rows, steps as columns
# Calculate optimal column widths with smart size limiting
if sorted_folders:
# Find the longest experiment name
max_name_length = max(len(folder) for folder in sorted_folders)
# Add padding and convert to relative width (experiment names are typically 10-30 chars)
experiment_col_width = min(
max(max_name_length * 0.8, 12), 25
) # Between 12 and 25
# Calculate available width for step columns
available_width = 100 - experiment_col_width
# Calculate step column width - each column can be smaller when more columns are added
step_col_width = available_width / len(current_columns)
# Smart size limiting: prevent any single image from being too large
# Reference size: 3 experiments, 4 files per page, but with smaller individual images
reference_experiment_width = 0 # Typical experiment column width
reference_available_width = 100 - reference_experiment_width
reference_step_width = reference_available_width / 4 # 4 files per page
# Reduce the maximum to 60% of the reference size for more reasonable single image size
max_step_width = (
reference_step_width * 1
) # This is our maximum allowed step width
# Apply the size limit
if step_col_width > max_step_width:
step_col_width = max_step_width
# Create column configuration
col_config = [experiment_col_width] + [step_col_width] * len(current_columns)
else:
col_config = [1] * (len(current_columns) + 1)
# Create grid header with step numbers
header_cols = st.columns(col_config)
with header_cols[0]:
st.write("**Experiment**")
for idx, step in enumerate(current_columns):
with header_cols[idx + 1]:
st.write(f"**Step {step}**")
# Create grid rows (one per experiment)
for folder_name in sorted_folders:
row_cols = st.columns(col_config)
# Experiment name in first column
with row_cols[0]:
st.write(folder_name)
# Images for each step in remaining columns
for idx, step in enumerate(current_columns):
with row_cols[idx + 1]:
if step in folder_step_grid[folder_name]:
file_info = folder_step_grid[folder_name][step]
try:
# Check if it's a video file by extension
file_extension = Path(file_info.path).suffix.lower()
is_video = file_extension in _SUPPORTED_VIDEO_EXTENSIONS
if is_video:
# Display video using st.video
st.video(file_info.path)
else:
# # Display image using PIL
# image = Image.open(file_info.path)
# # Apply consistent sizing if enabled
# if consistent_size:
# image = image.resize(
# consistent_size, Image.Resampling.LANCZOS
# )
# Display image
st.image(file_info.path, width="stretch")
except Exception as e:
st.error(f"Error loading {file_info.name}: {e}")
else:
st.write("—") # No image for this step
else: # Experiments as Columns
# New layout: steps as rows, experiments as columns
# Calculate optimal column widths
if sorted_folders:
# Find the longest experiment name
max_name_length = max(len(folder) for folder in sorted_folders)
# Add padding and convert to relative width
experiment_col_width = min(max(max_name_length * 0.8, 12), 25)
# Calculate available width for experiment columns
available_width = 100 - 15 # Reserve 15% for step labels
# Calculate experiment column width
experiment_col_width = available_width / len(current_columns)
# Apply smart size limiting for experiments too
max_experiment_width = 25 # Maximum 25% per experiment
if experiment_col_width > max_experiment_width:
experiment_col_width = max_experiment_width
# Create column configuration
col_config = [15] + [experiment_col_width] * len(current_columns)
else:
col_config = [1] * (len(current_columns) + 1)
# Create grid header with experiment names
header_cols = st.columns(col_config)
with header_cols[0]:
st.write("**Step**")
for idx, folder_name in enumerate(current_columns):
with header_cols[idx + 1]:
st.write(f"**{folder_name}**")
# Create grid rows (one per step)
for step in sorted_steps:
row_cols = st.columns(col_config)
# Step number in first column
with row_cols[0]:
st.write(f"**{step}**")
# Images for each experiment in remaining columns
for idx, folder_name in enumerate(current_columns):
with row_cols[idx + 1]:
if step in folder_step_grid[folder_name]:
file_info = folder_step_grid[folder_name][step]
try:
# Check if it's a video file by extension
file_extension = Path(file_info.path).suffix.lower()
is_video = file_extension in _SUPPORTED_VIDEO_EXTENSIONS
if is_video:
# Display video using st.video
st.video(file_info.path)
else:
# Display image using PIL
# image = Image.open(file_info.path)
# # Apply consistent sizing if enabled
# if consistent_size:
# image = Image.open(file_info.path).resize(
# consistent_size, Image.Resampling.LANCZOS
# )
# Display image
st.image(file_info.path, width="stretch")
except Exception as e:
st.error(f"Error loading {file_info.name}: {e}")
else:
st.write("—") # No image for this step
# Show column range info
if current_columns:
st.info(f"Showing {column_type} {current_columns[0]} to {current_columns[-1]}")
else:
st.info("No media files found matching the current filters")
else:
st.warning("No media files found", icon=":material/info:")
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
- Use `except Exception:` rather than bare `except:` ([`do-not-use-bare-except`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/do-not-use-bare-except/))
- Replace m.group(x) with m[x] for re.Match objects ([`use-getitem-for-re-match-groups`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/use-getitem-for-re-match-groups/))
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
…le comparison app
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> Signed-off-by: Leo Breedt <[email protected]>
Co-authored-by: Sabine Ståhlberg <[email protected]> Signed-off-by: Leo Breedt <[email protected]>
Co-authored-by: Sabine Ståhlberg <[email protected]> Signed-off-by: Leo Breedt <[email protected]>
Description
Include a summary of the changes and the related issue.
Related to: <ClickUp/JIRA task name>
Any expected test failures?
Add a
[X]
to relevant checklist items❔ This change
✔️ Pre-merge checklist
🧪 Test Configuration
Summary by Sourcery
Introduce a new Streamlit-based File Comparison App for downloading, visualizing, and comparing image and video series logged in Neptune experiments, and update documentation to list the new visualization tool.
New Features:
Build:
Documentation: