A Python utility for processing and analyzing SurveyCTO server usage reports across multiple reporting periods.
This script processes SurveyCTO server usage reports, merges data across time periods, and generates statistical summaries. It handles both aggregated (team-level) and detailed (form-level) reports, automatically extracting data from usage reports available from the SurveyCTO server in batches.
- Automatically extracts zip files with usage reports (less than 100KB in size)
- Generates four types of summary reports:
- Aggregated report summary - Shows total mobile and web submissions across all periods
- Detailed report summary - Provides form-level statistics from all detailed reports
- Aggregated report by period - Shows period-by-period breakdown of submission data
- Team-specific reports - Creates separate reports for each team when multiple teams exist
- Choose between a consolidated Excel report or separate CSV files
- Python 3.6+
- Required packages:
- pandas
- openpyxl (for Excel output)
It's recommended to run this script in a virtual environment to manage dependencies properly.
-
Navigate to your project directory:
cd /path/to/project -
Create a virtual environment:
python -m venv venv-scto-usage-reports
On Windows:
scto-venv\Scripts\activateOn macOS and Linux:
source scto-venv/bin/activateWith the virtual environment activated, install the required packages:
pip install pandas openpyxlBasic usage (processes files in current directory and outputs as Excel):
python scto-usage-reports.pyProcess files in a specific directory:
python scto-usage-reports.py /path/to/reportsOutput as CSV files instead of Excel:
python scto-usage-reports.py --exportformat csvProcess multiple servers in batch mode:
python scto-usage-reports.py /path/to/root --batchProcess multiple servers with consolidated output:
python scto-usage-reports.py /path/to/root --batch --output-dir /path/to/consolidatedThe script processes CSV files with the following naming pattern:
servername_aggregated_report_Month_Day_Year_to_Month_Day_Year.csv
servername_detailed_report_Month_Day_Year_to_Month_Day_Year.csv
Example: servername_aggregated_report_April_4_2025_to_May_1_2025.csv
It also automatically extracts zip files (regardless of filename) that contain properly formatted CSV reports.
A single workbook named scto_usage_reports.xlsx containing these sheets:
-
aggregated_report_summary - Shows:
- Date range of all processed reports
- Total mobile submissions across all periods
- Total web submissions across all periods
- Total of all submissions (mobile + web)
-
detailed_report_summary - Shows:
- Date range of all processed reports
- Form-level data from the most recent report
- Sum of mobile and web submissions for each form across all periods
-
aggregated_report_by_period - Shows:
- Date range of all processed reports
- Period-by-period breakdown of submission data
- Columns for period start/end dates, allocated space, and submission counts
-
Team-specific sheets - Shows:
- One sheet per team (when multiple teams exist)
- Period-by-period submission data for each team
- Team sheets are named following the pattern "team_teamname"
Multiple CSV files with the same content as the Excel sheets:
aggregated_report_summary.csvdetailed_report_summary.csvaggregated_report_by_period.csv- One CSV file per team (if multiple teams exist)
- Bold formatting for report titles and headers
- Proper date formatting (MM/DD/YYYY)
- Auto-adjusted column widths (excluding title row)
- Consistent formatting across all sheets
The script supports batch processing of usage reports from multiple SurveyCTO servers. This feature allows you to process reports from different servers in a single run, with options for organizing outputs.
Organize your server reports in subdirectories under a root directory:
root_directory/
├── server1/
│ ├── server1_usage_reports_April_4_2025_to_May_1_2025.zip
│ ├── server1_aggregated_report_April_4_2025_to_May_1_2025.csv
│ └── server1_detailed_report_April_4_2025_to_May_1_2025.csv
├── server2/
│ ├── server2_usage_reports_April_4_2025_to_May_1_2025.zip
│ └── (extracted CSV files)
└── myserver/
├── myserver_usage_reports_April_4_2025_to_May_1_2025.zip
└── (extracted CSV files)
Important: Each subdirectory name will be used as the server identifier in output files.
Process all server subdirectories, saving outputs in each subdirectory:
python scto-usage-reports.py /path/to/root --batchProcess all servers but save all outputs in a single directory:
python scto-usage-reports.py /path/to/root --batch --output-dir /path/to/consolidatedpython scto-usage-reports.py /path/to/root --batch --exportformat csvpython scto-usage-reports.py /path/to/root --batch --output-dir /path/to/consolidated --exportformat csvWhen using batch processing, output files are prefixed with the server name:
- Individual outputs: Each subdirectory gets
servername_scto_usage_reports.xlsx - Consolidated outputs: All files saved to specified directory as
servername_scto_usage_reports.xlsx
- Individual outputs: Files like
servername_aggregated_report_summary.csvin each subdirectory - Consolidated outputs: All server files saved to specified directory with server prefixes
- The script automatically detects all subdirectories in the root directory
- Each subdirectory is processed independently as a separate server
- Progress is displayed for each server being processed
- Subdirectories without valid report files are skipped with appropriate messages
- A summary shows how many servers were successfully processed
With individual outputs (default):
root_directory/
├── server1/
│ ├── (original files)
│ └── server1_scto_usage_reports.xlsx
├── server2/
│ ├── (original files)
│ └── server2_scto_usage_reports.xlsx
└── myserver/
├── (original files)
└── myserver_scto_usage_reports.xlsx
With consolidated output:
consolidated_output/
├── server1_scto_usage_reports.xlsx
├── server2_scto_usage_reports.xlsx
└── myserver_scto_usage_reports.xlsx
All existing single-directory functionality remains unchanged. The script automatically detects whether you're processing a single directory or using batch mode based on the --batch flag.
When processing multiple servers in batch mode, the script automatically generates an additional cross-server aggregate report that provides statistical comparisons across all processed servers.
The cross-server aggregate report is automatically created when:
- Using
--batchmode - Successfully processing 2 or more servers
- At least one server contains valid usage report data
The aggregate report (all_server_aggregate_report.xlsx) contains two sheets:
This sheet provides statistical comparisons across all servers and contains two separate tables:
Table 1: Total Submissions Across All Periods
- Shows total submission activity for each server (sum of all reporting periods)
- Useful for understanding overall server usage and scale differences
- Displays average, maximum, minimum, and standard deviation across all servers
Table 2: Average Submissions Per Period
- Shows typical submission activity per reporting period for each server
- Useful for understanding normal operational levels and comparing server activity rates
- Displays average, maximum, minimum, and standard deviation of per-period averages across all servers
Both tables include metrics for:
- Mobile Submissions
- Web Submissions
- All Submissions (Mobile + Web)
This sheet provides a comparative view of each server's most recent reporting period:
| Column | Description |
|---|---|
| Server name | Server identifier from directory name |
| Period | Date range of most recent report (MM/DD/YYYY - MM/DD/YYYY format) |
| Allocated space (MB) | Storage allocation for most recent period |
| Mobile submissions | Mobile submissions in most recent period |
| Web submissions | Web submissions in most recent period |
| All submissions | Total submissions in most recent period |
Why use two tables in the summary?
- Total submissions help identify which servers handle the most overall activity
- Average per period help identify which servers are most active in typical operations
Example interpretation:
- If "Average Submissions Per Period" shows ~10,000 but "Total Submissions" shows ~30,000, this suggests servers typically have 3 reporting periods
- If one server shows much higher per-period averages, it may be handling more active data collection
The aggregate report is saved as all_server_aggregate_report.xlsx in:
- The
--output-dirdirectory (if specified) - The root directory containing all server subdirectories (if no output directory specified)
Batch processing complete. Successfully processed 15 out of 16 servers.
Created cross-server aggregate report: /path/to/output/all_server_aggregate_report.xlsx
All outputs saved to: /path/to/output
If you encounter any issues:
- Missing files: Ensure your CSV files follow the expected naming convention
- Header errors: Check that CSV files contain all the required headers
- Date parsing errors: Verify that dates in filenames follow the Month_Day_Year format
If the script identifies issues with any files, it will display reasons why they were skipped.
- Files in zip archives are automatically extracted before processing
- The script will overwrite existing output files without warning
- When multiple teams exist, values are summed per period in the aggregated report
