SurveyCTO Usage Report Generator

A Python utility for processing and analyzing SurveyCTO server usage reports across multiple reporting periods.

Overview

This script processes SurveyCTO server usage reports, merges data across time periods, and generates statistical summaries. It handles both aggregated (team-level) and detailed (form-level) reports, automatically extracting data from usage reports available from the SurveyCTO server in batches.

Features

Automatically extracts zip files with usage reports (less than 100KB in size)
Generates four types of summary reports:
1. Aggregated report summary - Shows total mobile and web submissions across all periods
2. Detailed report summary - Provides form-level statistics from all detailed reports
3. Aggregated report by period - Shows period-by-period breakdown of submission data
4. Team-specific reports - Creates separate reports for each team when multiple teams exist
Choose between a consolidated Excel report or separate CSV files

Requirements

Python 3.6+
Required packages:
- pandas
- openpyxl (for Excel output)

Installation

Setting Up a Virtual Environment

It's recommended to run this script in a virtual environment to manage dependencies properly.

Creating a Virtual Environment

Navigate to your project directory:
```
cd /path/to/project
```
Create a virtual environment:
```
python -m venv venv-scto-usage-reports
```

Activating the Virtual Environment

On Windows:

scto-venv\Scripts\activate

On macOS and Linux:

source scto-venv/bin/activate

Installing Dependencies

With the virtual environment activated, install the required packages:

pip install pandas openpyxl

Usage

Basic usage (processes files in current directory and outputs as Excel):

python scto-usage-reports.py

Process files in a specific directory:

python scto-usage-reports.py /path/to/reports

Output as CSV files instead of Excel:

python scto-usage-reports.py --exportformat csv

Process multiple servers in batch mode:

python scto-usage-reports.py /path/to/root --batch

Process multiple servers with consolidated output:

python scto-usage-reports.py /path/to/root --batch --output-dir /path/to/consolidated

Input File Requirements

The script processes CSV files with the following naming pattern:

servername_aggregated_report_Month_Day_Year_to_Month_Day_Year.csv
servername_detailed_report_Month_Day_Year_to_Month_Day_Year.csv

Example: servername_aggregated_report_April_4_2025_to_May_1_2025.csv

It also automatically extracts zip files (regardless of filename) that contain properly formatted CSV reports.

Output

Excel Output (Default)

A single workbook named scto_usage_reports.xlsx containing these sheets:

aggregated_report_summary - Shows:
- Date range of all processed reports
- Total mobile submissions across all periods
- Total web submissions across all periods
- Total of all submissions (mobile + web)
detailed_report_summary - Shows:
- Date range of all processed reports
- Form-level data from the most recent report
- Sum of mobile and web submissions for each form across all periods
aggregated_report_by_period - Shows:
- Date range of all processed reports
- Period-by-period breakdown of submission data
- Columns for period start/end dates, allocated space, and submission counts
Team-specific sheets - Shows:
- One sheet per team (when multiple teams exist)
- Period-by-period submission data for each team
- Team sheets are named following the pattern "team_teamname"

CSV Output

Multiple CSV files with the same content as the Excel sheets:

aggregated_report_summary.csv
detailed_report_summary.csv
aggregated_report_by_period.csv
One CSV file per team (if multiple teams exist)

Excel Formatting Features

Bold formatting for report titles and headers
Proper date formatting (MM/DD/YYYY)
Auto-adjusted column widths (excluding title row)
Consistent formatting across all sheets

Batch Processing Multiple Servers

The script supports batch processing of usage reports from multiple SurveyCTO servers. This feature allows you to process reports from different servers in a single run, with options for organizing outputs.

Setting Up for Batch Processing

Organize your server reports in subdirectories under a root directory:

root_directory/
├── server1/
│   ├── server1_usage_reports_April_4_2025_to_May_1_2025.zip
│   ├── server1_aggregated_report_April_4_2025_to_May_1_2025.csv
│   └── server1_detailed_report_April_4_2025_to_May_1_2025.csv
├── server2/
│   ├── server2_usage_reports_April_4_2025_to_May_1_2025.zip
│   └── (extracted CSV files)
└── myserver/
    ├── myserver_usage_reports_April_4_2025_to_May_1_2025.zip
    └── (extracted CSV files)

Important: Each subdirectory name will be used as the server identifier in output files.

Batch Processing Commands

Basic batch processing

Process all server subdirectories, saving outputs in each subdirectory:

python scto-usage-reports.py /path/to/root --batch

Batch processing with consolidated output

Process all servers but save all outputs in a single directory:

python scto-usage-reports.py /path/to/root --batch --output-dir /path/to/consolidated

Batch processing with CSV output

python scto-usage-reports.py /path/to/root --batch --exportformat csv

Batch processing with both consolidated output and CSV format

python scto-usage-reports.py /path/to/root --batch --output-dir /path/to/consolidated --exportformat csv

Batch Processing Output

When using batch processing, output files are prefixed with the server name:

Excel Output (Default)

Individual outputs: Each subdirectory gets servername_scto_usage_reports.xlsx
Consolidated outputs: All files saved to specified directory as servername_scto_usage_reports.xlsx

CSV Output

Individual outputs: Files like servername_aggregated_report_summary.csv in each subdirectory
Consolidated outputs: All server files saved to specified directory with server prefixes

Batch Processing Behavior

The script automatically detects all subdirectories in the root directory
Each subdirectory is processed independently as a separate server
Progress is displayed for each server being processed
Subdirectories without valid report files are skipped with appropriate messages
A summary shows how many servers were successfully processed

Example Batch Output Structure

With individual outputs (default):

root_directory/
├── server1/
│   ├── (original files)
│   └── server1_scto_usage_reports.xlsx
├── server2/
│   ├── (original files)
│   └── server2_scto_usage_reports.xlsx
└── myserver/
    ├── (original files)
    └── myserver_scto_usage_reports.xlsx

With consolidated output:

consolidated_output/
├── server1_scto_usage_reports.xlsx
├── server2_scto_usage_reports.xlsx
└── myserver_scto_usage_reports.xlsx

Note on Backward Compatibility

All existing single-directory functionality remains unchanged. The script automatically detects whether you're processing a single directory or using batch mode based on the --batch flag.

Cross-Server Aggregate Reporting

When processing multiple servers in batch mode, the script automatically generates an additional cross-server aggregate report that provides statistical comparisons across all processed servers.

When Cross-Server Reports Are Generated

The cross-server aggregate report is automatically created when:

Using --batch mode
Successfully processing 2 or more servers
At least one server contains valid usage report data

Cross-Server Report Contents

The aggregate report (all_server_aggregate_report.xlsx) contains two sheets:

1. Aggregated Report Summary (statistics from all servers)

This sheet provides statistical comparisons across all servers and contains two separate tables:

Table 1: Total Submissions Across All Periods

Shows total submission activity for each server (sum of all reporting periods)
Useful for understanding overall server usage and scale differences
Displays average, maximum, minimum, and standard deviation across all servers

Table 2: Average Submissions Per Period

Shows typical submission activity per reporting period for each server
Useful for understanding normal operational levels and comparing server activity rates
Displays average, maximum, minimum, and standard deviation of per-period averages across all servers

Both tables include metrics for:

Mobile Submissions
Web Submissions
All Submissions (Mobile + Web)

2. Most Recent Data By Server

This sheet provides a comparative view of each server's most recent reporting period:

Column	Description
Server name	Server identifier from directory name
Period	Date range of most recent report (MM/DD/YYYY - MM/DD/YYYY format)
Allocated space (MB)	Storage allocation for most recent period
Mobile submissions	Mobile submissions in most recent period
Web submissions	Web submissions in most recent period
All submissions	Total submissions in most recent period

Understanding the Statistics

Why use two tables in the summary?

Total submissions help identify which servers handle the most overall activity
Average per period help identify which servers are most active in typical operations

Example interpretation:

If "Average Submissions Per Period" shows ~10,000 but "Total Submissions" shows ~30,000, this suggests servers typically have 3 reporting periods
If one server shows much higher per-period averages, it may be handling more active data collection

Cross-Server Report Location

The aggregate report is saved as all_server_aggregate_report.xlsx in:

The --output-dir directory (if specified)
The root directory containing all server subdirectories (if no output directory specified)

Example Output Message

Batch processing complete. Successfully processed 15 out of 16 servers.
Created cross-server aggregate report: /path/to/output/all_server_aggregate_report.xlsx
All outputs saved to: /path/to/output

Troubleshooting

If you encounter any issues:

Missing files: Ensure your CSV files follow the expected naming convention
Header errors: Check that CSV files contain all the required headers
Date parsing errors: Verify that dates in filenames follow the Month_Day_Year format

If the script identifies issues with any files, it will display reasons why they were skipped.

Notes

Files in zip archives are automatically extracted before processing
The script will overwrite existing output files without warning
When multiple teams exist, values are summed per period in the aggregated report

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment_gen.py		environment_gen.py
scto-usage-reports.py		scto-usage-reports.py
specification.md		specification.md
usage_report.png		usage_report.png

License

surveycto/scto-usage-reports

Folders and files

Latest commit

History

Repository files navigation

SurveyCTO Usage Report Generator

Overview

Features

Requirements

Installation

Setting Up a Virtual Environment

Creating a Virtual Environment

Activating the Virtual Environment

Installing Dependencies

Usage

Input File Requirements

Output

Excel Output (Default)

CSV Output

Excel Formatting Features

Batch Processing Multiple Servers

Setting Up for Batch Processing

Batch Processing Commands

Basic batch processing

Batch processing with consolidated output

Batch processing with CSV output

Batch processing with both consolidated output and CSV format

Batch Processing Output

Excel Output (Default)

CSV Output

Batch Processing Behavior

Example Batch Output Structure

Note on Backward Compatibility

Cross-Server Aggregate Reporting

When Cross-Server Reports Are Generated

Cross-Server Report Contents

1. Aggregated Report Summary (statistics from all servers)

2. Most Recent Data By Server

Understanding the Statistics

Cross-Server Report Location

Example Output Message

Troubleshooting

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages