Skip to content

surveycto/scto-usage-reports

Repository files navigation

SurveyCTO Usage Report Generator

image

A Python utility for processing and analyzing SurveyCTO server usage reports across multiple reporting periods.

Overview

This script processes SurveyCTO server usage reports, merges data across time periods, and generates statistical summaries. It handles both aggregated (team-level) and detailed (form-level) reports, automatically extracting data from usage reports available from the SurveyCTO server in batches.

Features

  • Automatically extracts zip files with usage reports (less than 100KB in size)
  • Generates four types of summary reports:
    1. Aggregated report summary - Shows total mobile and web submissions across all periods
    2. Detailed report summary - Provides form-level statistics from all detailed reports
    3. Aggregated report by period - Shows period-by-period breakdown of submission data
    4. Team-specific reports - Creates separate reports for each team when multiple teams exist
  • Choose between a consolidated Excel report or separate CSV files

Requirements

  • Python 3.6+
  • Required packages:
    • pandas
    • openpyxl (for Excel output)

Installation

Setting Up a Virtual Environment

It's recommended to run this script in a virtual environment to manage dependencies properly.

Creating a Virtual Environment

  1. Navigate to your project directory:

    cd /path/to/project
  2. Create a virtual environment:

    python -m venv venv-scto-usage-reports

Activating the Virtual Environment

On Windows:

scto-venv\Scripts\activate

On macOS and Linux:

source scto-venv/bin/activate

Installing Dependencies

With the virtual environment activated, install the required packages:

pip install pandas openpyxl

Usage

Basic usage (processes files in current directory and outputs as Excel):

python scto-usage-reports.py

Process files in a specific directory:

python scto-usage-reports.py /path/to/reports

Output as CSV files instead of Excel:

python scto-usage-reports.py --exportformat csv

Process multiple servers in batch mode:

python scto-usage-reports.py /path/to/root --batch

Process multiple servers with consolidated output:

python scto-usage-reports.py /path/to/root --batch --output-dir /path/to/consolidated

Input File Requirements

The script processes CSV files with the following naming pattern:

servername_aggregated_report_Month_Day_Year_to_Month_Day_Year.csv
servername_detailed_report_Month_Day_Year_to_Month_Day_Year.csv

Example: servername_aggregated_report_April_4_2025_to_May_1_2025.csv

It also automatically extracts zip files (regardless of filename) that contain properly formatted CSV reports.

Output

Excel Output (Default)

A single workbook named scto_usage_reports.xlsx containing these sheets:

  1. aggregated_report_summary - Shows:

    • Date range of all processed reports
    • Total mobile submissions across all periods
    • Total web submissions across all periods
    • Total of all submissions (mobile + web)
  2. detailed_report_summary - Shows:

    • Date range of all processed reports
    • Form-level data from the most recent report
    • Sum of mobile and web submissions for each form across all periods
  3. aggregated_report_by_period - Shows:

    • Date range of all processed reports
    • Period-by-period breakdown of submission data
    • Columns for period start/end dates, allocated space, and submission counts
  4. Team-specific sheets - Shows:

    • One sheet per team (when multiple teams exist)
    • Period-by-period submission data for each team
    • Team sheets are named following the pattern "team_teamname"

CSV Output

Multiple CSV files with the same content as the Excel sheets:

  • aggregated_report_summary.csv
  • detailed_report_summary.csv
  • aggregated_report_by_period.csv
  • One CSV file per team (if multiple teams exist)

Excel Formatting Features

  • Bold formatting for report titles and headers
  • Proper date formatting (MM/DD/YYYY)
  • Auto-adjusted column widths (excluding title row)
  • Consistent formatting across all sheets

Batch Processing Multiple Servers

The script supports batch processing of usage reports from multiple SurveyCTO servers. This feature allows you to process reports from different servers in a single run, with options for organizing outputs.

Setting Up for Batch Processing

Organize your server reports in subdirectories under a root directory:

root_directory/
├── server1/
│   ├── server1_usage_reports_April_4_2025_to_May_1_2025.zip
│   ├── server1_aggregated_report_April_4_2025_to_May_1_2025.csv
│   └── server1_detailed_report_April_4_2025_to_May_1_2025.csv
├── server2/
│   ├── server2_usage_reports_April_4_2025_to_May_1_2025.zip
│   └── (extracted CSV files)
└── myserver/
    ├── myserver_usage_reports_April_4_2025_to_May_1_2025.zip
    └── (extracted CSV files)

Important: Each subdirectory name will be used as the server identifier in output files.

Batch Processing Commands

Basic batch processing

Process all server subdirectories, saving outputs in each subdirectory:

python scto-usage-reports.py /path/to/root --batch

Batch processing with consolidated output

Process all servers but save all outputs in a single directory:

python scto-usage-reports.py /path/to/root --batch --output-dir /path/to/consolidated

Batch processing with CSV output

python scto-usage-reports.py /path/to/root --batch --exportformat csv

Batch processing with both consolidated output and CSV format

python scto-usage-reports.py /path/to/root --batch --output-dir /path/to/consolidated --exportformat csv

Batch Processing Output

When using batch processing, output files are prefixed with the server name:

Excel Output (Default)

  • Individual outputs: Each subdirectory gets servername_scto_usage_reports.xlsx
  • Consolidated outputs: All files saved to specified directory as servername_scto_usage_reports.xlsx

CSV Output

  • Individual outputs: Files like servername_aggregated_report_summary.csv in each subdirectory
  • Consolidated outputs: All server files saved to specified directory with server prefixes

Batch Processing Behavior

  • The script automatically detects all subdirectories in the root directory
  • Each subdirectory is processed independently as a separate server
  • Progress is displayed for each server being processed
  • Subdirectories without valid report files are skipped with appropriate messages
  • A summary shows how many servers were successfully processed

Example Batch Output Structure

With individual outputs (default):

root_directory/
├── server1/
│   ├── (original files)
│   └── server1_scto_usage_reports.xlsx
├── server2/
│   ├── (original files)
│   └── server2_scto_usage_reports.xlsx
└── myserver/
    ├── (original files)
    └── myserver_scto_usage_reports.xlsx

With consolidated output:

consolidated_output/
├── server1_scto_usage_reports.xlsx
├── server2_scto_usage_reports.xlsx
└── myserver_scto_usage_reports.xlsx

Note on Backward Compatibility

All existing single-directory functionality remains unchanged. The script automatically detects whether you're processing a single directory or using batch mode based on the --batch flag.

Cross-Server Aggregate Reporting

When processing multiple servers in batch mode, the script automatically generates an additional cross-server aggregate report that provides statistical comparisons across all processed servers.

When Cross-Server Reports Are Generated

The cross-server aggregate report is automatically created when:

  • Using --batch mode
  • Successfully processing 2 or more servers
  • At least one server contains valid usage report data

Cross-Server Report Contents

The aggregate report (all_server_aggregate_report.xlsx) contains two sheets:

1. Aggregated Report Summary (statistics from all servers)

This sheet provides statistical comparisons across all servers and contains two separate tables:

Table 1: Total Submissions Across All Periods

  • Shows total submission activity for each server (sum of all reporting periods)
  • Useful for understanding overall server usage and scale differences
  • Displays average, maximum, minimum, and standard deviation across all servers

Table 2: Average Submissions Per Period

  • Shows typical submission activity per reporting period for each server
  • Useful for understanding normal operational levels and comparing server activity rates
  • Displays average, maximum, minimum, and standard deviation of per-period averages across all servers

Both tables include metrics for:

  • Mobile Submissions
  • Web Submissions
  • All Submissions (Mobile + Web)

2. Most Recent Data By Server

This sheet provides a comparative view of each server's most recent reporting period:

Column Description
Server name Server identifier from directory name
Period Date range of most recent report (MM/DD/YYYY - MM/DD/YYYY format)
Allocated space (MB) Storage allocation for most recent period
Mobile submissions Mobile submissions in most recent period
Web submissions Web submissions in most recent period
All submissions Total submissions in most recent period

Understanding the Statistics

Why use two tables in the summary?

  • Total submissions help identify which servers handle the most overall activity
  • Average per period help identify which servers are most active in typical operations

Example interpretation:

  • If "Average Submissions Per Period" shows ~10,000 but "Total Submissions" shows ~30,000, this suggests servers typically have 3 reporting periods
  • If one server shows much higher per-period averages, it may be handling more active data collection

Cross-Server Report Location

The aggregate report is saved as all_server_aggregate_report.xlsx in:

  • The --output-dir directory (if specified)
  • The root directory containing all server subdirectories (if no output directory specified)

Example Output Message

Batch processing complete. Successfully processed 15 out of 16 servers.
Created cross-server aggregate report: /path/to/output/all_server_aggregate_report.xlsx
All outputs saved to: /path/to/output

Troubleshooting

If you encounter any issues:

  1. Missing files: Ensure your CSV files follow the expected naming convention
  2. Header errors: Check that CSV files contain all the required headers
  3. Date parsing errors: Verify that dates in filenames follow the Month_Day_Year format

If the script identifies issues with any files, it will display reasons why they were skipped.

Notes

  • Files in zip archives are automatically extracted before processing
  • The script will overwrite existing output files without warning
  • When multiple teams exist, values are summed per period in the aggregated report

About

A Python script to merge and aggregate SurveyCTO server usage reports.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages