Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,6 @@ wled-update.sh
/wled00/Release
/wled00/wled00.ino.cpp
/wled00/html_*.h

# Temporary fork statistics results
tempresults.json
201 changes: 201 additions & 0 deletions tools/README_fork_stats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# Fork Statistics Analysis Tool

This tool analyzes GitHub repository forks to provide insights into fork activity and health for the WLED project.

## Features

The script analyzes and reports on:

- **Branch Analysis**: Which forks have branches that do not exist in the main repo
- **Recency Analysis**: Which forks have recent versions of main vs outdated forks
- **Contribution Analysis**: Which fork repos have been the source of PRs into the main repo
- **Activity Detection**: Which forks have active development but haven't contributed PRs
- **Owner Commit Analysis**: Statistics about commits made by fork owners to their own repositories
- **Age Statistics**: Distribution of how far behind forks are (1 month, 3 months, 6 months, 1 year, 2+ years)
- **Incremental Saving**: Automatically saves intermediate results every 10 forks to prevent data loss

## Requirements

- Python 3.7+
- `requests` library (included in WLED requirements.txt)
- GitHub personal access token (recommended for analyzing large numbers of forks)

## Usage

### Quick Demo

To see what the output looks like with sample data:

```bash
python3 tools/fork_stats.py --demo
```

### Basic Analysis (Rate Limited)

Analyze the first 10 forks without a token (uses GitHub's unauthenticated API with 60 requests/hour limit):

```bash
python3 tools/fork_stats.py --max-forks 10
```

### Full Analysis with Token

For comprehensive analysis, create a GitHub personal access token:

1. Go to GitHub Settings > Developer settings > Personal access tokens > Tokens (classic)
2. Generate a new token with `public_repo` scope
3. Set the token as an environment variable:

```bash
export GITHUB_TOKEN="your_token_here"
python3 tools/fork_stats.py
```

Or pass it directly:

```bash
python3 tools/fork_stats.py --token "your_token_here"
```

### Advanced Options

```bash
# Analyze specific repository
python3 tools/fork_stats.py --repo owner/repo

# Limit number of forks analyzed
python3 tools/fork_stats.py --max-forks 50

# Fast mode: skip detailed analysis of very old forks for better performance
python3 tools/fork_stats.py --fast --max-forks 100

# Save detailed JSON results
python3 tools/fork_stats.py --output results.json

# Check what would be analyzed without making API calls
python3 tools/fork_stats.py --dry-run

# Different output format
python3 tools/fork_stats.py --format json
```

## Output

### Summary Format (Default)

The tool provides a human-readable summary including:

- Repository statistics (total forks, stars, watchers)
- Fork age distribution showing staleness
- Activity analysis showing contribution patterns
- Key insights about fork health

### JSON Format

Detailed machine-readable output including:

- Complete fork metadata for each analyzed fork
- Branch information and unique branches
- Contribution history and activity metrics
- Owner commit statistics for each fork
- Full statistical breakdown
- Intermediate results are automatically saved to `tempresults.json` every 10 forks to prevent data loss on interruption

### Visualization

For advanced visualization and analysis of the JSON results, use the companion visualizer tool:

```bash
# Generate visualizations from collected data
python3 tools/fork_stats_visualizer.py results.json --save-plots

# Text-only statistics (no graphs)
python3 tools/fork_stats_visualizer.py results.json --no-graphs
```

See [README_fork_stats_visualizer.md](README_fork_stats_visualizer.md) for complete documentation.

## Performance Considerations

### Execution Speed
- **Without Token**: 60 requests/hour (very slow, only suitable for testing)
- **With Token**: 5000 requests/hour (much faster, recommended for real analysis)
- **Each fork requires 3-8 API requests** depending on the fork's complexity

### Fast Mode
Use `--fast` flag to improve performance:
- Skips detailed analysis of forks inactive for 3+ years
- Reduces API calls for very old forks by ~80%
- Maintains statistical accuracy while dramatically improving speed
- Recommended for initial analysis or large repository scans

### Progress Tracking
The tool provides detailed progress information including:
- Current fork being analyzed
- Time taken per fork analysis
- API requests made and remaining rate limit
- Estimated completion time

## Example Output

```
============================================================
FORK ANALYSIS SUMMARY FOR wled/WLED
============================================================

Repository Details:
- Total Forks: 1,243
- Analyzed: 100
- Stars: 15,500
- Watchers: 326

Fork Age Distribution:
- Last updated ≤ 1 month: 8 ( 8.0%)
- Last updated ≤ 3 months: 12 ( 12.0%)
- Last updated ≤ 6 months: 15 ( 15.0%)
- Last updated ≤ 1 year: 23 ( 23.0%)
- Last updated ≤ 2 years: 25 ( 25.0%)
- Last updated > 5 years: 17 ( 17.0%)

Fork Activity Analysis:
- Forks with unique branches: 34 (34.0%)
- Forks with recent main branch: 42 (42.0%)
- Forks that contributed PRs: 18 (18.0%)
- Active forks (no PR contributions): 23 (23.0%)

Owner Commit Analysis:
- Forks with owner commits: 67 (67.0%)
- Total commits by fork owners: 2845
- Average commits per fork: 28.5

Key Insights:
- Most forks are significantly behind main branch
- Significant number of forks have custom development
- Majority of forks show some owner development activity
```

## Use Cases

- **Project Maintenance**: Identify which forks are actively maintained
- **Community Engagement**: Find potential contributors who haven't submitted PRs
- **Code Discovery**: Locate interesting custom features in fork branches
- **Health Assessment**: Monitor overall ecosystem health of the project
- **Outreach Planning**: Target active fork maintainers for collaboration

## Implementation Details

The script uses the GitHub REST API v3 and implements:

- Rate limiting with automatic backoff
- Error handling for private/deleted repositories
- Efficient pagination for large fork lists
- Branch comparison algorithms
- PR attribution analysis
- Commit recency detection

## Troubleshooting

- **Rate Limit Errors**: Use a GitHub token or reduce `--max-forks`
- **Permission Errors**: Ensure token has `public_repo` scope
- **Network Errors**: Check internet connection and GitHub status
- **Large Repository Timeouts**: Use `--max-forks` to limit analysis scope
Loading