Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,12 @@ sudo systemctl start connectivity-monitor@YOUR_USER
sudo systemctl status connectivity-monitor@YOUR_USER
```

For 24/7 Raspberry Pi operation, the Python directory also includes:

- `python/systemd/connectivity-monitor-healthcheck@.service|.timer` (scheduled API health probe)
- `python/systemd/connectivity-monitor-archive@.service|.timer` (daily archive/prune of logs/reports)
Comment on lines +119 to +120
- `python/ops/recover_service.sh` (one-command recovery + validation)

> 📖 Full details: [python/README.md](python/README.md)

---
Expand Down
127 changes: 127 additions & 0 deletions python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,16 @@ sudo systemctl status connectivity-monitor@YOUR_USER
sudo journalctl -u connectivity-monitor@YOUR_USER -f
```

### Hardened 24/7 service behavior

The included `connectivity-monitor@.service` is hardened for always-on usage:

- Restarts automatically (`Restart=always`) with short backoff
- Waits for `network-online.target` on boot
- Adds startup/shutdown timeout protections
- Sets file descriptor/task limits for long-running operation
- Restricts filesystem writes to `~/ConnectivityMonitor`

## Raspberry Pi Setup

The Python version works natively on Raspberry Pi:
Expand All @@ -118,6 +128,123 @@ python3 -m connectivity_monitor --headless --web-port 8080

For always-on monitoring, set it up as a systemd service (see above). Then access the dashboard from any device on your network at `http://<pi-ip>:8080`.

## Raspberry Pi 24/7 Production Setup

### 1) Keep the Pi address stable (queryable anytime)

Use one of these:

- DHCP reservation on your router (recommended), or
- Static IP on Raspberry Pi OS

Then use a stable DNS name on your LAN (for example, `connectivity-monitor.local`) if available.
Note: `.local` hostname discovery depends on mDNS (for example `avahi-daemon`) being enabled on the Pi/network.

```bash
sudo systemctl status avahi-daemon
sudo systemctl enable --now avahi-daemon
```

### 2) Install hardened service + timers

```bash
cd ~/ConnectivityMonitor/python

# Main monitor service
sudo cp connectivity-monitor@.service /etc/systemd/system/

# Health-check and archive timer units
sudo cp systemd/connectivity-monitor-healthcheck@.service /etc/systemd/system/
sudo cp systemd/connectivity-monitor-healthcheck@.timer /etc/systemd/system/
sudo cp systemd/connectivity-monitor-archive@.service /etc/systemd/system/
sudo cp systemd/connectivity-monitor-archive@.timer /etc/systemd/system/

sudo systemctl daemon-reload

# Enable monitor + safety timers
sudo systemctl enable connectivity-monitor@YOUR_USER
sudo systemctl start connectivity-monitor@YOUR_USER
sudo systemctl enable --now connectivity-monitor-healthcheck@YOUR_USER.timer
sudo systemctl enable --now connectivity-monitor-archive@YOUR_USER.timer
```

If your monitor runs on a non-default web port, override the health-check unit port:

```bash
sudo systemctl edit connectivity-monitor-healthcheck@YOUR_USER.service
# Add:
# [Service]
# Environment=WEB_PORT=9090
sudo systemctl daemon-reload
sudo systemctl restart connectivity-monitor-healthcheck@YOUR_USER.timer
```

### 3) Reverse proxy for controlled remote access (TLS/auth)

Keep the monitor on localhost and publish through a reverse proxy (Nginx/Caddy/Traefik) to add:

- HTTPS/TLS certificates
- Basic auth or SSO
- IP allow-listing/rate limits

Proxy upstream target: `http://127.0.0.1:8080`
Comment on lines +184 to +190

## Operational Safety Checks

### API health probe + alert trigger

`ops/health_probe.py` checks `/api/status` and exits non-zero when:

- Endpoint is unreachable
- Health score is below threshold
- Packet loss exceeds threshold

This is scheduled every minute by `connectivity-monitor-healthcheck@.timer` and visible in `journalctl`.

Optional auto-reboot trigger example:

```bash
python3 ops/health_probe.py \
--url http://127.0.0.1:8080/api/status \
--min-health 60 --max-loss 10 \
--reboot-after-failures 15 \
--allow-reboot
```

Auto-reboot requires root privileges (or an explicit sudo policy that allows the reboot command non-interactively).

### Log/report persistence and archival

The monitor writes logs and reports under `~/ConnectivityMonitor`.

`ops/archive_artifacts.py` can archive and prune old data, and is scheduled daily by `connectivity-monitor-archive@.timer`.

## One-command Recovery and Validation

Use:

```bash
bash ~/ConnectivityMonitor/python/ops/recover_service.sh YOUR_USER
```

This command:

- Reloads systemd units
- Re-enables and restarts the monitor service
- Prints service status and recent journal logs
- Validates API response from `/api/status`

## Soak Test Checklist (48–72h)

Before calling deployment production-ready, run for 48–72 hours and verify:

- Service survives reboot (`systemctl is-enabled` + post-reboot status)
- Auto-restart behavior works when process is killed
- API remains queryable (`/api/status`, `/api/history`, `/api/drops`, `/api/targets`, `/api/heatmap`)
- Logs and reports continue to generate
- Health-check timer executes and records status
- Daily archival timer creates archives and prunes old ones

## Package Structure

```
Expand Down
21 changes: 18 additions & 3 deletions python/connectivity-monitor@.service
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,29 @@
Description=Connectivity Monitor v4.0
After=network-online.target
Wants=network-online.target
StartLimitIntervalSec=300
StartLimitBurst=10

[Service]
Type=simple
User=%i
Group=%i
WorkingDirectory=%h/ConnectivityMonitor/python
Environment=PYTHONUNBUFFERED=1
ExecStart=/usr/bin/python3 -m connectivity_monitor --headless
WorkingDirectory=%h
Restart=on-failure
RestartSec=10
Restart=always
RestartSec=5
TimeoutStartSec=30
TimeoutStopSec=30
UMask=0027
LimitNOFILE=65536
TasksMax=512
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=full
ReadWritePaths=%h/ConnectivityMonitor
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target
101 changes: 101 additions & 0 deletions python/ops/archive_artifacts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
#!/usr/bin/env python3
"""Archive logs/reports into compressed tar files and prune old archives."""

import argparse
import datetime
import os
import tarfile
import time


def _collect_files(path, older_than_days):
if not os.path.isdir(path):
return []
now = time.time()
min_age = older_than_days * 86400
files = []
for name in os.listdir(path):
full = os.path.join(path, name)
if not os.path.isfile(full):
continue
if now - os.path.getmtime(full) >= min_age:
files.append(full)
return sorted(files)


def _archive_group(files, archive_path, base_dir):
if not files:
return 0
os.makedirs(os.path.dirname(archive_path), exist_ok=True)
with tarfile.open(archive_path, "w:gz") as tar:
for file_path in files:
rel = os.path.relpath(file_path, base_dir)
tar.add(file_path, arcname=rel)
return len(files)


def _prune_archives(archive_dir, keep_days):
if keep_days <= 0 or not os.path.isdir(archive_dir):
return 0
now = time.time()
max_age = keep_days * 86400
removed = 0
for name in os.listdir(archive_dir):
if not name.endswith(".tar.gz"):
continue
full = os.path.join(archive_dir, name)
if os.path.isfile(full) and (now - os.path.getmtime(full)) > max_age:
os.remove(full)
removed += 1
Comment on lines +43 to +49
return removed


def main():
parser = argparse.ArgumentParser(description="Archive ConnectivityMonitor logs/reports")
parser.add_argument("--base-dir", default=os.path.expanduser("~/ConnectivityMonitor"))
parser.add_argument("--older-than-days", type=int, default=1)
parser.add_argument("--delete-after-archive", action="store_true")
parser.add_argument("--keep-archive-days", type=int, default=30)
args = parser.parse_args()

ts = datetime.datetime.now(datetime.timezone.utc).strftime("%Y%m%d_%H%M%SZ")
logs_dir = os.path.join(args.base_dir, "logs")
reports_dir = os.path.join(args.base_dir, "reports")
archive_dir = os.path.join(args.base_dir, "archive")

logs_files = _collect_files(logs_dir, args.older_than_days)
reports_files = _collect_files(reports_dir, args.older_than_days)

logs_archive = os.path.join(archive_dir, "logs_{}.tar.gz".format(ts))
reports_archive = os.path.join(archive_dir, "reports_{}.tar.gz".format(ts))

logs_count = _archive_group(logs_files, logs_archive, args.base_dir)
reports_count = _archive_group(reports_files, reports_archive, args.base_dir)

delete_errors = []
if args.delete_after_archive:
for file_path in logs_files + reports_files:
try:
os.remove(file_path)
except OSError as exc:
delete_errors.append("{} ({})".format(file_path, exc))

pruned = _prune_archives(archive_dir, args.keep_archive_days)

print(
"Archived logs={}, reports={}, deleted_source={}, pruned_archives={}, delete_errors={}".format(
logs_count,
reports_count,
bool(args.delete_after_archive),
pruned,
len(delete_errors),
)
)
if delete_errors:
for err in delete_errors:
print("Delete error: {}".format(err))
raise SystemExit(1)


if __name__ == "__main__":
main()
Loading