Drawing inspiration from cryptocurrency mining pools, W5XDE lets you train deep learning models across multiple nodes with ease. WIIIIIDE is designed to be simple to use, yet powerful enough to handle your training needs.
Training deep learning models can be painfully slow. WIIIIIDE is here to (hopefully) change that by offering:
- Effortless Centralized Deployment: Set up a central coordinator, and let your nodes do the rest.
- Peer-to-Peer Communication: Distribute your training workload across nodes.
- Plug-and-Play Simplicity: Get started with minimal configuration headaches. (We hope...)
- Python 3.8+
- PyTorch
There are two methods of install.
pip install w5xde
,
OR
git clone https://github.com/rndmcoolawsmgrbg/wiiiiide.git
cd wiiiiide
pip install -r requirements.txt
More detailed instructions can be found in the setup guide.
WIIIIIDE is designed to be straightforward to use. Here's a minimal example to get you started:
import torch.nn as nn
class SimpleModel(nn.Module):
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(10, 5),
nn.ReLU(),
nn.Linear(5, 2)
)
def forward(self, x):
return self.layers(x)
from w5xde import CentralServer
from torch.utils.data import Dataset
# Your dataset should implement the PyTorch Dataset interface
dataset = YourDataset()
model = SimpleModel()
server = CentralServer(
model=model,
dataset=dataset,
batch_size=32,
ip="0.0.0.0", # Use actual IP for remote connections
port=5555
)
server.start()
from w5xde import TrainingNode
# Create the same model architecture
model = SimpleModel()
# Optional: Define a callback to monitor training
def loss_callback(loss, batch_id):
print(f"Batch {batch_id}: Loss = {loss:.4f}")
node = TrainingNode(
model=model,
server_address=('server_ip', 5555),
collect_metrics=True
)
node.train(loss_callback=loss_callback)
That's it! WIIIIIDE will handle the distributed training process automatically. For a complete working example, check out test_distributed.py.
WIIIIIDE handles training like a pro. Here's a quick look at how it works under the hood:
flowchart TD
A[CentralServer] -->|Distributes Batches| B[TrainingNode 1]
A -->|Distributes Batches| C[TrainingNode 2]
A -->|Distributes Batches| D[TrainingNode N]
B -->|Sends Gradients| A
C -->|Sends Gradients| A
D -->|Sends Gradients| A
subgraph Communication Flow
direction LR
E[Batch Distribution]
F[Gradient Aggregation]
end
A <-->|Socket Communication| G{Communication Mechanisms}
G -->|Pickle Serialization| H[Data Serialization]
G -->|ZLib Compression| I[Data Compression]
G -->|Struct Packing| J[Message Length Handling]
We're transparent about where WIIIIIDE is right now:
- Network Dependency: Requires a stable, high-bandwidth connection. (Especially for the central server)
- Security Considerations: Very minor forms of encryption are in place.
- Resource Management: Recommended to use on dedicated servers.
- Performance: We're continually improving and optimizing.