Multi-GPU and Distributed Workflow
EvoX has experimental support for distributed workflows, allowing you to run any normal evolutionary algorithms across multiple GPUs or even multiple machines. This can significantly speed up the optimization process, especially for time-consuming problems.
How to use
To use the distributed workflow, you need to set up a few things:
- Make sure you have manually fix the seed of the random number generator.
torch.manual_seed(seed)
# Optional: set the seed for numpy
np.random.seed(seed)
# Optional: use deterministic algorithms
torch.use_deterministic_algorithms(True)
Important: Make sure to set the seed for all random number generators before any torch or numpy operations. This ensures that the random number generator is in a known state before any operations are performed.
- Use the
torch.distributedortorchruncommand to launch your script. For example:
torchrun
--standalone
--nnodes=1
--nproc-per-node=$NUM_GPUS
your_program.py (--arg1 ... train script args...)
Tip:
torchrunis the recommended way to launch distributed torch programs. For more information, see the PyTorch documentation.