I would use the Greedy
allocator rather than RoundRobin
. Greedy
doesn’t do anything to explicitly minimize inter-chip communication, but it typically ends up with lower inter-chip communication than RoundRobin
.
We also have some new allocators that explicitly minimize the inter-chip communication: GreedyComms
and PartitionComms
. They’re in this PR. PartitionComms
is the most effective, but does require that you install nxmetis
. Often, it’s the inter-chip communication that makes networks slow, so reducing that could help considerably.