I have a sub-network of ensembles and passthrough nodes. Upon including it in an SNN, it takes a lot of time to compile. I followed @xchoo suggestion here, however it doesn’t help my case and takes forever to compile.
Few extra info about the sub-network - if I remove the ensembles and associated connections, i.e. only passthrough nodes remain… then it doesn’t take much time to compile (less than an hour compared to around 6 hours with ensembles in). So I guess… the presence of ensembles (and associated connections) in my sub-network is the root cause of significantly increased compilation time, and probably I can’t get past it.
So, with a larger/deeper SNN, I expect the compilation time to shoot off to days! I was wondering if there’s a way in NengoDL to save the intermediate stages of compilation so that I can resume the compilation as per my convenience? Please let me know!
Can you provide some details about the network you are trying to train? Like, how big are the ensembles, how many connections are there? Are they neuron-to-neuron, or decoded connections? Etc.
Hello @xchoo! Thank you for looking into it. I have messaged you the code and details. Please let me know if any link is inaccessible to you or if you need any more info.
I believe for NengoDL, the compilation step is a 2-step process. The first thing that is done is the network operators are optimized to try and group as many of the ops together as possible. As far as I’m aware, this step is not parallelized. The second step is handled by TensorFlow (in the backend), so parallelization would depend on whether TensorFlow’s compile function is in itself parallelizable (which from my quick search, it is not).
Oh…, then it looks like I am stuck with serial compilation. In case you got some time to look into the network I messaged you, please let me know you suggestions .
I took a look at the network, and there isn’t much to it. It’s three ensembles of 2 neurons each, so it shouldn’t take a long time to compile on any machine (it took less than a second on my machine). Is there something else you are doing with the network? Are you assigning more neurons to the ensembles? If so, how many? Is there another network that this is embedded, etc.?
That’s correct. An individual deployment of one such network doesn’t take much time to compile. But when there are many such networks embedded to the main SNN, I guess it then takes time to compile. Moreover, I have to connect each group of neurons from the previous Ensemble to each such network. Does these individual neuron to input node connections cause an increase in compile time? Please note that in the standalone network, each input (passthrough) node is connected to two other components => each individual neuron is connected to two other immediate components.
Other than the above, I am not modifying the standalone network in any way. Although, if the above mentioned connections are a necessity, is there any way to save the incomplete compiled models intermediary (and by extension… resume the compilation again after loading the incomplete compiled models)? I am just not able to access the shared compute resources with more than 5 days or so. Please help!
I believe your issue with the compile time is from the large amount of connections that are made by your main SNN. Do you have an idea of how many connections there are in total in your network?
One thing you’ll want to try to debug the build time is to try and build the network in regular Nengo (i.e., non-NengoDL). How long does that take? If it takes as much time as NengoDL, then it is likely that the number of connections in your network that is causing the issue.
You can also try to reduce the number of connections by removing the passthrough nodes. For your subnetworks, you can write it as a python class and and a connect function that takes in the pre and post ensembles, and it will then make the appropriate connections without using the passthrough node.
No. Unfortunately, there is no way to to save a partially compiled model. As a point of comparison, the Spaun model (1.5 million neurons, using the old Nengo 1.4 codebase) took about half a day on the CanadaCompute cloud to build. So, there must be something about your network that is causing the excessive compile times.