DARTS is a popular gradient-based method for Neural Architecture Search (NAS) . Many extensions have been introduced in the literature, resulting in various state-of-the-art models for many datasets. As such, DARTS can now be regarded as a family of methods. Most proposed extensions focus on improving DARTS' computational and memory demands and its effectiveness in generating competent architectures. Nonetheless, as with most NAS methods, DARTS is quite computationally expensive. Furthermore, despite the method's popularity, there is little research concerning its parallelization feasibility and the behavior of parallel DARTS methods. This paper studies the speedup, efficiency, and quality of a synchronous data-parallel DARTS scheme on the Fashion-MNIST dataset. We argue that although data-parallel methods can introduce noise to the search phase, this should not significantly affect the final results due to the pruning before extracting the final network. As a result, we achieve a speedup of 1.82 for two GPU workers and a 3.18 speedup for four GPU workers while retaining the same qualitative results as serially executing DARTS. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.