Abstract: This work explores a distributed computing setting where K nodes are assigned fractions (subtasks) of a computational task in order to perform the computation in parallel. In this setting, a well-known main bottleneck has been the inter-node communication cost required to parallelize the task, because unlike the computational cost which could keep decreasing as $K$ increases, the communication cost remains approximately constant, thus bounding the total speedup gains associated to having more computing nodes. This bottleneck was substantially ameliorated by the recent introduction of coded MapReduce techniques which allowed each node --- at the computational cost of having to preprocess approximately $t$ times more subtasks --- to reduce its communication cost by approximately $t$ times. In reality though, the associated speed up gains were severely limited by the requirement that larger $t$ and $K$ necessitated that the original task be divided into an extremely large number of subtasks. In this work we show how node cooperation, along with a novel assignment of tasks, can help to dramatically ameliorate this limitation. The result applies to wired as well as wireless distributed computing, and it is based on the idea of having groups of nodes compute identical parallelization (mapping) tasks and then employing a here-proposed novel D2D coded caching algorithm.