C. Peng, Z. Zhang, K. Wong, X. Zhang, D.E. Keyes
IOS Press, 21, pp. 1463-1485, (2017)
Community detection in graphs is widely used in social and biologicalnetworks, and the stochastic block model is a powerful probabilistic toolfor describing graphs with community structures. However, in the era of“big data,” traditional inference algorithms for such a model are increas-ingly limited due to their high time complexity and poor scalability. Inthis paper, we propose a multi-stage maximum likelihood approach to re-cover the latent parameters of the stochastic block model, in time linearwith respect to the number of edges. We also propose a parallel algorithmbased on message passing. Our algorithm can overlap communication andcomputation, providing speedup without compromising accuracy as thenumber of processors grows. For example, to process a real-world graphwith about 1.3 million nodes and 10 million edges, our algorithm requiresabout 6 seconds on 64 cores of a contemporary commodity Linux cluster.Experiments demonstrate that the algorithm can produce high qualityresults on both benchmark and real-world graphs. An example of findingmore meaningful communities is illustrated consequently in comparisonwith a popular modularity maximization algorithm.