Constrained Community Detection in Social Networks
Volume 2, Issue 3 (2024), pp. 368–379
Pub. online: 28 April 2023
Type: Statistical Methodology
Open Access
Accepted
5 March 2023
5 March 2023
Published
28 April 2023
28 April 2023
Abstract
Community detection in networks is the process by which unusually well-connected sub-networks are identified–a central component of many applied network analyses. The paradigm of modularity quality function optimization stipulates a partition of the network’s vertexes that maximizes the difference between the fraction of edges within communities and the corresponding expected fraction if edges were randomly allocated among all vertex pairs while conserving the degree distribution. The modularity quality function incorporates exclusively the network’s topology and has been extensively studied whereas the integration of constraints or external information on community composition has largely remained unexplored. We define a greedy, recursive-backtracking search procedure to identify the constitution of high-quality network communities that satisfy the global constraint that each community be comprised of at least one vertex among a set of so-called special vertexes and apply our methodology to identifying health care communities (HCCs) within a network of hospitals such that each HCC consists of at least one hospital wherein at least a minimum number of cardiac defibrillator surgeries were performed. This restriction permits meaningful comparisons in cardiac care among the resulting health care communities by standardizing the distribution of cardiac care across the hospital network.
References
Abramowitz, M. and Stegun, I. A. (1964) Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover. MR0415956
Andrade, R. F. S., Rocha-Neto, I. C., Santos, L. B. L., de Santana, C. N., Diniz, M. V. C., Lobão, T. P., Góes-Neto, A., Pinho, S. T. R. and El-Hani, C. N. (2011). Detecting Network Communities: An Application to Phylogenetic Analysis. PLoS Computational Biology 7. https://doi.org/10.1371/journal.pcbi.1001131. MR2821650
Chung, F. R. K. (1997) Spectral Graph Theory. American Mathematical Society. MR1421568
Fortunato, S. (2010). Community detection in graphs. Physics Reports 486(3-5) 75–174. https://doi.org/10.1016/j.physrep.2009.11.002. MR2580414
Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99(12) 7821–7826. https://doi.org/10.1073/pnas.122653799. MR1908073
Hendrickson, B. and Kolda, T. G. (2000). Graph Partitioning Models for Parallel Computing. Parallel Computing 26(12) 1519–1534. https://doi.org/10.1016/S0167-8191(00)00048-X. MR1786938
Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Physical Review E 83 016107. https://doi.org/10.1103/PhysRevE.83.016107. MR2788206
Leskovec, J., Lang, K. J., Dasgupta, A. and Mahoney, M. W. (2009). Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics 6(1) 29–123. MR2736090
Mucha, P. J., Richardson, T., Macon, K., Porter, M. A. and Onnela, J.-P. (2010). Community structure in time-dependent, multiscale, and multiplex networks. Science 328(5980) 876–878. https://doi.org/10.1126/science.1184819. MR2662590
Newman, M. (2010) Networks: An Introduction. Oxford University Press, Inc., USA. https://doi.org/10.1093/acprof:oso/9780199206650.001.0001. MR2676073
R Core Team (2013). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.