Persistence probability is a statistical index for detecting one or more communities embedded in a network. However, despite its straightforward definition (the probability that a random walk remains in a group of nodes after a time period), it is seldom used, possibly because of the difficulty of developing an efficient algorithm to calculate it. Proposed here is a new mathematical programming model to find the community with the largest persistence probability. The model is formulated as integer fractional programming and then is reduced to mixed-integer linear programming with appropriate variable substitutions. Nevertheless, the exact problem is solved in a reasonable time only for small networks, so heuristic procedures are developed to approximate the optimal solution for large networks. First, a randomized greedy-ascent method is elaborated, taking advantage of a particular data structure to generate feasible solutions quickly. After analyzing the greedy output and determining where the optimal solution is eventually located, improvement procedures are implemented based on a local exchange but applying different long-term diversification principles, and based on tree neighborhood search and random restart. When applied to simulated graphs to determine the reliability and effectiveness of the proposed method, it accurately reproduces the clustering characteristics found in real networks. Finally, it is applied to two real networks, and comparing the results to those from two well-known community-detection procedures confirms that this index complements the other methods well.
Avellone, A., Benati, S., Grassi, R., Rizzini, G. (2023). On finding the community with maximum persistence probability. 4OR [10.1007/s10288-023-00559-z].
On finding the community with maximum persistence probability
Avellone, A;Grassi, R
;
2023
Abstract
Persistence probability is a statistical index for detecting one or more communities embedded in a network. However, despite its straightforward definition (the probability that a random walk remains in a group of nodes after a time period), it is seldom used, possibly because of the difficulty of developing an efficient algorithm to calculate it. Proposed here is a new mathematical programming model to find the community with the largest persistence probability. The model is formulated as integer fractional programming and then is reduced to mixed-integer linear programming with appropriate variable substitutions. Nevertheless, the exact problem is solved in a reasonable time only for small networks, so heuristic procedures are developed to approximate the optimal solution for large networks. First, a randomized greedy-ascent method is elaborated, taking advantage of a particular data structure to generate feasible solutions quickly. After analyzing the greedy output and determining where the optimal solution is eventually located, improvement procedures are implemented based on a local exchange but applying different long-term diversification principles, and based on tree neighborhood search and random restart. When applied to simulated graphs to determine the reliability and effectiveness of the proposed method, it accurately reproduces the clustering characteristics found in real networks. Finally, it is applied to two real networks, and comparing the results to those from two well-known community-detection procedures confirms that this index complements the other methods well.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.