This article has reached scientific standards

0 /0      
Who?

This article still needs revisions

0 /0      
Who?
Research article

Hybrid Constructive Heuristics for the Critical Node Problem

Version 1 Released on 04 May 2015 under Creative Commons Attribution 4.0 International License


Authors' affiliations

  1.  Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA). INRIA - CNRS : UMR7503 - Université de Lorraine
  2.  Dipartimento di Informatica - Università di Torino

Keywords

  • Greedy algorithms
  • Network interdiction problems
  • Network optimization

Abstract

We consider the Critical Node Problem : given an undirected graph and an integer number $K$, at most $K$ nodes have to be deleted from the graph in order to minimize a connectivity measure in the residual graph. We combine the basic steps used in common greedy algorithms with some flavour of local search, in order to obtain simple hybrid heuristic algorithms. The obtained algorithms are shown to be effective, delivering improved performances (solution quality and speed) with respect to known greedy algorithms and other more sophisticated state of the art methods.

 

Introduction

The Critical Node Problem (CNP)  is to determine a set of vertices in a graph whose deletion results in a graph having the minimum pairwise connectivity.

To the authors' knowledge, the origin of the problem can be traced back to the so-called network interdiction problems  studied by Wollmer [22] and later by Wood [23], although these seminal papers focused on arc deletion. Due also to the renewed emphasis on security-related research in networks (see [9]) recently the attention has moved to node deletion problems. In particular, when the assessment of the robustness of communication network is considered (see  [11,6]), the deleted nodes in the solution of CNP, represent the critical node of the network. Further applications of CNP arise in different contexts. In [4] the CNP is stated in the context of detecting so-called “key players” in a relational network. In [2] and  [18] contagion control via vaccination of a limited number of individuals is considered: the nodes of the graph are potentially infected individuals and the edges represent contacts occurring between them.

In this paper we consider the CNP formulated as follows. Given an undirected graph $G(V,E)$ and an integer $K$, determine a subset of $K$ nodes $S\subseteq V$, such that the number of node pairs still connected in the induced subgraph $G[V\setminus S]$ \[f(S)= {\lvert} {\{ i,j\in V\setminus S\colon \text{$i$ and $j$ are connected by a path in $G[V\setminus S]$} \}}{\rvert}\] is as small as possible. The CNP is known to be NP-complete [2], polynomially solvable on trees [5] and other specially structured graphs via dynamic programming: graphs with bounded tree-width [1] and series-parallel graphs [15].

The CNP on general graphs has been tackled by branch and cut methods. A MILP model is proposed in [2]; a very large model relying on constraint separation is used in the branch and cut methods proposed in [17]; recently a compact MILP formulation has been proposed in [21].

A heuristic approach based on finding an indepent set, coupled with a 2-exchange local search phase is proposed in [2]. Metaheuristics — namely population-based incremental learning and simulated annealing — are studied and experimentally compared in [18]. An approximation algorithm based on a randomized rounding of the relaxed linear programming model is proposed in [19].

For a broad literature review, including problems with different metrics about graph fragmentation, we refer to comprehensive references given in other works, for example [16,9].

State of the art literature considers sets of benchmark instances whose size is up to few thousand nodes (but usually smaller). On the other side, instances coming from applications can be considerably larger. For this reason, computationally efficient algorithms capable to deal with such instances are still worth to investigate.

Our focus is on developing computationally efficient heuristics able to deal with large instances of CNP. Instead of adapting classical metaheuristics to the problem, we concentrate on ad-hoc greedy methods proposed for CNP, combining them in order to obtain new ad-hoc heuristics with some flavor of local search.

Simple constructive heuristics and their hybridization are discussed in Section 2.. Computational tests are reported and discussed in Section 3..

 

The algorithms

Observe that deleting any $S\subseteq V$ that is a vertex cover for $G$ completely disconnects the graph, giving $f(S)=0$ (only the complementary independent set survives). Since in general a vertex cover has more than $K$ nodes, such $S$ is infeasible for the CNP. Anyway it is possible to build a feasible solution by iteratively adding back  to the graph one node at a time. Based on this rule, Arulselvan et al. [2] propose Algorithm 1, that we call Greedy1.

An efficient implementation of Greedy1 (Algorithm 1) deals with the key step on line 3 by keeping trace of the connected components of $G[V\setminus S]$. It looks at all neighbours of all nodes in $S$ in order to determine the connected components that can be fused together and the dimension of the new component. If the connected components of the residual graph are stored, this requires at most $\mathcal{O}(|E|)$ operations; after the best node has been chosen, and added back to the graph (i.e. deleted from $S$), the other nodes in $V\setminus S$ are explored to update the component they now belong to. The complexity of identifying and/or merging components does not exceed $\mathcal{O}(|V|+|E|)$. These operations are repeated until $K$ nodes remain out of the graph, meaning less than $|V|-1-K$. The complexity of (heuristically, greedily) generating the initial vertex cover is bounded by $\mathcal{O}(|E|)$.

Greedy1 has the following fundamental weakness: if a node $k$ belongs to an optimal (or even only good) solution $S^\ast$ but it is not in the initial vertex cover, there will be no chance of bringing it into the solution. Consider for example the graph in Figure 1 with $K=1$. The optimal solution is obviously $S^\ast= {\{ 6 \}}$; if the minimal vertex cover computed at step 1 is $ {\{ 2,3,4,5,7,8,9,10 \}}$, Greedy1 is deemed to deliver the worst possible solution, leaving the graph with a single, large connected component.

 

Figure 1 Figure 1. Example

One might think about a completely specular approach, that — starting from the whole graph — sequentially applies remove  operations, leading to Greedy2 (see Algorithm 1).

This algorithm may seem to make perfectly sense, but at a closer look, it reveals a striking weakness. Indeed, it results in a completely random  choice of $i$, unless $G[V\setminus S]$ contains a so-called articulation point,  i.e. a node whose removal results in splitting the graph into two or more connected components. For example, even if Greedy2 easily handles the example of Figure 1, on the graph in Figure 2 with $K=2$ it is likely to deliver the worst possible solution — a single large connected graph, whereas the optimal choice would obviously be to delete $ {\{ 6,7 \}}$.

An efficient implementation of Greedy2 is a bit more tricky than Greedy1. The search for an articulation point is done through a modified graph visit, tracking the presence of articulation points and the dimension of the connected components rooted at it (see [10] for details). In the search for $K$ nodes to delete one applies $K$ times an algorithm of complexity $\mathcal{O}(|V|+|E|)$. The recent work of [20] also proposes an efficient implementation of algorithm Greedy2.

 

Figure 2 Figure 2. Example
In order to take advantage both of the efficient add-back step as well as of the detection of articulation points, we developed a hybrid approach, alternating the application of the two basic greedy steps: add-back  and remove  (see lines 3-4 of Greedy1 and Greedy2).

Furthermore, aiming at recovering from possibly wrong decisions, after finding a feasible solution, we allow the algorithm to move into the unfeasible region — with additional $\Delta_K$ add-back or remove steps — before returning to a feasible solution. The exploration of the solution space benefits of a somewhat stronger perturbation than that achieved by the basic add-and drop move, still possibly preserving some good structure of the previously generated feasible solutions. The resulting algorithm, that we call Greedy3 (see 3), acquires flavor of local search while retaining most of the simple structure of greedy algorithms. That's why we still call it “greedy”.

Greedy3 alternates stages where a number of add-back steps is applied with stages where the remove operation takes place. In each of these stages, a feasible solution is found, and if it is better of the actual best solution found, it is saved as record. As a stopping criterion, a maximum number of generated feasible solutions $N_S$ are allowed to be evaluated, where $N_S$ is an exogenous parameter (set by the user).

As a matter of factm Greedy3 is considerably less myiopic than the other two, since it has indeed chances of undoing wrong choices taken at early iterations. Experimental comparisons show that a single run of Greedy3 is considerably more effective than a multistart application of the basic greedy algorithms.

Following the same logic we also define Greedy4 (see Algorithm 4) that starts, similarly to Greedy2, with a remove stage.

Dynamically restarting the search.

Computational experience showed that our mechanism of sequentially adding and removing $\Delta_K$ nodes around the feasible solution, implemented in Greedy3 and Greedy4, is not always sufficient to avoid being trapped in “bad” solutions. Therefore, when a solution does not improve after a given number of iterations $\mathcal{I}$, we allow the algorithm to perform a complete restart (i.e., generating a new vertex cover from scratch). We kept the same stopping criterion, allowing $N_S$ feasible solutions to be evaluated. This new algorithm is called Greedy3d (that is Greedy3 with dynamical restart). Performances improve, at the modest cost of handling the additional parameter $\mathcal{I}$. A similar modification was applied to Greedy4, leading to a dynamically restarted procedure Greedy4d.

 

Computational results

In order to experimentally evaluate the algorithms we relied on two test beds. The first set is build on the basis of the 16 graphs used and in [18]. They are divided into 4 types of graphs with different structures, and with number of nodes ranging from 250 to 5000 (see Table I).

We got several instances of CNP by choosing various values for the parameter $K$ from the set $\{5,10,20,30,50,75,100,150,200,300,500,1000,1500,3000\}$ (always keeping $K<|V|$); we decided however to delete each instance for which we can certify a solution with 0 value, regarding such instances as easy ones. Therefore we got a set of 159 instances — we label it “set 1” in the following.

 

Table I. Sizes of the graphs from [18] from which set 1 is derived.
In order to obtain larger instances we downloaded example graphs from The Stanford Large Network Dataset Collection (SNAP)1 — mainly graphs from the Internet peer-to-peer networks category and the Collaboration networks category. These instances represent respectively 8 networks of computers connected to the internet and exchanging files, plus 5 networks of physicists collaborating around the world and linked by their published works on the website www.arxiv.org. The size of such graphs is reported in Table II.

Given the large number of nodes and the substantial difference between the graphs' dimensions (up to a factor of 6), we choose to select values of $K$ as fractions of the number of nodes $|V|$. In practice we choose five values of $K$ for each graph. The two types of graphs do not have the same density, therefore, in order to obtain interesting instances, we set the values for $K$ as: \[ \text{peer-to-peer:}\quad K\ \in\ \{0.01|V|,0.05|V|,0.1|V|,0.15|V|,0.2|V|\}. \] \[ \text{scientific collaborations:}\quad K\ \in\ \{0.1|V|,0.2|V|,0.3|V|,0.4|V|,0.5|V|\}. \] Larger values of $K$ easily lead to completely disconnected graphs with an optimum that is easily detected by all the algorithms. The total number of obtained instances in the test bed is 65 — we call it “set 2” in what follows.

 

Table II. Graphs taken from SNAP from which set 2 is derived.

We compare the relative performances of the various algorithms by means of performance profiles. The performance profile for each algorithm $A$ on a collection of instances $T$ is the function defined by \[p_A(n)=\frac{ {\lvert} \{I\in T: A(I)\leq 2^n\text{BEST}(I)\}{\rvert}}{ {\lvert} T{\rvert}}\] where $A(I)$ is the result of algorithm $A$ on instance $I$ and $\text{BEST}(I)$ is the best result obtained in the test campaign (with all the considered algorithm) for instance $I$; the point $(n,p(n))$ on the curve denotes the fraction of tested instances for which algorithm $A$ delivered a solution with a relative error 2 less then or equal to $2^n-1$ with respect to the best result found. Note that the curve works with a logarithmic scale; particularly, the relative error obtained by the algorithm grows exponentially as a function of $n$:

When plotting the curves for two algorithms $A$, $A'$, algorithm $A$ can be considered better than $A'$ if the performance profile of $A$ lies at north-west of the profile of $A'$. See [7] for a detailed introduction.

All tests were performed on a server equipped with two AMD Opteron 8425HE processors, 2.1 GHz clock and 16 GB RAM running Linux, the code is developped in C++.

As for the algorithms' parameterswe set values for $\Delta_K$ and $\cal I$ after a few preliminary experiments. We chose $\Delta_K=K/2$ for all the new algorithms and ${\cal I}=5$ for the dynamic restarted versions. This choice gave satisfactory results without asking for an excessive effort in calibration.

Comparing the basic approaches.

We tested Greedy3 and Greedy4 all instances in set 1, allowing a single run on each instance, fixing for both $N_S=60$.

In order to fairly compare Greedy3 and Greedy4 against Greedy1 and Greedy2, we ran Greedy1 and Greedy2 on a multistart basis, allowing 60 runs for each algorithm and keeping the best solution delivered.

The overall results are presented in the performance profile shown in Figure 3.

 

(a) complete curves(b) initial part of the same curves
Figure 3 Figure 3. Performance profiles for Greedy1, Greedy2, Greedy3, Greedy4. Tests performed on set 1.

Algorithm Greedy2 exhibits by far the worse behavior, being strongly outperformed by all the others. Greedy3 and Greedy4 offer the best performances with Greedy3 being slightly better. Although Greedy1 performances seem not too far from Greedy3 and Greedy4, a relevant difference is remarkable on the first part of the profiles, meaning that Greedy3 and Greedy4 are actually significantly better at delivering best solutions, with Greedy1 being only able to deliver a best solution in $20\%$ of the tests, while Greedy3 and Greedy4 deliver the best solution in approximately $35\%$ of the tests. Furthermore, if we look more into detail (see Figure3(b)), we can observe that accepting a relative error smaller then 10%, Greedy3 and Greedy4 are able to “solve” 90% of the instances, while Greedy1 only around 70%.

We also ran the same tests with all the algorithms being stopped after the same amount of running time. Each instance was solved by Greedy3 with $N_S=60$, keeping track of the running time. The same amount of CPU time was allotted to the multistart application of the other algorithms. We got completely similar results, with also a slightly improved performance record for Greedy3 and Greedy4.

Dynamic restart.

We tested Greedy3d and Greedy4d with a single run on each instance of set 1, with $N_S=60$; Greedy1 and Greedy2 were tested in a multistart fashion as described above over 60 runs. The results over set 1 are summarized by the performance profiles in Figure 4.

 

Figure 4 Figure 4. Performance profiles on set 1.

The profiles show that Greedy1 and Greedy2 are strongly outperformed by Greedy3d and Greedy4d, while also Greedy3 and Greedy4 are outperformed by Greedy3d. Hence adding the dynamic restart to Greedy3 proves to be fruitful. It is interesting to note that adding the dynamic restart to Greedy4 is not as beneficial as in the Greedy3 case.

Comparison with previous approaches.

Since we used graphs generated in previous works on the CNP [18], we will compare the results of our algorithms Greedy3d and Greedy4d with those delivered by the metaheuristic ( Simulated Annealing and Population- Based Incremental Learning) algorithms presented in [18]. Results for these two algorithms are presented in the cited paper for 30 runs, with information on the best and worst results as well as the average result for the objective. We ran a single execution of our algorithms with $N_S=30$. We compare our results with the best results found by SA and PBIL3 . The results are summarized in Table III. We warn the reader that the tests for SA and PBIL reported in [18] ran on a different machine (Intel i7-2600 K, 3.4 GHz clock, 8 GB RAM). The cpu times reported in Table III for SA and PBIL are the cumulative times for 30 runs, with the objective function value being the best value delivered over all such runs. It is interesting to observe how Greedy3d and Greedy4d find the best results in all cases except one, often with a large gap compared to SA and PBIL. We do not aim at a fine-grained comparison, but the large differences in the solution quality and computation times cannot be explained only by the different machines. Greedy3d and Greedy4d can then outperform metaheuristics like SA and PBIL, that perform extensive exploration of the solution space. Hence we take the figures of Table III as an indication of the effectiveness of our algorithms.

 

Table III. Results of the algorithms SA and PBIL from [18], Greedy3d and Greedy4d presented here, on the graphs introduced by [18]. The best of four results is displayed in bold font.

Results on large real instances.

Figure 5 contains performance profiles drawn for the testing of Greedy3, Greedy4, Greedy3d and Greedy4d on the instance set named set 2. Algorithms Greedy3d and Greedy4d appear to strongly dominate Greedy3 and Greedy4. Also, for this test set Greedy4d shows much better performances than those obtained on set 1. Hence on set 2 Greedy4 strongly benefits from the addition of the dynamic restart feature. By examining the behavior of the algorithms on some instances, we explain this phenomenon as follows. Our algorithms, although incorporating some flavor of neighborhood search through perturbating the solutions, they are still — and this was in the scope of the research — greedy-like, in the sense that the neighborhood exploration is kept limited. The first feasible solution generated by Greedy4 is basically obtained by an application of Greedy2, which has proved to offer really poor performances. A very bad solution generated at the first stage is unlikely to be so strongly improved in successive stages. The dynamic restart offers a chance to restart the search from a (possibly, very ) different solution, especially on a large graph. On the other hand, Greedy3 benefits from a first stage that applies the logic of Greedy1, which is much more effective than Greedy2, hence the dynamic restart has a still a beneficial but somehow milder impact.

On the large instances of set 2, Greedy3d and Greedy4d offer the best performances. Greedy4d offered a substantially larger number of best solutions found on set 2 — on more than $80\%$ of the tests, while Greedy3d only his the best results in $45\%$ on the instances. Nevertheless, if we accept a relative error up to $3\%$ both algorithms offer such precision on $96\%$ of the instances, while for the remaining instances Greedy4d can have a relative error of over $100\%$ while Greedy3d remains within a relative error of $20\%$.

 

(a) complete curves(b) initial part of the same curves
Figure 5 Figure 5. Performance profiles obtained on set 2, for Greedy3, Greedy4, Greedy3d, Greedy4d.

Conclusion

Building on top of basic greedy algorithms, we proposed “hybrid” heuristic algorithms in order to tackle the Critical Node Problem by combining the two most basic greedy rule. While more straightforward greedy algorithms start from the original graph or a completely fragmented graph and respectively delete or add back the nodes one by one, we chose to alternatively add and delete nodes around a feasible solution, in order to get out of local minima. The results are clearly in favour of our new approach, especially when allowing for a dynamic restarting of the algorithm after a certain amount of non-improving iterations. We further validated these results by applying the algorithms on larger graphs representing peer-to-peer networks or scientific collaborations. The results are encouraging and can be taken as a hint that greedy methods for the CNP should be applied to practical applications such as computational biology, as suggested by [3], or any field that requires to find maximal fragmentation of large networks. One possible step forward would be the inclusion of global information on the graph at hand in order to help minimize the bad random choices of the greedy algorithms.

References

  1. B. Addis, M. Di Summa, and A. Grosso. Removing critical nodes from a graph: complexity results and polynomial algorithms for the case of bounded treewidth. Discrete Applied Mathematics, 16-17:2349–2360, 2013.
  2. A. Arulselvan, C. W. Commander, L. Elefteriadou, and P. M. Pardalos. Detecting critical nodes in sparse graphs. Computers & Operations Research, 36:2193–2200, 2009.
  3. V. Boginski and C. W. Commander. Identifying critical nodes in protein-protein interaction networks. In S. Butenko, W. A. Chaovalitwongse, and P. M. Pardalos, editors, Clustering Challenges in Biological Networks, pages 153–168. World Scientific Publishing, 2009.
  4. S. P. Borgatti. Identifying sets of key players in a network. Computational and Mathematical Organization Theory, 12:21–34, 2006.
  5. M. Di Summa, A. Grosso, and M. Locatelli. The critical node problem over trees. Computers and Operations Research, 38:1766–1774, 2011.
  6. Thang N Dinh and My T Thai. Precise structural vulnerability assessment via mathematical programming. In MILCOM 2011 – 2011 IEEE Military Communications Conference, pages 1351–1356. IEEE, 2011.
  7. Elizabeth Dolan and Jorge Moré. {Benchmarking optimization software with performance profiles}. Mathematical Programming, 91(2):201–13, 2002.
  8. M. Edalatmanesh. Heuristics for the Critical Node Detection Problem in Large Complex Networks. PhD thesis, Faculty of Mathematics and Science, Brock University, St. Catharines, Ontario, 2013.
  9. B. L. Golden and D. R. Shier, editors. Network Interdiction Applications and Extensions, 2014. Virtual issue on Networks.
  10. John Hopcroft and Robert Tarjan. Algorithm 447: Efficient algorithms for graph manipulation. Communications of the ACM, 16(6):372–378, June 1973.
  11. Dinh T. N., Y. Xuan, M. T. Thai, P. M. Pardalos, and Znati T. On new approaches of assessing network vulnerability: Hardness and approximation on approximation of new optimization methods for assessing network vulnerability. IEEE/ACM Transactions on Networking, 20:609–619, 2012.
  12. \texttt {http://snap.stanford.edu/data/}.
  13. Calculated as \[ \frac {A(I)- \text {BEST}(I)}{ \text {BEST}(I)} \].
  14. the values of $K$ for the FF graphs are different from those printed in [18] — they have been corrected in [8].
  15. S. Shen and J. Cole Smith. Polynomial-time algorithms for solving a class of critical node problems on trees and series-parallel graphs. Networks, 60(2):103–119, 2012.
  16. S. Shen, J. C. Smith, and R. Goli. Exact interdiction models and algorithms for disconnecting networks via node deletions. Discrete Optimization, 9:172–88, 2012.
  17. M. Di Summa, A. Grosso, and M. Locatelli. Branch and cut algorithms for detecting critical nodes in undirected graphs. Computational Optimization and Applications, 53:649–680, 2012.
  18. M. Ventresca. Global search algorithms using a combinatorial unranking-based problem representation for the critical node detection problem. Computers & Operations Research, 39:2763–2775, 2012.
  19. M. Ventresca and D. Aleman. A derandomized approximation algorithm for the critical node detection problem. Computers and Operations Research, 43:261–270, 2014.
  20. M. Ventresca and D. Aleman. A fast greedy algorithm for the critical node detection problem. In Combinatorial Optimization and Applications, volume 8881 of Lecture Notes in Computer Science, pages 603–612. Springer-Verlag, 2014.
  21. A. Veremyev, V. Boginski, and E. Pasiliao. Exact identification of critical nodes in sparse networks via new compact formulations. Optimization Letters, 8:1245–1259, 2014.
  22. R. Wollmer. Removing arcs from a network. Operations Research, 12:934–940, 1964.
  23. R. K. Wood. Deterministic network interdiction. Mathematical and Computer Modelling, 17:1–18, 1993.

Footnotes

 

1. http://snap.stanford.edu/data/

 

2. Calculated as \[\frac{A(I)-\text{BEST}(I)}{\text{BEST}(I)}\]

 

3. the values of $K$ for the FF graphs are different from those printed in [18] — they have been corrected in [8].