Abstract
Sampling from large graphs is an area which is of great interest, particularly with the recent emergence of huge structures such as Online Social Networks. These often contain hundreds of millions of vertices and billions of edges. The large size of these networks makes it computationally expensive to obtain structural properties of the underlying graph by exhaustive search. If we can estimate these properties by taking small but representative samples from the network, then size is no longer a problem.
In this paper we develop an analysis of random walks, a commonly used method of sampling from networks. We present a method of biassing the random walk to acquire a complete sample of high degree vertices of social networks, or similar graphs. The preferential attachment model is a common method to generate graphs with a power law degree sequence. For this model, we prove that this sampling method is successful with high probability. We also make experimental studies of the method on various real world networks.
For t-vertex graphs G(t) generated by a preferential attachment process, we analyze a biassed random walk which makes transitions along undirected edges {x,y} proportional to [d(x)d(y)]b, where d(x) is the degree of vertex x and b > 0 is a constant parameter. Let S(a) be the set of all vertices of degree at least ta in G(t). We show that for some b approx 2/3, if the biassed random walk starts at an arbitrary vertex of S(a), then with high probability the set S(a) can be discovered completely in ~O(t1-(4/3)a+d) steps, where d is a very small positive constant. The notation ~O ignores poly-log t factors.
The preferential attachment process generates graphs with power law 3, so the above example is a special case of this result. For graphs with degree sequence power law c>2 generated by a generalized preferential attachment process, a random walk with transitions along undirected edges {x,y} proportional to (d(x)d(y))(c-2)/2, discovers the set S(a) completely in ~O(t1-a(c-2)+d) steps with high probability. The cover time of the graph is ~O(t).
Our results say that if we search preferential attachment graphs with a bias b=(c-2)/2 proportional to the power law c then, (i) we can find all high degree vertices quickly, and (ii) the time to discover all vertices is not much higher than in the case of a simple random walk. We conduct experimental tests on generated networks and real-world networks, which confirm these two properties.
Original language | English |
---|---|
Title of host publication | Proceedings of the 21st World Wide Web Conference, WWW 2012, Lyon, France, April 16-20, 2012 (Companion Volume) |
Editors | Alain Mille, Fabien L. Gandon, Jacques Misselis, Michael Rabinovich, Steffen Staab |
Publisher | ACM |
Pages | 1007-1016 |
Number of pages | 10 |
ISBN (Print) | 978-1-4503-1230-1 |
DOIs | |
Publication status | Published - 2012 |