Measurement-calibrated Graph Models for Social Network Experiments

Alessandra Sala

Lili Cao

Christo Wilson

Robert Zablit

Haitao Zheng

Ben Y. Zhao

*Proceedings of the 19th International World Wide Web Conference (WWW 2010)
*

[Full Text in GZIP PS
Format, 337KB]

[Full Text in PDF Format,
527KB]

[Code and Instructions]

Paper Abstract

Access to realistic, complex graph datasets is critical to research on social networking systems and applications. Simulations on graph data provide critical evaluation of new systems and applications ranging from community detection to spam filtering and social web search. Due to the high time and resource costs of gathering real graph datasets through direct measurements, researchers are anonymizing and sharing a small number of valuable datasets with the community. However, performing experiments using shared real datasets faces three key disadvantages: concerns that graphs can be de-anonymized to reveal private information, increasing costs of distributing large datasets, and that a small number of available social graphs limits the statistical confidence in the results.

The use of measurement-calibrated graph models is an attractive alternative to
sharing datasets. Researchers can "fit" a graph model to a real social
graph, extract a set of model parameters, and use them to generate multiple
synthetic graphs statistically similar to the original graph. While
numerous graph models have been proposed, it is unclear if they can produce
synthetic graphs that accurately match the properties of the original
graphs. In this paper, we explore the feasibility of measurement-calibrated
synthetic graphs using six popular graph models and a variety of real
social graphs gathered from the Facebook social network ranging from 30,000
to 3 million edges. We find that two models consistently produce synthetic
graphs with common graph metric values similar to those of the original
graphs. However, only one produces high fidelity results in our
application-level benchmarks. While this shows that graph models can
produce realistic synthetic graphs, it also highlights the fact that
current graph metrics remain incomplete, and some applications expose graph
properties that do not map to existing metrics.