As fundamental abstractions of network structure, graphs are everywhere, ranging from Internet topologies and biological networks to social networks. Studying graphs is critical to understanding the fundamental processes in these networks, and of practical implications in real world applications. In the past, most studies focused on small or synthetic graphs. Recently, more big real graphs have become accessible, and bring a number of new challenges. First, big real graphs are much larger in scale, posing new challenges to query processing. Second, they show much higher level of dynamics over time, providing opportunities to validate and modify existing models. Third, structures of big real graphs are very different from small or synthetic graphs studied in prior work, and often contain sensitive information with privacy risks.
In this defense, I will present our solutions to address two practical problems in dealing with big real graphs: analyzing dynamics in online social networks, and preventing data leakage in sharing graph datasets. First, I will describe my work on analyzing dynamics in a large online social network at multiple scales. We show that a number of interrelated processes drive dynamics in social networks, while many of the prior studies focused on capturing dynamics via a single process. In this work, I am interested in the question “how are individual user dynamics influenced by processes at different scales?” To answer this question, I conduct detailed analysis on dynamics in the largest Chinese online social network at three levels of granularity, including individual user level, user community level, and global network level. At each level, I make a number of intriguing observations. Second, I will introduce graph watermarks, a novel solution to strengthen data owners’ control on the graph datasets shared with authorized collaborators. Graph watermarks are small unique subgraphs tailored for a given graph and a secure user key, and when embedded in a graph, serve as a deterrent against data leakage. In this work, I design robust and efficient schemes to create, embed, and extract watermarks. Moreover, I develop techniques to defend against potential attacks. Through theoretical and empirical analysis on big real graphs, the proposed graph watermark system produces unique, robust watermarked graphs with low structural distortion.