Users of social networking services can connect with each other by forming communities for online interaction. Yet as
the number of communities hosted by such websites grows over time, users have even greater need for effective community
recommendations in order to meet more users. In this paper, we investigate two algorithms from very different domains and
evaluate their effectiveness for personalized community recommendation. First is association rule mining (ARM), which
discovers associations between sets of communities that are shared across many users. Second is latent Dirichlet
allocation (LDA), which models user-community co-occurrences using latent aspects. In comparing LDA with ARM, we are
interested in discovering whether modeling low-rank latent structure is more effective for recommendations than directly
mining rules from the observed data. We experiment on an Orkut data set consisting of 492,104 users and 118,002 communities.
We show that LDA consistently performs better than ARM using the top-k recommendations ranking metric, and we analyze
examples of the latent information learned by LDA to explain this finding. To efficiently handle the large-scale data
set, we parallelize LDA on distributed computers and demonstrate our parallel implementation's scalability with varying
numbers of machines.
Publications
Collaborative Filtering for Orkut Communities: Discovery of User Latent Behavior
Wen-Yen Chen, Jon Chu, Junyi Luan, Hongjie Bai, Yi Wang, and Edward Y. Chang
International World Wide Web Conference (WWW)
Madrid, Spain, April 2009 (11% accepted).
[PDF (320KB)]
PLDA: Parallel Latent Dirichlet Allocation
Yi Wang, Hongjie Bai, Matt Stanton, Wen-Yen Chen, and Edward Y. Chang
International Conference on Algorithmic Aspects in Information and Management (AAIM)
San Francisco, CA, June 2009.
[PDF (217KB)]
Rapid growth in the amount of data available on social-network sites has made information retrieval increasingly
challenging for users. We proposed Combinational Collaborative Filtering (CFF) to perform personalized community
recommendations by considering multiple types of co-occurrences in social data at the same time. This filtering
method fuses semantic and user informaiton, then applies a hybrid training stragety that combines Gibbs sampling
and Expectation-Maximization algorithm. To handle the large-scale dataset, parallel computing is used to speed up
the model training. Through an empirical study on the Orkut data set, we show CCF to be both effective and scalable.
Publications
Combinational Collaborative Filtering for Personalized Community Recommendation
Wen-Yen Chen, Dong Zhang, and Edward Y. Chang
ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining (KDD)
Las Vegas, NV, August 2008 (10% accepted).
[PDF (471KB)]
Spectral clustering algorithm has been shown to be more effective in
finding clusters than some traditional algorithms. However, spectral
clustering suffers from a scalability problem in both memory use
and computational time when the size of a data set is large. To perform
clustering on large data sets, we investigate various ways of approximating
the dense similarity matrix. We compare one by sparsifying
the matrix with another by the Nystrom method. We then pick the
strategy of sparsifying the matrix via retaining nearest neighbors and investigate
its parallelization. We parallelize both memory
use and computation on distributed computers. Through an empirical
study on a large document data set of 193,844 instances and a
large photo data set of 2,121,863, we demonstrate that our parallel
algorithm can effectively alleviate the scalability problem. A short version
appears at ECML/PKDD 2008.
Submitted
PSC: Parallel Spectral Clustering
Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and Edward Y. Chang
[PDF (2.4MB)] (submitted to PAMI, under review)
Publications
Parallel Spectral Clustering
Yangqiu Song, Wen-Yen Chen, Hongjie Bai, Chih-Jen Lin, and Edward Y. Chang
European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD)
Antwerp, Belgium, September 2008 (18% accepted).
Also appears in Lecture Notes in Artificial Intelligence (LNAI), Vol. 5212, pp. 374-389, 2008.
[PDF (4.4MB)]
Fotofiti is a research plateform for automating semantic annotation of digital photographs.
It not only provides a web-based user interface for managing social networks, events and photographs,
but also makes good use of a variety of metadata geared towards image classification and similarity assessment.
Fotofiti is featured with real-time online semantic annotation using global features from both content and context.
A manual annotation web interface is created to provide training examples for our classifier. Classification
experiments using various learning techniques were performed on a real-world data-set. Additionally, a scalable landmark
recognition system which utilizes local features is discussed.
Publications
A Scalable Service for Photo Annotation, Sharing, and Search
Ben Lee, Wen-Yen Chen, and Edward Y. Chang
ACM Int'l Conference on Multimedia (MM)
Santa Barbara, CA, October 2006.
[PDF (4.0MB)]
Fotofiti: Web Service for Photo Management
Ben Lee, Wen-Yen Chen, and Edward Y. Chang
ACM Int'l Conference on Multimedia (MM)
Santa Barbara, CA, October 2006.
[PDF (262KB)]
Fotowiki is a wiki-based map service that integrates visual and textual information with map. Fotowiki divides
a geographical area into sub-areas. An individual responsible for providing information about a sub-area enters
collected data into a wiki page. Fotowiki uploads distributed wiki-pages, and overlays the information on the map.
In addition to the traditional aerial images provided by the Google map, Fotowiki propagates both the street-level
views of the surrounding area and 360-degree panorama tour of a spot to the map. More importantly, the fine-grained
information about a particular location, provided by its incentive information collection strategy, can substantially
improve the usefulness of information.
Publications