Rapid growth in the amount of data available on social-network sites has made information retrieval increasingly
challenging for users. We proposed Combinational Collaborative Filtering (CFF) to perform personalized community
recommendations by considering multiple types of co-occurrences in social data at the same time. This filtering
method fuses semantic and user informaiton, then applies a hybrid training stragety that combines Gibbs sampling
and Expectation-Maximization algorithm. To handle the large-scale dataset, parallel computing is used to speed up
the model training. Through an empirical study on the Orkut data set, we show CCF to be both effective and scalable.
Publications
Combinational Collaborative Filtering for Personalized Community Recommendation
Wen-Yen Chen, Dong Zhang, Edward Chang
ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining (KDD)
Las Vegas, NV, August 2008 (10% accepted).
[PDF (471KB)]
Spectral clustering algorithm has been shown to be more effective in
finding clusters than some traditional algorithms. However, spectral
clustering suffers from a scalability problem in both memory use
and computational time when the size of a data set is large. To perform
clustering on large data sets, we investigate various ways of approximating
the dense similarity matrix. We compare one by sparsifying
the matrix with another by the Nystrom method. We then pick the
strategy of sparsifying the matrix via retaining nearest neighbors and investigate
its parallelization. We parallelize both memory
use and computation on distributed computers. Through an empirical
study on a large document data set of 193,844 instances and a
large photo data set of 2,121,863, we demonstrate that our parallel
algorithm can effectively alleviate the scalability problem. A short version
appears at ECML/PKDD 2008.
Submitted
PSC: Parallel Spectral Clustering
Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, Edward Chang
[PDF (2.4MB)]
Publications
Parallel Spectral Clustering
Yangqiu Song, Wen-Yen Chen, Hongjie Bai, Chih-Jen Lin, Edward Chang
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
(ECML/PKDD)
Antwerp, Belgium, September 2008 (18% accepted).
[PDF (4.4MB)]
Fotofiti is a research plateform for automating semantic annotation of digital photographs.
It not only provides a web-based user interface for managing social networks, events and photographs,
but also makes good use of a variety of metadata geared towards image classification and similarity assessment.
Fotofiti is featured with real-time online semantic annotation using global features from both content and context.
A manual annotation web interface is created to provide training examples for our classifier. Classification
experiments using various learning techniques were performed on a real-world data-set. Additionally, a scalable landmark
recognition system which utilizes local features is discussed.
Publications
A Scalable Service for Photo Annotation, Sharing, and Search
Ben Lee, Wen-Yen Chen, Edward Chang
ACM Int'l Conference on Multimedia (MM)
Santa Barbara, CA, October 2006.
[PDF (4.0MB)]
Fotofiti: Web Service for Photo Management
Ben Lee, Wen-Yen Chen, Edward Chang
ACM Int'l Conference on Multimedia (MM)
Santa Barbara, CA, October 2006.
[PDF (262KB)]
Fotowiki is a wiki-based map service that integrates visual and textual information with map. Fotowiki divides
a geographical area into sub-areas. An individual responsible for providing information about a sub-area enters
collected data into a wiki page. Fotowiki uploads distributed wiki-pages, and overlays the information on the map.
In addition to the traditional aerial images provided by the Google map, Fotowiki propagates both the street-level
views of the surrounding area and 360-degree panorama tour of a spot to the map. More importantly, the fine-grained
information about a particular location, provided by its incentive information collection strategy, can substantially
improve the usefulness of information.
Publications