Combinational Collaborative Filtering for Personalized Community Recommendation


Rapid growth in the amount of data available on social-network sites has made information retrieval increasingly challenging for users. We proposed Combinational Collaborative Filtering (CFF) to perform personalized community recommendations by considering multiple types of co-occurrences in social data at the same time. This filtering method fuses semantic and user informaiton, then applies a hybrid training stragety that combines Gibbs sampling and Expectation-Maximization algorithm. To handle the large-scale dataset, parallel computing is used to speed up the model training. Through an empirical study on the Orkut data set, we show CCF to be both effective and scalable.

Publications

  • Combinational Collaborative Filtering for Personalized Community Recommendation
    Wen-Yen Chen, Dong Zhang, Edward Chang
    ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining (KDD)
    Las Vegas, NV, August 2008 (10% accepted).
    [PDF (471KB)]

Parallel Sepctral Clustering


Spectral clustering algorithm has been shown to be more effective in finding clusters than some traditional algorithms. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate various ways of approximating the dense similarity matrix. We compare one by sparsifying the matrix with another by the Nystrom method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through an empirical study on a large document data set of 193,844 instances and a large photo data set of 2,121,863, we demonstrate that our parallel algorithm can effectively alleviate the scalability problem. A short version appears at ECML/PKDD 2008.

Submitted

  • PSC: Parallel Spectral Clustering
    Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, Edward Chang
    [PDF (2.4MB)]

Publications

  • Parallel Spectral Clustering
    Yangqiu Song, Wen-Yen Chen, Hongjie Bai, Chih-Jen Lin, Edward Chang
    European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
    (ECML/PKDD)

    Antwerp, Belgium, September 2008 (18% accepted).
    [PDF (4.4MB)]

Software (click here)


Fotofiti


Fotofiti is a research plateform for automating semantic annotation of digital photographs. It not only provides a web-based user interface for managing social networks, events and photographs, but also makes good use of a variety of metadata geared towards image classification and similarity assessment. Fotofiti is featured with real-time online semantic annotation using global features from both content and context. A manual annotation web interface is created to provide training examples for our classifier. Classification experiments using various learning techniques were performed on a real-world data-set. Additionally, a scalable landmark recognition system which utilizes local features is discussed.

Publications

  • A Scalable Service for Photo Annotation, Sharing, and Search
    Ben Lee, Wen-Yen Chen, Edward Chang
    ACM Int'l Conference on Multimedia (MM)
    Santa Barbara, CA, October 2006.
    [PDF (4.0MB)]


  • Fotofiti: Web Service for Photo Management
    Ben Lee, Wen-Yen Chen, Edward Chang
    ACM Int'l Conference on Multimedia (MM)
    Santa Barbara, CA, October 2006.
    [PDF (262KB)]

Demo (click here)


Fotowiki


Fotowiki is a wiki-based map service that integrates visual and textual information with map. Fotowiki divides a geographical area into sub-areas. An individual responsible for providing information about a sub-area enters collected data into a wiki page. Fotowiki uploads distributed wiki-pages, and overlays the information on the map. In addition to the traditional aerial images provided by the Google map, Fotowiki propagates both the street-level views of the surrounding area and 360-degree panorama tour of a spot to the map. More importantly, the fine-grained information about a particular location, provided by its incentive information collection strategy, can substantially improve the usefulness of information.

Publications

  • Fotowiki - Distributed Map Enhancement Service
    Wen-Yen Chen, Ben Lee, Edward Chang
    ACM Int'l Conference on Multimedia (MM)
    Santa Barbara, CA, October 2006.
    [PDF (1.9MB)]

Demo (click here)