Xin Jin

Xin Jin

PhD Candidate
Deparement of Computer Science
University of California, Santa Barbara

Email: xin_jin AT cs DOT ucsb DOT edu


I am a sixth-year Ph.D. candidate in Department of Computer Science, University of California, Santa Barbara, advised by Professor Tao Yang. My research interest is information retrieval.

My current projects centered on cache-conscious ranking optimization, multi-version search, all pairs similarity search and secure search. My work aims at improving the online ranking efficiency, accuracy and provide a secure search engine environment. [LinkedIn] [CV]


Education

University of California, Santa Barbara
Ph.D. in computer science, 09/2011-03/2017 (expected)

National University of Singapore
One year of graduate study in computer science department, 08/2010-05/2011

Peking University
B.S. in computer science, 09/2006-07/2010


Projects

A Comparison of Cache Blocking Methods for Fast Execution of Ensemble-based Score Computation
Machine-learned classification and ranking techniques often use ensembles to aggregate partial scores of feature vectors for high accuracy and the runtime score computation can become expensive when employing a large number of ensembles. The previous work has shown the judicious use of memory hierarchy in a modern CPU architecture which can effectively shorten the time of score computation. However, different traversal methods and blocking parameter settings can exhibit different cache and cost behavior depending on data and architectural characteristics. This project provides an analytic comparison of cache blocking methods on their data access performance with an approximation and proposes a fast guided sampling scheme to select a traversal method and blocking parameters for effective use of memory hierarichy. Our study shows that within a reasonable amount of time, the proposed scheme can identify a highly competitive solution that significantly accelerates score calculation.
[Project Publication]

Multi-version Search
This project will be focused on studying key challenges and cost-sensitive technical aspects in integrated archival and search support for managing large versioned datasets. The main tasks include efficient software architecture and optimization in detecting duplicated content on a cloud cluster architecture, fast multi-phase search with a hybrid index structure to exploit content similarity and query characteristics with top result ranking.
[Project Page]

Cache-Conscious Runtime Optimization for Ranking Ensembles
Multi-tree ensemble models have been proven to be effective for document ranking. Using a large number of trees can improve accuracy, but it takes time to calculate ranking scores of matched documents. This project investigates data traversal methods for fast score calculation with a large ensemble. We propose a 2D blocking scheme for better cache utilization with simpler code structure compared to previous work. The experiments with several benchmarks show significant acceleration in score calculation without loss of ranking accuracy.
[Project Publication]

Scalable Similarity Computing
Similarity comparison is one of the key operations in many data-intensive mining/search applications and cloud systems. Conducting similarity search on large datasets is time consuming and becomes more challenging when data is being updated continuously. This project studies scalable algorithms and system support for high performance similarity computing in modern computer architectures. Techniques for partitioning data, data layout design, computation balancing are developed to optimize communication, memory hierarchy performance, and computing resource usage. The project starts with incremental duplicate detection for web data analysis and search, and continues to work on similarity computing in other applications and cloud storage systems.
[Project Page]


Publications

Hybrid Indexing for Versioned Document Search with Cluster-based Retrieval
Xin Jin, Daniel Agun, Tao Yang, Qinghao Wu, Yifan Shen, Susen Zhao
ACM CIKM, October 2016
[PDF]

Comparison of Cache Blocking Methods for Fast Execution of Ensemble-based Score Computation
Xin Jin, Tao Yang, Xun Tang
ACM SIGIR, July 2016
[PDF]

Partitioned Similarity Search with Cache-Conscious Data Traversal
Xun Tang, Maha Alabduljalil, Xin Jin, Tao Yang
Transactions on Knowledge Discovery from Data TKDD 2015
[PDF]

Cache-Conscious Runtime Optimization for Ranking Ensembles
Xun Tang, Xin Jin (equal contribution with the first author), Tao Yang
ACM SIGIR, August 2014
[PDF]

Load Balancing for Partitioned-based Similarity Search
Maha Alabduljalil, Xun Tang, Xin Jin, Tao Yang
ACM SIGIR, August 2014
[PDF]

Temporal and Social Context based Burst Detection from Folksonomies
Junjie Yao, Bin Cui, Yuxin Huang, Xin Jin
AAAI, July 2010
[PDF]

Efficient Privacy-Preservation Top-K Search with Multi-Keyword Ranking
Daniel Agun, Xin Jin, Jinjin Shao, Stefano Tessaro, Tao Yang
in submission


Work Experience

Google
Summer Intern, 06/2016-09/2016
Search Team at Google Express

Yahoo!
Summer Intern, 06/2015-09/2015
Yahoo! Search Group

Electronic Arts
Summer Intern, 06/2014-09/2015
Player-User Relationship Management Group at EA Data Platform Team


Awards and Honors

Outstanding Publication Award, Computer Science Department, UCSB, 2016

Outstanding Teaching Assistant Award, Computer Science Department, UCSB, 2013