I'm a 5th year Ph.D. candidate in the Department of Computer Science, University of California, Santa Barbara. I am currently working with Prof. Xifeng Yan and Prof. Kenneth S. Kosik. Before I came to UCSB, I obtained B.E. and M.S. in Computer Science from Northeastern University, China.
I'm mostly interested in data mining and machine learning (including deep learning). Particularly, my research has been focused on developing better knowledge extraction tools for sequence data (e.g., biological sequences, event streams, text corpus). In addition, I also have some experiences with bioinformatics research such as DNA sequences assembly, SNPs calling and gene expression analysis (sadly, none of these works got published T_T).
Find more details about me in my CV.
Conferences / Journals:
- Honglei Liu, Bian Wu, “Active Learning of Functional Networks from Spike Trains”, SIAM Int. Conf. on Data Mining (SDM 2017). [paper][supplementary materials][source code]
- Honglei Liu, Fangqiu Han, Hongjun Zhou, Xifeng Yan, Kenneth S. Kosik, “Fast Motif Discovery in Short Sequences”, Proc. of Int. Conf. on Data Engineering (ICDE 2016). [paper] [slides] [poster] [source code]
- Xiaochun Yang, Honglei Liu, Bin Wang, "ALAE: Accelerating Local Alignment with Affine Gap Exactly in Biosequence Databases", Proc. of Int. Conf. on Very Large Data Bases (VLDB 2012). [paper][source code]
- Honglei Liu, Xiaochun Yang, Bin Wang, Rong Jin, “Approximate Substring Query Algorithms Supporting Local Optimal Matching”, Journal of Frontiers of Computer Science and Technology, 2011. [source code]
- Honglei Liu, Jocelyne Bruand, “Off-target detection tool for strings with multi-level cache”, pending.
- Honglei Liu, Xiaochun Yang, Jiaying Wang, Bin Wang, “Biological sequence local comparison method capable of obtaining complete solution”, CN102750461, issued April 22, 2015.
- Honglei Liu, Xiangfei Meng, “An electric automobile battery replacing device”, CN202089042, issued Dec. 28, 2011.
Classification on heterogeneous sequence data with a deep learning approach
We are trying to address the problem of running classification on multiple groups of heterogeneous intra-dependent sequence sets, where there is no easy way to directly utilize raw sequences as inputs to train an end-to-end classification model. While carefully extracting some features by hand could partially solve the problem, this approach suffers from obvious drawbacks such as difficulty of generalizing. We address this challenge by proposing a framework with a deep learning approach.
Active learning of functional networks from spike trains[Github]
We consider the problem of how to accurately infer a functional network from neural event streams (spike trains), which is an essential task in many real-world applications such as diagnosing neurodegenerative diseases. We improve the accuracy of the inferred functional network by adopting an active learning framework that could intelligently generate and utilize interventional data.
Fast Specificity Checking
We study the problem of specificity checking: given a set of primer sequence pairs and a reference sequence, identify all the potential targets which are defined as regions in the reference sequence that could match with the primer pairs according to some user specified rules. We propose an algorithm that could dramatically reduce running time by reusing calculations and filtering spurious candidates with a pre-trained predictive model.
Fast Motif Discovery in Short Sequences[ICDE'16 Paper][Project homepage] [Github]
Sequence motif discovery is trying to find frequent patterns from a set of sequences, for which most of existing algorithms cannot scale well w.r.t. number of sequences and size of alphabet set. We propose an anchor-based clustering (ASC) algorithm that could group sequences containing the same motif together, thus reducing the running time of a very popular motif finding algorithm, MEME, from weeks to a few minutes with even better accuracy.
Jun. 2016 - Sep. 2016
Topics: Correlation Finding and Indexing for Extremely Large Scale Time Series Data
Mar. 2016 - Jun. 2016
Teaching Assistant, Advanced Data Mining (CS291K)
Topics: Neural Networks, CNN, RNN, LSTM, TensorFlow
Jul. 2015 - Sep. 2015
Bioinformatics intern, Illumina Inc.
Topic: Fast Specificity Checking for Multiplex PCR Primer Design
Jun. 2014 - Aug. 2014
Mentor, Research Mentorship Program (RMP), UCSB
Student topic: Review Rating Adjustment to Incorporate User Preferences
Jul. 2014 - Sep. 2014
Mentor, Research Internships in Science and Engineering (RISE) Program, UCSB
Student topic: Detecting Spam Emails using Machine Learning Algorithms
Intelligent carAn intelligent car that can navigate by itself and follow a road track. I was in charge of the software part. We won the first prize in a national competition. Check more photos here.
Acoustic positioning carA car that can locate its position by sending / receving sound wave signals and do a series of tasks. I was in charge of the software part. We won the second prize in a national competition with our design. Check more photos here.
- Nov. 2015 - Mar. 2016
Track Chair of Graduate Student Workshop on Computing (GSWC) 2016, UCSB
- 2013 - 2014
Graduate Student Representative for the Department of Computer Science, UCSB
- 2013 - 2014
Food Bank Committee Member, UCSB
- 2013 summer
International Student Orientation Assistant in OISS
Email: honglei [at] cs.ucsb.edu
Address: Rm 1413, Phelps Hall, University of California Santa Barbara, CA 93106-5110, USA