Sequence data are ubiquitous in diverse domains such as bioinformatics, computational neuronal science, and user behavior analysis. As a result, many critical applications require extracting knowledge from sequences in multi-level. For example, mining frequent patterns is the central goal of motif discovery in biological sequences, while in computational neuronal science, one essential task is to infer causal networks from neural event sequences (spike trains). Despite the differences, most of existing knowledge extraction tools face new challenges posted by modern instruments. That is, as large scale and high resolution sequence data become available, we need knowledge extraction tools with better efficiency and higher accuracy.
In this talk, I will introduce my research on how to improve existing knowledge extraction tools to meet new challenges in terms of efficiency and accuracy. I will first present our work on scaling existing motif discovery algorithms. We propose an anchor based clustering algorithm that could universally improve the scalability of all the existing motif finding algorithms without losing accuracy at all. Second, I will discuss the problem of how to infer a functional network from spike trains with better accuracy. Our work improves the accuracy of the inferred functional network by adopting an active learning framework that could intelligently generate and utilize interventional data.