|
CIAA2003 |
START ConferenceManager |
An Efficient Pre-Determinization Algorithm
Cyril Allauzen and Mehryar Mohri
Presented at
Eighth International Conference on Implementation and Application of
Automata (CIAA 2003),
July 16-18, 2003
Santa Barbara, CA, USA
Abstract
We present a general algorithm that makes an arbitrary weighted transducer over the tropical semiring determinizable by inserting in it
transitions labeled with special symbols. After determinization, the special symbols can be removed or replaced with epsilon-transitions. The
resulting transducer, while not deterministic in general, contains less non-determinism. We report empirical results showing that our algorithm
leads to a substantial speed-up in large-vocabulary speech recognition. Our algorithm for inserting transitions makes use of on an efficient
algorithm for testing a general 'twins property', a sufficient condition for the determinizability of all weighted transducers over the tropical
semiring. It inserts new transitions just when needed to guarantee that the resulting transducer has the twins property and thus is
determinizable. It also uses a single-source shortest-paths algorithm over the min-max semiring for carefully choosing the positions for
insertion of new transitions to benefit from the subsequent application of determinization. These positions are proved to be optimal in a sense
that we describe.