资 源 简 介
This project provides a library for estimating storing large n-gram language models in memory and accessing them efficiently. It is described in this paper. Its data structures are faster and smaller than SRILM and nearly as fast as KenLM despite being written in Java instead of C++. It also achieves the best published lossless encoding of the Google n-gram corpus.
See here for some documentation.
News
July 16, 2014: The project has been migrated to github. Any future updates will happen there.
December 6, 2014: Since Google has deprec