资 源 简 介
jTokeniser is a set of classes that provide a variety of tokenisers for your Java projects. Simple tokenisers such as WhiteSpaceTokeniser or StringTokeniser provide basic token extraction whereas RegexTokeniser and BreakIteratorTokeniser give more advantage possibilities for more thorough tokenisers that discard punctuation too. Recent additions include RegexSeparatorTokeniser that allows complex definition of token delimiters. Also a SentenceTokeniser has been provided for segmenting text into a set of sentences.
There is also a GUI frontend to experiment without having to code.