首页| JavaScript| HTML/CSS| Matlab| PHP| Python| Java| C/C++/VC++| C#| ASP| 其他|
购买积分 购买会员 激活码充值

您现在的位置是:虫虫源码 > Java > 一个java库包含一系列的tokenisers分手文本在其构成的词语。

一个java库包含一系列的tokenisers分手文本在其构成的词语。

  • 资源大小:83.13 kB
  • 上传时间:2021-06-29
  • 下载次数:0次
  • 浏览次数:0次
  • 资源积分:1积分
  • 标      签: java NLP tokenizer

资 源 简 介

jTokeniser is a set of classes that provide a variety of tokenisers for your Java projects. Simple tokenisers such as WhiteSpaceTokeniser or StringTokeniser provide basic token extraction whereas RegexTokeniser and BreakIteratorTokeniser give more advantage possibilities for more thorough tokenisers that discard punctuation too. Recent additions include RegexSeparatorTokeniser that allows complex definition of token delimiters. Also a SentenceTokeniser has been provided for segmenting text into a set of sentences. There is also a GUI frontend to experiment without having to code.

文 件 列 表

jTokeniser-2.0.jar
lib
swing-layout-1.0.jar
README.txt
VIP VIP
0.175677s