首页| JavaScript| HTML/CSS| Matlab| PHP| Python| Java| C/C++/VC++| C#| ASP| 其他|
购买积分 购买会员 激活码充值

您现在的位置是:虫虫源码 > 其他 > 跨平台工具箱中提取、过滤、对齐和将多语言文档的文本数据转换为统计机器翻译系统的并行训练语料。

跨平台工具箱中提取、过滤、对齐和将多语言文档的文本数据转换为统计机器翻译系统的并行训练语料。

资 源 简 介

**Media filter graph metaphor * Workflow manager for parallel language data * Configuration-driven, modular filters * Reusable plug-in architecture * Standardized base-classes** Statistical machine translation SMT is growing from an academic novelty to a commercially viable capability. High quality parallel linguistic corpora drive SMT"s high quality translations. If you are looking to transform your existing asset of translation memories (and other parallel language data) into valuable training corpus that can drive new, accurate SMT operations, this tool is for you. This tool box provides a common framework, reusable filtering interfaces and aligned document work-flow to manage the transformation of ad-hoc data in thousands of documents with millions of sentence pairs into an catalogued set of parallel language corpora. This common framework can manage the work-flow for any open-source NLP, such as sentence breaking, word segmentation (e.g. MeCab for Japanese text)
VIP VIP
0.176313s