首页| JavaScript| HTML/CSS| Matlab| PHP| Python| Java| C/C++/VC++| C#| ASP| 其他|
购买积分 购买会员 激活码充值

您现在的位置是:虫虫源码 > 其他 > evbcorpus是英语越南语双语语料库

evbcorpus是英语越南语双语语料库

资 源 简 介

EVBCorpus is an English-Vietnamese Bilingual Corpus The EVBCopus contains over 10,000,000 words (10 million) from 15 bilingual books, 100 parallel English-Vietnamese / Vietnamese-English texts, 250 parallel law and ordinance texts, and 1,000 news articles. The composition, annotation, encoding and availability of the corpus are meant to facilitate developments of language technology and studies in bilingual terminology extraction, primarily for the English-Vietnamese-English language pair. English-Vietnamese Bilingual Corpus (EVBCorpus) The building EVBCorpus process includes four main steps: (1) collect data and align bitext at the paragraph level; (2) align bitext at the sentence level, (3) linguistic analysis and tagging; (4) annotate and correct corpus with toolkits. As result, the EVBCopus was aligned at the sentence level; and a part of this corpus containing 1,000 news articles was aligned semi-automatically at the word level. If you ar
VIP VIP
0.162777s