首页| JavaScript| HTML/CSS| Matlab| PHP| Python| Java| C/C++/VC++| C#| ASP| 其他|
购买积分 购买会员 激活码充值

您现在的位置是:虫虫源码 > Java > java开源爬虫

java开源爬虫

  • 资源大小:312.04 kB
  • 上传时间:2021-06-29
  • 下载次数:0次
  • 浏览次数:0次
  • 资源积分:1积分
  • 标      签: java 爬虫 开源

资 源 简 介

Light Crawler An Open Source Crawler for Java. Feature of LightCrawler list down below: LightCrawler can control the depth of the crawler. Crawler will stop at the pointed depth. LightCrawler is also Multi-Threads, Easily and Quickly to Build. LightCrawler can choose which url should be crawled and which should not be crawled by configing forbidden regex or allowed regex. LightCrawler can judge RSS Feed or HTML and choose the right parser automaticaly. LightCrawler fetcher can extract Title, Language, Encoding, ContentType, Md5, SimHash FingerPrint. That is important information for user. Fetch queue stored in memory. it"s fast in writing and reading. Example Single thread use: ``` String start_url = "http://www.###.c

文 件 列 表

lightcrawler-config.properties
LightCrawler-v1.08.jar
log4j.properties
VIP VIP
0.171131s