首页| JavaScript| HTML/CSS| Matlab| PHP| Python| Java| C/C++/VC++| C#| ASP| 其他|
购买积分 购买会员 激活码充值

您现在的位置是:虫虫源码 > Python > 一个模块化的,灵活的,可扩展的,多线程网络爬虫框架/应用Python写的

一个模块化的,灵活的,可扩展的,多线程网络爬虫框架/应用Python写的

资 源 简 介

HarvestMan is a modular, extensible and flexible web crawler program cum framework written in pure Python. HarvestMan can be used to download files from websites according to a number of customized rules and constraints. It can be used to find information from websites matching keywords or regular expressions. The final goal of the project is to develop a full-fledged semantic personal data mining platform which can be used to retrieve information from the Internet in a highly customizable manner, so that one can fetch information from the web the way he wants it, when he wants it. For this, HarvestMan project will provide support for Web 2.0 and 3.0 technologies such as RSS, RDF, OWL etc.

文 件 列 表

HarvestMan-2.0.5beta
dist
schema
build
.svn
pydocgen.sh
RELEASE-NOTES
Readme.txt
harvestman
ez_setup.py
MANIFEST
LICENSE.txt
setup.py
setup.cfg
doc
tarhm.py
deps
HarvestMan.egg-info
__init__.py
VIP VIP
0.252818s