首页| JavaScript| HTML/CSS| Matlab| PHP| Python| Java| C/C++/VC++| C#| ASP| 其他|
购买积分 购买会员 激活码充值

您现在的位置是:虫虫源码 > 其他 > 一个引擎来解析HTML为结构化数据。

一个引擎来解析HTML为结构化数据。

  • 资源大小:31.38 kB
  • 上传时间:2021-06-30
  • 下载次数:0次
  • 浏览次数:0次
  • 资源积分:1积分
  • 标      签: html Parsing Record regex structured

资 源 简 介

Introduction Html-structured-engine is an engine to get URL, follow links, parse html to get structured record data. It is a crawler to get targeted contents directly, so it need not a big storage to save the whole html content. Configurable It is configurable, with a set of rules and patterns, the engine can follow links to get data. It can get more records within one page, or combine data in more pages to one record, or get paged data from pages. The caller can pass-in parameters, headers or cookies etc. You can pass in rules and patterns when you call html-structured-engine, or save the rules and patterns and refer it by alias name. Flexible You can use html-structured-engine as a simple RPC caller, to get structured data from URL directory. Or use it as a standalone service, to perform the task in schedule, and get the saved data later. Frame It has a little frame for choice. The frame can be used to
VIP VIP
0.166781s