WPCrawler

Project Url: johnhany/WPCrawler
Introduction: a web crawler for single WordPress site
More: Author   ReportBugs   
Tags:

针对单个 WordPress 网站的网络爬虫程序

使用的开源类库如下:

Apache HttpComponents 4.3

HTML Parser 2.0

MySQL Connector/J 5.1.27

使用 UTF-8 编码以记录中文标签

使用 XAMPP 默认 MySQL 端口 localhost:3306

需要本地 XAMPP 环境

下一次更新会加入统计每篇文章所使用的标签的功能

可以在我的博客内阅读详细原理:

http://johnhany.net/2013/11/web-crawler-using-java-and-mysql/

(博客空间是新近开通的,如果访问时出现问题烦请告知,我会想办法解决)

=========

a web crawler for single WordPress site

open source projects that I am using:

Apache HttpComponents 4.3

HTML Parser 2.0

MySQL Connector/J 5.1.27

Need XAMPP environment.

The program assume that there is a database called "crawler" in your localhost with port 3306.

Analyzing tags for each article will be added in the next update.

You can read about this in my blog:

http://johnhany.net/2013/11/web-crawler-using-java-and-mysql/

My blog is new and yet unstable. If you have any problems entering my blog, please notify me:)

Apps
About Me
GitHub: Trinea
Facebook: Dev Tools