Commit Graph

698 Commits (800f66c4cc7e1e4b3e485af5236e3c9b8d54f028)

Author SHA1 Message Date
yihua.huang 2a15bc0289 contributor 2014-06-04 22:27:16 +08:00
yihua.huang 5e8ca02ec6 contributor 2014-06-04 22:26:56 +08:00
yihua.huang db0195babb update version in docs 2014-06-04 17:35:31 +08:00
yihua.huang 5f8c3fd5c5 update version 2014-06-04 17:33:30 +08:00
yihua.huang 0e9042eefa update pom 2014-06-04 17:17:48 +08:00
yihua.huang 03170178c4 update pom 2014-06-04 17:01:37 +08:00
yihua.huang c83b74f0f4 update pom for deploy 2014-06-04 16:55:34 +08:00
yihua.huang 7a64847a3c Bugfix: selector does not works well in element #113 2014-06-03 20:03:33 +08:00
yihua.huang 8d67fd0357 change back return proxy from spider to httpclientdownloader #128 2014-05-28 08:08:51 +08:00
yihua.huang 40bf8ca58f change return proxy from spider to httpclientdownloader #128 2014-05-28 07:57:42 +08:00
yihua.huang 1f21d9cc14 spell mistake fix #128 2014-05-28 07:29:19 +08:00
Yihua Huang e310139d00 Merge pull request #128 from yxssfxwzy/proxy
多个代理的管理
2014-05-28 07:22:08 +08:00
yihua.huang b165090434 Bugfix:Type convert error in JsonPathSelector #129 2014-05-27 21:19:22 +08:00
yihua.huang 95bdb30296 update xsoup version to release #113 2014-05-27 20:46:48 +08:00
yihua.huang a5d1b56e44 fix ut #113 2014-05-27 18:07:53 +08:00
yihua.huang 3939074a23 Bugfix: nodes() only return the first element #113 2014-05-27 17:53:06 +08:00
yihua.huang 41c2ea9498 refactor of selectable cont' #113
1. remove lazy init of Html
2. rename strings to sourceTexts for better meaning
3. make getSourceTexts abstract and DO NOT always store strings
4. instead store parsed elements of document in HtmlNode
2014-05-27 17:34:19 +08:00
yihua.huang f9825c214a refactor selectable for html fragment #113 2014-05-27 16:00:51 +08:00
yihua.huang 03d26c169b Enhance auto charset detect #126
1. Only read from content once to fix stream closed exception
2. invite moco as server test
2014-05-26 17:45:30 +08:00
zwf c146e2c7b4 add proxy pool 2014-05-19 15:59:31 +08:00
zwf 07ea04223f change_gitignore 2014-05-19 15:56:22 +08:00
yihua.huang 21982d3460 remove cpdetector temporary #126 2014-05-14 23:52:27 +08:00
fengwuze fcbfb75608 修改自动从网页中获取字符的代码块,抽取出来成为单独的方法。 2014-05-14 19:14:42 +08:00
fengwuze 95494d3c4d 增加处理meta的逻辑。
遗留:
3、网页没有指定编码的情况下,需要采用cpdetector,但目前cpdetector这个在Maven的中央库里面没有,不清楚如何解决。
2014-05-14 14:53:54 +08:00
yihua.huang dde2d89bbe Ignore content in json when bracket when remove padding #124 2014-05-08 23:37:18 +08:00
Yihua Huang 2913da4763 Merge pull request #123 from gsh199449/master
Update JsonFilePipeline.java #122
2014-05-08 15:20:02 +08:00
yihua.huang 928f98dd93 auto create folder in JsonFilePipeline #122 2014-05-08 15:12:17 +08:00
GaoShen 5883ed93d7 Update JsonFilePipeline.java
JsonFilePipeline可以自动新建尚不存在的文件夹
2014-05-08 15:08:55 +08:00
Yihua Huang 4e65dac249 Merge pull request #121 from ywooer/master
创建指定编码的Writer
2014-05-06 20:14:35 +08:00
ywooer 259f0a16c5 Update FilePipeline.java 2014-05-06 18:33:00 +08:00
ywooer 26d38851b5 add charset to Writer 2014-05-06 18:28:50 +08:00
yihua.huang 7fbe18b8c0 implementation of PageMapper #120 2014-05-05 08:01:39 +08:00
yihua.huang 5dc9fe95a9 interface of PageMapper #120 2014-05-05 07:43:32 +08:00
yihua.huang 7668731f08 update version to snapshot 2014-05-05 07:03:55 +08:00
yihua.huang 5f6f489314 deperate in user manual 2014-05-03 06:29:37 +08:00
yihua.huang 81e6e772ac versions back to 0.5.1 2014-05-03 06:18:57 +08:00
yihua.huang dbebcbe44f docs 2014-05-03 06:14:31 +08:00
yihua.huang 358e906379 [maven-release-plugin] prepare for next development iteration 2014-05-03 00:00:13 +08:00
yihua.huang 470750fc0d [maven-release-plugin] prepare release WebMagic-0.5.1 2014-05-02 23:59:55 +08:00
yihua.huang fc3d2906b0 remove avalon from pom temporary 2014-05-02 23:47:14 +08:00
yihua.huang 01aec7e1ab extension point of geturl #118 2014-05-02 23:23:23 +08:00
yihua.huang ec1c2e8cbc test and so on 2014-05-02 23:19:11 +08:00
yihua.huang 4f22f1210e some bug fix #118 2014-05-02 20:38:49 +08:00
yihua.huang 186b90512e refactor redisscheduler #118 2014-05-02 20:24:15 +08:00
yihua.huang 56f033ce8d set setDuplicateRemover for chain api #118 2014-05-02 20:21:23 +08:00
yihua.huang d1140b9e29 add bloom filter for scheduler #118 2014-05-02 20:20:22 +08:00
yihua.huang 64293cba20 samples 2014-05-02 19:12:38 +08:00
yihua.huang bc1d14fed4 sample 2014-05-02 17:54:21 +08:00
yihua.huang 8e4814bdc5 fix path seperator 2014-05-02 17:06:34 +08:00
yihua.huang e8d4a9be2b fix remove duplicate error #117 2014-04-29 20:32:06 +08:00