Commit Graph

300 Commits (37cb43b667b1896d01c3e5c3d6ab6929aa6fe61a)

Author SHA1 Message Date
yihua.huang 2400ff7e1a resovle conflict 2016-05-08 20:31:43 +08:00
yihua.huang b7f3c4bba0 Merge branch 'master' of git://github.com/hepan/webmagic into hepan-master 2016-05-08 20:27:47 +08:00
yihua.huang d8f978fd20 fix test in JsonPathSelectorTest #289 2016-05-08 19:32:03 +08:00
yihua.huang 61c28a0130 refactor on proxypool 2016-05-08 17:53:15 +08:00
yihua.huang b871b210c5 Merge branch 'proxy-strategy' of github.com:EdwardsBean/webmagic into EdwardsBean-proxy-strategy 2016-05-08 17:53:02 +08:00
yihua.huang b5413368de update ut 2016-05-08 16:23:41 +08:00
Jon 83c27ebbc4 增加IP代理认证功能 2016-05-08 16:17:58 +08:00
yihua.huang ca072c5575 fix URL regex in GithubRepoPageProcessor #305 2016-05-08 12:09:45 +08:00
hepan 89c6e52863 代理增加用户名密码认证 2016-04-13 15:16:57 +08:00
yihua.huang 7edfa26f90 complete javadoc 2016-01-21 18:34:07 +08:00
yihua.huang 8b90b91e33 complete some javadoc 2016-01-21 18:14:10 +08:00
yihua.huang 9c5716a543 complete javadoc 2016-01-21 18:05:12 +08:00
yihua.huang 81ce1ffc5f fix ignore 2016-01-21 12:36:49 +08:00
yihua.huang 93764fa2c9 ignore some test 2016-01-21 12:28:32 +08:00
yihua.huang 5706bb90af update xsoup to 0.3.1 2016-01-20 12:59:11 +08:00
yihua.huang 7586e3d75c add some test for github repo downloader 2016-01-19 08:05:53 +08:00
x1ny 90e14b31b0 修正FileCacheQueueScheduler导致程序不能正常结束和未关闭流
FileCacheQueueScheduler中开启了一个线程周期运行来保存数据但在爬虫结束后没有关闭导致程序无法结束,以及没有关闭io流。

解决方法:
让FileCacheQueueScheduler实现Closable接口,在close方法中关闭线程以及流。
在Spider的close方法中添加对scheduler的关闭操作。
2015-11-12 23:10:20 +08:00
yihua.huang 56e0cd513a compile error fix 2015-04-15 23:21:06 +08:00
yihua.huang c5740b1840 change assert #200 2015-04-15 08:32:08 +08:00
yihua.huang 67eb632f4d test for issue #200 2015-04-15 08:31:45 +08:00
高军 590561a6e4 修正site.setHttpProxy()不起作用的bug 2015-03-09 15:54:15 +08:00
edwardsbean 19474e4716 add SimpleProxyPool and IProxyPool 2015-02-28 17:50:10 +08:00
edwardsbean 4978665633 add retry sleep time 2015-01-21 13:30:02 +08:00
yihua.huang 8ffc1a7093 add NPE check for POST method 2015-01-13 14:10:00 +08:00
zhugw bc666e927d Update Site.java
setCycleRetryTimes的javadoc是这么说的:Set cycleRetryTimes times when download fail, 0 by default. Only work in RedisScheduler.
而通过查看源码发现似乎并没有做限制,即只能用于RedisScheduler. 故想问一下该javadoc是否过时了?
2014-09-12 12:42:57 +08:00
yihua.huang 147401ce5e remove duplicate setPath in ProxyPool 2014-09-09 22:58:44 +08:00
yihua.huang e7668e01b8 fix SourceRegion error and add some tests on it #144 2014-08-21 14:29:06 +08:00
yihua.huang 4446669c24 fix test 2014-08-18 10:54:24 +08:00
yihua.huang 9866297ec4 Disable jsoup entity escape by Default. Set Html.DISABLE_HTML_ENTITY_ESCAPE to false to enable it. #149 2014-08-14 08:04:56 +08:00
yihua.huang 4e6e946dd7 more friendly exception message in PlainText #144 2014-08-13 10:02:16 +08:00
yihua.huang af9939622b move thread package out of selector (because it is add by mistake at the beginning) 2014-06-25 18:19:50 +08:00
yihua.huang eae37c868b new sample 2014-06-10 17:38:54 +08:00
yihua.huang b3a282e58d some fix for tests #130 2014-06-10 00:05:30 +08:00
yihua.huang 074d767f45 Merge branch 'proxy' of github.com:yxssfxwzy/webmagic into yxssfxwzy-proxy 2014-06-09 23:51:36 +08:00
zwf 2f89cfc31a add test and fix bug of proxy module 2014-06-09 13:32:02 +08:00
yihua.huang eb89d66566 fix test 2014-06-04 22:28:27 +08:00
yihua.huang 5e8ca02ec6 contributor 2014-06-04 22:26:56 +08:00
yihua.huang 7a64847a3c Bugfix: selector does not works well in element #113 2014-06-03 20:03:33 +08:00
yihua.huang 8d67fd0357 change back return proxy from spider to httpclientdownloader #128 2014-05-28 08:08:51 +08:00
yihua.huang 40bf8ca58f change return proxy from spider to httpclientdownloader #128 2014-05-28 07:57:42 +08:00
yihua.huang 1f21d9cc14 spell mistake fix #128 2014-05-28 07:29:19 +08:00
Yihua Huang e310139d00 Merge pull request #128 from yxssfxwzy/proxy
多个代理的管理
2014-05-28 07:22:08 +08:00
yihua.huang b165090434 Bugfix:Type convert error in JsonPathSelector #129 2014-05-27 21:19:22 +08:00
yihua.huang a5d1b56e44 fix ut #113 2014-05-27 18:07:53 +08:00
yihua.huang 3939074a23 Bugfix: nodes() only return the first element #113 2014-05-27 17:53:06 +08:00
yihua.huang 41c2ea9498 refactor of selectable cont' #113
1. remove lazy init of Html
2. rename strings to sourceTexts for better meaning
3. make getSourceTexts abstract and DO NOT always store strings
4. instead store parsed elements of document in HtmlNode
2014-05-27 17:34:19 +08:00
yihua.huang f9825c214a refactor selectable for html fragment #113 2014-05-27 16:00:51 +08:00
yihua.huang 03d26c169b Enhance auto charset detect #126
1. Only read from content once to fix stream closed exception
2. invite moco as server test
2014-05-26 17:45:30 +08:00
zwf c146e2c7b4 add proxy pool 2014-05-19 15:59:31 +08:00
yihua.huang 21982d3460 remove cpdetector temporary #126 2014-05-14 23:52:27 +08:00