yihua.huang
03c251237b
add Json parse support
2014-04-13 10:23:00 +08:00
yihua.huang
969ad1766b
change logger style to slf4j for cleaner code
2014-04-06 21:32:20 +08:00
yihua.huang
9b2cb43f47
ConfigurablePageProcessor #91
2014-04-05 23:40:10 +08:00
Bo LIANG
b043ac76d6
change the formatter of log.
...
To use slf4j, we should insert {} into the formatter string.
2014-04-05 11:31:56 +08:00
yihua.huang
7aaf837e15
change logger to slf4j style for performance #84
2014-04-04 20:10:00 +08:00
yihua.huang
f9b157951d
Merge branch 'master' of github.com:code4craft/webmagic
2014-04-04 20:01:14 +08:00
yihua.huang
22c394e629
[doc]
2014-04-04 20:00:58 +08:00
Bo LIANG
762a3973fd
Modify the log levels of LocalDuplicatedRemovedScheduler.java
...
The old version will print a debug level log each time the push method is
called. So sometimes, when a html page have multiple links for the same
page, the debug log will appears more than once. Also, when we meet a
duplicate URL, it will also print a log, which will be confusing.
I change the level of it to trace. And each time a URL is really push into
queue, print a debug level log.
2014-04-04 15:53:46 +08:00
yihua.huang
a1c7e826f7
fix dep of slf4j-log4j12
2014-04-03 23:04:31 +08:00
yihua.huang
01848301d4
encode illegal charactors in url #80
2014-04-01 22:14:30 +08:00
yihua.huang
2780423e60
enable blank space in quotes in UrlUtils.fixAllRelativeHrefs #80
2014-04-01 20:35:11 +08:00
yihua.huang
97b6f46280
Bugfix: break loop in addTargetRequests #81
2014-04-01 20:12:25 +08:00
yihua.huang
8d8194bee4
Change HashMap to LinkedHashMap in ResultItems for same order of input and output #76
2014-03-25 08:23:20 +08:00
yihua.huang
8b35d79569
Do not cache document in Selectable for selected Html element #73
2014-03-19 22:19:06 +08:00
yihua.huang
6201fd6966
add worker as container
2014-03-17 23:01:58 +08:00
yihua.huang
6c11718566
Clean project structure #70
2014-03-14 23:24:38 +08:00
yihua.huang
9606a173cd
fix ZipCodePageProcessor
2014-03-13 22:55:50 +08:00
yihua.huang
757cc9b942
[maven-release-plugin] prepare for next development iteration
2014-03-13 07:49:51 +08:00
yihua.huang
63ffb5c792
[maven-release-plugin] prepare release webmaigc-0.4.3
2014-03-13 07:49:27 +08:00
yihua.huang
66d4d3c192
Merge branch 'master' into 0.4.x
2014-03-13 07:12:29 +08:00
yihua.huang
af07280176
remove defend code for httpclient 4.3.1 because it is fixed in 4.3.3 #59
2014-03-13 07:11:56 +08:00
yihua.huang
d5a978e00f
update version back to 0.4.3
2014-03-13 06:55:05 +08:00
yihua.huang
55368919df
add attribute 'text' support for CssSelector #66
2014-03-11 13:18:34 +08:00
yihua.huang
88b50d4182
bigfix: cycleTry will not work when spawnUrl is set to false #62
2014-03-04 07:33:07 +08:00
yihua.huang
2768a1cae4
add test for cycleTriedTimes and fix cycleTriedTimes inc error #60
2014-03-01 15:10:38 +08:00
yihua.huang
bbd0d7e600
update httpclient version to 4.3.3 #59
2014-02-28 21:17:02 +08:00
yihua.huang
571061454a
#58 add CYCLE_TRIED_TIMES support to QueueScheduler and PriorityScheduler
2014-02-27 23:54:30 +08:00
yihua.huang
0e98183f74
Change log4j to slf4j #55
2014-02-12 09:35:57 +08:00
yihua.huang
fa33b15843
property loader
2014-02-11 23:07:31 +08:00
yihua.huang
af809c4d55
update version to 0.5.0-snapshot
2014-02-11 22:16:01 +08:00
Almark Ming
2b46b11e55
Update RegexSelector.java
...
Optimize regex format check
Conflicts:
webmagic-core/src/main/java/us/codecraft/webmagic/selector/RegexSelector.java
2013-12-21 08:38:17 +08:00
yihua.huang
b51fb2696b
update ut for cookie
2013-12-06 00:30:01 +08:00
yihua.huang
ff2f588c41
#48 nullpointer exception
2013-12-04 22:11:20 +08:00
yihua.huang
d274310cb2
[maven-release-plugin] prepare for next development iteration
2013-12-03 23:35:06 +08:00
yihua.huang
e8c32a32dc
[maven-release-plugin] prepare release webmagic-0.4.2
2013-12-03 23:34:57 +08:00
yihua.huang
6a828e923c
#46 Downloader thread hang up when timeout
2013-12-03 09:59:54 +08:00
shijinping
9a524aa364
double-check 中再取次httpClient的内容
2013-11-28 14:38:30 +08:00
yihua.huang
e7083dc39d
[maven-release-plugin] prepare for next development iteration
2013-11-28 13:04:32 +08:00
yihua.huang
ae623567b3
[maven-release-plugin] prepare release webmagic-0.4.1
2013-11-28 13:04:22 +08:00
yihua.huang
59ad4cad27
#42 Add jsonpath in annotation mode for json result
2013-11-28 08:25:16 +08:00
yihua.huang
c2d6d495b3
#41 add getThreadAlive(),getStatus,getPageCount() to spider
2013-11-28 07:59:24 +08:00
yihua.huang
cf62d707e0
#36 Spider does not exit when success
2013-11-27 23:33:18 +08:00
yihua.huang
a01312930a
#39 Parsing html after page.getHtml()
2013-11-27 22:01:34 +08:00
yihua.huang
f63d33b457
update some comments
2013-11-27 21:06:53 +08:00
yihua.huang
04fcf3193f
#38 Change algorithm of SmartContentSelector
2013-11-23 13:56:55 +08:00
yihua.huang
296a68920e
fix javadoc and add setPipelines() for spider
2013-11-14 13:23:29 +08:00
yihua.huang
47a0360783
#35 add status code to page
2013-11-12 11:51:34 +08:00
yihua.huang
bc5c30de17
update scripts
2013-11-12 08:20:59 +08:00
yihua.huang
f9daae39cf
[maven-release-plugin] prepare for next development iteration
2013-11-11 14:33:11 +08:00
yihua.huang
fdb9441519
[maven-release-plugin] prepare release webmagic-0.4.0
2013-11-11 14:33:01 +08:00