yihua.huang
c6661899fd
new thread pool #110
2014-04-25 17:33:48 +08:00
yihua.huang
179baa7a22
return when page is null
2014-04-25 16:07:41 +08:00
yihua.huang
0336f4cdb4
remove IllegalStateException when download error for less error log
2014-04-25 16:06:29 +08:00
yihua.huang
11ba5beb42
[refactor]move monitor to webmagic-extension #98
2014-04-25 13:17:13 +08:00
yihua.huang
d61f65cef8
update mbean to mxbean #98
2014-04-25 11:31:43 +08:00
yihua.huang
ad6a273b12
update test url
2014-04-25 11:28:35 +08:00
yihua.huang
30af23d003
split monitor to server and client mode #98
2014-04-25 11:25:52 +08:00
yihua.huang
ced79630d3
specify jndi and jmx #98
2014-04-25 11:11:15 +08:00
yihua.huang
95d3802e77
add formdata support for post request #108
2014-04-24 11:48:58 +08:00
yihua.huang
f49bb877c8
clean some code #109
2014-04-24 11:38:13 +08:00
yihua.huang
e1aaf1dd11
fix mistake of guava Table #109
2014-04-24 11:05:49 +08:00
yihua.huang
8ba2da146c
request method #108 and more cookie #109 config
2014-04-24 10:51:37 +08:00
yihua.huang
b06aa489fb
[BugFix]Only one url from sourceRegion can be extracted #107
2014-04-18 17:48:26 +08:00
Bo LIANG
08fa3b01c1
when download error, throw an exception instead of calling onError and returning peacefully. #105
2014-04-17 17:53:12 +08:00
yihua.huang
27b37e8164
extension point and sample for JMX support #98
2014-04-17 08:12:37 +08:00
yihua.huang
a5db6cf292
some monitor and JMX support #98
2014-04-17 00:35:09 +08:00
yihua.huang
f39aa435cf
add null check #104
2014-04-16 19:46:32 +08:00
yihua.huang
42bbe40a37
[Bugfix]Urls will be lost when call setScheduler() #104
2014-04-16 19:45:17 +08:00
Bo LIANG
163773af6b
combine two try-catch block into one, make it cleaner.
2014-04-16 16:05:08 +08:00
yihua.huang
ec446277b1
some refactor in httpclientdownloader
2014-04-15 15:30:37 +08:00
yihua.huang
4a035e729a
extension point for LocalDuplicatedRemovedScheduler #95
2014-04-13 23:31:13 +08:00
yihua.huang
b249e49748
[Bugfix]loop error when add TargetRequest #99
2014-04-13 23:04:09 +08:00
Yihua Huang
da2f023c12
Merge pull request #96 from ouyanghuangzheng/master
...
修改了Spider 和site 几处注释
2014-04-13 13:12:12 +08:00
yihua.huang
f7950ebcab
fix tests
2014-04-13 13:00:31 +08:00
愤怒的番茄
32ba1b8889
修复几处注释问题
2014-04-13 12:41:15 +08:00
yihua.huang
84b897f83b
update AngularJSProcessor
2014-04-13 12:20:57 +08:00
yihua.huang
03c251237b
add Json parse support
2014-04-13 10:23:00 +08:00
愤怒的番茄
644e8d1f72
同步官方源码
2014-04-12 22:32:22 +08:00
yihua.huang
969ad1766b
change logger style to slf4j for cleaner code
2014-04-06 21:32:20 +08:00
yihua.huang
9b2cb43f47
ConfigurablePageProcessor #91
2014-04-05 23:40:10 +08:00
Bo LIANG
b043ac76d6
change the formatter of log.
...
To use slf4j, we should insert {} into the formatter string.
2014-04-05 11:31:56 +08:00
yihua.huang
7aaf837e15
change logger to slf4j style for performance #84
2014-04-04 20:10:00 +08:00
yihua.huang
f9b157951d
Merge branch 'master' of github.com:code4craft/webmagic
2014-04-04 20:01:14 +08:00
yihua.huang
22c394e629
[doc]
2014-04-04 20:00:58 +08:00
Bo LIANG
762a3973fd
Modify the log levels of LocalDuplicatedRemovedScheduler.java
...
The old version will print a debug level log each time the push method is
called. So sometimes, when a html page have multiple links for the same
page, the debug log will appears more than once. Also, when we meet a
duplicate URL, it will also print a log, which will be confusing.
I change the level of it to trace. And each time a URL is really push into
queue, print a debug level log.
2014-04-04 15:53:46 +08:00
yihua.huang
a1c7e826f7
fix dep of slf4j-log4j12
2014-04-03 23:04:31 +08:00
yihua.huang
01848301d4
encode illegal charactors in url #80
2014-04-01 22:14:30 +08:00
yihua.huang
2780423e60
enable blank space in quotes in UrlUtils.fixAllRelativeHrefs #80
2014-04-01 20:35:11 +08:00
yihua.huang
97b6f46280
Bugfix: break loop in addTargetRequests #81
2014-04-01 20:12:25 +08:00
yihua.huang
8d8194bee4
Change HashMap to LinkedHashMap in ResultItems for same order of input and output #76
2014-03-25 08:23:20 +08:00
yihua.huang
8b35d79569
Do not cache document in Selectable for selected Html element #73
2014-03-19 22:19:06 +08:00
yihua.huang
6201fd6966
add worker as container
2014-03-17 23:01:58 +08:00
yihua.huang
6c11718566
Clean project structure #70
2014-03-14 23:24:38 +08:00
yihua.huang
9606a173cd
fix ZipCodePageProcessor
2014-03-13 22:55:50 +08:00
yihua.huang
757cc9b942
[maven-release-plugin] prepare for next development iteration
2014-03-13 07:49:51 +08:00
yihua.huang
63ffb5c792
[maven-release-plugin] prepare release webmaigc-0.4.3
2014-03-13 07:49:27 +08:00
yihua.huang
66d4d3c192
Merge branch 'master' into 0.4.x
2014-03-13 07:12:29 +08:00
yihua.huang
af07280176
remove defend code for httpclient 4.3.1 because it is fixed in 4.3.3 #59
2014-03-13 07:11:56 +08:00
yihua.huang
d5a978e00f
update version back to 0.4.3
2014-03-13 06:55:05 +08:00
yihua.huang
55368919df
add attribute 'text' support for CssSelector #66
2014-03-11 13:18:34 +08:00
yihua.huang
88b50d4182
bigfix: cycleTry will not work when spawnUrl is set to false #62
2014-03-04 07:33:07 +08:00
yihua.huang
2768a1cae4
add test for cycleTriedTimes and fix cycleTriedTimes inc error #60
2014-03-01 15:10:38 +08:00
yihua.huang
bbd0d7e600
update httpclient version to 4.3.3 #59
2014-02-28 21:17:02 +08:00
yihua.huang
571061454a
#58 add CYCLE_TRIED_TIMES support to QueueScheduler and PriorityScheduler
2014-02-27 23:54:30 +08:00
yihua.huang
0e98183f74
Change log4j to slf4j #55
2014-02-12 09:35:57 +08:00
yihua.huang
fa33b15843
property loader
2014-02-11 23:07:31 +08:00
yihua.huang
af809c4d55
update version to 0.5.0-snapshot
2014-02-11 22:16:01 +08:00
Almark Ming
2b46b11e55
Update RegexSelector.java
...
Optimize regex format check
Conflicts:
webmagic-core/src/main/java/us/codecraft/webmagic/selector/RegexSelector.java
2013-12-21 08:38:17 +08:00
yihua.huang
b51fb2696b
update ut for cookie
2013-12-06 00:30:01 +08:00
yihua.huang
ff2f588c41
#48 nullpointer exception
2013-12-04 22:11:20 +08:00
yihua.huang
d274310cb2
[maven-release-plugin] prepare for next development iteration
2013-12-03 23:35:06 +08:00
yihua.huang
e8c32a32dc
[maven-release-plugin] prepare release webmagic-0.4.2
2013-12-03 23:34:57 +08:00
yihua.huang
6a828e923c
#46 Downloader thread hang up when timeout
2013-12-03 09:59:54 +08:00
shijinping
9a524aa364
double-check 中再取次httpClient的内容
2013-11-28 14:38:30 +08:00
yihua.huang
e7083dc39d
[maven-release-plugin] prepare for next development iteration
2013-11-28 13:04:32 +08:00
yihua.huang
ae623567b3
[maven-release-plugin] prepare release webmagic-0.4.1
2013-11-28 13:04:22 +08:00
yihua.huang
59ad4cad27
#42 Add jsonpath in annotation mode for json result
2013-11-28 08:25:16 +08:00
yihua.huang
c2d6d495b3
#41 add getThreadAlive(),getStatus,getPageCount() to spider
2013-11-28 07:59:24 +08:00
yihua.huang
cf62d707e0
#36 Spider does not exit when success
2013-11-27 23:33:18 +08:00
yihua.huang
a01312930a
#39 Parsing html after page.getHtml()
2013-11-27 22:01:34 +08:00
yihua.huang
f63d33b457
update some comments
2013-11-27 21:06:53 +08:00
yihua.huang
04fcf3193f
#38 Change algorithm of SmartContentSelector
2013-11-23 13:56:55 +08:00
yihua.huang
296a68920e
fix javadoc and add setPipelines() for spider
2013-11-14 13:23:29 +08:00
yihua.huang
47a0360783
#35 add status code to page
2013-11-12 11:51:34 +08:00
yihua.huang
bc5c30de17
update scripts
2013-11-12 08:20:59 +08:00
yihua.huang
f9daae39cf
[maven-release-plugin] prepare for next development iteration
2013-11-11 14:33:11 +08:00
yihua.huang
fdb9441519
[maven-release-plugin] prepare release webmagic-0.4.0
2013-11-11 14:33:01 +08:00
yihua.huang
1d75ae7f5b
rollback version to 0.4.0 because not deploy success
2013-11-11 11:52:56 +08:00
yihua.huang
df8ca8ad09
add scripts
2013-11-10 22:30:48 +08:00
yihua.huang
775eb9732f
[maven-release-plugin] prepare for next development iteration
2013-11-06 22:17:58 +08:00
yihua.huang
0b4fadc24d
[maven-release-plugin] prepare release webmagic-0.4.0
2013-11-06 22:17:47 +08:00
yihua.huang
fe6d9bb2e2
get keep-alive rework
2013-11-06 21:53:39 +08:00
yihua.huang
fd6d2fd6f8
try to keepalive TCP connection
2013-11-06 21:19:14 +08:00
yihua.huang
425df08523
update version to 0.4.0
2013-11-06 12:50:45 +08:00
yihua.huang
e046bb0723
remove useless code
2013-11-06 12:48:14 +08:00
yihua.huang
6e32a19f80
update api for direct download
2013-11-06 12:46:50 +08:00
yihua.huang
807aefe9df
change EntityUtil to IOUtil because some encoding error
2013-11-06 07:37:34 +08:00
yihua.huang
00b0a751b4
#33 ignore 'content-encoding' when redirect
2013-11-06 06:57:58 +08:00
yihua.huang
8f774afc84
add direct download
2013-11-06 06:41:04 +08:00
yihua.huang
c18b603399
optimize long compare
2013-11-04 07:09:44 +08:00
yihua.huang
ed3f3583cc
downloader refactor
2013-11-04 01:03:23 +08:00
yihua.huang
a37f40e6e6
add cookie supoort
2013-11-04 00:59:48 +08:00
yihua.huang
3c6fced48e
update connection client
2013-11-04 00:53:01 +08:00
yihua.huang
09153ff715
#22 http proxy support #32 update httpclient to 4.3.1
2013-11-04 00:47:09 +08:00
yihua.huang
edfc319c45
update httpclient to 4.3.1
2013-11-04 00:06:30 +08:00
yihua.huang
160a149b05
todo bugfix
2013-11-03 23:10:09 +08:00
yihua.huang
583a0eba8c
#29 refactor some method name
2013-11-03 20:24:26 +08:00
yihua.huang
6fa82a418b
#29 seed urls with more information
2013-11-03 20:20:50 +08:00
yihua.huang
1446ada732
some refactor
2013-10-31 22:50:22 +08:00
yihua.huang
84976c81ec
remove useless code
2013-10-31 22:48:18 +08:00