yihua.huang
ec1c2e8cbc
test and so on
2014-05-02 23:19:11 +08:00
yihua.huang
4f22f1210e
some bug fix #118
2014-05-02 20:38:49 +08:00
yihua.huang
56f033ce8d
set setDuplicateRemover for chain api #118
2014-05-02 20:21:23 +08:00
yihua.huang
d1140b9e29
add bloom filter for scheduler #118
2014-05-02 20:20:22 +08:00
yihua.huang
8e4814bdc5
fix path seperator
2014-05-02 17:06:34 +08:00
yihua.huang
e8d4a9be2b
fix remove duplicate error #117
2014-04-29 20:32:06 +08:00
yihua.huang
04ade75606
Merge branch 'stable' of github.com:code4craft/webmagic
...
Conflicts:
README.md
pom.xml
webmagic-avalon/pom.xml
webmagic-core/pom.xml
webmagic-extension/pom.xml
webmagic-lucene/pom.xml
webmagic-samples/pom.xml
webmagic-saxon/pom.xml
webmagic-scripts/pom.xml
webmagic-selenium/pom.xml
2014-04-27 15:03:15 +08:00
yihua.huang
a08d8cb167
update verion
2014-04-27 14:59:48 +08:00
yihua.huang
42a2676e8c
update version
2014-04-27 14:56:21 +08:00
yihua.huang
c25b32f1ca
[maven-release-plugin] prepare for next development iteration
2014-04-27 12:52:27 +08:00
yihua.huang
7ff83bb11a
[maven-release-plugin] prepare release WebMagic-0.5.0
2014-04-27 12:52:12 +08:00
yihua.huang
1104122979
more abstraction in scheduler
2014-04-27 09:30:01 +08:00
yihua.huang
2770811a10
update monitor example
2014-04-26 11:24:22 +08:00
yihua.huang
5ecd909ef2
add timeout for wait/notify #111
2014-04-25 19:41:36 +08:00
yihua.huang
c7afdb516e
remove thread utils #110
2014-04-25 18:44:45 +08:00
yihua.huang
17e95f2a7f
comments
2014-04-25 18:39:01 +08:00
yihua.huang
05eb7831b6
refactor and comments #110
2014-04-25 18:27:40 +08:00
yihua.huang
375e64e845
more monitor status
2014-04-25 18:10:14 +08:00
yihua.huang
018061d2cd
fix error in thread pool
2014-04-25 18:01:02 +08:00
yihua.huang
cdc423f2bf
log
2014-04-25 17:41:41 +08:00
yihua.huang
c6661899fd
new thread pool #110
2014-04-25 17:33:48 +08:00
yihua.huang
179baa7a22
return when page is null
2014-04-25 16:07:41 +08:00
yihua.huang
0336f4cdb4
remove IllegalStateException when download error for less error log
2014-04-25 16:06:29 +08:00
yihua.huang
11ba5beb42
[refactor]move monitor to webmagic-extension #98
2014-04-25 13:17:13 +08:00
yihua.huang
d61f65cef8
update mbean to mxbean #98
2014-04-25 11:31:43 +08:00
yihua.huang
ad6a273b12
update test url
2014-04-25 11:28:35 +08:00
yihua.huang
30af23d003
split monitor to server and client mode #98
2014-04-25 11:25:52 +08:00
yihua.huang
ced79630d3
specify jndi and jmx #98
2014-04-25 11:11:15 +08:00
yihua.huang
95d3802e77
add formdata support for post request #108
2014-04-24 11:48:58 +08:00
yihua.huang
f49bb877c8
clean some code #109
2014-04-24 11:38:13 +08:00
yihua.huang
e1aaf1dd11
fix mistake of guava Table #109
2014-04-24 11:05:49 +08:00
yihua.huang
8ba2da146c
request method #108 and more cookie #109 config
2014-04-24 10:51:37 +08:00
yihua.huang
b06aa489fb
[BugFix]Only one url from sourceRegion can be extracted #107
2014-04-18 17:48:26 +08:00
Bo LIANG
08fa3b01c1
when download error, throw an exception instead of calling onError and returning peacefully. #105
2014-04-17 17:53:12 +08:00
yihua.huang
27b37e8164
extension point and sample for JMX support #98
2014-04-17 08:12:37 +08:00
yihua.huang
a5db6cf292
some monitor and JMX support #98
2014-04-17 00:35:09 +08:00
yihua.huang
f39aa435cf
add null check #104
2014-04-16 19:46:32 +08:00
yihua.huang
42bbe40a37
[Bugfix]Urls will be lost when call setScheduler() #104
2014-04-16 19:45:17 +08:00
Bo LIANG
163773af6b
combine two try-catch block into one, make it cleaner.
2014-04-16 16:05:08 +08:00
yihua.huang
ec446277b1
some refactor in httpclientdownloader
2014-04-15 15:30:37 +08:00
yihua.huang
a03f6a8431
eclipse project
2014-04-15 07:44:43 +08:00
yihua.huang
4a035e729a
extension point for LocalDuplicatedRemovedScheduler #95
2014-04-13 23:31:13 +08:00
yihua.huang
b249e49748
[Bugfix]loop error when add TargetRequest #99
2014-04-13 23:04:09 +08:00
Yihua Huang
da2f023c12
Merge pull request #96 from ouyanghuangzheng/master
...
修改了Spider 和site 几处注释
2014-04-13 13:12:12 +08:00
yihua.huang
f7950ebcab
fix tests
2014-04-13 13:00:31 +08:00
愤怒的番茄
32ba1b8889
修复几处注释问题
2014-04-13 12:41:15 +08:00
yihua.huang
84b897f83b
update AngularJSProcessor
2014-04-13 12:20:57 +08:00
yihua.huang
03c251237b
add Json parse support
2014-04-13 10:23:00 +08:00
愤怒的番茄
644e8d1f72
同步官方源码
2014-04-12 22:32:22 +08:00
yihua.huang
969ad1766b
change logger style to slf4j for cleaner code
2014-04-06 21:32:20 +08:00
yihua.huang
9b2cb43f47
ConfigurablePageProcessor #91
2014-04-05 23:40:10 +08:00
Bo LIANG
b043ac76d6
change the formatter of log.
...
To use slf4j, we should insert {} into the formatter string.
2014-04-05 11:31:56 +08:00
yihua.huang
7aaf837e15
change logger to slf4j style for performance #84
2014-04-04 20:10:00 +08:00
yihua.huang
f9b157951d
Merge branch 'master' of github.com:code4craft/webmagic
2014-04-04 20:01:14 +08:00
yihua.huang
22c394e629
[doc]
2014-04-04 20:00:58 +08:00
Bo LIANG
762a3973fd
Modify the log levels of LocalDuplicatedRemovedScheduler.java
...
The old version will print a debug level log each time the push method is
called. So sometimes, when a html page have multiple links for the same
page, the debug log will appears more than once. Also, when we meet a
duplicate URL, it will also print a log, which will be confusing.
I change the level of it to trace. And each time a URL is really push into
queue, print a debug level log.
2014-04-04 15:53:46 +08:00
yihua.huang
a1c7e826f7
fix dep of slf4j-log4j12
2014-04-03 23:04:31 +08:00
yihua.huang
01848301d4
encode illegal charactors in url #80
2014-04-01 22:14:30 +08:00
yihua.huang
2780423e60
enable blank space in quotes in UrlUtils.fixAllRelativeHrefs #80
2014-04-01 20:35:11 +08:00
yihua.huang
97b6f46280
Bugfix: break loop in addTargetRequests #81
2014-04-01 20:12:25 +08:00
yihua.huang
8d8194bee4
Change HashMap to LinkedHashMap in ResultItems for same order of input and output #76
2014-03-25 08:23:20 +08:00
yihua.huang
8b35d79569
Do not cache document in Selectable for selected Html element #73
2014-03-19 22:19:06 +08:00
yihua.huang
6201fd6966
add worker as container
2014-03-17 23:01:58 +08:00
yihua.huang
6c11718566
Clean project structure #70
2014-03-14 23:24:38 +08:00
yihua.huang
9606a173cd
fix ZipCodePageProcessor
2014-03-13 22:55:50 +08:00
yihua.huang
4f68368db0
Merge branch 'master' of git.oschina.net:flashsword20/webmagic
...
Conflicts:
webmagic-core/src/main/java/us/codecraft/webmagic/selector/RegexSelector.java
2014-03-13 08:09:37 +08:00
yihua.huang
98e2bba099
Merge branch 'master' of github.com:code4craft/webmagic
...
Conflicts:
README.md
pom.xml
webmagic-core/pom.xml
webmagic-extension/pom.xml
webmagic-scripts/pom.xml
2014-03-13 08:07:33 +08:00
yihua.huang
757cc9b942
[maven-release-plugin] prepare for next development iteration
2014-03-13 07:49:51 +08:00
yihua.huang
63ffb5c792
[maven-release-plugin] prepare release webmaigc-0.4.3
2014-03-13 07:49:27 +08:00
yihua.huang
66d4d3c192
Merge branch 'master' into 0.4.x
2014-03-13 07:12:29 +08:00
yihua.huang
af07280176
remove defend code for httpclient 4.3.1 because it is fixed in 4.3.3 #59
2014-03-13 07:11:56 +08:00
yihua.huang
d5a978e00f
update version back to 0.4.3
2014-03-13 06:55:05 +08:00
yihua.huang
55368919df
add attribute 'text' support for CssSelector #66
2014-03-11 13:18:34 +08:00
yihua.huang
88b50d4182
bigfix: cycleTry will not work when spawnUrl is set to false #62
2014-03-04 07:33:07 +08:00
yihua.huang
2768a1cae4
add test for cycleTriedTimes and fix cycleTriedTimes inc error #60
2014-03-01 15:10:38 +08:00
yihua.huang
bbd0d7e600
update httpclient version to 4.3.3 #59
2014-02-28 21:17:02 +08:00
yihua.huang
571061454a
#58 add CYCLE_TRIED_TIMES support to QueueScheduler and PriorityScheduler
2014-02-27 23:54:30 +08:00
yihua.huang
0e98183f74
Change log4j to slf4j #55
2014-02-12 09:35:57 +08:00
yihua.huang
fa33b15843
property loader
2014-02-11 23:07:31 +08:00
yihua.huang
af809c4d55
update version to 0.5.0-snapshot
2014-02-11 22:16:01 +08:00
Almark Ming
2b46b11e55
Update RegexSelector.java
...
Optimize regex format check
Conflicts:
webmagic-core/src/main/java/us/codecraft/webmagic/selector/RegexSelector.java
2013-12-21 08:38:17 +08:00
yihua.huang
2a8e1b654d
Merge branch 'master' of git.oschina.net:flashsword20/webmagic into osc
...
Conflicts:
pom.xml
2013-12-21 07:59:28 +08:00
Almark Ming
91ed66ecac
Update RegexSelector.java
2013-12-17 16:57:22 +08:00
Almark Ming
83926970b2
Check valid left parenthesis
2013-12-17 16:55:53 +08:00
yihua.huang
b51fb2696b
update ut for cookie
2013-12-06 00:30:01 +08:00
yihua.huang
ff2f588c41
#48 nullpointer exception
2013-12-04 22:11:20 +08:00
yihua.huang
fc97cb58c5
update lib and version
2013-12-04 00:04:29 +08:00
yihua.huang
7c41bec92f
Merge branch 'master' of github.com:code4craft/webmagic
...
Conflicts:
README.md
webmagic-samples/pom.xml
webmagic-selenium/pom.xml
2013-12-03 23:50:26 +08:00
yihua.huang
d274310cb2
[maven-release-plugin] prepare for next development iteration
2013-12-03 23:35:06 +08:00
yihua.huang
e8c32a32dc
[maven-release-plugin] prepare release webmagic-0.4.2
2013-12-03 23:34:57 +08:00
yihua.huang
6a828e923c
#46 Downloader thread hang up when timeout
2013-12-03 09:59:54 +08:00
shijinping
9a524aa364
double-check 中再取次httpClient的内容
2013-11-28 14:38:30 +08:00
yihua.huang
fd23cb6dc0
Merge branch 'master' of github.com:code4craft/webmagic
...
Conflicts:
README.md
pom.xml
webmagic-samples/pom.xml
webmagic-selenium/pom.xml
2013-11-28 13:40:24 +08:00
yihua.huang
e7083dc39d
[maven-release-plugin] prepare for next development iteration
2013-11-28 13:04:32 +08:00
yihua.huang
ae623567b3
[maven-release-plugin] prepare release webmagic-0.4.1
2013-11-28 13:04:22 +08:00
yihua.huang
59ad4cad27
#42 Add jsonpath in annotation mode for json result
2013-11-28 08:25:16 +08:00
yihua.huang
c2d6d495b3
#41 add getThreadAlive(),getStatus,getPageCount() to spider
2013-11-28 07:59:24 +08:00
yihua.huang
cf62d707e0
#36 Spider does not exit when success
2013-11-27 23:33:18 +08:00
yihua.huang
a01312930a
#39 Parsing html after page.getHtml()
2013-11-27 22:01:34 +08:00
yihua.huang
f63d33b457
update some comments
2013-11-27 21:06:53 +08:00
yihua.huang
04fcf3193f
#38 Change algorithm of SmartContentSelector
2013-11-23 13:56:55 +08:00
yihua.huang
296a68920e
fix javadoc and add setPipelines() for spider
2013-11-14 13:23:29 +08:00
yihua.huang
47a0360783
#35 add status code to page
2013-11-12 11:51:34 +08:00
yihua.huang
bc5c30de17
update scripts
2013-11-12 08:20:59 +08:00
yihua.huang
f9daae39cf
[maven-release-plugin] prepare for next development iteration
2013-11-11 14:33:11 +08:00
yihua.huang
fdb9441519
[maven-release-plugin] prepare release webmagic-0.4.0
2013-11-11 14:33:01 +08:00
yihua.huang
1d75ae7f5b
rollback version to 0.4.0 because not deploy success
2013-11-11 11:52:56 +08:00
yihua.huang
df8ca8ad09
add scripts
2013-11-10 22:30:48 +08:00
yihua.huang
e40b48e77b
Merge tag 'webmagic-0.4.0' of github.com:code4craft/webmagic
...
[maven-release-plugin] copy for tag webmagic-0.4.0
Conflicts:
pom.xml
webmagic-core/pom.xml
webmagic-extension/pom.xml
2013-11-06 22:48:26 +08:00
yihua.huang
775eb9732f
[maven-release-plugin] prepare for next development iteration
2013-11-06 22:17:58 +08:00
yihua.huang
0b4fadc24d
[maven-release-plugin] prepare release webmagic-0.4.0
2013-11-06 22:17:47 +08:00
yihua.huang
fe6d9bb2e2
get keep-alive rework
2013-11-06 21:53:39 +08:00
yihua.huang
fd6d2fd6f8
try to keepalive TCP connection
2013-11-06 21:19:14 +08:00
yihua.huang
425df08523
update version to 0.4.0
2013-11-06 12:50:45 +08:00
yihua.huang
e046bb0723
remove useless code
2013-11-06 12:48:14 +08:00
yihua.huang
6e32a19f80
update api for direct download
2013-11-06 12:46:50 +08:00
yihua.huang
807aefe9df
change EntityUtil to IOUtil because some encoding error
2013-11-06 07:37:34 +08:00
yihua.huang
00b0a751b4
#33 ignore 'content-encoding' when redirect
2013-11-06 06:57:58 +08:00
yihua.huang
8f774afc84
add direct download
2013-11-06 06:41:04 +08:00
yihua.huang
c18b603399
optimize long compare
2013-11-04 07:09:44 +08:00
yihua.huang
ed3f3583cc
downloader refactor
2013-11-04 01:03:23 +08:00
yihua.huang
a37f40e6e6
add cookie supoort
2013-11-04 00:59:48 +08:00
yihua.huang
3c6fced48e
update connection client
2013-11-04 00:53:01 +08:00
yihua.huang
09153ff715
#22 http proxy support #32 update httpclient to 4.3.1
2013-11-04 00:47:09 +08:00
yihua.huang
edfc319c45
update httpclient to 4.3.1
2013-11-04 00:06:30 +08:00
yihua.huang
160a149b05
todo bugfix
2013-11-03 23:10:09 +08:00
yihua.huang
583a0eba8c
#29 refactor some method name
2013-11-03 20:24:26 +08:00
yihua.huang
6fa82a418b
#29 seed urls with more information
2013-11-03 20:20:50 +08:00
yihua.huang
1446ada732
some refactor
2013-10-31 22:50:22 +08:00
yihua.huang
84976c81ec
remove useless code
2013-10-31 22:48:18 +08:00
yihua.huang
b4fcf41168
add exit when comlete option
2013-10-31 22:41:02 +08:00
yihua.huang
352887870c
remove shutdown call
2013-10-31 22:22:14 +08:00
yihua.huang
a3f9ad198f
refactor multi thread code in Spider
2013-10-31 21:52:43 +08:00
yihua.huang
7fb44d2eec
#30 reuse PoolingClientConnectionManager for HttpClientDownloader
2013-10-14 23:22:04 +08:00
yihua.huang
5a226387e0
#27 nullpointer fix
2013-10-11 11:32:44 +08:00
yihua.huang
16e12e3bc9
#27 customize http header for downloader
2013-10-11 08:37:21 +08:00
yihua.huang
1a2c84ea78
#27 add timeout config to site
2013-10-11 07:36:16 +08:00
yihua.huang
372cc0ad06
update jar
2013-09-23 13:21:40 +08:00
yihua.huang
4acbc19cee
[maven-release-plugin] prepare for next development iteration
2013-09-23 13:12:32 +08:00
yihua.huang
cc3b787991
[maven-release-plugin] prepare release webmagic-0.3.2
2013-09-23 13:12:19 +08:00
yihua.huang
b131878123
add example
2013-09-23 13:01:28 +08:00
yihua.huang
95ab4edec3
some bugfix
2013-09-23 08:38:54 +08:00
yihua.huang
fba330872b
fix a thread pool exception
2013-09-22 23:57:15 +08:00
yihua.huang
3c79d031bd
fix thread pool
2013-09-22 22:52:52 +08:00
yihua.huang
a2fba8caa2
update to 0.3.1
2013-09-09 12:48:01 +08:00
yihua.huang
fb693a4ac4
[maven-release-plugin] prepare for next development iteration
2013-09-08 22:25:07 +08:00
yihua.huang
bfaaa042b9
[maven-release-plugin] prepare release webmagic-parent-0.3.1
2013-09-08 22:24:48 +08:00
yihua.huang
c17a31a21d
fix null pointe exception #26
2013-09-08 21:09:49 +08:00
yihua.huang
d2e0f0cd33
#25 use URL api in UrlUtils.canonicalizeUrl()
2013-09-06 21:35:23 +08:00
yihua.huang
ef4cf49fee
add stop method to spider #24
2013-09-06 21:17:36 +08:00
yihua.huang
58150a090d
update jar
2013-09-05 20:56:25 +08:00
yihua.huang
57556ab879
merege
2013-09-05 20:53:15 +08:00
yihua.huang
692de76f86
fix issue #21 charset detect error
2013-09-04 15:27:51 +08:00
yihua.huang
e7bf425df4
[maven-release-plugin] prepare for next development iteration
2013-09-04 10:51:01 +08:00
yihua.huang
77ff252316
[maven-release-plugin] prepare release webmagic-0.3.0
2013-09-04 10:50:50 +08:00
yihua.huang
1fc8e104ab
add cycle retry
2013-09-04 10:32:13 +08:00
yihua.huang
d141541ef3
add retry
2013-09-04 09:57:19 +08:00
yihua.huang
a1ef2523cc
update xsoup version
2013-09-04 09:38:40 +08:00
yihua.huang
aefd0569a5
update version
2013-09-04 09:36:56 +08:00
yihua.huang
194518fd82
add switch
2013-09-04 08:21:34 +08:00
yihua.huang
326b97c65a
update
2013-09-04 00:15:54 +08:00
yihua.huang
2c3574537a
refactor in selectors
2013-09-02 14:14:24 +08:00
yihua.huang
85b7cf1563
complete test
2013-09-02 13:52:41 +08:00
yihua.huang
d7cd9e5747
update pom
2013-09-02 11:56:01 +08:00
yihua.huang
55d4a76ab7
newselectors
2013-09-02 08:21:32 +08:00
yihua.huang
d7abbd0e4b
fix compile error
2013-08-25 16:31:00 +08:00
yihua.huang
5e9e8b2541
add TextContentSelector
2013-08-25 16:30:38 +08:00
yihua.huang
0cc0ccee35
add charset specific for easy call of HttpClientDownloader
2013-08-25 15:41:43 +08:00
yihua.huang
91dcccf7b5
add a sample
2013-08-21 21:55:15 +08:00
yihua.huang
ad66d33f38
[maven-release-plugin] prepare for next development iteration
2013-08-20 23:39:59 +08:00
yihua.huang
9dc6b11954
[maven-release-plugin] prepare release webmagic-parent-0.2.1
2013-08-20 23:37:55 +08:00
yihua.huang
4f62dfc8a4
release
2013-08-20 23:37:20 +08:00
yihua.huang
74c940c758
[maven-release-plugin] prepare for next development iteration
2013-08-20 23:19:58 +08:00
yihua.huang
a4bb4e3429
[maven-release-plugin] prepare release webmagic-parent-0.2.1
2013-08-20 23:19:27 +08:00
yihua.huang
194f16aa75
update
2013-08-20 23:16:43 +08:00
yihua.huang
0f0f1a9bcd
release notes
2013-08-20 22:51:30 +08:00
yihua.huang
c1471718df
extractors
2013-08-20 22:44:53 +08:00
yihua.huang
20705b34ac
add more option to extractors
2013-08-20 22:13:30 +08:00
yihua.huang
c70ed57025
remove PriorityScheduler to core
2013-08-20 21:55:58 +08:00
yihua.huang
7003426898
update pom
2013-08-20 21:52:39 +08:00
yihua.huang
606417fdc7
update pom
2013-08-19 09:55:49 +08:00
yihua.huang
d460e136ef
update version
2013-08-19 09:52:15 +08:00
yihua.huang
c79d6ecf09
complete all comments
2013-08-17 23:30:49 +08:00
yihua.huang
90bbe9b951
webmagic-core
2013-08-17 23:24:04 +08:00
yihua.huang
17f8ead28f
update comments for selector
2013-08-17 21:33:54 +08:00
yihua.huang
77e6ca2945
update comments
2013-08-17 21:26:44 +08:00
yihua.huang
5073258237
closable
2013-08-17 21:19:24 +08:00
yihua.huang
d01c0eb8ce
update comments of spider
2013-08-17 21:15:36 +08:00
yihua.huang
5f1f4cbc46
update comments
2013-08-17 20:41:29 +08:00
yihua.huang
1148450ff9
update filecache to more useful
2013-08-17 18:12:47 +08:00
yihua.huang
3ba7a76f44
add combo extract to replace Extract2 Extract3...
2013-08-17 17:23:11 +08:00
yihua.huang
5cb45af3a4
+doc
2013-08-17 12:10:34 +08:00
yihua.huang
ef673b985e
add a method for httpclientdownloader
2013-08-14 13:32:23 +08:00
yihua.huang
067f3ea0cb
add some null pointer check for httpclientdownloader
2013-08-14 13:30:09 +08:00
yihua.huang
9e82256ce3
update docs
2013-08-12 10:08:20 +08:00
yihua.huang
0a902b441c
update docs
2013-08-12 09:55:17 +08:00
yihua.huang
0f2c5b5723
update redisscheduler
2013-08-11 18:28:12 +08:00
yihua.huang
787b952932
release notes and docs
2013-08-11 10:21:26 +08:00
yihua.huang
8b15f3c63d
add test
2013-08-10 20:33:47 +08:00
yihua.huang
ade5714d50
add https support
2013-08-10 18:52:27 +08:00
yihua.huang
21eca688e9
complete docs
2013-08-09 20:56:33 +08:00
yihua.huang
17d2d98cec
remove invalid @date
2013-08-09 20:43:06 +08:00
yihua.huang
268bd8d0c4
remove saxon to extension
2013-08-07 23:04:10 +08:00
yihua.huang
cff943f698
fix path format error
2013-08-07 13:05:12 +08:00
yihua.huang
5ef231a768
update version
2013-08-07 12:48:32 +08:00
yihua.huang
570533cce5
update readme
2013-08-07 09:45:38 +08:00
yihua.huang
36494bcfa5
add xpath2.0 api
2013-08-06 23:01:43 +08:00
yihua.huang
5c96407a3d
fix a null domain error
2013-08-06 22:43:31 +08:00
yihua.huang
c7005a0227
json fix
2013-08-06 22:36:37 +08:00
yihua.huang
e5f4b3916f
change file dir
2013-08-06 22:26:39 +08:00
yihua.huang
7d277e84d4
update lucene pipeline
2013-08-06 21:47:44 +08:00
yihua.huang
b40cca1122
move model package to plugin
2013-08-06 20:41:35 +08:00
yihua.huang
4eb3d60083
fix nullpointer exception
2013-08-05 22:06:39 +08:00
yihua.huang
b0af45f4bb
complete redis support
2013-08-05 21:44:29 +08:00
yihua.huang
f3a29d9315
fix pagedmodel bug
2013-08-05 21:03:47 +08:00
yihua.huang
629f8ac2d1
add extractors chain
2013-08-05 20:45:34 +08:00
yihua.huang
27ce3fc176
lazy init
2013-08-05 19:36:49 +08:00
yihua.huang
dc9f574e27
update request
2013-08-05 18:17:52 +08:00
yihua.huang
d56c681be1
add priority to request
2013-08-05 18:08:28 +08:00
yihua.huang
971e7b6ce2
add core
2013-08-05 13:53:13 +08:00
yihua.huang
619a12b303
add paged support
2013-08-04 21:22:15 +08:00
yihua.huang
a5c85c3c8b
add annotation ExtractByRaw
2013-08-04 15:12:06 +08:00
yihua.huang
1a50c64e33
update name
2013-08-04 10:05:03 +08:00
yihua.huang
a3a868f584
rename
2013-08-04 09:55:50 +08:00
yihua.huang
04a7fa037a
update pipeline
2013-08-04 09:53:01 +08:00
yihua.huang
21cae2ff2e
update package
2013-08-04 07:53:28 +08:00
yihua.huang
cfb8990453
update author
2013-08-04 03:04:30 +08:00
yihua.huang
b393e38320
add multi entity extract
2013-08-03 20:42:29 +08:00
yihua.huang
bfadac756a
fix an attribute bug
2013-08-03 18:36:03 +08:00
yihua.huang
145628557d
update afterextract api
2013-08-03 18:01:17 +08:00
yihua.huang
aca165b132
add and or selector
2013-08-03 17:38:36 +08:00
yihua.huang
69245e8c03
fix Class.assinable bug
2013-08-03 17:17:59 +08:00
yihua.huang
65518f7672
add list support
2013-08-03 17:01:25 +08:00
yihua.huang
d4de60a562
skip test
2013-08-03 16:35:12 +08:00
yihua.huang
d26cd82d59
rename package
2013-08-03 16:29:50 +08:00
yihua.huang
f84b53514f
complete objectpipeline
2013-08-03 15:55:54 +08:00
yihua.huang
866ab0a056
update email
2013-08-03 14:01:18 +08:00
yihua.huang
7c9e9ce869
xpath2.0
2013-08-03 07:28:46 +08:00
yihua.huang
7f27c28d4c
simplify api
2013-08-02 23:45:13 +08:00
yihua.huang
d7899e94ae
test saxon and invite XPath2.0 support
2013-08-02 23:39:34 +08:00
yihua.huang
3fe3d8f044
update
2013-08-02 13:51:42 +08:00
yihua.huang
516ff3310d
add failfast
2013-08-02 08:20:55 +08:00
yihua.huang
7a4dbb1f15
invite notnull
2013-08-02 08:09:37 +08:00
yihua.huang
06a39af0f3
add setter support
2013-08-02 07:32:37 +08:00
yihua.huang
abba3b7bff
add extract by url
2013-08-02 06:59:25 +08:00
yihua.huang
f08ffc34fd
rename
2013-08-02 06:33:48 +08:00
yihua.huang
c5cf05640a
processor
2013-08-01 22:53:44 +08:00
yihua.huang
50edd22ef6
add annotation
2013-08-01 22:40:57 +08:00
yihua.huang
7020b8648d
fix a thread problem
2013-07-30 21:39:43 +08:00
yihua.huang
52fd5cfc1c
fix encoding
2013-07-30 15:24:59 +08:00
yihua.huang
e87aabf8fd
为downloader增加了一个新方法,可设置线程数
2013-07-29 20:01:44 +08:00
yihua.huang
18fefa0c0a
fix a spider init problem
2013-07-29 10:59:23 +08:00
yihua.huang
54904851ea
add list output support
2013-07-26 21:22:57 +08:00
yihua.huang
42508af041
add huaban processor
2013-07-26 16:32:51 +08:00
yihua.huang
fe224cbf66
release resource
2013-07-26 15:27:47 +08:00
yihua.huang
86a20eabd9
fix a httpclient pool size bug
2013-07-26 14:41:30 +08:00
yihua.huang
fed3c0c98a
update readme
2013-07-26 11:55:40 +08:00
yihua.huang
d3e527fd6b
try invite selenium
2013-07-26 11:52:23 +08:00
yihua.huang
c2142f872b
add iteye sample
2013-07-26 08:24:08 +08:00
yihua.huang
65dc372152
update pipeline api
2013-07-25 13:32:39 +08:00
yihua.huang
cea866520d
update version
2013-07-24 20:45:45 +08:00
yihua.huang
de006333c8
update java docs
2013-07-24 20:38:49 +08:00
yihua.huang
827972d80f
update java docs
2013-07-24 19:49:00 +08:00
yihua.huang
96454fd74c
update java doc
2013-07-24 18:26:54 +08:00
yihua.huang
81e7f7982e
invite jsoup and cssselector
2013-07-20 08:34:18 +08:00
yihua.huang
c733046045
+sina blog
2013-07-19 12:36:55 +08:00
yihua.huang
2b34dc9d3f
add retry
2013-07-18 17:22:26 +08:00
yihua.huang
5c79550fd9
add offline cache and process
2013-06-24 14:42:49 +08:00
yihua.huang
a7316a1f57
add runasync
2013-06-23 22:16:04 +08:00
yihua.huang
cad2594a08
add multithread support
2013-06-23 21:09:26 +08:00
yihua.huang
5a6a68a318
add gzip support
2013-06-23 18:56:31 +08:00
yihua.huang
adeed3bcaf
add extra
2013-06-23 17:06:43 +08:00
yihua.huang
a0bcfb8567
add extra for page
2013-06-23 17:05:10 +08:00
yihua.huang
7e17c71c3e
add page skip
2013-06-23 16:57:01 +08:00
yihua.huang
9b1ba6e8bc
ignore unstable test
2013-06-20 17:57:31 +08:00
yihua.huang
5cfdb10f81
update api to support jdk 1.6
2013-06-20 17:39:06 +08:00
yihua.huang
e1e25cb5e7
update javadoc
2013-06-20 08:21:48 +08:00
yihua.huang
b1f023ead5
fix spell error=.=
2013-06-20 07:54:55 +08:00
yihua.huang
7bed01c9f2
update Spider api
2013-06-20 07:53:48 +08:00
yihua.huang
986ae0beaf
update Select api: remove x() s() etc.
2013-06-19 09:57:41 +08:00
yihua.huang
586d23ef63
add package infos
2013-06-19 08:20:21 +08:00
yihua.huang
956d5cb3c8
docs
2013-06-18 22:39:37 +08:00
yihua.huang
fb0797b65c
update docs
2013-06-18 22:13:40 +08:00
yihua.huang
8f954c7997
fix samples
2013-06-18 18:30:45 +08:00
yihua.huang
312e1bce87
fix compile error
2013-06-18 18:02:30 +08:00
yihua.huang
49a4ad66d3
add uuid to spider
2013-06-18 17:42:31 +08:00
yihua.huang
6428e20543
add id
2013-06-18 14:34:09 +08:00
yihua.huang
0ae7adf324
add cookie support & add docs
2013-06-18 08:32:11 +08:00
yihua.huang
8cef8774cb
change author info
2013-06-18 07:24:19 +08:00
yihua.huang
328f174d11
fix pom
2013-06-17 11:27:44 +08:00
yihua.huang
f0fa1dad07
clean some code
2013-06-17 11:12:22 +08:00
yihua.huang
01f49aad3c
fix a pom error
2013-06-16 14:47:02 +08:00
yihua.huang
1c1bf89522
Merge branch 'master' of github.com:code4craft/webmagic
2013-06-10 12:18:04 +08:00
yihua.huang
8774cce7da
files
2013-06-10 12:17:53 +08:00
黄亿华
906e68cbfa
update comment
2013-06-09 22:00:39 +08:00
yihua.huang
ecb61d1385
update pipeline
2013-06-09 12:56:16 +08:00
yihua.huang
755b9aa84e
remove samples in test
2013-06-08 20:59:27 +08:00
yihua.huang
9d04fe3a76
split modules
2013-06-08 20:49:43 +08:00
yihua.huang
6dc88fa111
split modules
2013-06-08 20:48:27 +08:00