yihua.huang
5706bb90af
update xsoup to 0.3.1
2016-01-20 12:59:11 +08:00
yihua.huang
7586e3d75c
add some test for github repo downloader
2016-01-19 08:05:53 +08:00
x1ny
90e14b31b0
修正FileCacheQueueScheduler导致程序不能正常结束和未关闭流
...
FileCacheQueueScheduler中开启了一个线程周期运行来保存数据但在爬虫结束后没有关闭导致程序无法结束,以及没有关闭io流。
解决方法:
让FileCacheQueueScheduler实现Closable接口,在close方法中关闭线程以及流。
在Spider的close方法中添加对scheduler的关闭操作。
2015-11-12 23:10:20 +08:00
yihua.huang
56e0cd513a
compile error fix
2015-04-15 23:21:06 +08:00
yihua.huang
c5740b1840
change assert #200
2015-04-15 08:32:08 +08:00
yihua.huang
67eb632f4d
test for issue #200
2015-04-15 08:31:45 +08:00
高军
590561a6e4
修正site.setHttpProxy()不起作用的bug
2015-03-09 15:54:15 +08:00
edwardsbean
19474e4716
add SimpleProxyPool and IProxyPool
2015-02-28 17:50:10 +08:00
edwardsbean
4978665633
add retry sleep time
2015-01-21 13:30:02 +08:00
yihua.huang
8ffc1a7093
add NPE check for POST method
2015-01-13 14:10:00 +08:00
zhugw
bc666e927d
Update Site.java
...
setCycleRetryTimes的javadoc是这么说的:Set cycleRetryTimes times when download fail, 0 by default. Only work in RedisScheduler.
而通过查看源码发现似乎并没有做限制,即只能用于RedisScheduler. 故想问一下该javadoc是否过时了?
2014-09-12 12:42:57 +08:00
yihua.huang
147401ce5e
remove duplicate setPath in ProxyPool
2014-09-09 22:58:44 +08:00
yihua.huang
e7668e01b8
fix SourceRegion error and add some tests on it #144
2014-08-21 14:29:06 +08:00
yihua.huang
4446669c24
fix test
2014-08-18 10:54:24 +08:00
yihua.huang
9866297ec4
Disable jsoup entity escape by Default. Set Html.DISABLE_HTML_ENTITY_ESCAPE to false to enable it. #149
2014-08-14 08:04:56 +08:00
yihua.huang
4e6e946dd7
more friendly exception message in PlainText #144
2014-08-13 10:02:16 +08:00
yihua.huang
af9939622b
move thread package out of selector (because it is add by mistake at the beginning)
2014-06-25 18:19:50 +08:00
yihua.huang
eae37c868b
new sample
2014-06-10 17:38:54 +08:00
yihua.huang
b3a282e58d
some fix for tests #130
2014-06-10 00:05:30 +08:00
yihua.huang
074d767f45
Merge branch 'proxy' of github.com:yxssfxwzy/webmagic into yxssfxwzy-proxy
2014-06-09 23:51:36 +08:00
zwf
2f89cfc31a
add test and fix bug of proxy module
2014-06-09 13:32:02 +08:00
yihua.huang
eb89d66566
fix test
2014-06-04 22:28:27 +08:00
yihua.huang
5e8ca02ec6
contributor
2014-06-04 22:26:56 +08:00
yihua.huang
8c33be48a6
Merge branch 'stable' of github.com:code4craft/webmagic
2014-06-04 17:37:45 +08:00
yihua.huang
5f8c3fd5c5
update version
2014-06-04 17:33:30 +08:00
yihua.huang
7a64847a3c
Bugfix: selector does not works well in element #113
2014-06-03 20:03:33 +08:00
yihua.huang
8d67fd0357
change back return proxy from spider to httpclientdownloader #128
2014-05-28 08:08:51 +08:00
yihua.huang
40bf8ca58f
change return proxy from spider to httpclientdownloader #128
2014-05-28 07:57:42 +08:00
yihua.huang
1f21d9cc14
spell mistake fix #128
2014-05-28 07:29:19 +08:00
Yihua Huang
e310139d00
Merge pull request #128 from yxssfxwzy/proxy
...
多个代理的管理
2014-05-28 07:22:08 +08:00
yihua.huang
b165090434
Bugfix:Type convert error in JsonPathSelector #129
2014-05-27 21:19:22 +08:00
yihua.huang
a5d1b56e44
fix ut #113
2014-05-27 18:07:53 +08:00
yihua.huang
3939074a23
Bugfix: nodes() only return the first element #113
2014-05-27 17:53:06 +08:00
yihua.huang
41c2ea9498
refactor of selectable cont' #113
...
1. remove lazy init of Html
2. rename strings to sourceTexts for better meaning
3. make getSourceTexts abstract and DO NOT always store strings
4. instead store parsed elements of document in HtmlNode
2014-05-27 17:34:19 +08:00
yihua.huang
f9825c214a
refactor selectable for html fragment #113
2014-05-27 16:00:51 +08:00
yihua.huang
03d26c169b
Enhance auto charset detect #126
...
1. Only read from content once to fix stream closed exception
2. invite moco as server test
2014-05-26 17:45:30 +08:00
zwf
c146e2c7b4
add proxy pool
2014-05-19 15:59:31 +08:00
yihua.huang
21982d3460
remove cpdetector temporary #126
2014-05-14 23:52:27 +08:00
fengwuze
fcbfb75608
修改自动从网页中获取字符的代码块,抽取出来成为单独的方法。
2014-05-14 19:14:42 +08:00
fengwuze
95494d3c4d
增加处理meta的逻辑。
...
遗留:
3、网页没有指定编码的情况下,需要采用cpdetector,但目前cpdetector这个在Maven的中央库里面没有,不清楚如何解决。
2014-05-14 14:53:54 +08:00
yihua.huang
dde2d89bbe
Ignore content in json when bracket when remove padding #124
2014-05-08 23:37:18 +08:00
ywooer
259f0a16c5
Update FilePipeline.java
2014-05-06 18:33:00 +08:00
ywooer
26d38851b5
add charset to Writer
2014-05-06 18:28:50 +08:00
yihua.huang
7668731f08
update version to snapshot
2014-05-05 07:03:55 +08:00
yihua.huang
182dd51689
Merge branch 'stable' of github.com:code4craft/webmagic
2014-05-03 06:19:11 +08:00
yihua.huang
81e6e772ac
versions back to 0.5.1
2014-05-03 06:18:57 +08:00
yihua.huang
feb604da87
Merge branch 'stable' of github.com:code4craft/webmagic
2014-05-03 06:14:54 +08:00
yihua.huang
358e906379
[maven-release-plugin] prepare for next development iteration
2014-05-03 00:00:13 +08:00
yihua.huang
470750fc0d
[maven-release-plugin] prepare release WebMagic-0.5.1
2014-05-02 23:59:55 +08:00
yihua.huang
01aec7e1ab
extension point of geturl #118
2014-05-02 23:23:23 +08:00
yihua.huang
ec1c2e8cbc
test and so on
2014-05-02 23:19:11 +08:00
yihua.huang
4f22f1210e
some bug fix #118
2014-05-02 20:38:49 +08:00
yihua.huang
56f033ce8d
set setDuplicateRemover for chain api #118
2014-05-02 20:21:23 +08:00
yihua.huang
d1140b9e29
add bloom filter for scheduler #118
2014-05-02 20:20:22 +08:00
yihua.huang
8e4814bdc5
fix path seperator
2014-05-02 17:06:34 +08:00
yihua.huang
e8d4a9be2b
fix remove duplicate error #117
2014-04-29 20:32:06 +08:00
yihua.huang
04ade75606
Merge branch 'stable' of github.com:code4craft/webmagic
...
Conflicts:
README.md
pom.xml
webmagic-avalon/pom.xml
webmagic-core/pom.xml
webmagic-extension/pom.xml
webmagic-lucene/pom.xml
webmagic-samples/pom.xml
webmagic-saxon/pom.xml
webmagic-scripts/pom.xml
webmagic-selenium/pom.xml
2014-04-27 15:03:15 +08:00
yihua.huang
a08d8cb167
update verion
2014-04-27 14:59:48 +08:00
yihua.huang
42a2676e8c
update version
2014-04-27 14:56:21 +08:00
yihua.huang
c25b32f1ca
[maven-release-plugin] prepare for next development iteration
2014-04-27 12:52:27 +08:00
yihua.huang
7ff83bb11a
[maven-release-plugin] prepare release WebMagic-0.5.0
2014-04-27 12:52:12 +08:00
yihua.huang
1104122979
more abstraction in scheduler
2014-04-27 09:30:01 +08:00
yihua.huang
2770811a10
update monitor example
2014-04-26 11:24:22 +08:00
yihua.huang
5ecd909ef2
add timeout for wait/notify #111
2014-04-25 19:41:36 +08:00
yihua.huang
c7afdb516e
remove thread utils #110
2014-04-25 18:44:45 +08:00
yihua.huang
17e95f2a7f
comments
2014-04-25 18:39:01 +08:00
yihua.huang
05eb7831b6
refactor and comments #110
2014-04-25 18:27:40 +08:00
yihua.huang
375e64e845
more monitor status
2014-04-25 18:10:14 +08:00
yihua.huang
018061d2cd
fix error in thread pool
2014-04-25 18:01:02 +08:00
yihua.huang
cdc423f2bf
log
2014-04-25 17:41:41 +08:00
yihua.huang
c6661899fd
new thread pool #110
2014-04-25 17:33:48 +08:00
yihua.huang
179baa7a22
return when page is null
2014-04-25 16:07:41 +08:00
yihua.huang
0336f4cdb4
remove IllegalStateException when download error for less error log
2014-04-25 16:06:29 +08:00
yihua.huang
11ba5beb42
[refactor]move monitor to webmagic-extension #98
2014-04-25 13:17:13 +08:00
yihua.huang
d61f65cef8
update mbean to mxbean #98
2014-04-25 11:31:43 +08:00
yihua.huang
ad6a273b12
update test url
2014-04-25 11:28:35 +08:00
yihua.huang
30af23d003
split monitor to server and client mode #98
2014-04-25 11:25:52 +08:00
yihua.huang
ced79630d3
specify jndi and jmx #98
2014-04-25 11:11:15 +08:00
yihua.huang
95d3802e77
add formdata support for post request #108
2014-04-24 11:48:58 +08:00
yihua.huang
f49bb877c8
clean some code #109
2014-04-24 11:38:13 +08:00
yihua.huang
e1aaf1dd11
fix mistake of guava Table #109
2014-04-24 11:05:49 +08:00
yihua.huang
8ba2da146c
request method #108 and more cookie #109 config
2014-04-24 10:51:37 +08:00
yihua.huang
b06aa489fb
[BugFix]Only one url from sourceRegion can be extracted #107
2014-04-18 17:48:26 +08:00
Bo LIANG
08fa3b01c1
when download error, throw an exception instead of calling onError and returning peacefully. #105
2014-04-17 17:53:12 +08:00
yihua.huang
27b37e8164
extension point and sample for JMX support #98
2014-04-17 08:12:37 +08:00
yihua.huang
a5db6cf292
some monitor and JMX support #98
2014-04-17 00:35:09 +08:00
yihua.huang
f39aa435cf
add null check #104
2014-04-16 19:46:32 +08:00
yihua.huang
42bbe40a37
[Bugfix]Urls will be lost when call setScheduler() #104
2014-04-16 19:45:17 +08:00
Bo LIANG
163773af6b
combine two try-catch block into one, make it cleaner.
2014-04-16 16:05:08 +08:00
yihua.huang
ec446277b1
some refactor in httpclientdownloader
2014-04-15 15:30:37 +08:00
yihua.huang
a03f6a8431
eclipse project
2014-04-15 07:44:43 +08:00
yihua.huang
4a035e729a
extension point for LocalDuplicatedRemovedScheduler #95
2014-04-13 23:31:13 +08:00
yihua.huang
b249e49748
[Bugfix]loop error when add TargetRequest #99
2014-04-13 23:04:09 +08:00
Yihua Huang
da2f023c12
Merge pull request #96 from ouyanghuangzheng/master
...
修改了Spider 和site 几处注释
2014-04-13 13:12:12 +08:00
yihua.huang
f7950ebcab
fix tests
2014-04-13 13:00:31 +08:00
愤怒的番茄
32ba1b8889
修复几处注释问题
2014-04-13 12:41:15 +08:00
yihua.huang
84b897f83b
update AngularJSProcessor
2014-04-13 12:20:57 +08:00
yihua.huang
03c251237b
add Json parse support
2014-04-13 10:23:00 +08:00
愤怒的番茄
644e8d1f72
同步官方源码
2014-04-12 22:32:22 +08:00
yihua.huang
969ad1766b
change logger style to slf4j for cleaner code
2014-04-06 21:32:20 +08:00
yihua.huang
9b2cb43f47
ConfigurablePageProcessor #91
2014-04-05 23:40:10 +08:00
Bo LIANG
b043ac76d6
change the formatter of log.
...
To use slf4j, we should insert {} into the formatter string.
2014-04-05 11:31:56 +08:00
yihua.huang
7aaf837e15
change logger to slf4j style for performance #84
2014-04-04 20:10:00 +08:00
yihua.huang
f9b157951d
Merge branch 'master' of github.com:code4craft/webmagic
2014-04-04 20:01:14 +08:00
yihua.huang
22c394e629
[doc]
2014-04-04 20:00:58 +08:00
Bo LIANG
762a3973fd
Modify the log levels of LocalDuplicatedRemovedScheduler.java
...
The old version will print a debug level log each time the push method is
called. So sometimes, when a html page have multiple links for the same
page, the debug log will appears more than once. Also, when we meet a
duplicate URL, it will also print a log, which will be confusing.
I change the level of it to trace. And each time a URL is really push into
queue, print a debug level log.
2014-04-04 15:53:46 +08:00
yihua.huang
a1c7e826f7
fix dep of slf4j-log4j12
2014-04-03 23:04:31 +08:00
yihua.huang
01848301d4
encode illegal charactors in url #80
2014-04-01 22:14:30 +08:00
yihua.huang
2780423e60
enable blank space in quotes in UrlUtils.fixAllRelativeHrefs #80
2014-04-01 20:35:11 +08:00
yihua.huang
97b6f46280
Bugfix: break loop in addTargetRequests #81
2014-04-01 20:12:25 +08:00
yihua.huang
8d8194bee4
Change HashMap to LinkedHashMap in ResultItems for same order of input and output #76
2014-03-25 08:23:20 +08:00
yihua.huang
8b35d79569
Do not cache document in Selectable for selected Html element #73
2014-03-19 22:19:06 +08:00
yihua.huang
6201fd6966
add worker as container
2014-03-17 23:01:58 +08:00
yihua.huang
6c11718566
Clean project structure #70
2014-03-14 23:24:38 +08:00
yihua.huang
9606a173cd
fix ZipCodePageProcessor
2014-03-13 22:55:50 +08:00
yihua.huang
4f68368db0
Merge branch 'master' of git.oschina.net:flashsword20/webmagic
...
Conflicts:
webmagic-core/src/main/java/us/codecraft/webmagic/selector/RegexSelector.java
2014-03-13 08:09:37 +08:00
yihua.huang
98e2bba099
Merge branch 'master' of github.com:code4craft/webmagic
...
Conflicts:
README.md
pom.xml
webmagic-core/pom.xml
webmagic-extension/pom.xml
webmagic-scripts/pom.xml
2014-03-13 08:07:33 +08:00
yihua.huang
757cc9b942
[maven-release-plugin] prepare for next development iteration
2014-03-13 07:49:51 +08:00
yihua.huang
63ffb5c792
[maven-release-plugin] prepare release webmaigc-0.4.3
2014-03-13 07:49:27 +08:00
yihua.huang
66d4d3c192
Merge branch 'master' into 0.4.x
2014-03-13 07:12:29 +08:00
yihua.huang
af07280176
remove defend code for httpclient 4.3.1 because it is fixed in 4.3.3 #59
2014-03-13 07:11:56 +08:00
yihua.huang
d5a978e00f
update version back to 0.4.3
2014-03-13 06:55:05 +08:00
yihua.huang
55368919df
add attribute 'text' support for CssSelector #66
2014-03-11 13:18:34 +08:00
yihua.huang
88b50d4182
bigfix: cycleTry will not work when spawnUrl is set to false #62
2014-03-04 07:33:07 +08:00
yihua.huang
2768a1cae4
add test for cycleTriedTimes and fix cycleTriedTimes inc error #60
2014-03-01 15:10:38 +08:00
yihua.huang
bbd0d7e600
update httpclient version to 4.3.3 #59
2014-02-28 21:17:02 +08:00
yihua.huang
571061454a
#58 add CYCLE_TRIED_TIMES support to QueueScheduler and PriorityScheduler
2014-02-27 23:54:30 +08:00
yihua.huang
0e98183f74
Change log4j to slf4j #55
2014-02-12 09:35:57 +08:00
yihua.huang
fa33b15843
property loader
2014-02-11 23:07:31 +08:00
yihua.huang
af809c4d55
update version to 0.5.0-snapshot
2014-02-11 22:16:01 +08:00
Almark Ming
2b46b11e55
Update RegexSelector.java
...
Optimize regex format check
Conflicts:
webmagic-core/src/main/java/us/codecraft/webmagic/selector/RegexSelector.java
2013-12-21 08:38:17 +08:00
yihua.huang
2a8e1b654d
Merge branch 'master' of git.oschina.net:flashsword20/webmagic into osc
...
Conflicts:
pom.xml
2013-12-21 07:59:28 +08:00
Almark Ming
91ed66ecac
Update RegexSelector.java
2013-12-17 16:57:22 +08:00
Almark Ming
83926970b2
Check valid left parenthesis
2013-12-17 16:55:53 +08:00
yihua.huang
b51fb2696b
update ut for cookie
2013-12-06 00:30:01 +08:00
yihua.huang
ff2f588c41
#48 nullpointer exception
2013-12-04 22:11:20 +08:00
yihua.huang
fc97cb58c5
update lib and version
2013-12-04 00:04:29 +08:00
yihua.huang
7c41bec92f
Merge branch 'master' of github.com:code4craft/webmagic
...
Conflicts:
README.md
webmagic-samples/pom.xml
webmagic-selenium/pom.xml
2013-12-03 23:50:26 +08:00
yihua.huang
d274310cb2
[maven-release-plugin] prepare for next development iteration
2013-12-03 23:35:06 +08:00
yihua.huang
e8c32a32dc
[maven-release-plugin] prepare release webmagic-0.4.2
2013-12-03 23:34:57 +08:00
yihua.huang
6a828e923c
#46 Downloader thread hang up when timeout
2013-12-03 09:59:54 +08:00
shijinping
9a524aa364
double-check 中再取次httpClient的内容
2013-11-28 14:38:30 +08:00
yihua.huang
fd23cb6dc0
Merge branch 'master' of github.com:code4craft/webmagic
...
Conflicts:
README.md
pom.xml
webmagic-samples/pom.xml
webmagic-selenium/pom.xml
2013-11-28 13:40:24 +08:00
yihua.huang
e7083dc39d
[maven-release-plugin] prepare for next development iteration
2013-11-28 13:04:32 +08:00
yihua.huang
ae623567b3
[maven-release-plugin] prepare release webmagic-0.4.1
2013-11-28 13:04:22 +08:00
yihua.huang
59ad4cad27
#42 Add jsonpath in annotation mode for json result
2013-11-28 08:25:16 +08:00
yihua.huang
c2d6d495b3
#41 add getThreadAlive(),getStatus,getPageCount() to spider
2013-11-28 07:59:24 +08:00
yihua.huang
cf62d707e0
#36 Spider does not exit when success
2013-11-27 23:33:18 +08:00
yihua.huang
a01312930a
#39 Parsing html after page.getHtml()
2013-11-27 22:01:34 +08:00
yihua.huang
f63d33b457
update some comments
2013-11-27 21:06:53 +08:00
yihua.huang
04fcf3193f
#38 Change algorithm of SmartContentSelector
2013-11-23 13:56:55 +08:00
yihua.huang
296a68920e
fix javadoc and add setPipelines() for spider
2013-11-14 13:23:29 +08:00
yihua.huang
47a0360783
#35 add status code to page
2013-11-12 11:51:34 +08:00
yihua.huang
bc5c30de17
update scripts
2013-11-12 08:20:59 +08:00
yihua.huang
f9daae39cf
[maven-release-plugin] prepare for next development iteration
2013-11-11 14:33:11 +08:00
yihua.huang
fdb9441519
[maven-release-plugin] prepare release webmagic-0.4.0
2013-11-11 14:33:01 +08:00
yihua.huang
1d75ae7f5b
rollback version to 0.4.0 because not deploy success
2013-11-11 11:52:56 +08:00
yihua.huang
df8ca8ad09
add scripts
2013-11-10 22:30:48 +08:00
yihua.huang
e40b48e77b
Merge tag 'webmagic-0.4.0' of github.com:code4craft/webmagic
...
[maven-release-plugin] copy for tag webmagic-0.4.0
Conflicts:
pom.xml
webmagic-core/pom.xml
webmagic-extension/pom.xml
2013-11-06 22:48:26 +08:00
yihua.huang
775eb9732f
[maven-release-plugin] prepare for next development iteration
2013-11-06 22:17:58 +08:00
yihua.huang
0b4fadc24d
[maven-release-plugin] prepare release webmagic-0.4.0
2013-11-06 22:17:47 +08:00
yihua.huang
fe6d9bb2e2
get keep-alive rework
2013-11-06 21:53:39 +08:00
yihua.huang
fd6d2fd6f8
try to keepalive TCP connection
2013-11-06 21:19:14 +08:00
yihua.huang
425df08523
update version to 0.4.0
2013-11-06 12:50:45 +08:00
yihua.huang
e046bb0723
remove useless code
2013-11-06 12:48:14 +08:00
yihua.huang
6e32a19f80
update api for direct download
2013-11-06 12:46:50 +08:00
yihua.huang
807aefe9df
change EntityUtil to IOUtil because some encoding error
2013-11-06 07:37:34 +08:00
yihua.huang
00b0a751b4
#33 ignore 'content-encoding' when redirect
2013-11-06 06:57:58 +08:00
yihua.huang
8f774afc84
add direct download
2013-11-06 06:41:04 +08:00
yihua.huang
c18b603399
optimize long compare
2013-11-04 07:09:44 +08:00
yihua.huang
ed3f3583cc
downloader refactor
2013-11-04 01:03:23 +08:00
yihua.huang
a37f40e6e6
add cookie supoort
2013-11-04 00:59:48 +08:00
yihua.huang
3c6fced48e
update connection client
2013-11-04 00:53:01 +08:00
yihua.huang
09153ff715
#22 http proxy support #32 update httpclient to 4.3.1
2013-11-04 00:47:09 +08:00
yihua.huang
edfc319c45
update httpclient to 4.3.1
2013-11-04 00:06:30 +08:00
yihua.huang
160a149b05
todo bugfix
2013-11-03 23:10:09 +08:00
yihua.huang
583a0eba8c
#29 refactor some method name
2013-11-03 20:24:26 +08:00
yihua.huang
6fa82a418b
#29 seed urls with more information
2013-11-03 20:20:50 +08:00
yihua.huang
1446ada732
some refactor
2013-10-31 22:50:22 +08:00
yihua.huang
84976c81ec
remove useless code
2013-10-31 22:48:18 +08:00
yihua.huang
b4fcf41168
add exit when comlete option
2013-10-31 22:41:02 +08:00
yihua.huang
352887870c
remove shutdown call
2013-10-31 22:22:14 +08:00
yihua.huang
a3f9ad198f
refactor multi thread code in Spider
2013-10-31 21:52:43 +08:00
yihua.huang
7fb44d2eec
#30 reuse PoolingClientConnectionManager for HttpClientDownloader
2013-10-14 23:22:04 +08:00
yihua.huang
5a226387e0
#27 nullpointer fix
2013-10-11 11:32:44 +08:00
yihua.huang
16e12e3bc9
#27 customize http header for downloader
2013-10-11 08:37:21 +08:00
yihua.huang
1a2c84ea78
#27 add timeout config to site
2013-10-11 07:36:16 +08:00
yihua.huang
372cc0ad06
update jar
2013-09-23 13:21:40 +08:00
yihua.huang
4acbc19cee
[maven-release-plugin] prepare for next development iteration
2013-09-23 13:12:32 +08:00
yihua.huang
cc3b787991
[maven-release-plugin] prepare release webmagic-0.3.2
2013-09-23 13:12:19 +08:00
yihua.huang
b131878123
add example
2013-09-23 13:01:28 +08:00
yihua.huang
95ab4edec3
some bugfix
2013-09-23 08:38:54 +08:00
yihua.huang
fba330872b
fix a thread pool exception
2013-09-22 23:57:15 +08:00
yihua.huang
3c79d031bd
fix thread pool
2013-09-22 22:52:52 +08:00
yihua.huang
a2fba8caa2
update to 0.3.1
2013-09-09 12:48:01 +08:00
yihua.huang
fb693a4ac4
[maven-release-plugin] prepare for next development iteration
2013-09-08 22:25:07 +08:00
yihua.huang
bfaaa042b9
[maven-release-plugin] prepare release webmagic-parent-0.3.1
2013-09-08 22:24:48 +08:00
yihua.huang
c17a31a21d
fix null pointe exception #26
2013-09-08 21:09:49 +08:00
yihua.huang
d2e0f0cd33
#25 use URL api in UrlUtils.canonicalizeUrl()
2013-09-06 21:35:23 +08:00
yihua.huang
ef4cf49fee
add stop method to spider #24
2013-09-06 21:17:36 +08:00
yihua.huang
58150a090d
update jar
2013-09-05 20:56:25 +08:00
yihua.huang
57556ab879
merege
2013-09-05 20:53:15 +08:00
yihua.huang
692de76f86
fix issue #21 charset detect error
2013-09-04 15:27:51 +08:00
yihua.huang
e7bf425df4
[maven-release-plugin] prepare for next development iteration
2013-09-04 10:51:01 +08:00
yihua.huang
77ff252316
[maven-release-plugin] prepare release webmagic-0.3.0
2013-09-04 10:50:50 +08:00
yihua.huang
1fc8e104ab
add cycle retry
2013-09-04 10:32:13 +08:00
yihua.huang
d141541ef3
add retry
2013-09-04 09:57:19 +08:00
yihua.huang
a1ef2523cc
update xsoup version
2013-09-04 09:38:40 +08:00
yihua.huang
aefd0569a5
update version
2013-09-04 09:36:56 +08:00
yihua.huang
194518fd82
add switch
2013-09-04 08:21:34 +08:00
yihua.huang
326b97c65a
update
2013-09-04 00:15:54 +08:00
yihua.huang
2c3574537a
refactor in selectors
2013-09-02 14:14:24 +08:00
yihua.huang
85b7cf1563
complete test
2013-09-02 13:52:41 +08:00
yihua.huang
d7cd9e5747
update pom
2013-09-02 11:56:01 +08:00
yihua.huang
55d4a76ab7
newselectors
2013-09-02 08:21:32 +08:00
yihua.huang
d7abbd0e4b
fix compile error
2013-08-25 16:31:00 +08:00
yihua.huang
5e9e8b2541
add TextContentSelector
2013-08-25 16:30:38 +08:00
yihua.huang
0cc0ccee35
add charset specific for easy call of HttpClientDownloader
2013-08-25 15:41:43 +08:00
yihua.huang
91dcccf7b5
add a sample
2013-08-21 21:55:15 +08:00
yihua.huang
ad66d33f38
[maven-release-plugin] prepare for next development iteration
2013-08-20 23:39:59 +08:00
yihua.huang
9dc6b11954
[maven-release-plugin] prepare release webmagic-parent-0.2.1
2013-08-20 23:37:55 +08:00
yihua.huang
4f62dfc8a4
release
2013-08-20 23:37:20 +08:00
yihua.huang
74c940c758
[maven-release-plugin] prepare for next development iteration
2013-08-20 23:19:58 +08:00
yihua.huang
a4bb4e3429
[maven-release-plugin] prepare release webmagic-parent-0.2.1
2013-08-20 23:19:27 +08:00
yihua.huang
194f16aa75
update
2013-08-20 23:16:43 +08:00
yihua.huang
0f0f1a9bcd
release notes
2013-08-20 22:51:30 +08:00
yihua.huang
c1471718df
extractors
2013-08-20 22:44:53 +08:00
yihua.huang
20705b34ac
add more option to extractors
2013-08-20 22:13:30 +08:00
yihua.huang
c70ed57025
remove PriorityScheduler to core
2013-08-20 21:55:58 +08:00
yihua.huang
7003426898
update pom
2013-08-20 21:52:39 +08:00
yihua.huang
606417fdc7
update pom
2013-08-19 09:55:49 +08:00
yihua.huang
d460e136ef
update version
2013-08-19 09:52:15 +08:00
yihua.huang
c79d6ecf09
complete all comments
2013-08-17 23:30:49 +08:00
yihua.huang
90bbe9b951
webmagic-core
2013-08-17 23:24:04 +08:00
yihua.huang
17f8ead28f
update comments for selector
2013-08-17 21:33:54 +08:00
yihua.huang
77e6ca2945
update comments
2013-08-17 21:26:44 +08:00
yihua.huang
5073258237
closable
2013-08-17 21:19:24 +08:00
yihua.huang
d01c0eb8ce
update comments of spider
2013-08-17 21:15:36 +08:00
yihua.huang
5f1f4cbc46
update comments
2013-08-17 20:41:29 +08:00
yihua.huang
1148450ff9
update filecache to more useful
2013-08-17 18:12:47 +08:00
yihua.huang
3ba7a76f44
add combo extract to replace Extract2 Extract3...
2013-08-17 17:23:11 +08:00
yihua.huang
5cb45af3a4
+doc
2013-08-17 12:10:34 +08:00
yihua.huang
ef673b985e
add a method for httpclientdownloader
2013-08-14 13:32:23 +08:00
yihua.huang
067f3ea0cb
add some null pointer check for httpclientdownloader
2013-08-14 13:30:09 +08:00
yihua.huang
9e82256ce3
update docs
2013-08-12 10:08:20 +08:00
yihua.huang
0a902b441c
update docs
2013-08-12 09:55:17 +08:00
yihua.huang
0f2c5b5723
update redisscheduler
2013-08-11 18:28:12 +08:00
yihua.huang
787b952932
release notes and docs
2013-08-11 10:21:26 +08:00
yihua.huang
8b15f3c63d
add test
2013-08-10 20:33:47 +08:00
yihua.huang
ade5714d50
add https support
2013-08-10 18:52:27 +08:00
yihua.huang
21eca688e9
complete docs
2013-08-09 20:56:33 +08:00
yihua.huang
17d2d98cec
remove invalid @date
2013-08-09 20:43:06 +08:00
yihua.huang
268bd8d0c4
remove saxon to extension
2013-08-07 23:04:10 +08:00
yihua.huang
cff943f698
fix path format error
2013-08-07 13:05:12 +08:00
yihua.huang
5ef231a768
update version
2013-08-07 12:48:32 +08:00
yihua.huang
570533cce5
update readme
2013-08-07 09:45:38 +08:00
yihua.huang
36494bcfa5
add xpath2.0 api
2013-08-06 23:01:43 +08:00
yihua.huang
5c96407a3d
fix a null domain error
2013-08-06 22:43:31 +08:00
yihua.huang
c7005a0227
json fix
2013-08-06 22:36:37 +08:00
yihua.huang
e5f4b3916f
change file dir
2013-08-06 22:26:39 +08:00
yihua.huang
7d277e84d4
update lucene pipeline
2013-08-06 21:47:44 +08:00
yihua.huang
b40cca1122
move model package to plugin
2013-08-06 20:41:35 +08:00
yihua.huang
4eb3d60083
fix nullpointer exception
2013-08-05 22:06:39 +08:00
yihua.huang
b0af45f4bb
complete redis support
2013-08-05 21:44:29 +08:00
yihua.huang
f3a29d9315
fix pagedmodel bug
2013-08-05 21:03:47 +08:00
yihua.huang
629f8ac2d1
add extractors chain
2013-08-05 20:45:34 +08:00
yihua.huang
27ce3fc176
lazy init
2013-08-05 19:36:49 +08:00
yihua.huang
dc9f574e27
update request
2013-08-05 18:17:52 +08:00
yihua.huang
d56c681be1
add priority to request
2013-08-05 18:08:28 +08:00
yihua.huang
971e7b6ce2
add core
2013-08-05 13:53:13 +08:00
yihua.huang
619a12b303
add paged support
2013-08-04 21:22:15 +08:00
yihua.huang
a5c85c3c8b
add annotation ExtractByRaw
2013-08-04 15:12:06 +08:00
yihua.huang
1a50c64e33
update name
2013-08-04 10:05:03 +08:00
yihua.huang
a3a868f584
rename
2013-08-04 09:55:50 +08:00
yihua.huang
04a7fa037a
update pipeline
2013-08-04 09:53:01 +08:00
yihua.huang
21cae2ff2e
update package
2013-08-04 07:53:28 +08:00
yihua.huang
cfb8990453
update author
2013-08-04 03:04:30 +08:00
yihua.huang
b393e38320
add multi entity extract
2013-08-03 20:42:29 +08:00
yihua.huang
bfadac756a
fix an attribute bug
2013-08-03 18:36:03 +08:00
yihua.huang
145628557d
update afterextract api
2013-08-03 18:01:17 +08:00
yihua.huang
aca165b132
add and or selector
2013-08-03 17:38:36 +08:00
yihua.huang
69245e8c03
fix Class.assinable bug
2013-08-03 17:17:59 +08:00
yihua.huang
65518f7672
add list support
2013-08-03 17:01:25 +08:00
yihua.huang
d4de60a562
skip test
2013-08-03 16:35:12 +08:00
yihua.huang
d26cd82d59
rename package
2013-08-03 16:29:50 +08:00
yihua.huang
f84b53514f
complete objectpipeline
2013-08-03 15:55:54 +08:00
yihua.huang
866ab0a056
update email
2013-08-03 14:01:18 +08:00
yihua.huang
7c9e9ce869
xpath2.0
2013-08-03 07:28:46 +08:00
yihua.huang
7f27c28d4c
simplify api
2013-08-02 23:45:13 +08:00
yihua.huang
d7899e94ae
test saxon and invite XPath2.0 support
2013-08-02 23:39:34 +08:00
yihua.huang
3fe3d8f044
update
2013-08-02 13:51:42 +08:00
yihua.huang
516ff3310d
add failfast
2013-08-02 08:20:55 +08:00
yihua.huang
7a4dbb1f15
invite notnull
2013-08-02 08:09:37 +08:00
yihua.huang
06a39af0f3
add setter support
2013-08-02 07:32:37 +08:00
yihua.huang
abba3b7bff
add extract by url
2013-08-02 06:59:25 +08:00
yihua.huang
f08ffc34fd
rename
2013-08-02 06:33:48 +08:00
yihua.huang
c5cf05640a
processor
2013-08-01 22:53:44 +08:00
yihua.huang
50edd22ef6
add annotation
2013-08-01 22:40:57 +08:00
yihua.huang
7020b8648d
fix a thread problem
2013-07-30 21:39:43 +08:00
yihua.huang
52fd5cfc1c
fix encoding
2013-07-30 15:24:59 +08:00