yihua.huang
2a15bc0289
contributor
2014-06-04 22:27:16 +08:00
yihua.huang
5e8ca02ec6
contributor
2014-06-04 22:26:56 +08:00
yihua.huang
db0195babb
update version in docs
2014-06-04 17:35:31 +08:00
yihua.huang
5f8c3fd5c5
update version
2014-06-04 17:33:30 +08:00
yihua.huang
0e9042eefa
update pom
2014-06-04 17:17:48 +08:00
yihua.huang
03170178c4
update pom
2014-06-04 17:01:37 +08:00
yihua.huang
c83b74f0f4
update pom for deploy
2014-06-04 16:55:34 +08:00
yihua.huang
7a64847a3c
Bugfix: selector does not works well in element #113
2014-06-03 20:03:33 +08:00
yihua.huang
8d67fd0357
change back return proxy from spider to httpclientdownloader #128
2014-05-28 08:08:51 +08:00
yihua.huang
40bf8ca58f
change return proxy from spider to httpclientdownloader #128
2014-05-28 07:57:42 +08:00
yihua.huang
1f21d9cc14
spell mistake fix #128
2014-05-28 07:29:19 +08:00
Yihua Huang
e310139d00
Merge pull request #128 from yxssfxwzy/proxy
...
多个代理的管理
2014-05-28 07:22:08 +08:00
yihua.huang
b165090434
Bugfix:Type convert error in JsonPathSelector #129
2014-05-27 21:19:22 +08:00
yihua.huang
95bdb30296
update xsoup version to release #113
2014-05-27 20:46:48 +08:00
yihua.huang
a5d1b56e44
fix ut #113
2014-05-27 18:07:53 +08:00
yihua.huang
3939074a23
Bugfix: nodes() only return the first element #113
2014-05-27 17:53:06 +08:00
yihua.huang
41c2ea9498
refactor of selectable cont' #113
...
1. remove lazy init of Html
2. rename strings to sourceTexts for better meaning
3. make getSourceTexts abstract and DO NOT always store strings
4. instead store parsed elements of document in HtmlNode
2014-05-27 17:34:19 +08:00
yihua.huang
f9825c214a
refactor selectable for html fragment #113
2014-05-27 16:00:51 +08:00
yihua.huang
03d26c169b
Enhance auto charset detect #126
...
1. Only read from content once to fix stream closed exception
2. invite moco as server test
2014-05-26 17:45:30 +08:00
zwf
c146e2c7b4
add proxy pool
2014-05-19 15:59:31 +08:00
zwf
07ea04223f
change_gitignore
2014-05-19 15:56:22 +08:00
yihua.huang
21982d3460
remove cpdetector temporary #126
2014-05-14 23:52:27 +08:00
fengwuze
fcbfb75608
修改自动从网页中获取字符的代码块,抽取出来成为单独的方法。
2014-05-14 19:14:42 +08:00
fengwuze
95494d3c4d
增加处理meta的逻辑。
...
遗留:
3、网页没有指定编码的情况下,需要采用cpdetector,但目前cpdetector这个在Maven的中央库里面没有,不清楚如何解决。
2014-05-14 14:53:54 +08:00
yihua.huang
dde2d89bbe
Ignore content in json when bracket when remove padding #124
2014-05-08 23:37:18 +08:00
Yihua Huang
2913da4763
Merge pull request #123 from gsh199449/master
...
Update JsonFilePipeline.java #122
2014-05-08 15:20:02 +08:00
yihua.huang
928f98dd93
auto create folder in JsonFilePipeline #122
2014-05-08 15:12:17 +08:00
GaoShen
5883ed93d7
Update JsonFilePipeline.java
...
JsonFilePipeline可以自动新建尚不存在的文件夹
2014-05-08 15:08:55 +08:00
Yihua Huang
4e65dac249
Merge pull request #121 from ywooer/master
...
创建指定编码的Writer
2014-05-06 20:14:35 +08:00
ywooer
259f0a16c5
Update FilePipeline.java
2014-05-06 18:33:00 +08:00
ywooer
26d38851b5
add charset to Writer
2014-05-06 18:28:50 +08:00
yihua.huang
7fbe18b8c0
implementation of PageMapper #120
2014-05-05 08:01:39 +08:00
yihua.huang
5dc9fe95a9
interface of PageMapper #120
2014-05-05 07:43:32 +08:00
yihua.huang
7668731f08
update version to snapshot
2014-05-05 07:03:55 +08:00
yihua.huang
5f6f489314
deperate in user manual
2014-05-03 06:29:37 +08:00
yihua.huang
81e6e772ac
versions back to 0.5.1
2014-05-03 06:18:57 +08:00
yihua.huang
dbebcbe44f
docs
2014-05-03 06:14:31 +08:00
yihua.huang
358e906379
[maven-release-plugin] prepare for next development iteration
2014-05-03 00:00:13 +08:00
yihua.huang
470750fc0d
[maven-release-plugin] prepare release WebMagic-0.5.1
2014-05-02 23:59:55 +08:00
yihua.huang
fc3d2906b0
remove avalon from pom temporary
2014-05-02 23:47:14 +08:00
yihua.huang
01aec7e1ab
extension point of geturl #118
2014-05-02 23:23:23 +08:00
yihua.huang
ec1c2e8cbc
test and so on
2014-05-02 23:19:11 +08:00
yihua.huang
4f22f1210e
some bug fix #118
2014-05-02 20:38:49 +08:00
yihua.huang
186b90512e
refactor redisscheduler #118
2014-05-02 20:24:15 +08:00
yihua.huang
56f033ce8d
set setDuplicateRemover for chain api #118
2014-05-02 20:21:23 +08:00
yihua.huang
d1140b9e29
add bloom filter for scheduler #118
2014-05-02 20:20:22 +08:00
yihua.huang
64293cba20
samples
2014-05-02 19:12:38 +08:00
yihua.huang
bc1d14fed4
sample
2014-05-02 17:54:21 +08:00
yihua.huang
8e4814bdc5
fix path seperator
2014-05-02 17:06:34 +08:00
yihua.huang
e8d4a9be2b
fix remove duplicate error #117
2014-04-29 20:32:06 +08:00