yihua.huang
41c2ea9498
refactor of selectable cont' #113
...
1. remove lazy init of Html
2. rename strings to sourceTexts for better meaning
3. make getSourceTexts abstract and DO NOT always store strings
4. instead store parsed elements of document in HtmlNode
2014-05-27 17:34:19 +08:00
yihua.huang
f9825c214a
refactor selectable for html fragment #113
2014-05-27 16:00:51 +08:00
yihua.huang
03d26c169b
Enhance auto charset detect #126
...
1. Only read from content once to fix stream closed exception
2. invite moco as server test
2014-05-26 17:45:30 +08:00
yihua.huang
21982d3460
remove cpdetector temporary #126
2014-05-14 23:52:27 +08:00
fengwuze
fcbfb75608
修改自动从网页中获取字符的代码块,抽取出来成为单独的方法。
2014-05-14 19:14:42 +08:00
fengwuze
95494d3c4d
增加处理meta的逻辑。
...
遗留:
3、网页没有指定编码的情况下,需要采用cpdetector,但目前cpdetector这个在Maven的中央库里面没有,不清楚如何解决。
2014-05-14 14:53:54 +08:00
yihua.huang
dde2d89bbe
Ignore content in json when bracket when remove padding #124
2014-05-08 23:37:18 +08:00
Yihua Huang
2913da4763
Merge pull request #123 from gsh199449/master
...
Update JsonFilePipeline.java #122
2014-05-08 15:20:02 +08:00
yihua.huang
928f98dd93
auto create folder in JsonFilePipeline #122
2014-05-08 15:12:17 +08:00
GaoShen
5883ed93d7
Update JsonFilePipeline.java
...
JsonFilePipeline可以自动新建尚不存在的文件夹
2014-05-08 15:08:55 +08:00
Yihua Huang
4e65dac249
Merge pull request #121 from ywooer/master
...
创建指定编码的Writer
2014-05-06 20:14:35 +08:00
ywooer
259f0a16c5
Update FilePipeline.java
2014-05-06 18:33:00 +08:00
ywooer
26d38851b5
add charset to Writer
2014-05-06 18:28:50 +08:00
yihua.huang
7fbe18b8c0
implementation of PageMapper #120
2014-05-05 08:01:39 +08:00
yihua.huang
5dc9fe95a9
interface of PageMapper #120
2014-05-05 07:43:32 +08:00
yihua.huang
7668731f08
update version to snapshot
2014-05-05 07:03:55 +08:00
yihua.huang
5f6f489314
deperate in user manual
2014-05-03 06:29:37 +08:00
yihua.huang
81e6e772ac
versions back to 0.5.1
2014-05-03 06:18:57 +08:00
yihua.huang
dbebcbe44f
docs
2014-05-03 06:14:31 +08:00
yihua.huang
358e906379
[maven-release-plugin] prepare for next development iteration
2014-05-03 00:00:13 +08:00
yihua.huang
470750fc0d
[maven-release-plugin] prepare release WebMagic-0.5.1
2014-05-02 23:59:55 +08:00
yihua.huang
fc3d2906b0
remove avalon from pom temporary
2014-05-02 23:47:14 +08:00
yihua.huang
01aec7e1ab
extension point of geturl #118
2014-05-02 23:23:23 +08:00
yihua.huang
ec1c2e8cbc
test and so on
2014-05-02 23:19:11 +08:00
yihua.huang
4f22f1210e
some bug fix #118
2014-05-02 20:38:49 +08:00
yihua.huang
186b90512e
refactor redisscheduler #118
2014-05-02 20:24:15 +08:00
yihua.huang
56f033ce8d
set setDuplicateRemover for chain api #118
2014-05-02 20:21:23 +08:00
yihua.huang
d1140b9e29
add bloom filter for scheduler #118
2014-05-02 20:20:22 +08:00
yihua.huang
64293cba20
samples
2014-05-02 19:12:38 +08:00
yihua.huang
bc1d14fed4
sample
2014-05-02 17:54:21 +08:00
yihua.huang
8e4814bdc5
fix path seperator
2014-05-02 17:06:34 +08:00
yihua.huang
e8d4a9be2b
fix remove duplicate error #117
2014-04-29 20:32:06 +08:00
yihua.huang
22652c4521
fix dep
2014-04-27 16:27:12 +08:00
yihua.huang
f84a858bce
update verion of forge
2014-04-27 15:38:38 +08:00
yihua.huang
5c00e59939
Merge branch 'stable'
2014-04-27 15:14:06 +08:00
yihua.huang
66692b2f74
update forger version
2014-04-27 15:09:26 +08:00
yihua.huang
c07b32cd85
en docs
2014-04-27 15:06:13 +08:00
yihua.huang
3355624035
docs
2014-04-27 15:05:51 +08:00
yihua.huang
a08d8cb167
update verion
2014-04-27 14:59:48 +08:00
yihua.huang
42a2676e8c
update version
2014-04-27 14:56:21 +08:00
yihua.huang
c892eadb56
contributor
2014-04-27 14:54:39 +08:00
yihua.huang
028f5e8755
readme
2014-04-27 14:52:44 +08:00
yihua.huang
c25b32f1ca
[maven-release-plugin] prepare for next development iteration
2014-04-27 12:52:27 +08:00
yihua.huang
7ff83bb11a
[maven-release-plugin] prepare release WebMagic-0.5.0
2014-04-27 12:52:12 +08:00
yihua.huang
dc3c175772
docs
2014-04-27 10:50:35 +08:00
yihua.huang
1104122979
more abstraction in scheduler
2014-04-27 09:30:01 +08:00
yihua.huang
b0fb1c3e10
remove copy-dependcies plugin for m2e error
2014-04-27 08:22:33 +08:00
yihua.huang
94a67165e1
remove jmx server for simplify #98
2014-04-26 20:17:52 +08:00
yihua.huang
86a45a6643
change SpiderMonitor to singleton #98
2014-04-26 18:14:25 +08:00
yihua.huang
ab4d36806e
clean code
2014-04-26 11:45:21 +08:00