yihua.huang
a5d1b56e44
fix ut #113
2014-05-27 18:07:53 +08:00
yihua.huang
3939074a23
Bugfix: nodes() only return the first element #113
2014-05-27 17:53:06 +08:00
yihua.huang
41c2ea9498
refactor of selectable cont' #113
...
1. remove lazy init of Html
2. rename strings to sourceTexts for better meaning
3. make getSourceTexts abstract and DO NOT always store strings
4. instead store parsed elements of document in HtmlNode
2014-05-27 17:34:19 +08:00
yihua.huang
f9825c214a
refactor selectable for html fragment #113
2014-05-27 16:00:51 +08:00
yihua.huang
03d26c169b
Enhance auto charset detect #126
...
1. Only read from content once to fix stream closed exception
2. invite moco as server test
2014-05-26 17:45:30 +08:00
yihua.huang
21982d3460
remove cpdetector temporary #126
2014-05-14 23:52:27 +08:00
fengwuze
fcbfb75608
修改自动从网页中获取字符的代码块,抽取出来成为单独的方法。
2014-05-14 19:14:42 +08:00
fengwuze
95494d3c4d
增加处理meta的逻辑。
...
遗留:
3、网页没有指定编码的情况下,需要采用cpdetector,但目前cpdetector这个在Maven的中央库里面没有,不清楚如何解决。
2014-05-14 14:53:54 +08:00
yihua.huang
dde2d89bbe
Ignore content in json when bracket when remove padding #124
2014-05-08 23:37:18 +08:00
ywooer
259f0a16c5
Update FilePipeline.java
2014-05-06 18:33:00 +08:00
ywooer
26d38851b5
add charset to Writer
2014-05-06 18:28:50 +08:00
yihua.huang
7668731f08
update version to snapshot
2014-05-05 07:03:55 +08:00
yihua.huang
81e6e772ac
versions back to 0.5.1
2014-05-03 06:18:57 +08:00
yihua.huang
358e906379
[maven-release-plugin] prepare for next development iteration
2014-05-03 00:00:13 +08:00
yihua.huang
470750fc0d
[maven-release-plugin] prepare release WebMagic-0.5.1
2014-05-02 23:59:55 +08:00
yihua.huang
01aec7e1ab
extension point of geturl #118
2014-05-02 23:23:23 +08:00
yihua.huang
ec1c2e8cbc
test and so on
2014-05-02 23:19:11 +08:00
yihua.huang
4f22f1210e
some bug fix #118
2014-05-02 20:38:49 +08:00
yihua.huang
56f033ce8d
set setDuplicateRemover for chain api #118
2014-05-02 20:21:23 +08:00
yihua.huang
d1140b9e29
add bloom filter for scheduler #118
2014-05-02 20:20:22 +08:00
yihua.huang
8e4814bdc5
fix path seperator
2014-05-02 17:06:34 +08:00
yihua.huang
e8d4a9be2b
fix remove duplicate error #117
2014-04-29 20:32:06 +08:00
yihua.huang
a08d8cb167
update verion
2014-04-27 14:59:48 +08:00
yihua.huang
42a2676e8c
update version
2014-04-27 14:56:21 +08:00
yihua.huang
c25b32f1ca
[maven-release-plugin] prepare for next development iteration
2014-04-27 12:52:27 +08:00
yihua.huang
7ff83bb11a
[maven-release-plugin] prepare release WebMagic-0.5.0
2014-04-27 12:52:12 +08:00
yihua.huang
1104122979
more abstraction in scheduler
2014-04-27 09:30:01 +08:00
yihua.huang
2770811a10
update monitor example
2014-04-26 11:24:22 +08:00
yihua.huang
5ecd909ef2
add timeout for wait/notify #111
2014-04-25 19:41:36 +08:00
yihua.huang
c7afdb516e
remove thread utils #110
2014-04-25 18:44:45 +08:00
yihua.huang
17e95f2a7f
comments
2014-04-25 18:39:01 +08:00
yihua.huang
05eb7831b6
refactor and comments #110
2014-04-25 18:27:40 +08:00
yihua.huang
375e64e845
more monitor status
2014-04-25 18:10:14 +08:00
yihua.huang
018061d2cd
fix error in thread pool
2014-04-25 18:01:02 +08:00
yihua.huang
cdc423f2bf
log
2014-04-25 17:41:41 +08:00
yihua.huang
c6661899fd
new thread pool #110
2014-04-25 17:33:48 +08:00
yihua.huang
179baa7a22
return when page is null
2014-04-25 16:07:41 +08:00
yihua.huang
0336f4cdb4
remove IllegalStateException when download error for less error log
2014-04-25 16:06:29 +08:00
yihua.huang
11ba5beb42
[refactor]move monitor to webmagic-extension #98
2014-04-25 13:17:13 +08:00
yihua.huang
d61f65cef8
update mbean to mxbean #98
2014-04-25 11:31:43 +08:00
yihua.huang
ad6a273b12
update test url
2014-04-25 11:28:35 +08:00
yihua.huang
30af23d003
split monitor to server and client mode #98
2014-04-25 11:25:52 +08:00
yihua.huang
ced79630d3
specify jndi and jmx #98
2014-04-25 11:11:15 +08:00
yihua.huang
95d3802e77
add formdata support for post request #108
2014-04-24 11:48:58 +08:00
yihua.huang
f49bb877c8
clean some code #109
2014-04-24 11:38:13 +08:00
yihua.huang
e1aaf1dd11
fix mistake of guava Table #109
2014-04-24 11:05:49 +08:00
yihua.huang
8ba2da146c
request method #108 and more cookie #109 config
2014-04-24 10:51:37 +08:00
yihua.huang
b06aa489fb
[BugFix]Only one url from sourceRegion can be extracted #107
2014-04-18 17:48:26 +08:00
Bo LIANG
08fa3b01c1
when download error, throw an exception instead of calling onError and returning peacefully. #105
2014-04-17 17:53:12 +08:00
yihua.huang
27b37e8164
extension point and sample for JMX support #98
2014-04-17 08:12:37 +08:00