Commit Graph

300 Commits (95bdb3029635975b5660fb76df4c51cc3e70a972)

Author SHA1 Message Date
yihua.huang a5d1b56e44 fix ut #113 2014-05-27 18:07:53 +08:00
yihua.huang 3939074a23 Bugfix: nodes() only return the first element #113 2014-05-27 17:53:06 +08:00
yihua.huang 41c2ea9498 refactor of selectable cont' #113
1. remove lazy init of Html
2. rename strings to sourceTexts for better meaning
3. make getSourceTexts abstract and DO NOT always store strings
4. instead store parsed elements of document in HtmlNode
2014-05-27 17:34:19 +08:00
yihua.huang f9825c214a refactor selectable for html fragment #113 2014-05-27 16:00:51 +08:00
yihua.huang 03d26c169b Enhance auto charset detect #126
1. Only read from content once to fix stream closed exception
2. invite moco as server test
2014-05-26 17:45:30 +08:00
yihua.huang 21982d3460 remove cpdetector temporary #126 2014-05-14 23:52:27 +08:00
fengwuze fcbfb75608 修改自动从网页中获取字符的代码块,抽取出来成为单独的方法。 2014-05-14 19:14:42 +08:00
fengwuze 95494d3c4d 增加处理meta的逻辑。
遗留:
3、网页没有指定编码的情况下,需要采用cpdetector,但目前cpdetector这个在Maven的中央库里面没有,不清楚如何解决。
2014-05-14 14:53:54 +08:00
yihua.huang dde2d89bbe Ignore content in json when bracket when remove padding #124 2014-05-08 23:37:18 +08:00
ywooer 259f0a16c5 Update FilePipeline.java 2014-05-06 18:33:00 +08:00
ywooer 26d38851b5 add charset to Writer 2014-05-06 18:28:50 +08:00
yihua.huang 7668731f08 update version to snapshot 2014-05-05 07:03:55 +08:00
yihua.huang 81e6e772ac versions back to 0.5.1 2014-05-03 06:18:57 +08:00
yihua.huang 358e906379 [maven-release-plugin] prepare for next development iteration 2014-05-03 00:00:13 +08:00
yihua.huang 470750fc0d [maven-release-plugin] prepare release WebMagic-0.5.1 2014-05-02 23:59:55 +08:00
yihua.huang 01aec7e1ab extension point of geturl #118 2014-05-02 23:23:23 +08:00
yihua.huang ec1c2e8cbc test and so on 2014-05-02 23:19:11 +08:00
yihua.huang 4f22f1210e some bug fix #118 2014-05-02 20:38:49 +08:00
yihua.huang 56f033ce8d set setDuplicateRemover for chain api #118 2014-05-02 20:21:23 +08:00
yihua.huang d1140b9e29 add bloom filter for scheduler #118 2014-05-02 20:20:22 +08:00
yihua.huang 8e4814bdc5 fix path seperator 2014-05-02 17:06:34 +08:00
yihua.huang e8d4a9be2b fix remove duplicate error #117 2014-04-29 20:32:06 +08:00
yihua.huang a08d8cb167 update verion 2014-04-27 14:59:48 +08:00
yihua.huang 42a2676e8c update version 2014-04-27 14:56:21 +08:00
yihua.huang c25b32f1ca [maven-release-plugin] prepare for next development iteration 2014-04-27 12:52:27 +08:00
yihua.huang 7ff83bb11a [maven-release-plugin] prepare release WebMagic-0.5.0 2014-04-27 12:52:12 +08:00
yihua.huang 1104122979 more abstraction in scheduler 2014-04-27 09:30:01 +08:00
yihua.huang 2770811a10 update monitor example 2014-04-26 11:24:22 +08:00
yihua.huang 5ecd909ef2 add timeout for wait/notify #111 2014-04-25 19:41:36 +08:00
yihua.huang c7afdb516e remove thread utils #110 2014-04-25 18:44:45 +08:00
yihua.huang 17e95f2a7f comments 2014-04-25 18:39:01 +08:00
yihua.huang 05eb7831b6 refactor and comments #110 2014-04-25 18:27:40 +08:00
yihua.huang 375e64e845 more monitor status 2014-04-25 18:10:14 +08:00
yihua.huang 018061d2cd fix error in thread pool 2014-04-25 18:01:02 +08:00
yihua.huang cdc423f2bf log 2014-04-25 17:41:41 +08:00
yihua.huang c6661899fd new thread pool #110 2014-04-25 17:33:48 +08:00
yihua.huang 179baa7a22 return when page is null 2014-04-25 16:07:41 +08:00
yihua.huang 0336f4cdb4 remove IllegalStateException when download error for less error log 2014-04-25 16:06:29 +08:00
yihua.huang 11ba5beb42 [refactor]move monitor to webmagic-extension #98 2014-04-25 13:17:13 +08:00
yihua.huang d61f65cef8 update mbean to mxbean #98 2014-04-25 11:31:43 +08:00
yihua.huang ad6a273b12 update test url 2014-04-25 11:28:35 +08:00
yihua.huang 30af23d003 split monitor to server and client mode #98 2014-04-25 11:25:52 +08:00
yihua.huang ced79630d3 specify jndi and jmx #98 2014-04-25 11:11:15 +08:00
yihua.huang 95d3802e77 add formdata support for post request #108 2014-04-24 11:48:58 +08:00
yihua.huang f49bb877c8 clean some code #109 2014-04-24 11:38:13 +08:00
yihua.huang e1aaf1dd11 fix mistake of guava Table #109 2014-04-24 11:05:49 +08:00
yihua.huang 8ba2da146c request method #108 and more cookie #109 config 2014-04-24 10:51:37 +08:00
yihua.huang b06aa489fb [BugFix]Only one url from sourceRegion can be extracted #107 2014-04-18 17:48:26 +08:00
Bo LIANG 08fa3b01c1 when download error, throw an exception instead of calling onError and returning peacefully. #105 2014-04-17 17:53:12 +08:00
yihua.huang 27b37e8164 extension point and sample for JMX support #98 2014-04-17 08:12:37 +08:00