Commit Graph

143 Commits (f68795d7dd1ad3202a59ab9d49030065992001b1)

Author SHA1 Message Date
yihua.huang 7586e3d75c add some test for github repo downloader 2016-01-19 08:05:53 +08:00
yihua.huang 56e0cd513a compile error fix 2015-04-15 23:21:06 +08:00
yihua.huang c5740b1840 change assert #200 2015-04-15 08:32:08 +08:00
yihua.huang 67eb632f4d test for issue #200 2015-04-15 08:31:45 +08:00
yihua.huang 4446669c24 fix test 2014-08-18 10:54:24 +08:00
yihua.huang 9866297ec4 Disable jsoup entity escape by Default. Set Html.DISABLE_HTML_ENTITY_ESCAPE to false to enable it. #149 2014-08-14 08:04:56 +08:00
yihua.huang b3a282e58d some fix for tests #130 2014-06-10 00:05:30 +08:00
yihua.huang 074d767f45 Merge branch 'proxy' of github.com:yxssfxwzy/webmagic into yxssfxwzy-proxy 2014-06-09 23:51:36 +08:00
zwf 2f89cfc31a add test and fix bug of proxy module 2014-06-09 13:32:02 +08:00
yihua.huang eb89d66566 fix test 2014-06-04 22:28:27 +08:00
yihua.huang 5e8ca02ec6 contributor 2014-06-04 22:26:56 +08:00
yihua.huang 7a64847a3c Bugfix: selector does not works well in element #113 2014-06-03 20:03:33 +08:00
yihua.huang b165090434 Bugfix:Type convert error in JsonPathSelector #129 2014-05-27 21:19:22 +08:00
yihua.huang a5d1b56e44 fix ut #113 2014-05-27 18:07:53 +08:00
yihua.huang 3939074a23 Bugfix: nodes() only return the first element #113 2014-05-27 17:53:06 +08:00
yihua.huang 41c2ea9498 refactor of selectable cont' #113
1. remove lazy init of Html
2. rename strings to sourceTexts for better meaning
3. make getSourceTexts abstract and DO NOT always store strings
4. instead store parsed elements of document in HtmlNode
2014-05-27 17:34:19 +08:00
yihua.huang 03d26c169b Enhance auto charset detect #126
1. Only read from content once to fix stream closed exception
2. invite moco as server test
2014-05-26 17:45:30 +08:00
yihua.huang 21982d3460 remove cpdetector temporary #126 2014-05-14 23:52:27 +08:00
fengwuze fcbfb75608 修改自动从网页中获取字符的代码块,抽取出来成为单独的方法。 2014-05-14 19:14:42 +08:00
yihua.huang dde2d89bbe Ignore content in json when bracket when remove padding #124 2014-05-08 23:37:18 +08:00
ywooer 26d38851b5 add charset to Writer 2014-05-06 18:28:50 +08:00
yihua.huang ec1c2e8cbc test and so on 2014-05-02 23:19:11 +08:00
yihua.huang 4f22f1210e some bug fix #118 2014-05-02 20:38:49 +08:00
yihua.huang d1140b9e29 add bloom filter for scheduler #118 2014-05-02 20:20:22 +08:00
yihua.huang 5ecd909ef2 add timeout for wait/notify #111 2014-04-25 19:41:36 +08:00
yihua.huang 11ba5beb42 [refactor]move monitor to webmagic-extension #98 2014-04-25 13:17:13 +08:00
yihua.huang d61f65cef8 update mbean to mxbean #98 2014-04-25 11:31:43 +08:00
yihua.huang ad6a273b12 update test url 2014-04-25 11:28:35 +08:00
yihua.huang 27b37e8164 extension point and sample for JMX support #98 2014-04-17 08:12:37 +08:00
yihua.huang f7950ebcab fix tests 2014-04-13 13:00:31 +08:00
yihua.huang 84b897f83b update AngularJSProcessor 2014-04-13 12:20:57 +08:00
yihua.huang 03c251237b add Json parse support 2014-04-13 10:23:00 +08:00
yihua.huang 22c394e629 [doc] 2014-04-04 20:00:58 +08:00
yihua.huang 01848301d4 encode illegal charactors in url #80 2014-04-01 22:14:30 +08:00
yihua.huang 2780423e60 enable blank space in quotes in UrlUtils.fixAllRelativeHrefs #80 2014-04-01 20:35:11 +08:00
yihua.huang 8d8194bee4 Change HashMap to LinkedHashMap in ResultItems for same order of input and output #76 2014-03-25 08:23:20 +08:00
yihua.huang 8b35d79569 Do not cache document in Selectable for selected Html element #73 2014-03-19 22:19:06 +08:00
yihua.huang 6c11718566 Clean project structure #70 2014-03-14 23:24:38 +08:00
yihua.huang 2768a1cae4 add test for cycleTriedTimes and fix cycleTriedTimes inc error #60 2014-03-01 15:10:38 +08:00
Almark Ming 2b46b11e55 Update RegexSelector.java
Optimize regex format check

Conflicts:
	webmagic-core/src/main/java/us/codecraft/webmagic/selector/RegexSelector.java
2013-12-21 08:38:17 +08:00
yihua.huang b51fb2696b update ut for cookie 2013-12-06 00:30:01 +08:00
yihua.huang ff2f588c41 #48 nullpointer exception 2013-12-04 22:11:20 +08:00
yihua.huang cf62d707e0 #36 Spider does not exit when success 2013-11-27 23:33:18 +08:00
yihua.huang a3f9ad198f refactor multi thread code in Spider 2013-10-31 21:52:43 +08:00
yihua.huang 5a226387e0 #27 nullpointer fix 2013-10-11 11:32:44 +08:00
yihua.huang fba330872b fix a thread pool exception 2013-09-22 23:57:15 +08:00
yihua.huang d2e0f0cd33 #25 use URL api in UrlUtils.canonicalizeUrl() 2013-09-06 21:35:23 +08:00
yihua.huang ef4cf49fee add stop method to spider #24 2013-09-06 21:17:36 +08:00
yihua.huang 194518fd82 add switch 2013-09-04 08:21:34 +08:00
yihua.huang 2c3574537a refactor in selectors 2013-09-02 14:14:24 +08:00
yihua.huang d7abbd0e4b fix compile error 2013-08-25 16:31:00 +08:00
yihua.huang 5e9e8b2541 add TextContentSelector 2013-08-25 16:30:38 +08:00
yihua.huang c1471718df extractors 2013-08-20 22:44:53 +08:00
yihua.huang c70ed57025 remove PriorityScheduler to core 2013-08-20 21:55:58 +08:00
yihua.huang c79d6ecf09 complete all comments 2013-08-17 23:30:49 +08:00
yihua.huang 268bd8d0c4 remove saxon to extension 2013-08-07 23:04:10 +08:00
yihua.huang b40cca1122 move model package to plugin 2013-08-06 20:41:35 +08:00
yihua.huang 619a12b303 add paged support 2013-08-04 21:22:15 +08:00
yihua.huang a5c85c3c8b add annotation ExtractByRaw 2013-08-04 15:12:06 +08:00
yihua.huang 21cae2ff2e update package 2013-08-04 07:53:28 +08:00
yihua.huang cfb8990453 update author 2013-08-04 03:04:30 +08:00
yihua.huang bfadac756a fix an attribute bug 2013-08-03 18:36:03 +08:00
yihua.huang 145628557d update afterextract api 2013-08-03 18:01:17 +08:00
yihua.huang aca165b132 add and or selector 2013-08-03 17:38:36 +08:00
yihua.huang 69245e8c03 fix Class.assinable bug 2013-08-03 17:17:59 +08:00
yihua.huang 65518f7672 add list support 2013-08-03 17:01:25 +08:00
yihua.huang d4de60a562 skip test 2013-08-03 16:35:12 +08:00
yihua.huang d26cd82d59 rename package 2013-08-03 16:29:50 +08:00
yihua.huang f84b53514f complete objectpipeline 2013-08-03 15:55:54 +08:00
yihua.huang 866ab0a056 update email 2013-08-03 14:01:18 +08:00
yihua.huang 7c9e9ce869 xpath2.0 2013-08-03 07:28:46 +08:00
yihua.huang 7f27c28d4c simplify api 2013-08-02 23:45:13 +08:00
yihua.huang d7899e94ae test saxon and invite XPath2.0 support 2013-08-02 23:39:34 +08:00
yihua.huang 3fe3d8f044 update 2013-08-02 13:51:42 +08:00
yihua.huang abba3b7bff add extract by url 2013-08-02 06:59:25 +08:00
yihua.huang f08ffc34fd rename 2013-08-02 06:33:48 +08:00
yihua.huang c5cf05640a processor 2013-08-01 22:53:44 +08:00
yihua.huang 50edd22ef6 add annotation 2013-08-01 22:40:57 +08:00
yihua.huang 52fd5cfc1c fix encoding 2013-07-30 15:24:59 +08:00
yihua.huang 65dc372152 update pipeline api 2013-07-25 13:32:39 +08:00
yihua.huang 96454fd74c update java doc 2013-07-24 18:26:54 +08:00
yihua.huang 81e7f7982e invite jsoup and cssselector 2013-07-20 08:34:18 +08:00
yihua.huang c733046045 +sina blog 2013-07-19 12:36:55 +08:00
yihua.huang 5c79550fd9 add offline cache and process 2013-06-24 14:42:49 +08:00
yihua.huang 9b1ba6e8bc ignore unstable test 2013-06-20 17:57:31 +08:00
yihua.huang 7bed01c9f2 update Spider api 2013-06-20 07:53:48 +08:00
yihua.huang 986ae0beaf update Select api: remove x() s() etc. 2013-06-19 09:57:41 +08:00
yihua.huang fb0797b65c update docs 2013-06-18 22:13:40 +08:00
yihua.huang 0ae7adf324 add cookie support & add docs 2013-06-18 08:32:11 +08:00
yihua.huang 8cef8774cb change author info 2013-06-18 07:24:19 +08:00
yihua.huang f0fa1dad07 clean some code 2013-06-17 11:12:22 +08:00
yihua.huang 755b9aa84e remove samples in test 2013-06-08 20:59:27 +08:00
yihua.huang 6dc88fa111 split modules 2013-06-08 20:48:27 +08:00