yihua.huang
7586e3d75c
add some test for github repo downloader
2016-01-19 08:05:53 +08:00
yihua.huang
56e0cd513a
compile error fix
2015-04-15 23:21:06 +08:00
yihua.huang
c5740b1840
change assert #200
2015-04-15 08:32:08 +08:00
yihua.huang
67eb632f4d
test for issue #200
2015-04-15 08:31:45 +08:00
yihua.huang
4446669c24
fix test
2014-08-18 10:54:24 +08:00
yihua.huang
9866297ec4
Disable jsoup entity escape by Default. Set Html.DISABLE_HTML_ENTITY_ESCAPE to false to enable it. #149
2014-08-14 08:04:56 +08:00
yihua.huang
b3a282e58d
some fix for tests #130
2014-06-10 00:05:30 +08:00
yihua.huang
074d767f45
Merge branch 'proxy' of github.com:yxssfxwzy/webmagic into yxssfxwzy-proxy
2014-06-09 23:51:36 +08:00
zwf
2f89cfc31a
add test and fix bug of proxy module
2014-06-09 13:32:02 +08:00
yihua.huang
eb89d66566
fix test
2014-06-04 22:28:27 +08:00
yihua.huang
5e8ca02ec6
contributor
2014-06-04 22:26:56 +08:00
yihua.huang
7a64847a3c
Bugfix: selector does not works well in element #113
2014-06-03 20:03:33 +08:00
yihua.huang
b165090434
Bugfix:Type convert error in JsonPathSelector #129
2014-05-27 21:19:22 +08:00
yihua.huang
a5d1b56e44
fix ut #113
2014-05-27 18:07:53 +08:00
yihua.huang
3939074a23
Bugfix: nodes() only return the first element #113
2014-05-27 17:53:06 +08:00
yihua.huang
41c2ea9498
refactor of selectable cont' #113
...
1. remove lazy init of Html
2. rename strings to sourceTexts for better meaning
3. make getSourceTexts abstract and DO NOT always store strings
4. instead store parsed elements of document in HtmlNode
2014-05-27 17:34:19 +08:00
yihua.huang
03d26c169b
Enhance auto charset detect #126
...
1. Only read from content once to fix stream closed exception
2. invite moco as server test
2014-05-26 17:45:30 +08:00
yihua.huang
21982d3460
remove cpdetector temporary #126
2014-05-14 23:52:27 +08:00
fengwuze
fcbfb75608
修改自动从网页中获取字符的代码块,抽取出来成为单独的方法。
2014-05-14 19:14:42 +08:00
yihua.huang
dde2d89bbe
Ignore content in json when bracket when remove padding #124
2014-05-08 23:37:18 +08:00
ywooer
26d38851b5
add charset to Writer
2014-05-06 18:28:50 +08:00
yihua.huang
ec1c2e8cbc
test and so on
2014-05-02 23:19:11 +08:00
yihua.huang
4f22f1210e
some bug fix #118
2014-05-02 20:38:49 +08:00
yihua.huang
d1140b9e29
add bloom filter for scheduler #118
2014-05-02 20:20:22 +08:00
yihua.huang
5ecd909ef2
add timeout for wait/notify #111
2014-04-25 19:41:36 +08:00
yihua.huang
11ba5beb42
[refactor]move monitor to webmagic-extension #98
2014-04-25 13:17:13 +08:00
yihua.huang
d61f65cef8
update mbean to mxbean #98
2014-04-25 11:31:43 +08:00
yihua.huang
ad6a273b12
update test url
2014-04-25 11:28:35 +08:00
yihua.huang
27b37e8164
extension point and sample for JMX support #98
2014-04-17 08:12:37 +08:00
yihua.huang
f7950ebcab
fix tests
2014-04-13 13:00:31 +08:00
yihua.huang
84b897f83b
update AngularJSProcessor
2014-04-13 12:20:57 +08:00
yihua.huang
03c251237b
add Json parse support
2014-04-13 10:23:00 +08:00
yihua.huang
22c394e629
[doc]
2014-04-04 20:00:58 +08:00
yihua.huang
01848301d4
encode illegal charactors in url #80
2014-04-01 22:14:30 +08:00
yihua.huang
2780423e60
enable blank space in quotes in UrlUtils.fixAllRelativeHrefs #80
2014-04-01 20:35:11 +08:00
yihua.huang
8d8194bee4
Change HashMap to LinkedHashMap in ResultItems for same order of input and output #76
2014-03-25 08:23:20 +08:00
yihua.huang
8b35d79569
Do not cache document in Selectable for selected Html element #73
2014-03-19 22:19:06 +08:00
yihua.huang
6c11718566
Clean project structure #70
2014-03-14 23:24:38 +08:00
yihua.huang
2768a1cae4
add test for cycleTriedTimes and fix cycleTriedTimes inc error #60
2014-03-01 15:10:38 +08:00
Almark Ming
2b46b11e55
Update RegexSelector.java
...
Optimize regex format check
Conflicts:
webmagic-core/src/main/java/us/codecraft/webmagic/selector/RegexSelector.java
2013-12-21 08:38:17 +08:00
yihua.huang
b51fb2696b
update ut for cookie
2013-12-06 00:30:01 +08:00
yihua.huang
ff2f588c41
#48 nullpointer exception
2013-12-04 22:11:20 +08:00
yihua.huang
cf62d707e0
#36 Spider does not exit when success
2013-11-27 23:33:18 +08:00
yihua.huang
a3f9ad198f
refactor multi thread code in Spider
2013-10-31 21:52:43 +08:00
yihua.huang
5a226387e0
#27 nullpointer fix
2013-10-11 11:32:44 +08:00
yihua.huang
fba330872b
fix a thread pool exception
2013-09-22 23:57:15 +08:00
yihua.huang
d2e0f0cd33
#25 use URL api in UrlUtils.canonicalizeUrl()
2013-09-06 21:35:23 +08:00
yihua.huang
ef4cf49fee
add stop method to spider #24
2013-09-06 21:17:36 +08:00
yihua.huang
194518fd82
add switch
2013-09-04 08:21:34 +08:00
yihua.huang
2c3574537a
refactor in selectors
2013-09-02 14:14:24 +08:00
yihua.huang
d7abbd0e4b
fix compile error
2013-08-25 16:31:00 +08:00
yihua.huang
5e9e8b2541
add TextContentSelector
2013-08-25 16:30:38 +08:00
yihua.huang
c1471718df
extractors
2013-08-20 22:44:53 +08:00
yihua.huang
c70ed57025
remove PriorityScheduler to core
2013-08-20 21:55:58 +08:00
yihua.huang
c79d6ecf09
complete all comments
2013-08-17 23:30:49 +08:00
yihua.huang
268bd8d0c4
remove saxon to extension
2013-08-07 23:04:10 +08:00
yihua.huang
b40cca1122
move model package to plugin
2013-08-06 20:41:35 +08:00
yihua.huang
619a12b303
add paged support
2013-08-04 21:22:15 +08:00
yihua.huang
a5c85c3c8b
add annotation ExtractByRaw
2013-08-04 15:12:06 +08:00
yihua.huang
21cae2ff2e
update package
2013-08-04 07:53:28 +08:00
yihua.huang
cfb8990453
update author
2013-08-04 03:04:30 +08:00
yihua.huang
bfadac756a
fix an attribute bug
2013-08-03 18:36:03 +08:00
yihua.huang
145628557d
update afterextract api
2013-08-03 18:01:17 +08:00
yihua.huang
aca165b132
add and or selector
2013-08-03 17:38:36 +08:00
yihua.huang
69245e8c03
fix Class.assinable bug
2013-08-03 17:17:59 +08:00
yihua.huang
65518f7672
add list support
2013-08-03 17:01:25 +08:00
yihua.huang
d4de60a562
skip test
2013-08-03 16:35:12 +08:00
yihua.huang
d26cd82d59
rename package
2013-08-03 16:29:50 +08:00
yihua.huang
f84b53514f
complete objectpipeline
2013-08-03 15:55:54 +08:00
yihua.huang
866ab0a056
update email
2013-08-03 14:01:18 +08:00
yihua.huang
7c9e9ce869
xpath2.0
2013-08-03 07:28:46 +08:00
yihua.huang
7f27c28d4c
simplify api
2013-08-02 23:45:13 +08:00
yihua.huang
d7899e94ae
test saxon and invite XPath2.0 support
2013-08-02 23:39:34 +08:00
yihua.huang
3fe3d8f044
update
2013-08-02 13:51:42 +08:00
yihua.huang
abba3b7bff
add extract by url
2013-08-02 06:59:25 +08:00
yihua.huang
f08ffc34fd
rename
2013-08-02 06:33:48 +08:00
yihua.huang
c5cf05640a
processor
2013-08-01 22:53:44 +08:00
yihua.huang
50edd22ef6
add annotation
2013-08-01 22:40:57 +08:00
yihua.huang
52fd5cfc1c
fix encoding
2013-07-30 15:24:59 +08:00
yihua.huang
65dc372152
update pipeline api
2013-07-25 13:32:39 +08:00
yihua.huang
96454fd74c
update java doc
2013-07-24 18:26:54 +08:00
yihua.huang
81e7f7982e
invite jsoup and cssselector
2013-07-20 08:34:18 +08:00
yihua.huang
c733046045
+sina blog
2013-07-19 12:36:55 +08:00
yihua.huang
5c79550fd9
add offline cache and process
2013-06-24 14:42:49 +08:00
yihua.huang
9b1ba6e8bc
ignore unstable test
2013-06-20 17:57:31 +08:00
yihua.huang
7bed01c9f2
update Spider api
2013-06-20 07:53:48 +08:00
yihua.huang
986ae0beaf
update Select api: remove x() s() etc.
2013-06-19 09:57:41 +08:00
yihua.huang
fb0797b65c
update docs
2013-06-18 22:13:40 +08:00
yihua.huang
0ae7adf324
add cookie support & add docs
2013-06-18 08:32:11 +08:00
yihua.huang
8cef8774cb
change author info
2013-06-18 07:24:19 +08:00
yihua.huang
f0fa1dad07
clean some code
2013-06-17 11:12:22 +08:00
yihua.huang
755b9aa84e
remove samples in test
2013-06-08 20:59:27 +08:00
yihua.huang
6dc88fa111
split modules
2013-06-08 20:48:27 +08:00