Commit Graph

525 Commits (03c251237b307ef5e5c193f165248fd7221f665d)

Author SHA1 Message Date
yihua.huang 03c251237b add Json parse support 2014-04-13 10:23:00 +08:00
yihua.huang 843e928c2c comments on sinablogprocessor sample 2014-04-12 20:10:24 +08:00
yihua.huang be37d8b216 sinablogprocessor sample 2014-04-12 20:03:44 +08:00
yihua.huang 094f9d1552 rename assets for spell mistake 2014-04-12 13:42:32 +08:00
yihua.huang 2b023c95c2 qqmeishi demo 2014-04-11 11:43:04 +08:00
yihua.huang db65dfafb8 add baidunews sample 2014-04-09 23:32:07 +08:00
yihua.huang 3669e73e4a update News163: use Xsoup 0.2.0 syntax instead of ComboExtract 2014-04-09 16:43:55 +08:00
yihua.huang 02b441ad38 disable NativeObject in Rhino because it is a hotspot internal api and compile error in OpenJDK #93 2014-04-09 15:40:33 +08:00
yihua.huang 9f5a6494a0 add support for JDK6 #93 2014-04-09 10:44:52 +08:00
yihua.huang c6c56ad511 Merge branch 'master' of github.com:code4craft/webmagic 2014-04-09 09:54:13 +08:00
yihua.huang c2873928c8 [prototype] extractrule 2014-04-09 09:54:01 +08:00
Yihua Huang 7cb4e37812 Merge pull request #93 from friddle/master
update the script
2014-04-07 23:22:35 +08:00
friddle 933800147b update ruby 2014-04-07 23:18:00 +08:00
friddle 37666a7151 update the script 2014-04-07 23:04:24 +08:00
yihua.huang c1e7207869 add FileCacheQueueScheduler support for cycleRetryTimes 2014-04-07 11:00:09 +08:00
yihua.huang 969ad1766b change logger style to slf4j for cleaner code 2014-04-06 21:32:20 +08:00
yihua.huang 9b2cb43f47 ConfigurablePageProcessor #91 2014-04-05 23:40:10 +08:00
Yihua Huang 1090d070d9 Merge pull request #90 from ccliangbo/removeUnusedLines
Remove unused variable to make the project cleaner.
2014-04-05 22:00:30 +08:00
Bo LIANG 159eeea2f5 Remove unused variable to make the project cleaner. 2014-04-05 18:32:12 +08:00
yihua.huang c143fc662c add SubPageProcessor #86 2014-04-05 18:17:48 +08:00
Yihua Huang 2b2ce9ce13 Merge pull request #89 from ccliangbo/slf4jFormat
change the formatter of log.
2014-04-05 15:11:58 +08:00
Bo LIANG b043ac76d6 change the formatter of log.
To use slf4j, we should insert {} into the formatter string.
2014-04-05 11:31:56 +08:00
Yihua Huang 474f785dab Merge pull request #86 from sebastian1118/master
new feature: PatternProcessor
2014-04-04 23:41:27 +08:00
yihua.huang 8fe967ba8d [BugFix]exclude log4j.xml from maven jar plugin #82 2014-04-04 23:39:32 +08:00
Tian 38a12f8641 new feature: PatternProcessor 2014-04-04 22:02:52 +08:00
yihua.huang dafd0b5875 [BugFix]multi model in one pageprocessor will be skipped #85 2014-04-04 20:36:31 +08:00
yihua.huang 7aaf837e15 change logger to slf4j style for performance #84 2014-04-04 20:10:00 +08:00
yihua.huang f9b157951d Merge branch 'master' of github.com:code4craft/webmagic 2014-04-04 20:01:14 +08:00
yihua.huang 22c394e629 [doc] 2014-04-04 20:00:58 +08:00
Yihua Huang 3efa774191 Merge pull request #84 from ccliangbo/logInScheduler
Modify the log levels of LocalDuplicatedRemovedScheduler.java
2014-04-04 17:16:34 +08:00
Bo LIANG 762a3973fd Modify the log levels of LocalDuplicatedRemovedScheduler.java
The old version will print a debug level log each time the push method is
called. So sometimes, when a html page have multiple links for the same
page, the debug log will appears more than once. Also, when we meet a
duplicate URL, it will also print a log, which will be confusing.
I change the level of it to trace. And each time a URL is really push into
queue, print a debug level log.
2014-04-04 15:53:46 +08:00
yihua.huang 44293cd894 [doc]add qq group in readme 2014-04-04 10:07:48 +08:00
yihua.huang 9a0a4051ed [doc] ch3 part1 2014-04-04 08:05:34 +08:00
yihua.huang 7ca644cdd9 format readme 2014-04-04 06:47:28 +08:00
yihua.huang a1c7e826f7 fix dep of slf4j-log4j12 2014-04-03 23:04:31 +08:00
yihua.huang a34e92d11a fix huabanprocessor 2014-04-03 22:33:10 +08:00
yihua.huang 50cee4c7bb [doc] complete docs2.0 ch1 2014-04-03 11:06:03 +08:00
yihua.huang 9ec0ca02c6 doc2.0 ch1 2014-04-03 08:18:59 +08:00
yihua.huang 7e0e5b0969 clean ui 2014-04-02 11:47:44 +08:00
yihua.huang 94f97da4dc [Avalon] fix spring config for static and ignore google fonts for better loading speed 2014-04-02 07:36:31 +08:00
yihua.huang 22e8697671 add forger to folder 2014-04-01 23:16:03 +08:00
yihua.huang 05abd566a4 remove submodule 2014-04-01 23:08:28 +08:00
yihua.huang 01848301d4 encode illegal charactors in url #80 2014-04-01 22:14:30 +08:00
yihua.huang 2780423e60 enable blank space in quotes in UrlUtils.fixAllRelativeHrefs #80 2014-04-01 20:35:11 +08:00
yihua.huang 97b6f46280 Bugfix: break loop in addTargetRequests #81 2014-04-01 20:12:25 +08:00
yihua.huang d1563da33b add contributor 2014-04-01 08:07:25 +08:00
yihua.huang b13f1da039 reformat 2014-04-01 08:04:43 +08:00
yihua.huang 7038c00a9a reformat 2014-04-01 08:03:47 +08:00
yihua.huang 6252042ed2 add warning of slf4j #78 2014-04-01 08:02:22 +08:00
yihua.huang f3c2503a29 add warning of slf4j #78 2014-04-01 07:42:23 +08:00