Commit Graph

238 Commits (03c251237b307ef5e5c193f165248fd7221f665d)

Author SHA1 Message Date
yihua.huang 1d75ae7f5b rollback version to 0.4.0 because not deploy success 2013-11-11 11:52:56 +08:00
yihua.huang df8ca8ad09 add scripts 2013-11-10 22:30:48 +08:00
yihua.huang 775eb9732f [maven-release-plugin] prepare for next development iteration 2013-11-06 22:17:58 +08:00
yihua.huang 0b4fadc24d [maven-release-plugin] prepare release webmagic-0.4.0 2013-11-06 22:17:47 +08:00
yihua.huang fe6d9bb2e2 get keep-alive rework 2013-11-06 21:53:39 +08:00
yihua.huang fd6d2fd6f8 try to keepalive TCP connection 2013-11-06 21:19:14 +08:00
yihua.huang 425df08523 update version to 0.4.0 2013-11-06 12:50:45 +08:00
yihua.huang e046bb0723 remove useless code 2013-11-06 12:48:14 +08:00
yihua.huang 6e32a19f80 update api for direct download 2013-11-06 12:46:50 +08:00
yihua.huang 807aefe9df change EntityUtil to IOUtil because some encoding error 2013-11-06 07:37:34 +08:00
yihua.huang 00b0a751b4 #33 ignore 'content-encoding' when redirect 2013-11-06 06:57:58 +08:00
yihua.huang 8f774afc84 add direct download 2013-11-06 06:41:04 +08:00
yihua.huang c18b603399 optimize long compare 2013-11-04 07:09:44 +08:00
yihua.huang ed3f3583cc downloader refactor 2013-11-04 01:03:23 +08:00
yihua.huang a37f40e6e6 add cookie supoort 2013-11-04 00:59:48 +08:00
yihua.huang 3c6fced48e update connection client 2013-11-04 00:53:01 +08:00
yihua.huang 09153ff715 #22 http proxy support #32 update httpclient to 4.3.1 2013-11-04 00:47:09 +08:00
yihua.huang edfc319c45 update httpclient to 4.3.1 2013-11-04 00:06:30 +08:00
yihua.huang 160a149b05 todo bugfix 2013-11-03 23:10:09 +08:00
yihua.huang 583a0eba8c #29 refactor some method name 2013-11-03 20:24:26 +08:00
yihua.huang 6fa82a418b #29 seed urls with more information 2013-11-03 20:20:50 +08:00
yihua.huang 1446ada732 some refactor 2013-10-31 22:50:22 +08:00
yihua.huang 84976c81ec remove useless code 2013-10-31 22:48:18 +08:00
yihua.huang b4fcf41168 add exit when comlete option 2013-10-31 22:41:02 +08:00
yihua.huang 352887870c remove shutdown call 2013-10-31 22:22:14 +08:00
yihua.huang a3f9ad198f refactor multi thread code in Spider 2013-10-31 21:52:43 +08:00
yihua.huang 7fb44d2eec #30 reuse PoolingClientConnectionManager for HttpClientDownloader 2013-10-14 23:22:04 +08:00
yihua.huang 5a226387e0 #27 nullpointer fix 2013-10-11 11:32:44 +08:00
yihua.huang 16e12e3bc9 #27 customize http header for downloader 2013-10-11 08:37:21 +08:00
yihua.huang 1a2c84ea78 #27 add timeout config to site 2013-10-11 07:36:16 +08:00
yihua.huang 4acbc19cee [maven-release-plugin] prepare for next development iteration 2013-09-23 13:12:32 +08:00
yihua.huang cc3b787991 [maven-release-plugin] prepare release webmagic-0.3.2 2013-09-23 13:12:19 +08:00
yihua.huang b131878123 add example 2013-09-23 13:01:28 +08:00
yihua.huang 95ab4edec3 some bugfix 2013-09-23 08:38:54 +08:00
yihua.huang fba330872b fix a thread pool exception 2013-09-22 23:57:15 +08:00
yihua.huang 3c79d031bd fix thread pool 2013-09-22 22:52:52 +08:00
yihua.huang fb693a4ac4 [maven-release-plugin] prepare for next development iteration 2013-09-08 22:25:07 +08:00
yihua.huang bfaaa042b9 [maven-release-plugin] prepare release webmagic-parent-0.3.1 2013-09-08 22:24:48 +08:00
yihua.huang c17a31a21d fix null pointe exception #26 2013-09-08 21:09:49 +08:00
yihua.huang d2e0f0cd33 #25 use URL api in UrlUtils.canonicalizeUrl() 2013-09-06 21:35:23 +08:00
yihua.huang ef4cf49fee add stop method to spider #24 2013-09-06 21:17:36 +08:00
yihua.huang 692de76f86 fix issue #21 charset detect error 2013-09-04 15:27:51 +08:00
yihua.huang e7bf425df4 [maven-release-plugin] prepare for next development iteration 2013-09-04 10:51:01 +08:00
yihua.huang 77ff252316 [maven-release-plugin] prepare release webmagic-0.3.0 2013-09-04 10:50:50 +08:00
yihua.huang 1fc8e104ab add cycle retry 2013-09-04 10:32:13 +08:00
yihua.huang d141541ef3 add retry 2013-09-04 09:57:19 +08:00
yihua.huang a1ef2523cc update xsoup version 2013-09-04 09:38:40 +08:00
yihua.huang aefd0569a5 update version 2013-09-04 09:36:56 +08:00
yihua.huang 194518fd82 add switch 2013-09-04 08:21:34 +08:00
yihua.huang 326b97c65a update 2013-09-04 00:15:54 +08:00
yihua.huang 2c3574537a refactor in selectors 2013-09-02 14:14:24 +08:00
yihua.huang 85b7cf1563 complete test 2013-09-02 13:52:41 +08:00
yihua.huang d7cd9e5747 update pom 2013-09-02 11:56:01 +08:00
yihua.huang 55d4a76ab7 newselectors 2013-09-02 08:21:32 +08:00
yihua.huang d7abbd0e4b fix compile error 2013-08-25 16:31:00 +08:00
yihua.huang 5e9e8b2541 add TextContentSelector 2013-08-25 16:30:38 +08:00
yihua.huang 0cc0ccee35 add charset specific for easy call of HttpClientDownloader 2013-08-25 15:41:43 +08:00
yihua.huang 91dcccf7b5 add a sample 2013-08-21 21:55:15 +08:00
yihua.huang ad66d33f38 [maven-release-plugin] prepare for next development iteration 2013-08-20 23:39:59 +08:00
yihua.huang 9dc6b11954 [maven-release-plugin] prepare release webmagic-parent-0.2.1 2013-08-20 23:37:55 +08:00
yihua.huang 4f62dfc8a4 release 2013-08-20 23:37:20 +08:00
yihua.huang 74c940c758 [maven-release-plugin] prepare for next development iteration 2013-08-20 23:19:58 +08:00
yihua.huang a4bb4e3429 [maven-release-plugin] prepare release webmagic-parent-0.2.1 2013-08-20 23:19:27 +08:00
yihua.huang 194f16aa75 update 2013-08-20 23:16:43 +08:00
yihua.huang 0f0f1a9bcd release notes 2013-08-20 22:51:30 +08:00
yihua.huang c1471718df extractors 2013-08-20 22:44:53 +08:00
yihua.huang 20705b34ac add more option to extractors 2013-08-20 22:13:30 +08:00
yihua.huang c70ed57025 remove PriorityScheduler to core 2013-08-20 21:55:58 +08:00
yihua.huang 7003426898 update pom 2013-08-20 21:52:39 +08:00
yihua.huang 606417fdc7 update pom 2013-08-19 09:55:49 +08:00
yihua.huang d460e136ef update version 2013-08-19 09:52:15 +08:00
yihua.huang c79d6ecf09 complete all comments 2013-08-17 23:30:49 +08:00
yihua.huang 90bbe9b951 webmagic-core 2013-08-17 23:24:04 +08:00
yihua.huang 17f8ead28f update comments for selector 2013-08-17 21:33:54 +08:00
yihua.huang 77e6ca2945 update comments 2013-08-17 21:26:44 +08:00
yihua.huang 5073258237 closable 2013-08-17 21:19:24 +08:00
yihua.huang d01c0eb8ce update comments of spider 2013-08-17 21:15:36 +08:00
yihua.huang 5f1f4cbc46 update comments 2013-08-17 20:41:29 +08:00
yihua.huang 1148450ff9 update filecache to more useful 2013-08-17 18:12:47 +08:00
yihua.huang 3ba7a76f44 add combo extract to replace Extract2 Extract3... 2013-08-17 17:23:11 +08:00
yihua.huang 5cb45af3a4 +doc 2013-08-17 12:10:34 +08:00
yihua.huang ef673b985e add a method for httpclientdownloader 2013-08-14 13:32:23 +08:00
yihua.huang 067f3ea0cb add some null pointer check for httpclientdownloader 2013-08-14 13:30:09 +08:00
yihua.huang 9e82256ce3 update docs 2013-08-12 10:08:20 +08:00
yihua.huang 0a902b441c update docs 2013-08-12 09:55:17 +08:00
yihua.huang 0f2c5b5723 update redisscheduler 2013-08-11 18:28:12 +08:00
yihua.huang 787b952932 release notes and docs 2013-08-11 10:21:26 +08:00
yihua.huang 8b15f3c63d add test 2013-08-10 20:33:47 +08:00
yihua.huang ade5714d50 add https support 2013-08-10 18:52:27 +08:00
yihua.huang 21eca688e9 complete docs 2013-08-09 20:56:33 +08:00
yihua.huang 17d2d98cec remove invalid @date 2013-08-09 20:43:06 +08:00
yihua.huang 268bd8d0c4 remove saxon to extension 2013-08-07 23:04:10 +08:00
yihua.huang cff943f698 fix path format error 2013-08-07 13:05:12 +08:00
yihua.huang 5ef231a768 update version 2013-08-07 12:48:32 +08:00
yihua.huang 570533cce5 update readme 2013-08-07 09:45:38 +08:00
yihua.huang 36494bcfa5 add xpath2.0 api 2013-08-06 23:01:43 +08:00
yihua.huang 5c96407a3d fix a null domain error 2013-08-06 22:43:31 +08:00
yihua.huang c7005a0227 json fix 2013-08-06 22:36:37 +08:00
yihua.huang e5f4b3916f change file dir 2013-08-06 22:26:39 +08:00
yihua.huang 7d277e84d4 update lucene pipeline 2013-08-06 21:47:44 +08:00