The old version will print a debug level log each time the push method is called. So sometimes, when a html page have multiple links for the same page, the debug log will appears more than once. Also, when we meet a duplicate URL, it will also print a log, which will be confusing. I change the level of it to trace. And each time a URL is really push into queue, print a debug level log. |
||
---|---|---|
.. | ||
src | ||
README.md | ||
pom.xml |
README.md
webmagic-core
webmagic核心部分。只包含爬虫基本模块和基本抽取器。webmagic-core的目标是成为网页爬虫的一个教科书般的实现。