- Page保存了上一次抓取的结果,并可定义待抓取的链接内容。
-
- 主要方法:
- {@link #getUrl()} 获取页面的Url
- {@link #getHtml()} 获取页面的html内容
- {@link #putField(String, Object)} 保存抽取的结果
- {@link #getResultItems()} 获取抽取的结果,在 {@link us.codecraft.webmagic.pipeline.Pipeline} 中调用
- {@link #addTargetRequests(java.util.List)} {@link #addTargetRequest(String)} 添加待抓取的链接
-
-
-
- Store extracted result and urls to be crawled.
-
- Main method:
- {@link #getUrl()} get url of current page
- {@link #getHtml()} get content of current page
- {@link #putField(String, Object)} save extracted result
- {@link #getResultItems()} get extract results to be used in {@link us.codecraft.webmagic.pipeline.Pipeline}
- {@link #addTargetRequests(java.util.List)} {@link #addTargetRequest(String)} add urls to fetch
-
-
-
- @author code4crafter@gmail.com
-]]>
-