$ cat urls-kazmuzikblog/nutch
http://kazuomik.livejournal.com/
http://kazuomik.livejournal.com/?skip=20
http://kazuomik.livejournal.com/?skip=40
http://kazuomik.livejournal.com/?skip=60
http://kazuomik.livejournal.com/?skip=80
http://kazuomik.livejournal.com/?skip=100
http://kazuomik.livejournal.com/?skip=120
...
http://kazuomik.livejournal.com/?skip=380
http://kazuomik.livejournal.com/?skip=400
$ cat conf/crawl-urlfilter.txt
+^http://kazuomik.livejournal.com/$
+^http://kazuomik.livejournal.com/[?]skip[=][0-9]*$
-.
$ bin/nutch crawl urls-kazmuzikblog -dir crawl-2 -depth 1
...
crawl finished: crawl-2
$ bin/nutch readseg -list crawl-2/segments/20070420134609
NAME GENERATED FETCHER START FETCHER END FETCHED PARSED
20070420134609 21 2007-04-20T13:46:15 2007-04-20T13:47:02 21 21
$
|