这是一个创建于 3788 天前的主题,其中的信息可能已经有所发展或是发生改变。
这是我截取的access log. 其中/{xxx}代表的是我网站的某个路径,其他的都是原始的log未做改动。
这个爬虫IP不固定,封了后过一会会有新的IP爬过来。
这个爬虫从大概2年前就开始爬我的站,中间我的站关掉了一年左右,现在重新开,没想到这个爬虫居然还在。不知道什么路数。很有可能我关站的这段时间他还在爬。大家给分析分析
89.248.162.170 - - [05/Aug/2014:04:42:36 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a3" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120427 Firefox/15.0a1"
89.248.162.170 - - [05/Aug/2014:04:42:36 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a2" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11"
89.248.162.170 - - [05/Aug/2014:04:42:36 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a0" "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1"
89.248.162.170 - - [05/Aug/2014:04:42:36 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11"
89.248.162.170 - - [05/Aug/2014:04:42:36 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a4" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/11.10 Chromium/17.0.963.65 Chrome/17.0.963.65 Safari/535.11"
89.248.162.170 - - [05/Aug/2014:04:42:37 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a2" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11"
89.248.162.170 - - [05/Aug/2014:04:42:37 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a9" "Mozilla/6.0 (Windows NT 6.2; WOW64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1"
89.248.162.170 - - [05/Aug/2014:04:42:37 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a3" "Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1"
94.102.49.31 - - [05/Aug/2014:04:42:56 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a6" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_4) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.65 Safari/535.11"
94.102.49.31 - - [05/Aug/2014:04:42:56 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a8" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/10.10 Chromium/17.0.963.65 Chrome/17.0.963.65 Safari/535.11"
94.102.49.31 - - [05/Aug/2014:04:42:57 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a1" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0.1"
94.102.49.31 - - [05/Aug/2014:04:42:58 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a3" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/11.04 Chromium/17.0.963.65 Chrome/17.0.963.65 Safari/535.11"
94.102.49.31 - - [05/Aug/2014:04:42:58 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a8" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11"
94.102.49.31 - - [05/Aug/2014:04:42:58 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a1" "Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1"
94.102.49.31 - - [05/Aug/2014:04:42:58 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a7" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11"
94.102.49.31 - - [05/Aug/2014:04:42:58 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a3" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/11.10 Chromium/17.0.963.65 Chrome/17.0.963.65 Safari/535.11"
94.102.49.31 - - [05/Aug/2014:04:43:00 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a1" "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1"
94.102.49.31 - - [05/Aug/2014:04:43:01 +0000] "GET /{xxx} HTTP/1.1" 301 193 "http://www.google.com/#q=a2" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.16) Gecko/20120427 Firefox/15.0a1"
15 条回复 • 2014-08-05 15:25:55 +08:00
|
|
1
liangdi 2014-08-05 12:48:57 +08:00
log呢?
|
|
|
2
sanp 2014-08-05 12:50:30 +08:00
@ liangdi 刚才按了Enter居然自动发布了。。。我刚编辑了。
|
|
|
3
plprapper 2014-08-05 13:16:23 +08:00
这哪里是爬虫, 简直是癞皮狗。。。。
|
|
|
4
liangdi 2014-08-05 13:26:59 +08:00
是采集器吧 lz什么站?
|
|
|
5
sintrb 2014-08-05 13:28:21 +08:00
这爬虫好可怜。。
|
|
|
8
ChanneW 2014-08-05 14:35:14 +08:00
怎么看出不是真 google 的
|
|
|
11
sanp 2014-08-05 15:22:05 +08:00
@ liangdi 一个工具类的站,查询数据的,对方是遍历抓取的。我奇怪的是我站都关了一年多。重新开了,他居然还在。
|
|
|
12
sanp 2014-08-05 15:22:40 +08:00
|
|
|
13
sanp 2014-08-05 15:24:02 +08:00
@ plprapper 确实,一般的爬虫禁了就行了,这个是禁了吗,过会就有别的IP过来,而且抓取很频繁,基本不停的爬。
|
|
|
14
sanp 2014-08-05 15:25:03 +08:00
@ captainhcg 没有用wordpress。这个爬虫是遍历网站页面,然后就不停的爬。
|