原创

CentOS-grep-搜索-计数-排序-从日志中查询爬虫日志

从日志中查询爬虫日志
## 这个是包括资源的并排序
grep "Baiduspider" access.log | awk '{print $7}'| sort | uniq -c| sort -rn
## 这个好像是具体哪个页面并排序
grep "Baiduspider" access.log | awk '{print $11}'| sort | uniq -c| sort -rn

#360
grep "360Spider" access.log | awk '{print $11}'| sort | uniq -c| sort -rn
grep "360SE" access.log | awk '{print $11}'| sort | uniq -c| sort -rn

grep "Spider" access.log | awk '{print $11}'| sort | uniq -c| sort -rn

看全貌的,就看 grep "360Spider" access.log 这种

360扫描的结果中有
"GET /robots.txt HTTP/1.1" 404 571
"GET /robots.txt HTTP/1.1" 403 571

## 定时任务
0 0 *** grep "Baiduspider"/path/to/access.log | awk '{print $7}'| sort | uniq -c| sort -rn >/path/to/result.txt

正文到此结束
本文目录