玩转日志第一步,通过fluentd转存nginx日志-阿里云开发者社区

[TOC]

Nginx配置fluentd

本文详细的介绍了通过fluentd把nginx日志转存。

1、fluentd介绍

Fluentd是一个完全开源免费的log信息收集软件，支持超过125个系统的log信息收集。
本质上，Fluentd可以分为客户端和服务端两种模块。客户端为安装在被采集系统中的程序，用于读取log文件等信息，并发送到Fluentd的服务端。服务端则是一个收集器。在Fluentd服务端，我们可以进行相应的配置，使其可以对收集到的数据进行过滤和处理，并最终路由到下一跳。下一跳可以是用于存储的数据库，如MongoDB, elasticsearch, 也可以是其他的数据处理平台，比如Hadoop。

2、fluentd安装和部署

2.1 安装前配置

ntp时间同步(重要)

yum install ntp
vim /etc/ntp.conf  //添加要同步的时间服务器
/etc/init.d/ntpd start

文件打开数

查看当前设置

 ulimit -n

增加最大文件打开数，修改/etc/security/limits.conf

root soft nofile 65536
root hard nofile 65536
* soft nofile 65536
* hard nofile 65536

修改`/etc/sysctl.conf`，添加如下内容

net.core.somaxconn = 1024
net.core.netdev_max_backlog = 5000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_wmem = 4096 12582912 16777216
net.ipv4.tcp_rmem = 4096 12582912 16777216
net.ipv4.tcp_max_syn_backlog = 8096
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 10240 65535

配置生效

sysctl -p

2.2 安装fluentd

rpm形式安装

下载最新的yum源并安装

/etc/yum.repos.d/td.repo

//官方脚本
 curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh

也可以直接设置repo

# cat /etc/yum.repos.d/td.repo 
[treasuredata]
name=TreasureData
baseurl=http://packages.treasuredata.com/2/redhat/$releasever/$basearch
gpgcheck=1
gpgkey=https://packages.treasuredata.com/GPG-KEY-td-agent

# yum install -y td-agent

启动fluend

$ /etc/init.d/td-agent start 
Starting td-agent: [  OK  ]
$ /etc/init.d/td-agent status
td-agent (pid  21678) is running...

查看状态

$ /etc/init.d/td-agent status

yum安装之后的配置文件

/etc/td-agent/td-agent.conf

对应配置文件路径

| 项目                       路径   
| 主配置文件        /etc/td-agent/td-agent.conf  
| 主程序           /usr/sbin/td-agent  
| 程序日志         /var/log/td-agent/td-agent.log  
| ruby程序位置     /opt/td-agent/embedded/bin/ruby  
| pid位置         /var/run/td-atent/td-agent.pid

查看默认主配置文件的配置

# egrep -v "^#|^$" /etc/td-agent/td-agent.conf
<match td.*.*>
  @type tdlog
  apikey YOUR_API_KEY
  auto_create_table
  buffer_type file
  buffer_path /var/log/td-agent/buffer/td
  <secondary>
    @type file
    path /var/log/td-agent/failed_records
  </secondary>
</match>
<match debug.**>
  @type stdout
</match>
<source>
  @type forward
</source>
<source>
  @type http
  port 8888
</source>
<source>
  @type debug_agent
  bind 127.0.0.1
  port 24230
</source>

默认配置下的监听状态

# netstat -utpln |grep ruby 
tcp        0      0 0.0.0.0:8888                0.0.0.0:*                   LISTEN      818/ruby            
tcp        0      0 0.0.0.0:24224               0.0.0.0:*                   LISTEN      823/ruby            
tcp        0      0 127.0.0.1:24230             0.0.0.0:*                   LISTEN      823/ruby            
udp        0      0 0.0.0.0:24224               0.0.0.0:*                               823/ruby

测试

通过8888端口提交一条日志

 curl -X POST -d 'json={"json":"message"}' http://localhost:8888/debug.test

查看输出的日志

# tail -n 1 /var/log/td-agent/td-agent.log
2018-02-09 00:09:06.719633187 +0800 debug.test: {"json":"message"}

以上就是官方给出的最简化的demo，作用不大，但可以初步理解fluentd的工作方式

3、CS结构

schema

数据流的处理都是双向的，既有Input方向也有Output方向。
角色：

日志转发 //client承担此角色
日志聚合 //server扮演此角色

日志转发 通常安装在每个节点上以接收本地日志，一旦接收到数据，通过网络将其转发到server端。

日志聚合 接收转发而来的数据，缓冲，过滤并定期通过插件将数据上传到其他存储程序、本地文件等媒介中。

==client、server使用同一样部署方式，区别在于配置文件的不同。==

4、配置讲解

4.1 `td-agent.conf`配置的组成部分

source directives determine the input sources.
match directives determine the output destinations.
filter directives determine the event processing pipelines.
system directives set system wide configuration.
label directives group the output and filter for internal routing
@include directives include other files.

source： 定义输入，数据的来源,input方向
match：定义输出，下一步的去向，如写入文件，或者发送到指定软件存储。output方向
filter：定义过滤，也即事件处理流水线，一般在输入和输出之间运行，可减少字段，也可丰富信息。
system：系统级别的设置，如日志级别、进程名称。
label：定义一组操作，从而实现复用和内部路由。
@include：引入其他文件，和Java、python的import类似。
//使用最多的就是source、match、filter

例如一个source部分：

# 从 24224/tcp 接收数据
# This is used by log forwarding and the fluent-cat command
<source>
  @type forward
  port 24224
</source>

# 从9880/http协议接收数据
# http://this.host:9880/myapp.access?json={"event":"data"}
<source>
  @type http
  port 9880
</source>

//@type 是必须有的参数，指定类型就是指定使用的插件
//http：使 fluentd 转变为一个 httpd 端点，以接受进入的 http 报文。
//forward：使 fluentd 转变为一个 TCP 端点，以接受 TCP 报文。

5、插件介绍及安装

Fluentd有6种类型的插件(或者叫方法)，分别是：

Input：完成输入数据的读取，由source部分配置
//常用类型：tail、http、forward、tcp、udp、exec

Parser：解析插件，常与输入、输处配合使用，多见于`format`字段后面
//常用类型：ltsv、json、自定义等

Output：完成输出数据的操作，由match部分配置
//常用配置：file、forward、copy、stdout、exec

filter：过滤插件
//常用配置：grep、ignore、record_transformer

Buffer：缓存插件，用于缓存数据
//常用配置：file、mem

Formatter：消息格式化的插件，用于输出，允许用户扩展和重新使用自定义输出格式
//常用类型：ltsv、json等

注意：out_forward 转发输出插件将事件转发到其他fluentd节点。此插件支持负载平衡和自动故障切换，由<server></server>标记。对于复制，请使用 out_copy 插件，

copy 输出插件将事件复制到多个输出。由<store></store>标记

安装一些需要的插件的命令

/opt/td-agent/embedded/bin/gem install woothee fluent-plugin-woothee  fluent-plugin-elasticsearch

安装指定版本插件

td-agent-gem  install fluent-plugin-woothee --version=0.2.1
td-agent-gem  install woothee --version=1.4.0

//'fluent-plugin-woothee' is a Fluentd filter plugin to parse UserAgent strings and to filter/drop specified categories of user terminals (like 'pc', 'smartphone' and so on).

yum安装的fluentd程序，td-agent-gem等价于/opt/td-agent/embedded/bin/gem

6、nginx日志配置样例

nginx日志格式`ltsv`

ltsv格式的日志可以简单的描述程kv形式，由于业务线上其他日志也是t进行分割的所以这里也采用了ltsv进行序列化

log_format ltsv   "time:$time_local"
                  "\trealip:$remote_addr"
                  "\txffip:$http_x_forwarded_for"
                  "\treq:$request"
                  "\tstatus:$status"
                  "\tsize:$body_bytes_sent"
                  "\treferer:$http_referer"
                  "\tua:$http_user_agent"
                  "\treqtime:$request_time"
                  "\tvhost:$host" ;

client端配置

##pc端日志收集
<source>
  type tail
  format ltsv
  path /log/nginx/www.itouzi.access.log
  pos_file /log/buffer/posfile/mergeua.www.access.log.pos
  tag mergeua.www.access.nginx
  time_key time
  time_format %d/%b/%Y:%H:%M:%S %z
</source>


##对日志中的ua进行合并和分类
# Merged ua
<match mergeua.**>
  type woothee
  key_name ua
  add_prefix merged
  merge_agent_info yes
</match>

##收集的数据由tcp协议转发到多个server的49875端口
## Multiple output
<match merged.mergeua.**>
  type forward
  <server>
   name es01
   host es01
   port 49875
   weight 60
  </server>
  <server>
   name es02
   host es02
   port 49875
   weight 60
  </server>
</match>

server端配置

##定义收集来源和监听端口
<source>
  @type forward
  bind 10.10.10.10
  port 49875
</source>

##对聚合的数据将ip转为地图地址，添加新的tag头部
<match merged.mergeua.www.access.nginx>
  @type geoip
  geoip_lookup_key   realip
  geoip_database    /etc/td-agent/GeoLiteCity.dat
  <record>
    location_array       '[${longitude["realip"]},${latitude["realip"]}]'
    country_code3   ${country_code3["realip"]}
    country         ${country_code["realip"]}
    city            ${city["realip"]}
  </record>
  skip_adding_null_record  true
  add_tag_prefix       geoip.
</match>

##聚合pc端数据复制到es和本地文件
<match geoip.merged.mergeua.www.access.nginx>
  @type copy
  <store>
    @type elasticsearch
    localtime
    index_name wwwaccess
    hosts es-master-01,es-master-02,es-master-03
    #port 9200
    logstash_format true
    logstash_prefix wwwaccess
    flush_interval 3s
  </store>
  <store>
    @type file
    path /data/fluentd/www/wwwaccess.log
    compress gzip
    flush_interval 86400s
  </store>
</match>

通过上述配置可以简单的把nginx日志通过fluentd 转发到任意地方存储起来进行查询。
最后在这里说一下为什么采用了fluentd而不是采用logstash或者是flume:
1.fluentd目前开源插件较多（可能没有logstash多但是足够用了），几乎不需要进行二次开发。
2.fluentd相对于其他两者相对较轻，内存占用极低，基本上正则匹配好之后，不需要开发再更改日志格式了。
3.本文由于ltsv格式可以直接被fluentd识别不需要再次正则匹配了，所以我在这里简单的写一个正则来给大家演示一下：

format /^(?<time>[^ ]* [^ ]*) (?<unixtime>[^ ]*)\t(?<host>\[(.*)\])\t(?<level>\[(.*)\])\t(?<category>\[(.*)\])\t(?<unique_id>\[(.*)\])\t(?<message>(.*))$/

这个正则表达式可以简单的把开发日志匹配成kv形式并且性能很好。

玩转日志第一步,通过fluentd转存nginx日志

Nginx配置fluentd

1、fluentd介绍

2、fluentd安装和部署

2.1 安装前配置

ntp时间同步(重要)

文件打开数

修改`/etc/sysctl.conf`，添加如下内容

2.2 安装fluentd

下载最新的yum源并安装

也可以直接设置repo

启动fluend

查看状态

yum安装之后的配置文件

对应配置文件路径

查看默认主配置文件的配置

测试

3、CS结构

4、配置讲解

4.1 `td-agent.conf`配置的组成部分

5、插件介绍及安装

Fluentd有6种类型的插件(或者叫方法)，分别是：

安装一些需要的插件的命令

安装指定版本插件

6、nginx日志配置样例

nginx日志格式`ltsv`

client端配置

server端配置

热门文章

最新文章

相关课程

相关电子书

相关实验场景

玩转日志第一步,通过fluentd转存nginx日志

Nginx配置fluentd

1、fluentd介绍

2、fluentd安装和部署

2.1 安装前配置

ntp时间同步(重要)

文件打开数

修改/etc/sysctl.conf，添加如下内容

2.2 安装fluentd

下载最新的yum源并安装

也可以直接设置repo

启动fluend

查看状态

yum安装之后的配置文件

对应配置文件路径

查看默认主配置文件的配置

测试

3、CS结构

4、配置讲解

4.1 td-agent.conf配置的组成部分

5、插件介绍及安装

Fluentd有6种类型的插件(或者叫方法)，分别是：

安装一些需要的插件的命令

安装指定版本插件

6、nginx日志配置样例

nginx日志格式ltsv

client端配置

server端配置

热门文章

最新文章

相关课程

相关电子书

相关实验场景

修改`/etc/sysctl.conf`，添加如下内容

4.1 `td-agent.conf`配置的组成部分

nginx日志格式`ltsv`