TCPWebLog is a Free Open Source Software system to collect and aggregate Web-type logs (i.e. Apache, Varnish, PHP, FTP …) from multiple GNU/Linux computers running on a Cloud.

When a user connects to a Website hosted at CatN, the response may come from different Web servers running on different virtual and physical machines. This first introduced the problem of needing to aggregate the logs from multiple cluster nodes to a central log server, and then split them up again based on the virtual host. Here at CatN, for each virtual host we used to write on a central place distinct log files for Apache Access log, Apache Error Log, Varnish, etc. These files are then rotated and processed using other software tools which required a significant administrative overhead.

In order to overcome this issue, and after analysing several options, I decided to create TCPWebLog, a system basically composed by a simple “client” program used to directly pipe the Web logs to a central server via TCP connection, and a “server” program to receive the logs and quickly aggregate/split them by case. To collect data from PHP I also created LogPipe, a PHP module for custom error log. This projects are available as a Free Open Source Software, so anyone can freely use and contribute to it.

The Architecture

TCPWebLog is composed by two main sections: TCPWebLog-Client and TCPWebLog-Server.

TCPWebLog-Client

This section contains the software to be installed on the cluster nodes. It is essentially composed by the tcpweblog_client.bin program to transmit the input log data to a remote server through a TCP connection.
This module can be also replaced by the default rsyslog as per the example below.

TCPWebLog-Server

The TCPWebLog-Server program listens on a TCP port for incoming log data from multiple TCPWebLog-Client clients and stores the logs on the local filesystem.
Once installed and configured, this system can be easily started and stopped using the provided SysV init script.

Installation and configuration

Before proceeding with the following instructions, please read the guide How to create TCPWebLog RPM packages for Enterprise Linux.

TCPWebLog-Client

The TCPWebLog-Client RPM must be installed on each cluster host you wish to monitor.

As root install the TCPWebLog-Client RPM files (please replace the version number with the correct one):

# rpm -i tcpweblog_client-1.2.0-1.el6.$(uname -m).rpm

The tcpweblog_client.bin installed by this RPM can be used to “pipe” the logs to the server where TCPWebLog-Server is installed.

The 7 parameters required by tcpweblog_client.bin are:

  • remote_ip_address: the IP address of the listening remote log server;
  • remote_port: the TCP port of the listening remote log server;
  • local_cache_file: the local cache file to temporarily store the logs when the TCP connection is not available;
  • logname: the last part of the log file name (i.e.: access.log);
  • cluster_number: the cluster number;
  • client_ip: the client (local) IP address;
  • client_hostname: the client (local) hostname.

Examples

APACHE (configuration per virtual host)

CustomLog "| /usr/bin/tcpweblog_client.bin 10.0.3.15 9940 /var/log/tcpweblog_cache.log access.log 1 10.0.2.15 xhost" combined
ErrorLog "| /usr/bin/tcpweblog_client.bin 10.0.3.15 9940 /var/log/tcpweblog_cache.log error.log 1 10.0.2.15 xhost"

APACHE SSL (configuration per virtual host)

CustomLog "| /usr/bin/tcpweblog_client.bin 10.0.3.15 9940 /var/log/tcpweblog_cache.log ssl.access.log 1 10.0.2.15 xhost" combined
ErrorLog "| /usr/bin/tcpweblog_client.bin 10.0.3.15 9940 /var/log/tcpweblog_cache.log ssl.error.log 1 10.0.2.15 xhost"

APACHE (general CustomLog)

# you must prefix the log format with "%A %V", for example:
LogFormat "%A %V %{X-Forwarded-For}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" tcpweblog
CustomLog "| /usr/bin/tcpweblog_client.bin 10.0.3.15 9940 /var/log/tcpweblog_cache.log access.log 1 - -" tcpweblog

VARNISHNCSA

# You must prefix the log format with "%A %V", for example:
varnishncsa -F "%A %V %{X-Forwarded-For}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" | /usr/bin/tcpweblog_client.bin 10.0.3.15 9940 /var/log/tcpweblog_cache.log varnish.log 1 - -

USING A NAMED PIPE TO FORWARD LOGS (pure-ftpd example)

# Create a named pipe:

mkfifo /var/log/pureftpd.log -Z system_u:object_r:var_log_t:s0

# Create a /root/ftplogpipe.sh file:

#!/bin/sh
(setsid bash -c '(while cat /var/log/pureftpd.log; do : Nothing; done | /usr/bin/tcpweblog_client.bin 10.0.3.15 9940 /var/log/tcpweblog_ftp_cache.log ftp.log 1 - -) & disown %%') </dev/null >&/dev/null &

# Add the following line to the end of /etc/rc.d/rc.local:

root/ftplogpipe.sh

# Edit /etc/pure-ftpd/pure-ftpd.conf:
Altlog clf:/var/log/pureftpd.log

USING RSYSLOG AS A CLIENT FOR TCPWebLog-Server (pure-ftpd example)

# Create a /etc/rsyslog.d/ftp.conf

# Rsyslog config file to forward pure-ftp logs to TCPWebLog-Server
$ModLoad imfile
# read the input ftp log file
$InputFileName /var/log/pureftpd.log
$InputFileTag :ftplog: # mark rows with a custom tag
$InputFileStateFile stat-ftp
$InputFileSeverity notice
$InputFileFacility ftp
$InputRunFileMonitor
# define a message template compatible with TCPWebLog-Server
# @@logname<TAB>cluster<TAB>clientip<TAB>clienthost<TAB>rawbuf
$template tcpweblog_format,"@@ftp.log	1	-	-	%msg%n"
# configure TCP connection and local cache
$WorkDirectory /var/lib/rsyslog # where to place spool files
$ActionQueueFileName FTPfwdRule # unique name prefix for spool files
$ActionQueueMaxDiskSpace 1g   # 1gb space limit (use as much as possible)
$ActionQueueSaveOnShutdown on # save messages to disk on shutdown
$ActionQueueType LinkedList   # run asynchronously
$ActionResumeRetryCount -1    # infinite retries if host is down
# send data to TCPWebLog-Server via TCP
:syslogtag, isequal, ":ftplog:" @@10.0.3.15:9940;tcpweblog_format


# Restart rsyslog and pure-ftpd:

service rsyslog restart
service pure-ftpd restart

On the above examples we are simply “piping” the log data to our program.

TCPWebLog-Server

The TCPWebLog-Server RPM must be installed on the Log Server (the computer receiving the logs from the clients) only.

As root install the TCPWebLog-Server RPM file (please replace the version number with the correct one):

# rpm -i tcpweblog_server-3.2.0-1.el6.$(uname -m).rpm

Once the RPM is installed you can configure the TCPWebLog-Server by editing the following file:

# nano /etc/tcpweblog_server.conf

The TCPWebLog-Server includes a SysV init script to start/stop/restart the service:

# /etc/init.d/tcpweblog_server start|stop|status|restart|reload|condrestart

The init script starts the tcpweblog_server.bin program that listens for incoming TCP connections from the clients.
The incoming log lines are appended to different log files depending by log type, IP address and host name.
The file path for each log type is:

  • general log
    [ROOT_DIR]/[CLUSTER_NUMBER]/logs/ip/[IP_ADDRESS]/[HOSTNAME].[logname]
  • FTP Error Log (when the logname contains the word “ftp”)
    [ROOT_DIR]/[CLUSTER_NUMBER]/logs/ident/[USER_ID]/[USER_ID].[logname]

NOTES:

  • The ROOT_DIR is defined on the configuration file.
  • The CLUSTER_NUMBER is always composed by 3 digits with zero padding.
  • The directories MUST be created in advance because the server program is unable to create them.
  • The particular filesystem structure has been chosen to be backward compatible with existing systems.

To start the service at boot you can use the following command:

# chkconfig tcpweblog_server on

Notes on Performances and Limitations

  • The transmission speed between the client and server is limited by the network bandwidth and quality.
  • The processing capacity of the Log server is limited by the hardware characteristics of the server.

Since this project is still at an experimental stage, I would like to invite you to try it and leave your comments and suggestions here to help develop the project further.

Nicola Asuni Systems Engineer

Nicola focused on designing, building and integrating the backend for our application platforms, including automatic deployment, monitoring and backups.