ServerUsage is a Free Open Source Software system to collect and process usage statistic information from multiple computers running a GNU-Linux Operating System.

Since CatN is a “cluster” hosting company, one of the challenges is to track and analyse the customers’ activity spread on multiple physical hosts for both monitoring and billing purposes. From each physical host we need to collect at regular intervals the total disk I/O, the network traffic and the CPU ticks used by each system user, process and IP address. This “raw” log data must be then aggregated and processed on a central point to extract relevant information.

After spending some time searching for a ready-made solution I decided to start the ServerUsage project to best-fit our needs. This project is now available as a Free Open Source Software, so anyone can freely use and contribute to it.

NOTE: ServerUsage works as a complete system, but the small programs that it is composed from can be also used independently for different purposes.

The Architecture

ServerUsage is composed by two main sections: ServerUsage-Client and ServerUsage-Server.

Server Usage Diagram

ServerUsage flowchart schema

ServerUsage flowchart schema

ServerUsage-Client

This section contains the software to be installed on the computers for monitoring. It is essentially composed by a SystemTap kernel module to collect the usage information, and a program to transmit the data to a remote server through a TCP connection.
SystemTap is a free software infrastructure to simplify the gathering of information about the running Linux system, it is somewhat equivalent to DTrace on Solaris-based systems.
Once installed and configured, this system can be easily started and stopped using the provided SysV init script.

ServerUsage-Client-MDB

This section contains the software to be installed on the MariaDB database servers to monitor. MariaDB is an enhanced, drop-in replacement for MySQL. It includes the Google and Percona patches to get usage statistics. With this module you can measure the CPU time spent by each DB user and the amount of bytes sent and received.
Once installed and configured, this system can be easily started and stopped using the provided SysV init script.

ServerUsage-Server

The ServerUsage-Server program listens on a TCP port for incoming log data from multiple ServerUsage-Client clients and stores the logs in a SQLite table. An script (serverusage_dbagg.sh) is executed periodically by a cron job to aggregate data on another table and delete obsolete data.
Once installed and configured, this system can be easily started and stopped using the provided SysV init script.

The serverusage_api.php script can be used remotely to extract formatted data from the database or to display graphs.
This scripts accept several input parameters:

  • from: (integer) starting timestamp in seconds since EPOCH;
  • to: (integer) starting timestamp in seconds since EPOCH;
  • metric: (not available with svg mode) type of info to extract; Possible values are: ‘uid’, ‘ip’, ‘uip’, ‘grp’, ‘glb’, ‘all’. The return values for each metric are:
    • uid: user_id, cpu_ticks;
    • uidt: user_id, cpu_ticks, minimum start time, maximum end time;
    • ip: ip, net_in, net_out;
    • ipt: ip, net_in, net_out, minimum start time, maximum end time;
    • uip: user_id, ip, cpu_ticks, io_read, io_write, net_in, net_out;
    • uipt: user_id, ip, cpu_ticks, io_read, io_write, net_in, net_out, minimum start time, maximum end time;
    • grp: start_time, end_time, user_id, ip, cpu_ticks, io_read, io_write, net_in, net_out;
    • glb: (default for SVG mode) lah_start_time, lah_end_time, lah_cpu_ticks, lah_io_read, lah_io_write, lah_netin, lah_netout;
    • all: start_time, end_time, process, user_id, ip, cpu_ticks, io_read, io_write, net_in, net_out.
  • uid: (integer) if set, filter result for the requested user ID;
  • ip: (IP address) if set, filter result for the requested IP address.
  • mode: output format (‘json’ = JSON – JavaScript Object Notation, ‘csv’ = CSV TAB-Separated Values, ‘psa’ = base64 encoded PHP Serialized array, ‘svg’ = SVG – Scalable Vector Graphics).

Additional parameters for SVG mode:

  • width: (integer) optional width for SVG output (default 1024; minimum 50);
  • height: (integer) optional height for SVG output; will be rounded to a multiple of 5 (default 750, minimum 50);
  • scale: linear = vertical linear scale (default), log = vertical logarithmic scale;
  • bgcol: type of background color: ‘dark’ or ‘light’ (default);
  • gtype: sequence of number representing the graphs to display: 1 = CPU TICKS, 2 = IO READ, 3 = IO WRITE, 4 = NET IN, 5 = NET OUT. Default: 12345.

Usage Examples:

JSON

serverusage_api.php?from=1332769800&to=1332845100&metric=uid&mode=json serverusage_api.php?from=1332769800&to=1332845100&metric=uidt&mode=json serverusage_api.php?from=1332769800&to=1332845100&metric=ip&mode=json serverusage_api.php?from=1332769800&to=1332845100&metric=ipt&mode=json serverusage_api.php?from=1332769800&to=1332845100&metric=uip&mode=json serverusage_api.php?from=1332769800&to=1332845100&metric=uipt&mode=json serverusage_api.php?from=1332769800&to=1332845100&metric=all&mode=json serverusage_api.php?from=1332769800&to=1332845100&metric=all&uid=320&mode=json

CSV

serverusage_api.php?from=1332769800&to=1332845100&metric=all&uid=320&mode=csv

BASE64 ENCODED PHP SERIALIZED ARRAY

serverusage_api.php?from=1332769800&to=1332845100&metric=all&uid=320&mode=psa

SVG

serverusage_api.php?from=1332769800&to=1332845100&mode=svg&width=1024&height=750&scale=log
serverusage_api.php?from=1333532663&to=1333627917&mode=svg&scale=log&bgcol=light&gtype=12345
serverusage_api.php?from=1333532663&to=1333627917&mode=svg&scale=linear&bgcol=light&gtype=15
serverusage_api.php?from=1333532663&to=1333627917&mode=svg&scale=log&bgcol=light&gtype=5

Additional notes:

The reference time for all servers should be the standard UTC.
The latest available time on the ServerUsage-Server aggregated table is always in the past by the value specified by DB_AGGREGATION_DELAY constant (by default 5 minutes).

A time interval can be calculated as follows:

polling_interval = 900; // 15 minutes * 60 seconds; must be equal or greater than DB_AGGREGATION_DELAY.
delay_time = 600; // 10 minutes * 60 seconds; must be equal or greater than (2 * DB_AGGREGATION_DELAY).
end_time = (current_time - delay_time);
start_time = (end_time - polling_interval);

The serverusage_svg.html provides an example on how to use the serverusage_api.php to display statistical graphs.

serverusage_monitoring_screenshotScreenshot of ServerUsage API Monitoring

 Feel free to edit the serverusage_svg.html file and adapt it to your needs.

Installation and configuration

Before proceeding with the following instructions, please read the guide How to create ServerUsage RPM packages for Enterprise Linux.

Install ServerUsage-Client

The ServerUsage-Client RPM must be installed on each client computer you wish to monitor.

As root install the SystemTap-Runtime and ServerUsage-Client RPM files (please replace the version number with the correct one):

# rpm -i systemtap-runtime-1.7-1.el6.$(uname -m).rpm
# rpm -i serverusage_client-6.3.0-1.el6.$(uname -m).rpm

Configure the ServerUsage-Client

# nano /etc/serverusage_client.conf

Set the IP address of the Log server where ServerUsage-Server is installed and be sure that the specified TCP port is open on both client and server.

The ServerUsage-Client includes a SysV init script to start/stop/restart the service:

# /etc/init.d/serverusage_client start|stop|status|restart|reload|condrestart

When the service is started, the serverusage_client.ko SystemTap kernel module is executed via the staprun command and the output is piped to the serverusage_tcpsender.bin to be sent to the Log server via a TCP connection. If the connection is broken or the Log server is not responding, the log files are temporarily stored on /var/log/serverusage_cache.log file and resent as soon the TCP connection is restored.

To start the service at boot you can use the following command:

# chkconfig serverusage_client on

Install ServerUsage-Client-MDB

The ServerUsage-Client_MDB RPM must be installed on each MariaDB server you wish to monitor.

As root install the ServerUsage-Client-MDB RPM file (please replace the version number with the correct one):

# rpm -i serverusage_client_mdb-6.3.0-1.el6.$(uname -m).rpm

Configure the ServerUsage-Client-MDB

# nano /etc/serverusage_client_mdb.conf

Set the IP address of the Log server where ServerUsage-Server is installed and be sure that the specified TCP port is open on both client and server.

The ServerUsage-Client includes a SysV init script to start/stop/restart the service:

# /etc/init.d/serverusage_client_mdb start|stop|status|restart|reload|condrestart

When the service is started, the logs are collected and piped to the serverusage_tcpsender.bin to be sent to the Log server via a TCP connection. If the connection is broken or the Log server is not responding, the log files are temporarily stored on /var/log/serverusage_cache.log file and resent as soon the TCP connection is restored.

To start the service at boot you can use the following command:

# chkconfig serverusage_client_mdb on

Install ServerUsage-Server

The ServerUsage-Server RPM must be installed on the Log Server (the computer receiving the logs from the clients) only.

As root install the ServerUsage-Server RPM file (please replace the version number with the correct one):

# rpm -i serverusage_server-6.3.0-1.el6.$(uname -m).rpm

Once the RPM is installed you can configure the ServerUsage-Server by editing the following file:

# nano /etc/serverusage_server.conf

The ServerUsage-Server includes a SysV init script to start/stop/restart the service:

# /etc/init.d/serverusage_server start|stop|status|restart|reload|condrestart

The init script starts the serverusage_tcpreceiver.bin program that listens for incoming TCP connections from the clients, and installs a cron job to aggregate the data every 5 minutes.
The raw data received from serverusage_tcpreceiver.bin is stored on a SQLite 3 database (var/lib/serverusage/serverusage.db) table named log_raw. The table containing the aggregated data is called log_agg_hst. The aggregated data is immediately removed from the log_raw table. The data on log_agg_hst older than DB_GARBAGE_TIME seconds is automatically removed.

To start the service at boot you can use the following command:

# chkconfig serverusage_server on

To extract formatted information from the SQLite database you can use the serverusage_api.php. This file is installed by default in /var/www/serverusage directory, so you have to configure Apache/PHP accordingly or move the script to another position.

The serverusage_api.php allows you to extract filtered information in various formats: JSON (JavaScript Object Notation), CSV (tab-separated text values), Base64 encoded serialized array or SVG (Scalable Vector Graphics). You can find an example HTML file that displays an auto-update graph using the php API in the same directory where the file serverusage_api.php is located.

Notes on Performances and Limitations

  • The compilation options of the SystemTap module are set by default to handle a maximum of one million lines per minute. This value is big enough to handle almost all usage cases but you can change it by setting the MAXACTION and MAXMAPENTRIES parameters of the stap command.
  • The transmision speed between the client and server is limited by the network bandwidth and quality.
  • The processing capacity of the Log server is limited by the hardware characteristics of the server.
  • On a virtual machine running CentOS 6.2 with 2 virtual processors and 4GB of RAM, I successfully sent and processed more than 110,000 lines per second.

Since this project is still at an experimental stage, I invite you to try it and leave your comments and suggestions here to help develop the project further.

Nicola Asuni Systems Engineer

Nicola focused on designing, building and integrating the backend for our application platforms, including automatic deployment, monitoring and backups.