March 2009

Monthly Archives

OpenVZ forced umount of lustre mount problem

Recently we managed to find an answer to a quite worrying lustre problem that has been bugging us for some time. Every now and then on servers running OpenVZ containers that make use of lustre filesystem we would see a log entry in /var/log/messages saying:

kernel: Lustre: setting import lustre-server-MDT0000_UUID INACTIVE by administrator request

followed by a number of broken mounts/fs errors inside containers running on the server that the log entry appeared. In effect, all the containers making extensive use of the same lustre server would stop working properly (for example, apache serving sites from lustre mounts would start spawning processes all of which would be unsuccessfully trying to read data from the mounts).

Read More

Adventures with Lustre

For the last few months we’ve been busy integrating, testing and tuning Lustre for use on our hosting platform. I thought I’d share some notes…

Lustre is most widely used in HPC settings and there seem to be relatively few operations using it in conjunction with web servers and virtualisation. Reading through the wiki and mailling lists it soon becomes clear that Lustre has not been designed with small files and high metadata request rates in mind. Read More

Preparing Procurve Switches for Production

We recently took delivery of some new Procurve 5406zl switches for our Cloud platform. As with any new device being attached to the network there are quite a few tasks that must be done before it can be plugged into the production network. Naturally we document this internally but this time the process was quite interesting so I thought I’d blog it as well. Read More

.htaccess revisited

About a year ago Dawid posted about the performance of .htaccess files. We decided to revisit these tests to compare the performance of .htaccess files on local disk and network filesystems. The network filesystem we used was Lustre, chosen partly because we are doing a lot of testing with Lustre at the moment but also because of its’ known issues with metatdata and small file performance.

The test setup is a Dual Xeon 3.2GHz server with 4GB of RAM and a gigabit network connection to the testing and storage vlans. We installed the latest version of our Apache 2.2.11 build with mpm-worker and two vhosts. One vhost has a DocumentRoot on local disk, the other has a DocumentRoot on Lustre storage. Read More

Getting to know Lustre

Lustre is a complex cluster filesystem aimed at super computing clusters, offering scalability to many Gigabytes per second and Petabytes of raw storage. We’ve been testing it for some time as a base filesystem for virtualisation and I wanted to share some notes. Read More

Process list for OpenVZ containers (vztop)

The standard linux task list shows you each process and the resources (e.g. CPU, Memory) that they are consuming. However, if you run top on an OpenVZ host server, it doesn’t show you the container ID of each process.

Fortunately, there is a tool imaginatively named, vztop, which can show you a slightly more useful list of container and host processes, along with their container ID. Read More

Choosing the right switch for storage

Storage over Ethernet is a hot topic in the web hosting datacentre at the moment. With several excellent storage protocols to choose from and plenty of vendors offering product there is certainly a lot of choice in the marketplace. But it’s easy to overlook the importance of the Ethernet switches that are going to be carrying your data – after all you just spent a fortune on ‘redundant’ storage right? Read More

Memtest Over Network

One rainy day we had a bad feeling that one of our rackable servers had corrupted memory. The server had some intermittent stability issues and hanged from time to time. It prompted us to test memory on all rackable servers in the rackable rack, especially that they were soon to become part of our hosting infrastructure.

If you ever tested memory you will most likely have heard about Memtest86 program. The procedure is simple, you just burn memtest.iso onto a CD and boot your server from it. Except that none of the rackables actually have CD-ROM drives. Read More

Resolving bind zone transfer issues

Today, a fax machine at the office started complaining that it couldn’t send emails. No useful error messages or anything…

After some digging it turns out the fax machine was getting SERVFAIL from the name server. This nameserver carries slaved zones for certain domains used for critical infrastructure, and it turns out there was a problem getting updates from upstream. Read More