Memtest Over Network

By Mark Sutton Monday, 16th March 2009

One rainy day we had a bad feeling that one of our rackable servers had corrupted memory. The server had some intermittent stability issues and hanged from time to time. It prompted us to test memory on all rackable servers in the rackable rack, especially that they were soon to become part of our hosting infrastructure.

If you ever tested memory you will most likely have heard about Memtest86 program. The procedure is simple, you just burn memtest.iso onto a CD and boot your server from it. Except that none of the rackables actually have CD-ROM drives.

We decided to use our internal PXE server, we use to install operating systems over the network, in order to run memtest. Below is a quick how to do it.

Before you go any further you may want to know how to setup a PXE server and carry out a linux installation from network which you can learn by visiting the following two links:

Once you have a working PXE setup, adding Memtest86 to it is quite easy.
First of all you need to download the source from:

www.memtest86.com

and compile it by single make command. In result, among other files you should find memtest.bin which is a linux kernel and is the only file you are going to need to boot memtest from your PXE server. In order to be able to load it, you need to copy the file to your TFTP root directory. Important note, when you copy the file remember to remove the .bin extension as PXE recognises files based on their extensions and would associate .bin with “CD boot sector” type of file causing the boot process to fail.

You also need to add the following line to your PXE configuration file (typically, pxelinux.cfg/default):

LABEL memtest
KERNEL distros/memtest

Additionaly you can add “memtest” option to your menu.msg file so it shows up on the pxe boot menu. This is all you need to start memtest from your PXE server.

Although we had no troubles booting our rackables into memtest we had an issue with the memory tests themselves. None of them seemed to run longer than 1 minute. They all seemed to just freeze after a while. It made us wonder if it was because of RAM errors. Luckily, it turned out that these hangs were caused by USB legacy feature enabled in the BIOS. As soon as we disabled it the tests ran just fine. This is something to be aware of, if you get unexpected behaviour during the tests, go to your BIOS setup and make sure the option is disabled.

Memtest86 on PXE certainly makes life easier when it comes to testing a bunch of servers. You can temporarily set memtest as the default boot option and power up all the servers you need to test. You can power them up all at the same time without worrying that they will seriously affect your network since the memtest kernel is 100KB and only takes a second or two to load. Once the servers have booted you can come back to them later on just to check the test results.


Posted in Technology, vCluster | No Comments » twitter-follow facebook-follow rss-follow

Leave a Reply

Your email address will not be published. Required fields are marked *