Floating-point (FP) instructions are among the least used while running Linux. They probably represent < 0.001% of the instructions executed on an average Linux box, unless one deals with scientific computations. Besides, if you really want to know how well designed the FPU in your processor is, it's easier to have a look at its data sheet and check how many clock cycles it takes to execute a given FPU instruction. But there are more benchmarks that measure FPU performance than anything else. Why ?
The original Whetstone benchmark was designed in the 60's by Brian Wichmann at the National Physical Laboratory, in England, as a test for an ALGOL 60 compiler for a hypothetical machine. The compilation system was named after the small town of Whetstone, where it was designed, and the name seems to have stuck to the benchmark itself.
The first practical implementation of the Whetstone benchmark was written by Harold Curnow in FORTRAN in 1972 (Curnow and Wichmann together published a paper on the Whetstone benchmark in 1976 for The Computer Journal). Historically it is the first major synthetic benchmark. It is designed to measure the execution speed of a variety of FP instructions (+, *, sin, cos, atan, sqrt, log, exp) on scalar and vector data, but also contains some integer code. Results are provided in MWIPS (Millions of Whetstone Instructions Per Second). The meaning of the expression "Whetstone Instructions" is not clear, though, at least after close examination of the C source code.
During the late 80's and early 90's it was recognized that Whetstone would not adequately measure the FP performance of parallel multiprocessor supercomputers (e.g. Cray and other mainframes dedicated to scientific computations). This spawned the development of various modern benchmarks, many of them with names like Fhoostone, as a humorous reference to Whetstone. Whetstone however is still widely used, because it provides a very reasonable metric as a measure of uniprocessor FP performance.
Whetstone has other interesting qualities for Linux users:
The version of the Whetstone benchmark that we are going to use for this example was slightly modified by Al Aburto and can be downloaded from his excellent FTP site dedicated to benchmarks. After downloading the file whets.c, you will have to edit slightly the source: a) Uncomment the "#define POSIX1" directive (this enables the Linux compatible timer routine). b) Uncomment the "#define DP" directive (since we are only interested in the Double Precision results).
This benchmark is extremely sensitive to compiler optimization options. Here is the line I used to compile it: cc whets.c -o whets -O2 -fomit-frame-pointer -ffast-math -fforce-addr -fforce-mem -m486 -lm.
Note that some compiler options of some versions of gcc are buggy, most notably one of -O, -O2, -O3, ... together with -funroll-loops can cause gcc to emit incorrect code on a Linux box. You can test your gcc with a short test program available at Uwe Mayer's site. Of course, if your compiler is buggy, then any test results are not written in stone, to say the least (pun intended). In short, don't use -funroll-loops to compile this benchmark, and try to stick to the optimization options listed above.
Just execute whets. Whetstone will display its results on standard output and also write a whets.res file if you give it the information it requests. Run it a few times to confirm that variations in the results are very small.
Some motherboards allow you to disable the L1 (internal) or L2 (external) caches through the BIOS configuration menus (take a look at the motherboard's manual; the ASUS P55T2P4 motherboard, for example, allows disabling both caches separately or together). You may want to experiment with these settings and/or main memory (DRAM) timing settings.
You can try to compile whets.c without any special optimization options, just to verify that compiler quality and compiler optimization options do influence benchmark results.
The Whetstone benchmark main loop executes in a few milliseconds on an average modern machine, so its designers decided to provide a calibration procedure that will first execute 1 pass, then 5, then 25 passes, etc... until the calibration takes more than 2 seconds, and then guess a number of passes xtra that will result in an approximate running time of 100 seconds. It will then execute xtra passes of each one of the 8 sections of the main loop, measure the running time for each (for a total running time very near to 100 seconds) and calculate a rating in MWIPS, the Whetstone metric. This is an interesting variation in the two basic procedures described in Section 1.
The main loop consists of 8 sections each containing a mix of various instructions representative of some type of computational task. Each section is itself a very short, very small loop, and has its own timing calculation. The code that gets looped through for section 8 for example is a single line of C code:
x = sqrt(exp(log(x)/t1); where x = 0.75 and t1=0.50000025, both defined as doubles.
Compiling as specified above with gcc 220.127.116.11, the resulting ELF executable whets is 13 096 bytes long on my system. It calls libc and of course libm for the trigonometric and transcendental math functions, but these should get compiled to very short executable code sequences since all modern CPUs have FPUs with these functions wired-in.
Now that we have an FPU performance figure for our machine, the next step is comparing it to other CPUs. Have you noticed all the data that whets.c asked you after you had run it for the first time? Well, Al Aburto has collected Whetstone results for your convenience at his site, you may want to download the data file and have a look at it. This kind of benchmarking data repository is very important, because it allows comparisons between various different machines. More on this topic in one of my next articles.
Whetstone is not a Linux specific test, it's not even an OS specific test, but it certainly is a good test for the FPU in your Linux box, and also gives an indication of compiler efficiency for specific kinds of applications that involve FP calculations.
I hope this gave you a taste of what benchmarking is all about.