PreviousNextIndex

The Cyrix 6x86 Coma bug


Quick description of the 6x86 Coma bug

Processors concerned The Cyrix/IBM 6x86, 6x86L, and 6x86MX.

All revisions up to the current 6x86MX Rev. 1.4 are affected. Cyrix has stated that future 6x86MX revisions won't suffer from this bug.

What does the bug do? A simple, short, legal instruction sequence will lock the processor in an infinite loop and will prevent servicing of interrupts. This will obviously crash the machine.
Is this bug specific to GNU/Linux? No. Since it's a hardware bug, it will affect all OS's equally.
Workaround 1 Enable the NO_LOCK bit (0x10) in configuration register CCR1 (0xc1).
Workaround 2 Use the undocumented Cyrix workaround (see the News page, November 12 & 13).
How do I implement workaround 1? Insert a simple command in rc.local or rc.cyrix : set6x86 -p 0xc1 -s 0x10, or use any other utility to set the NO_LOCK flag in CCR1.
How do I test? See the gcc-compatible program below.
How serious is it? Exactly as serious as the new Pentium F0 bug: if you run a multiuser OS on a 6x86 machine, any user with permissions to run a small innocent program may bring down the machine. And as with the Pentium F0 bug, it's almost impossible to trace the cause of the crash. So, it's a very serious bug.
Does workaround 1 imply a performance hit? No. It might even make your CPU slightly faster.
Won't setting the NO_LOCK bit cause other problems? It shouldn't cause any problem with most OS's: Linux, Windows95, NT, Solaris, BSD Unix variants, NextStep, etc...
What about DOS users? Cyrix has reported having had problems with the DOS4GW extensions when NO-LOCK is set (no details provided). The DOS4GW extensions are used by DOOM and QUAKE and other DOS packages to enable access to extended memory. However, this incompatibility is irrelevant to the serious problem caused by the Cyrix Coma bug: a security breach in multiuser, multitasking protected-mode OS's.

Conclusion: if you run DOS games or other DOS programs that use the DOS4GW extensions, ignore the 6x86 Coma bug issue altogether.

Who found the bug? Serguei Shtyliov (Moscow) found the bug, and Alexandr Konosevich (Omsk, West Siberia) investigated it further. Then Alexander contacted editor Uwe Post of c't magazine. First a short note appeared mentionning the bug, and a few days later a full article went online on the magazine's Web site, co-authored by Alexandr Konosevich and Uwe Post.
Why did the c't article use the name "hidden CLI bug"? Alexandr Konosevich and Uwe Post called it the "hidden CLI bug" based on the fact that the 6x86 processor seems to run in an infinite loop with interrupts disabled (the CLI instruction disables interrupts). However, this explanation may be misleading (read the technical explanation below).
Why did you call it the "Coma" bug? I am calling it the Cyrix 6x86 Coma bug because the CPU goes into an infinite loop, executing instructions but not responding to any external "stimulus"; hence some kind of CPU "coma". This contrasts with the Pentium F0 bug, where the Pentium CPU simply stops ("dies").

Bug origins

I first saw a posting on the Linux-kernel mailing list (the on-going discussion was about the Pentium F0 bug):
> In article <199711080849.VAA02265@karaka.chch.cri.nz> you wrote:
> > Doesn't stop Cyrix 166 MMX's
>
> There seems to be a different sequence that affects Cyrixes though. >
> See <http://www.heise.de/ct/art_ab97/9713030/>. If you don't
> understand German you will probably want to skip down to the
> assembler code at the bottom of the article.
>
> I don't have a Cyrix, so I can't test it, but it seems to me
> that the 'sti' must be unnecessary and impossible on Linux in
> user mode. Apparently the effect of the loop is to stop the
> handling of interrupts.
>
> - -- > Erik Corry erik@arbat.com

I don't understand German, so that made it a little harder to check this information.

First Solution

A few hours later, here is what I posted (some words changed for the sake of clarity):

Here are the *bad* news:
===============
You *can* deadlock a 6x86 in user mode with the exact code sequence or small variations of the assembly code proposed in the article. Here is the source of a small user-space program that will lock your 6x86 in an infinite loop:
-
static unsigned char c[4] = {0x36, 0x78, 0x38, 0x36};
main()
{
asm ("movl $c, %ebx\n\t"
"again: xchgl (%ebx), %eax\n\t"
"movl %eax, %edx\n\t"
"jmp again\n\t");
}
-
I am sorry if it's not as elegant as Richard Johnson's example. [Note: of the Pentium F0 bug]
Compiling and running as a simple user the above program will lock my 6x86L box. I didn't try it on the 6x86MX (I will report later on the 6x86MX). [Note: it also locked the 6x86MX which I later tested]
-
Explanation (my guess, I may be completely wrong):
==============================
The exchange instruction (xchgl above) on the 6x86 will lock the cpu bus and effectively disable interrupts during its execution. It seems that the combination of speculative execution and register renaming plus out-of-order execution and the intelligent pipelines in the 6x86 will prevent interrupt servicing during the execution of the movl and jmp instructions. Consequently interrupts routines never get called and the processor is effectively locked in a loop that runs in its cache line.
-
And here are the good news:
=================
Setting the NO_LOCK bit in CCR1 will prevent the deadlock caused by the above code sequence. Here is a short call to set6x86 that does this:
set6x86 -p 0xc1 -s 0x10.
Page table accesses and interrupt acknowledge cycles will still be executed in locked cycles, but the xchgl instruction will *not* generate locked cycles anymore.
I don't know if setting the NO_LOCK bit will cause problems when running Linux. I don't think so...

Testing the bug

You can test this bug on any 6x86 machine. It will crash the machine, so be extra careful that you don't lose any data with your tests. Just copy the above source to your machine. Compile. Careful now: sync your hard disks and unmount all filesystems. Now execute as a simple user. Your 6x86-based computer should be locked, only a hardware reset will bring it out of this state.

Preliminary technical explanation

The program above basically consists of the following loop:

  1. Exchange the contents of memory location pointed by ebx with contents of eax (32 bits): xchg opcode.
  2. Copy the new contents of eax to edx (32 bits): mov opcode.
  3. Unconditionally jump back to step 1: jmp opcode.

This, in itself, is an infinite loop. The only way for the processor to get out of this loop is by getting interrupted. So far so good, since any multitasking OS and even DOS will interrupt the processor. However, as you will see for yourself if you try to run this loop or a variant of it, the processor does not respond to any interrupt once in the loop. This is the symptom that led the c't magazine crew to call it the hidden CLI bug.

However, if one could look at the flags inside the processor while it's in the loop, one would see that interrupts are enabled. It does not respond to interrupts because it is executing back-to-back locked bus cycles.

Locked cycles are a special type of bus cycle that cannot be split: you cannot have an interrupt routine called in the middle of a locked cycle (among other pecularities). And typically, xchg instructions are implicitly executed in locked cycles: the xchg instruction is the classic example of an "atomic" instruction, used to implement all sorts of semaphores and other software constructs (coincidently, the Pentium F0 bug is caused by an explicitly locked compare-and-exchange instruction) that depend on this "atomicity".

But what has become of both the mov and the jmp instructions? Shouldn't the processor recognize the interrupts while executing these two instructions? It should, but it doesn't!

The 6x86 CPU family has some advanced architectural features, among others: dual pipelines, register renaming, data dependency removal, speculative execution and branch prediction.

What is probably happening here is that the jmp instruction is having its execution overlapped with the other two instructions through branch prediction. That should still give us the idle cycle where the mov instruction is performed to service the interrupt, but then I assume it too is being overlapped with the xchg instruction; my intuition is that the mov will execute in one pipeline whereas the xchg executes in the other pipeline. How is that possible if the eax register is modified by the xchg instruction? Well, that's where register renaming and data dependency removal will come into play. The xchg instruction is probably acting on one copy of eax while the mov instruction uses a second copy.

Let's put it this way: the original M1 engineers went one (unlocked) cycle too far in their drive to make this processor as efficient as possible. It's simply incredible but this bug has propagated unto all the 6x86 family members! It would probably have gone unnoticed if Serguei Shtyliov had not detected it while writing an assembly language routine.

Why does setting NO_LOCK solve this? Because it effectively disables locked bus cycles for the xchg instruction. Normal bus cycles can be interrupted, so we can always regain control of the CPU and kill the loop.

All this explanation is just a hypothesis: one can only tell what's going on inside a CPU with 100% certainty if one has advanced ICE equipment. Which of course is by far beyond the means of an individual.

Refining the proposed solution

However, setting the NO_LOCK bit could cause other lesser but still annoying problems. Here is another Email from Bryan James Philippe I received about three hours after I had posted my first solution:

...
> Do you know if the S3 Trio 64 PCI-based video card might have this problem? My 2.1.62 system with the 6x86 and no-lock setting froze after about 20 minutes in X, as soon as I tried to start Communicator 4.03 (which worked fine previously). After rebooting, I've been leary to go into X or start Communicator, and the system has been running fine now for about 2 hours, with the same no-lock setting (and I ran an exhaustive benchmark on it during that time, as well).
...

Well, I first suggested to Bryan that setting the Weak Locking bit in RCR7 instead of the NO_LOCK bit in CCR1 might do the job, but it doesn't.

So far the only way to avoid the deadlock is to set the NO_LOCK bit. Bryan like many GNU/Linux 6x86 users has patched his kernel, instead of installing set6x86. He has two PCI memory regions on his machine that are non-prefetchable, one of which is used by a bus-mastering SCSI controller. However, he has not set the ARRs properly for these regions. His problems could come from a combination of NO_LOCK and not having the ARRs setup correctly on his GNU/Linux box.

More testing is needed right now, and I would welcome any results and comments, specially on other OS's besides GNU/Linux.

Right now my two 6x86 GNU/Linux boxes are doing fine with my simple NO_LOCK workaround, and I have reports of at least 3 other users that NO_LOCK is doing fine on their systems.

Anyway, if you are going to use my NO_LOCK workaround, don't forget to properly setup the ARRs on your 6x86 machine. This is relatively simple: read the set6x86 README, and also take a look at the relevant pages in IBM's Application Note 40205. You will also find some hints in my FAQ page. Here is what the 6x86_arr ARR reporting utility included with set6x86 displays on my system:

6x86 Address Region Register dump:
ARR0: address = 0xA0000 , size = 128 KB
RCR = 0x9 : not cached, write gathering,
ARR1: address = 0xC0000 , size = 256 KB
RCR = 0x1 : not cached, ARR2: disabled
ARR3: address = 0xA8000 , size = 32 KB
RCR = 0x9 : not cached, write gathering,
ARR4: disabled ARR5: disabled
ARR6: address = 0xE0000000 , size = 2 MB
RCR = 0x9 : not cached, write gathering,
ARR7: address = 0x0 , size = 32 MB
RCR = 0xB : cached, weak write ordering, write gathering,

As you can see, I have setup ARR6 to handle my PCI video card. Although I am running X and Netscape as I write these lines, with a video board similar to Bryan's, and with the NO_LOCK bit set as described above, I haven't had a single problem on my 6x86L box. My 6x86MX also has the NO_LOCK bit set, the ARRs are correctly setup and it is also working without any problems.

And both systems are safe from the Cyrix Coma bug :-)


PreviousNextIndex

Last updated on November 21, 1997.

Copyright 1997 Andrew D. Balsa