Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane

From: Matt Wilson <msw <at> linux.com>
Subject: Re: [PATCH] x86/fpu: CR0.TS should be set before trap into PV guest's #NM exception handler
Newsgroups: gmane.comp.emulators.xen.devel
Date: Thursday 27th February 2014 00:04:07 UTC (over 4 years ago)
On Wed, Nov 06, 2013 at 08:51:56AM +0000, Jan Beulich wrote:
> >>> On 06.11.13 at 07:41, Zhu Yanhai  wrote:
> > As we know Intel X86's CR0.TS is a sticky bit, which means once set
> > it remains set until cleared by some software routines, in other words,
> > the exception handler expects the bit is set when it starts to execute.
> 
> Since when would that be the case? CR0.TS is entirely unaffected
> by exception invocations according to all I know. All that is known
> here is that #NM wouldn't have occurred in the first place if CR0.TS
> was clear.
> 
> > However xen doesn't simulate this behavior quite well for PV guests -
> > vcpu_restore_fpu_lazy() clears CR0.TS unconditionally in the very
beginning,
> > so the guest kernel's #NM handler runs with CR0.TS cleared. Generally 
> > speaking
> > it's fine since the linux kernel executes the exception handler with
> > interrupt disabled and a sane #NM handler will clear the bit anyway
> > before it exits, but there's a catch: if it's the first FPU trap for
the 
> > process,
> > the linux kernel must allocate a piece of SLAB memory for it to save
> > the FPU registers, which opens a schedule window as the memory
> > allocation might sleep -- and with CR0.TS keeps clear!
> > 
> > [see the code below in linux kernel,
> 
> You're apparently referring to the pvops kernel.
> 
> > void math_state_restore(void)
> > {
> >     struct task_struct *tsk = current;
> > 
> >     if (!tsk_used_math(tsk)) {
> >         local_irq_enable();
> >         /*
> >          * does a slab alloc which can sleep
> >          */
> >         if (init_fpu(tsk)) {                 <<<< Here it might open a
schedule window
> >             /*
> >              * ran out of memory!
> >              */
> >             do_group_exit(SIGKILL);
> >             return;
> >         }
> >         local_irq_disable();
> >     }
> > 
> >     __thread_fpu_begin(tsk);    <<<< Here the process gets marked as a
'fpu user'
> >                                          after the schedule window
> > 
> >     /*
> >      * Paranoid restore. send a SIGSEGV if we fail to restore the
state.
> >      */
> >     if (unlikely(restore_fpu_checking(tsk))) {
> >         drop_init_fpu(tsk);
> >         force_sig(SIGSEGV, tsk);
> >         return;
> >     }
> > 
> >     tsk->fpu_counter++;
> > }
> > ]
> 
> May I direct your attention to the XenoLinux one:
> 
> asmlinkage void math_state_restore(void)
> {
> 	struct task_struct *me = current;
> 
> 	/* NB. 'clts' is done for us by Xen during virtual trap. */
> 	__get_cpu_var(xen_x86_cr0) &= ~X86_CR0_TS;
> 	if (!used_math())
> 		init_fpu(me);
> 	restore_fpu_checking(&me->thread.i387.fxsave);
> 	task_thread_info(me)->status |= TS_USEDFPU;
> }
> 
> Note the comment close to the beginning - the fact that CR0.TS
> is clear at exception handler entry is actually part of the PV ABI,
> i.e. by altering hypervisor behavior here you break all forward
> ported kernels.
> 
> Nevertheless I agree that there is an issue, but this needs to be
> fixed on the Linux side (hence adding the Linux maintainers to Cc);
> this issue was introduced way back in 2.6.26 (before that there
> was no allocation on that path). It's not clear though whether
> using GFP_ATOMIC for the allocation would be preferable over
> stts() before calling the allocation function (and clts() if it
> succeeded), or whether perhaps to defer the stts() until we
> actually know the task is being switched out. It's going to be an
> ugly, Xen-specific hack in any event.

Was there ever a resolution to this problem? I never saw a comment
from the Linux Xen PV maintainers.

--msw
 
CD: 4ms