x86_64 kernel does not start under qemu

Jan Kiszka jan.kiszka at siemens.com
Fri Mar 22 21:50:57 CET 2019


On 21.03.19 17:07, Jan Kiszka wrote:
> On 21.03.19 12:57, Richard Weinberger wrote:
>> Am Donnerstag, 21. März 2019, 12:02:45 CET schrieb Jan Kiszka:
>>> FWIW, I've just seen this issue as well, with QEMU in KVM mode: I ran into that
>>> lockup when my host was under full load while Xenomai booted in the VM. And it
>>> seems reproducible. Debugging...
>>
>> Oh, good to hear that!
>> I played a little with your config but got badly interrupted with other stuff.
>> Your config seems to work but mostly because things are slower due to debugging stuff
>> you've enabled. Maybe this info helps.
>>
> 
> It's a race, so everything that changes timing also changes
> probabilities. I'm starting to nail it down:
> 
> (gdb) info threads
>    Id   Target Id         Frame
> * 4    Thread 4 (CPU#3 [halted ]) __ipipe_halt_root (use_mwait=-2147483634) at ../arch/x86/kernel/ipipe.c:317
>    3    Thread 3 (CPU#2 [running]) rep_nop () at ../arch/x86/include/asm/processor.h:655
>    2    Thread 2 (CPU#1 [halted ]) __ipipe_halt_root (use_mwait=-2147483634) at ../arch/x86/kernel/ipipe.c:317
>    1    Thread 1 (CPU#0 [halted ]) __ipipe_halt_root (use_mwait=-2147483634) at ../arch/x86/kernel/ipipe.c:317
> (gdb) monitor info lapic
> dumping local APIC state for CPU 3
> 
> LVT0     0x00010700 active-hi edge  masked                      ExtINT (vec 0)
> LVT1     0x00010400 active-hi edge  masked                      NMI
> LVTPC    0x00010400 active-hi edge  masked                      NMI
> LVTERR   0x000000fe active-hi edge                              Fixed  (vec 254)
> LVTTHMR  0x00010000 active-hi edge  masked                      Fixed  (vec 0)
> LVTT     0x000400ef active-hi edge                 tsc-deadline Fixed  (vec 239)
> Timer    DCR=0x0 (divide by 2) initial_count = 0
> SPIV     0x000001ff APIC enabled, focus=off, spurious vec 255
> ICR      0x000008fd logical edge de-assert no-shorthand
> ICR2     0x02000000 mask 00000010 (APIC ID)
> ESR      0x00000000
> ISR      239
> IRR      236 237 238 239
> 
> APR 0x00 TPR 0x00 DFR 0x0f LDR 0x08 PPR 0xe0
> 
> 
> So we are halting while we didn't finish vector 239 (timer) yet. And
> that means we re-enabled interrupts while the timer was being processed
> - a bug in I-pipe.
> 
> This is while another CPU tries to run ipipe_critical_enter, never
> reaching CPU 3 this way (via IPI_CRITICAL_VECTOR = 236).
> 
> Jan
> 

This might be the fix, but I need to sleep over it. Will send a PR next 
week.

---8<---

ipipe: Call present timer ack handlers unconditionally

This plugs a race for timers that are per-CPU but share the same
interrupt number. When setting them up, there is a window where the
first CPU already called ipipe_request_irq, but some other CPU did not
yet ran through grab_timer, thus have ipipe_stolen = 0.

Moreover, it is questionable that non-stolen timers should not call
their ack functions.

Signed-off-by: Jan Kiszka <jan.kiszka at siemens.com>
---
 kernel/ipipe/timer.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/kernel/ipipe/timer.c b/kernel/ipipe/timer.c
index 98d1192a2727..2d5f468ce7fb 100644
--- a/kernel/ipipe/timer.c
+++ b/kernel/ipipe/timer.c
@@ -369,13 +369,10 @@ static void __ipipe_ack_hrtimer_irq(struct irq_desc *desc)
 
 	if (desc)
 		desc->ipipe_ack(desc);
-
-	if (timer->host_timer->ipipe_stolen) {
-		if (timer->ack)
-			timer->ack();
-		if (desc)
-			desc->ipipe_end(desc);
-	}
+	if (timer->ack)
+		timer->ack();
+	if (desc && timer->host_timer->ipipe_stolen)
+		desc->ipipe_end(desc);
 }
 
 static int do_set_oneshot(struct clock_event_device *cdev)
-- 
2.16.4

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



More information about the Xenomai mailing list