Features Download
From: Enrico Scholz <enrico.scholz <at> sigma-chemnitz.de>
Subject: Random failures while loading iwmmxt compiled binaries
Newsgroups: gmane.linux.ports.arm.general
Date: Tuesday 22nd June 2010 16:15:33 UTC (over 7 years ago)

I have the problem that program loading segfaults sometimes or aborts

| Inconsistency detected by ld.so: ../elf/dl-sysdep.c: 465:
_dl_important_hwcaps: Assertion `m == cnt' f

There are existing similar reports in the glibc[1] or gentoo[2] bugtrackers
and it happens very seldom during normal usage.

I analyzed it partially and can reproduce it in <10 minutes on PXA270
and PXA320 platforms, but I am unable to find a solution yet.  These
platforms were running with kernel 2.6.34 (PXA270) and 2.6.31 (PXA320).


* it is not a compiler bug

* it is not a eglibc bug

* it can be only the kernel (iwmmxt state not restored properly between
  task switching?) or a bug in the silicon

Steps to reproduce it:

1. Build (e)glibc and busybox with -march=iwmmxt -mcpu=iwmmxt
   I placed binaries at [3] which are based on OpenEmbedded's gcc 4.3.4 and
   eglibc 2.12.

2. Places the binaries into a tmpfs (that's important for triggering bug
   in a short time; NFS or MTD takes longer)

    mount -t tmpfs -o size=8m none /tmp
    mkdir /tmp/bin /tmp/lib
    cp /bin/busybox /tmp/bin/
    cp /lib/ld[.-]* /tmp/lib
    cp /lib/lib[cm][.-]* /tmp/lib
    ln -s busybox /tmp/bin/sh
    ln -s busybox /tmp/bin/sed

3. Create a testscript like

    cat << EOF > /tmp/x
    #! /bin/sh
    while test `echo abcdefghijkl |
          sed 's!.$!!' | sed 's!.$!!' | sed 's!.$!!' | \
          sed 's!.$!!' | sed 's!.$!!' | sed 's!.$!!' | \
          sed 's!.$!!' | sed 's!^.!!' | sed 's!^.!!'` = cde; do

4. Go into /tmp chroot and execute script

    chroot /tmp
    sh /x

After some time, you will get either the assertion or a segfault.


Segfault with 'user_debug=-1' line gives

[ 2280.392473] sed: unhandled page fault (11) at 0x40026200, code 0x017
[ 2280.392500] pgd = c7bf0000
[ 2280.395191] [40026200] *pgd=87b57031, *pte=00000000, *ppte=00000000
[ 2280.401529] 
[ 2280.408893] Pid: 8831, comm:                  sed
[ 2280.418649] CPU: 0    Not tainted  ( #1)
[ 2280.424497] PC is at 0x40016a84
[ 2280.427610] LR is at 0x40014b0c
[ 2280.430724] pc : [<40016a84>]    lr : [<40014b0c>]    psr: 20000010
[ 2280.430733] sp : befff8f8  ip : befff910  fp : befff99c
[ 2280.460747] r10: 00000000  r9 : 6474e552  r8 : 00000004
[ 2280.466537] r7 : 00000000  r6 : 00000012  r5 : 00000201  r4 : 40026202
[ 2280.473452] r3 : befff8f8  r2 : 00000009  r1 : 40026200  r0 : 40026202
[ 2280.479937] Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA ARM 
Segment user
[ 2280.487354] Control: 0400397f  Table: 87bf0018  DAC: 00000015

  Register -> Variable mapping:

    r6,r7:   'masked'    -->  strange: it seems to be always 0x12
    r8:      'm'
    r5:      'n'
    r2:      'masked' >> 'n & 0xff'  (r3 destroyed by code calling

The PC is strlen() and was called from LR _dl_important_hwcaps():

00016a80 :
   16a80:       e3c01003        bic     r1, r0, #3      ; 0x3
   16a84:       e4912004        ldr     r2, [r1], #4

000147ec <_dl_important_hwcaps>:
   14aa4:       e1961007        orrs    r1, r6, r7	<<<< the two 32
                                                        bit words of
   14aa8:       0a000023        beq     14b3c <_dl_important_hwcaps+0x350>
   14aac:       e51b4068        ldr     r4, [fp, #-104]
   14ab0:       e51b206c        ldr     r2, [fp, #-108]
   14ab4:       e3a00001        mov     r0, #1  ; 0x1
   14ab8:       e3a01000        mov     r1, #0  ; 0x0
   14abc:       e084e002        add     lr, r4, r2
   14ac0:       e28e4050        add     r4, lr, #80     ; 0x50
   14ac4:       e3a05000        mov     r5, #0  ; 0x0   <<<< this is 'n'
   14ac8:       ec41000a        tmcrr   wr10, r0, r1
   14acc:       ea000003        b       14ae0 <_dl_important_hwcaps+0x2f4>
   14ad0:       e1961007        orrs    r1, r6, r7      <<<< start of loop
   14ad4:       e284400a        add     r4, r4, #10     ; 0xa
   14ad8:       0a000017        beq     14b3c <_dl_important_hwcaps+0x350>
   14adc:       e2855001        add     r5, r5, #1      ; 0x1
   14ae0:       ec476004        tmcrr   wr4, r6, r7     <<<< move 'masked'
into cp
   14ae4:       ee085110        tmcr    wcgr0, r5
   14ae8:       eee40148        wsrldg  wr0, wr4, wcgr0 <<<< right shift
                                                        of 'masked' for n &
0xff bits
   14aec:       ec532000        mra     r2, r3, acc0    <<< this is
                                                        acc0 is wr0
   14af0:       e2020001        and     r0, r2, #1      ; 0x1
   14af4:       e3500000        cmp     r0, #0  ; 0x0   <<<< that's the 'if
   14af8:       0afffff4        beq     14ad0 <_dl_important_hwcaps+0x2e4>
   14afc:       e51b3044        ldr     r3, [fp, #-68]
   14b00:       e1a00004        mov     r0, r4
   14b04:       e7834188        str     r4, [r3, r8, lsl #3]
>> 14b08:       eb0007dc        bl      16a80 
   14b0c:       e51b1044        ldr     r1, [fp, #-68]
   14b10:       ee095110        tmcr    wcgr1, r5       <<<< move 'n' into
   14b14:       eeda5149        wslldg  wr5, wr10, wcgr1 <<< left shift of
1 for n & 0xff bits
   14b18:       e081c188        add     ip, r1, r8, lsl #3
   14b1c:       e58c0004        str     r0, [ip, #4]
   14b20:       ec510005        tmrrc   r0, r1, wr5     <<<< split 64 bit
   14b24:       e0266000        eor     r6, r6, r0      <<<< and do the ^=
   14b28:       e0277001        eor     r7, r7, r1
   14b2c:       e1961007        orrs    r1, r6, r7
   14b30:       e2888001        add     r8, r8, #1      <<<< this is 'm'
   14b34:       e284400a        add     r4, r4, #10
   14b38:       1affffe7        bne     14adc <_dl_important_hwcaps+0x2f0>

The corresponding code is

  uint64_t masked = GLRO(dl_hwcap) & GLRO(dl_hwcap_mask);

  for (n = 0; masked != 0; ++n)
    if ((masked & (1ULL << n)) != 0)
	temp[m].str = _dl_hwcap_string (n);
	temp[m].len = strlen (temp[m].str);
	masked ^= 1ULL << n;

The crash happens because 'n' (stored in r5) becomes 0x201 and an array
outside of the mapped memory is accessed.

The value of 0x201 means that loop has missed its end condition twice
(first time for 0x01 and second time for 0x101).

The assertion happens probably when loop exits in the second round

Generated assembly code looks sane to me and I do not see how 'n' can
become >64 (or I looked to long at it and missed the obvious).  Within
the loop, only strlen() is called and this alters r0-r3 only.

As I wrote above, 'masked' seems to be always '0x12' which means that
always the same bits have not been cleared.

Does somebody has other ideas or a solution?


[1]  http://sourceware.org/bugzilla/show_bug.cgi?id=6729
[2]  http://bugs.gentoo.org/show_bug.cgi?id=194973
[3]  https://www.cvg.de/people/ensc/iwmmxt-ld.tar.gz
CD: 4ms