Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Elliott Slaughter <elliottslaughter <at> gmail.com>
Subject: Re: Beginning with TLS for SBCL win32 threads
Newsgroups: gmane.lisp.steel-bank.devel
Date: Friday 15th August 2008 19:24:25 UTC (over 9 years ago)
On Fri, Aug 15, 2008 at 2:17 AM, Nikodemus Siivola
 wrote:
> On Fri, Aug 15, 2008 at 12:15 AM, Elliott Slaughter
>  wrote:
>
> (Sorry for the slightly disjointed reply -- you probably want to read
> it in reverse order...)

Ok, let me see if I can piece this together.

>> Exception Code: 0xc0000005.
>> Faulting IP: 0x412cb0.
>> page status: 0x10000.
>> Was writing: 0, where: 0x3fff8010.
>> fatal error encountered in SBCL pid 2304(tid 3948):
>> Exception too early in cold init, cannot continue.
>> Welcome to LDB, a low-level debugger for the Lisp runtime environment.
>> ldb> unknown command: ``;;;''
>
> After you get to this point using Brian's suggestion of --core
> cold-sbcl.core, things to do:
>
> See if you can get an LDB backtrace. (Command is "ba".) Then, if you
> have a working GDB on Win32, attach it, and see if the C backtrace
> leads you anywhere. If it doesn't, look in sbcl.nm if you can figure
> out which C side function in SBCL the IP could be int (if any).
> Disassembling the area around the faulting IP might give a clue.
> Constructing a backtrace by hand is also an option -- sbcl-internals
> wiki has a short guide.

$ src/runtime/sbcl --core output/cold-sbcl.core
[...]
Exception Code: 0xc0000005.
Faulting IP: 0x412cb0.
page status: 0x10000.
Was writing: 0, where: 0x3fff8010.
fatal error encountered in SBCL pid 2984(tid 2880):
Exception too early in cold init, cannot continue.
Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb> ba
Backtrace:
   0: Foreign fp = 0x22f988, ra = 0x406d03
   1: Foreign fp = 0x22f9a8, ra = 0x40553f
   2: Foreign fp = 0x22f9f8, ra = 0x40d35c
   3: Foreign fp = 0x22fa1c, ra = 0x7c9037bf
   4: Foreign fp = 0x22facc, ra = 0x7c90378b
   5: Foreign fp = 0x22fe48, ra = 0x7c90eafa
   6: Foreign fp = 0x22fe88, ra = 0x40b171
ldb>

[Another window...]
$ gdb src/runtime/sbcl 2984
GNU gdb 6.8
[...]
(gdb) ba
#0  0x7c901231 in ntdll!DbgUiConnectToDbg ()
   from C:\WINDOWS\system32\ntdll.dll
#1  0x7c9507a8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#2  0x00000005 in ?? ()
#3  0x00000004 in ?? ()
#4  0x00000001 in ?? ()
#5  0x00bdffd0 in ?? ()
#6  0xffdff548 in ?? ()
#7  0xffffffff in ?? ()
#8  0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
#9  0x7c9507c8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#10 0x00000000 in ?? ()
(gdb)

Hmm... that's not so helpful. Ok, here's my manual attempt using the
values in sbcl.nm:

0: 00406b90 T _ldb_monitor
1: 00405470 T _lose
2: 0040d050 T _handle_exception
3-5: what?
6: 0040b090 T _create_initial_thread

Which would seem to indicate the problem is in
create_initial_thread... But faulting address itself seems to be in
00412c40 T _call_into_lisp.

> The instruction pointer seems to be outside the Lisp heap, so the
> fault occurs either in C code of SBCL, or in some library.
>
> The address where the write (or read -- I don't think so?) faulted, on
> the other hand, is smack in the middle of dynamic space. (See address
> space layout in src/compiler/x86/parms.lisp -- search for the string
> win32 to find the relevant bits.)

#!+win32
(progn

  (def!constant read-only-space-start #x02000000)
  (def!constant read-only-space-end   #x020ff000)

  (def!constant static-space-start    #x02100000)
  (def!constant static-space-end      #x021ff000)

  (def!constant dynamic-space-start   #x09000000)
  (def!constant dynamic-space-end     #x29000000)

  (def!constant linkage-table-space-start #x02200000)
  (def!constant linkage-table-space-end   #x022ff000))

> So, what occurs that during a C side call an attempt to write to a
> protected page was made. However (see handle_exception in win32-os.c),
> either is_valid_lisp_address didn't return true for the address for
> some reason, or gencgc_handle_wp_violation() declined to handle it.
> The first option is just wierd. The second can eg. occur if gc_init()
> has not completed setting up the page tables yet.
>
> So, tasks:
>
> * Verify that the faulting address is in the Lisp heap. (I believe so.)

Is it? The dynamic space

  (def!constant dynamic-space-start   #x09000000)
  (def!constant dynamic-space-end     #x29000000)

doesn't include 0x3fff8010.

> * Verify that the IP is not in Lisp space. (I believe so.)

Yes. 00412c40 T _call_into_lisp

> * Find out what causes the write -- by bisection via printf & /show if
> necessary, backtrace/examining sbcl.nm is likely to be faster.

I'll do this after lunch...

> * When you know what is being written and where, you should be able to
> figure out which of the following three options is right:
>
> 1. Writing in the wrong address.
> 2. Writing in the right address but too early.
> 3. Writing in the right address, and everything should be set up --
> and it still goes wrong. The something (eg. the heap_base pointer) has
> been corrupted earlier on.
>
> (There may be other possibilities as well, but these are the ones that
> spring to mind.)
>
> Cheers,
>
>  -- Nikodemus
>



-- 
Elliott Slaughter

"Any road followed precisely to its end leads precisely nowhere." -
Frank Herbert

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge
Build the coolest Linux based applications with Moblin SDK & win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
 
CD: 3ms