Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane

From: Virgile Bello <virgile.bello <at> gmail.com>
Subject: First class aggregates of small size: split when used in function call
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Wednesday 31st December 2014 07:41:10 UTC (over 3 years ago)
Hello,

In my LLVM frontend (CLR/MSIL), I am currently using first-class aggregates
to represent loaded value types on the "CLR stack".

However, I noticed that when calling external method taking those aggregate
by value, they were not passed as I expected:

%COLORREF = type { i8, i8, i8, i8 }

declare i32 @SetLayeredWindowAttributes(i8*, %COLORREF, i8, i32)
I call this function with call x86_stdcallcc (it's a Win32 function, loaded
with GetProcAddress)

However, checking the assembly code, it seems that the %COLORREF gets split
due to the calling convention: first i8 field go through %edx, but the 3
next fields go through the stacks.
I would like all of it to go through either a single 32bit register or a
32bit stack value (since all of the structure fits in a i32 and it is
already packed in memory that way before the call).

I was thinking using alloca with sret/byval might help, but I am not even
sure since it is enough, since clang also seems to actually use i16 or i32
(and even i32+i16 or i32+i32) to represent such struct <= 8 bytes  when
passing them to a method (even if they contain many smaller i8 fields).

Does somebody know if only alloca with sret/byval is enough or if I also
need to concat myself smaller struct into i32 types like clang does to be
sure it won't be split across registers?
Any other hint or idea on how I can achieve this?

Also, I was wondering which is the current recommendation (sret/byval with
alloca for every copy vs first-class aggregate) considering the current
state of LLVM and supported optimizations. Since clang uses sret/byval, I
expect it to be more optimized/mature, but I might be wrong.

I suppose LLVM will easily understand/optimize all those additional
aggregate alloca/memcpy I will end up doing if I were to switch to a
sret/byval approach?

Thanks,
 
CD: 5ms