Features Download
From: Jeff Roberson <jroberson <at> jroberson.net>
Subject: Re: Linux kernel compatability
Newsgroups: gmane.os.freebsd.architechture
Date: Tuesday 4th January 2011 20:53:30 UTC (over 7 years ago)
On Tue, 4 Jan 2011, Alexander Kabaev wrote:

> On Mon, 3 Jan 2011 19:03:01 -1000 (HST)
> Jeff Roberson  wrote:
>> On Mon, 3 Jan 2011, Alexander Kabaev wrote:
>>> On Mon, 3 Jan 2011 10:31:24 -1000 (HST)
>>> Jeff Roberson  wrote:
>>>> Hello Folks,
>>>> Some of you may have seen my infiniband work proceed in svn.  It is
>>>> coming to a close soon and I will be integrating it into current.
>>>> I have a few patches to the kernel to send for review but I wanted
>>>> to bring up the KPI wrapper itself for discussion.
>>>> The infiniband port has been done by creating a 10,000 line KPI
>>>> compatability layer.  With this layer the vast majority of the
>>>> driver code runs unmodified.  The exceptions are anything that
>>>> interfaces with skbs and most of the code that deals with network
>>>> interfaces.
>>>> Some examples of things supported by the wrapper:
>>>> atomics, types, bitops, byte order conversion, character devices,
>>>> pci devices, dma, non-device files, idr tables, interrupts,
>>>> ioremap, hashes, kobjects, radix trees, lists, modules, notifier
>>>> blocks, rbtrees, rwlock, rwsem, semaphore, schedule, spinlocks,
>>>> kalloc, wait queues, workqueues, timers, etc.
>>>> Obviously a complete wrapper is impossible and I only implemented
>>>> the features that I needed.  The build is accomplished by pointing
>>>> the linux compatible code at sys/ofed/include/ which has a
>>>> simulated linux kernel include tree.  There are some config(8)
>>>> changes to help this along as well.
>>>> I have seen that some attempt at similar wrappers has been made
>>>> elsewhere. I wonder if instead of making each one tailored to
>>>> individual components, which mostly seem to be filesystems so far,
>>>> should we put this in a central place under compat somewhere?  Is
>>>> this project doomed to be tied to a single consumer by the specific
>>>> nature of it?
>>>> Other comments or concerns?
>>>> Thanks,
>>>> Jeff
>>> This probably will go against popular opinion here, but having 10k
>>> linux emulation layer that _almost_ work in the tree will be an
>>> unfortunate event and will do more damage to FreeBSD as a platform
>>> than good in the long run. I would rather see this code never hit
>>> main repository.
>> I would argue that the layer works very well for infiniband.  Much
>> better than almost.  It is only almost complete in that there is no
>> need for me to implement features that we're not using.
>> I am interested in hearing your other concerns however.
>> Thanks,
>> Jeff

Alexander, let me first start out by saying I have a great deal of respect 
for you and I hear your concerns.  I see that this is a somewhat heated 
issue and I can really only address the technical points.  The more 
existential questions about FreeBSD will have to be left to others.

> The considerations are simple enough. First, we do not have many IB
> users of FreeBSD in the wild and those that we have (Isilon) seem to be
> perfectly capable of managing the IB stack out of the tree, without
> dumping the thousands of lines of the code into the base. If they had
> the stack before, but were not willing/capable to provide adequate care
> for it in the past, there is no reason to expect things to change with
> second stack, which now will rot in our tree instead of theirs.

They provided adequate care for it to keep their product running on old 
versions of FreeBSD.  Unfortunately it is a large stack and there are a 
great number of people and organizations working on improving and 
advancing it on Linux via OFED and having a private stack does not give 
you the benefit of their work.  The motivation for making the wrapper 
layer was entirely to keep pace with this development and make it less 
likely that what is in the tree will rot.

> Second, semi-complete Linux compat layer in kernel will have the
> same effect as linuxulator in userland - we do have some vendors still
> trying to bother with FreeBSD drivers for their hardware now and we
> will have none after we provide the possibility to hack their Linux
> code to run somewhat stably on top of Linux compat layer. Due to
> intentional fluidity of Linux kAPI, our shims will never quite walk and
> quack like their original implementation in Linux kernel and combined
> result will always be lees stable than native Linux linux drivers in
> Linux kernel.

I have heard this argument about the linuxulator and what we're really 
talking about is slipping FreeBSD marketshare.  I don't share the view 
that the linuxulator futhered this slip but rather my view is that it 
allows us to stay relevant in areas where companies can not justify an 
independent FreeBSD effort.  Adobe is a good example of this.

Let's talk nuts and bolts about what this thing does.  In the vast 
majority of cases it simply shuffles arguments and function names around 
where there is a 1:1 correlation between linux api and FreeBSD API.  Think 
about things like atomics, callouts, locks, jiffies vs ticks, etc.  In 
these areas the systems are trivially different.  In a very small number 
of areas where this wasn't the case I did a direct port and noted it with 
an #ifdef.

This works specifically in the infiniband case because it is its own 
middle layer.  You can't write a scsi driver for linux and use it on BSD 
with this.  You can't write a network driver even.  But if you do bring in 
code from linux you don't have to worry about changing every kmalloc to 
malloc and every printk to printf so diffs can be reduced in trivial 
cases.  I thought given your work on XFS for FreeBSD that would make sense 
to you.

Our options are, to leave FreeBSD users without infiniband, which I can 
tell you has cost us more market share as I know of specific cases we have 
lost due to it.  To maintain our own stack independently, which no one has 
the budget for.  Or to try to integrate with OFED.  Do you see some other 


> -- 
> Alexander Kabaev
[email protected] mailing list
To unsubscribe, send any mail to "[email protected]"
CD: 17ms