Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Robert Kern <robert.kern <at> gmail.com>
Subject: Re: Boolean arrays
Newsgroups: gmane.comp.python.numeric.general
Date: Friday 27th August 2010 20:35:07 UTC (over 7 years ago)
On Fri, Aug 27, 2010 at 15:21, Nathaniel Smith  wrote:
> On Fri, Aug 27, 2010 at 1:17 PM, Robert Kern 
wrote:
>> But in any case, that would be very slow for large arrays since it
>> would invoke a Python function call for every value in ar. Instead,
>> iterate over the valid array, which is much shorter:
>>
>> mask = np.zeros(ar.shape, dtype=bool)
>> for good in valid:
>>    mask |= (ar == good)
>>
>> Wrap that up into a function and you're good to go. That's about as
>> efficient as it gets unless if the valid array gets large.
>
> Probably even more efficient if 'ar' is large and 'valid' is small,
> and shorter to boot:
>
> np.in1d(ar, valid)

Not according to my timings:

[~]
|2> def kern_in(x, valid):
..>     mask = np.zeros(x.shape, dtype=bool)
..>     for good in valid:
..>         mask |= (x == good)
..>     return mask
..>

[~]
|6> ar = np.random.randint(100, size=1000000)

[~]
|7> valid = np.arange(0, 100, 5)

[~]
|8> %timeit kern_in(ar, valid)
10 loops, best of 3: 115 ms per loop

[~]
|9> %timeit np.in1d(ar, valid)
1 loops, best of 3: 279 ms per loop


As valid gets larger, in1d() will catch up but for smallish sizes of
valid, which I suspect given the "non-numeric" nature of the OP's (Hi,
Brett!) request, kern_in() is usually better.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
CD: 3ms