On Fri, Aug 27, 2010 at 15:21, Nathaniel Smith wrote:
> On Fri, Aug 27, 2010 at 1:17 PM, Robert Kern
wrote:
>> But in any case, that would be very slow for large arrays since it
>> would invoke a Python function call for every value in ar. Instead,
>> iterate over the valid array, which is much shorter:
>>
>> mask = np.zeros(ar.shape, dtype=bool)
>> for good in valid:
>> mask = (ar == good)
>>
>> Wrap that up into a function and you're good to go. That's about as
>> efficient as it gets unless if the valid array gets large.
>
> Probably even more efficient if 'ar' is large and 'valid' is small,
> and shorter to boot:
>
> np.in1d(ar, valid)
Not according to my timings:
[~]
2> def kern_in(x, valid):
..> mask = np.zeros(x.shape, dtype=bool)
..> for good in valid:
..> mask = (x == good)
..> return mask
..>
[~]
6> ar = np.random.randint(100, size=1000000)
[~]
7> valid = np.arange(0, 100, 5)
[~]
8> %timeit kern_in(ar, valid)
10 loops, best of 3: 115 ms per loop
[~]
9> %timeit np.in1d(ar, valid)
1 loops, best of 3: 279 ms per loop
As valid gets larger, in1d() will catch up but for smallish sizes of
valid, which I suspect given the "nonnumeric" nature of the OP's (Hi,
Brett!) request, kern_in() is usually better.

Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 Umberto Eco
_______________________________________________
NumPyDiscussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpydiscussion
