Home Reading Searching Subscribe Sponsors Statistics Posting Contact Spam Lists Links About Hosting Filtering Features Download Marketing Archives FAQ Blog From: Robert Kern gmail.com> Subject: Re: Boolean arrays Newsgroups: gmane.comp.python.numeric.general Date: Friday 27th August 2010 20:35:07 UTC (over 7 years ago) ```On Fri, Aug 27, 2010 at 15:21, Nathaniel Smith wrote: > On Fri, Aug 27, 2010 at 1:17 PM, Robert Kern wrote: >> But in any case, that would be very slow for large arrays since it >> would invoke a Python function call for every value in ar. Instead, >> iterate over the valid array, which is much shorter: >> >> mask = np.zeros(ar.shape, dtype=bool) >> for good in valid: >>    mask |= (ar == good) >> >> Wrap that up into a function and you're good to go. That's about as >> efficient as it gets unless if the valid array gets large. > > Probably even more efficient if 'ar' is large and 'valid' is small, > and shorter to boot: > > np.in1d(ar, valid) Not according to my timings: [~] |2> def kern_in(x, valid): ..> mask = np.zeros(x.shape, dtype=bool) ..> for good in valid: ..> mask |= (x == good) ..> return mask ..> [~] |6> ar = np.random.randint(100, size=1000000) [~] |7> valid = np.arange(0, 100, 5) [~] |8> %timeit kern_in(ar, valid) 10 loops, best of 3: 115 ms per loop [~] |9> %timeit np.in1d(ar, valid) 1 loops, best of 3: 279 ms per loop As valid gets larger, in1d() will catch up but for smallish sizes of valid, which I suspect given the "non-numeric" nature of the OP's (Hi, Brett!) request, kern_in() is usually better. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."   -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion```
CD: 4ms