x86, core: Optimize hweight32()
authorAkinobu Mita <akinobu.mita@gmail.com>
Tue, 22 Dec 2009 00:20:16 +0000 (16:20 -0800)
committerIngo Molnar <mingo@elte.hu>
Mon, 28 Dec 2009 09:41:39 +0000 (10:41 +0100)
commit39d997b514e12d5aff0dca206eb8996b3957927e
treed63202847a8a421fbd1fc39e9a2433dbc86ce104
parent6b7b284958d47b77d06745b36bc7f36dab769d9b
x86, core: Optimize hweight32()

Optimize hweight32 by using the same technique in hweight64.

The proof of this technique can be found in the commit log for
f9b4192923fa6e38331e88214b1fe5fc21583fcc ("bitops: hweight()
speedup").

The userspace benchmark on x86_32 showed 20% speedup with
bitmap_weight() which uses hweight32 to count bits for each
unsigned long on 32bit architectures.

 int main(void)
 {
#define SZ (1024 * 1024 * 512)

static DECLARE_BITMAP(bitmap, SZ) = {
        [0 ... 100] = 1,
};

return bitmap_weight(bitmap, SZ);
 }

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1258603932-4590-1-git-send-email-akinobu.mita@gmail.com>
[ only x86 sets ARCH_HAS_FAST_MULTIPLIER so we do this via the x86 tree]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
lib/hweight.c