Thursday, May 28, 2009

Swap Ints, Two Variables, No Waiting

Who hasn't had to swap two ints? Everyone who implemented a sort algorithm, for example, had to swap two values. What does this function look like? Probably something like this:

static void SwapInts(ref int first, ref int second)
{
    int aux = first;
    first = second;
    second = aux;
}

And it works too!

static void Main(string[] args)
{
    int a = 15;
    int b = 320;
    Console.WriteLine("a: {0}\tb: {1}", a, b);

    SwapInts(ref a, ref b);

    Console.WriteLine("a: {0}\tb: {1}", a, b);
}
a: 15   b: 320
a: 320  b: 15
Press any key to continue . . .

My parents are so proud!

Anyways, a co-worker challenged me to implement the SwapInts method without using an auxiliary variable. We all know that patience is a virtue, and I know that it is something I should work on. I got frustrated within minutes and demanded the solution to this. It turned out not so complex after all. Check it out:

static void SwapIntsXor(ref int first, ref int second)
{
    first ^= second;
    second ^= first;
    first ^= second;
}

This works because xoring first and second produces a number that when xored with first we get second and when xored with second we get first. Pretty slick! Let's take a deeper look. Remember the xor table?

XOR01
001
110

Now, a pretty cool characteristic about xor is that when you xor twice you get back where you started:

0011
0101
0110

And once again…

0110
0101
0011

Bingo!

Now lets breakdown our new method. Consider A to be the first value, B the second and C the xored value.

first ^= second; -> first is C / second is B
second ^= first; -> first is C / second is A
first ^= second; -> first is B / second is A

This got me thinking a little. Of course this has absolutely no use in real life code, but it seemed as in terms of performance we could have a little improvement. So, no time to waste, I benchmarked them:

static void Main(string[] args)
{
    int a = 15;
    int b = 320;

    // Warm up
    SwapIntsXor (ref a,ref b);
    SwapInts (ref a, ref b);

    Benchmark(100000000, 5);
}

static void Benchmark(int innerIterations, int outterIterations)
{
    int a = 15;
    int b = 320;
    Stopwatch sw = new Stopwatch();

    for (int i = 0; i < outterIterations; i++)
    {
        sw.Start();
        for (int j = 0; j < innerIterations; j++)
        {
            SwapInts(ref a, ref b);
        }
        sw.Stop();
        Console.Write("SwapInts: {0:ss.fff} - ", new DateTime(sw.Elapsed.Ticks));
        sw.Reset();

        sw.Start();
        for (int j = 0; j < innerIterations; j++)
        {
            SwapIntsXor(ref a, ref b);
        }
        sw.Stop();
        Console.WriteLine("SwapIntsXor: {0:ss.fff}", new DateTime(sw.Elapsed.Ticks));
        sw.Reset();
    }
}

Output:

SwapInts: 00.190 - SwapXor: 00.558
SwapInts: 00.185 - SwapXor: 00.548
SwapInts: 00.189 - SwapXor: 00.547
SwapInts: 00.187 - SwapXor: 00.554
SwapInts: 00.187 - SwapXor: 00.546
Press any key to continue . . .

The conventional swap is faster than the new one. I can't stress enough that this was just for experimental purposes and to satisfy my own curiosity. By any means, even if with xor was significantly faster, would I recommend it since the conventional one is well known, straight forward and much, much easier to understand.

Anyways, since I've never really tried to understand any CIL, this seemed like a sweet opportunity to take the first steps and understand what makes it slower. Let's analyze the SwapInts method:

.method private hidebysig static void  SwapInts(int32& first,
                                                int32& second) cil managed
{
  // Code size       11 (0xb)
  .maxstack  2
  .locals init ([0] int32 aux)
  IL_0000:  ldarg.0
  IL_0001:  ldind.i4
  IL_0002:  stloc.0
  IL_0003:  ldarg.0
  IL_0004:  ldarg.1
  IL_0005:  ldind.i4
  IL_0006:  stind.i4
  IL_0007:  ldarg.1
  IL_0008:  ldloc.0
  IL_0009:  stind.i4
  IL_000a:  ret
} // end of method Program::SwapInts
  • IL_0000 Pushes first's address into the stack.
  • IL_0001 Pops the address to fetch and push the actual Int32 value.
  • IL_0002 Pops first's value and stores it into local variable aux.
  • IL_0003 Pushes first's address into the stack.
  • IL_0004 Pushes second's address into the stack.
  • IL_0005 Pops the address to fetch and push the actual Int32 value.
  • IL_0006 Pops second's value and first's address to store the value at the address.
  • IL_0007 Pushes second's address into the stack.
  • IL_0008 Pushes the value at local variable aux.
  • IL_0009 Pops aux's value and second's address to store the value at the address.
  • IL_000a Gets the hell outta here.

Click to display diagram.

To sum up, this method uses only 2 levels of the stack, declares 1 local variable, loads 4 addresses, loads 2 values through these addresses, loads 1 value through a local variable, stores 2 values to addresses, stores 1 value to a local variable and performs no operations.

Note: The stack mentioned above is a conceptual stack called the Virtual Execution System. This is not "the stack" everyone talks about when referring to where the local and parameter value types are stored. That being said lets move on...

Now we should check out the SwapIntsXor method:

.method private hidebysig static void  SwapIntsXor(int32& first,
                                                   int32& second) cil managed
{
  // Code size       22 (0x16)
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  dup
  IL_0002:  ldind.i4
  IL_0003:  ldarg.1
  IL_0004:  ldind.i4
  IL_0005:  xor
  IL_0006:  stind.i4
  IL_0007:  ldarg.1
  IL_0008:  dup
  IL_0009:  ldind.i4
  IL_000a:  ldarg.0
  IL_000b:  ldind.i4
  IL_000c:  xor
  IL_000d:  stind.i4
  IL_000e:  ldarg.0
  IL_000f:  dup
  IL_0010:  ldind.i4
  IL_0011:  ldarg.1
  IL_0012:  ldind.i4
  IL_0013:  xor
  IL_0014:  stind.i4
  IL_0015:  ret
} // end of method Program::SwapIntsXor

It's seems heavier already! Let's start:

  • IL_0000 Pushes first's address into the stack.
  • IL_0001 Duplicates the top value of the stack.
  • IL_0002 Pops the address to fetch and push the actual Int32 value.
  • IL_0003 Pushes second's address into the stack.
  • IL_0004 Pops the address to fetch and push the actual Int32 value.
  • IL_0005 Pops second's value and first's value to perform a xor and push the result.
  • IL_0006 Pops the xor's result and first's address to store the value at the address.
  • IL_0007 Pushes second's address into the stack.
  • IL_0008 Duplicates the top value of the stack.
  • IL_0009 Pops the address to fetch and push the actual Int32 value.
  • IL_000a Pushes first's address into the stack.
  • IL_000b Pops the address to fetch and push the actual Int32 value.
  • IL_000c Pops first's value and second's value to perform a xor and push the result.
  • IL_000d Pops the xor's result and second's address to store the value at the address.
  • IL_000e Pushes first's address into the stack.
  • IL_000f Duplicates the top value of the stack.
  • IL_0010 Pops the address to fetch and push the actual Int32 value.
  • IL_0011 Pushes second's address into the stack.
  • IL_0012 Pops the address to fetch and push the actual Int32 value.
  • IL_0013 Pops second's value and first's value to perform a xor and push the result.
  • IL_0014 Pops the xor's result and first's address to store the value at the address.
  • IL_0015 Gets the hell outta here.

Click to display diagram.

To sum up, this method uses 3 levels of stack, declares no local variables, loads 6 addresses, duplicates 3 of these, loads 6 values through these addresses, stores 3 values to addresses and performs 3 operations. This one is definitely heavier than the previous.

I believe is worth noting that CIL doesn't necessarily translate directly or proportionally to machine code, it's just one step closer. Analyzing the machine code is a whole other story; a story I don't plan getting close to anywhere in the near future.

Random thoughts: I wonder why .maxstack is set on 8 for SwapIntXor when 3 would suffice, but for what I've seen this is pretty common. Anyhow, I have no idea if anyone will find all this information useful at all, but it was an extremely helpful exercise for me and a lot of fun.

Edit: Switched the benchmark. It was previously done in debug mode.

Follow up: Who's of First

No comments:

Post a Comment