Java : Local variables inside a loop and performance

Overview

Sometimes a question comes up about how much work allocating a new local variable takes.  My feeling has always been that the code becomes optimised to the point where this cost is static i.e. done once, not each time the code is run.

Recently Ishwor Gurung suggested considering moving some local variables outside a loop. I suspected it wouldn’t make a difference but I had never tested to see if this was the case.

The test.

This is the test I ran

public static void main(String… args) {
for (int i = 0; i < 10; i++) {
testInsideLoop();
testOutsideLoop();
}
}

private static void testInsideLoop() {
long start = System.nanoTime();
int[] counters = new int[144];
int runs = 200 * 1000;
for (int i = 0; i < runs; i++) {
int x = i % 12;
int y = i / 12 % 12;
int times = x * y;
counters[times]++;
}
long time = System.nanoTime() – start;
System.out.printf(“Inside: Average loop time %.1f ns%n”, (double) time / runs);
}

private static void testOutsideLoop() {
long start = System.nanoTime();
int[] counters = new int[144];
int runs = 200 * 1000, x, y, times;
for (int i = 0; i < runs; i++) {
x = i % 12;
y = i / 12 % 12;
times = x * y;
counters[times]++;
}
long time = System.nanoTime() – start;
System.out.printf(“Outside: Average loop time %.1f ns%n”, (double) time / runs);
}

and the output ended with

Inside: Average loop time 3.6 ns
Outside: Average loop time 3.6 ns
Inside: Average loop time 3.6 ns
Outside: Average loop time 3.6 ns

Increasing the time the test takes to 100 million iterations made little difference to the results.

Inside: Average loop time 3.8 ns
Outside: Average loop time 3.8 ns
Inside: Average loop time 3.8 ns
Outside: Average loop time 3.8 ns

Replacing the modulus and multiplication with >>, &, + I got

int x = i & 15;
int y = (i >> 4) & 15;
int times = x + y;

prints

Inside: Average loop time 1.2 ns
Outside: Average loop time 1.2 ns
Inside: Average loop time 1.2 ns
Outside: Average loop time 1.2 ns

While modulus is relatively expensive the resolution of the test is to 0.1 ns or less than 1/3 of a clock cycle. This would show any difference between the two tests to an accuracy of this.

Using Caliper

As @maaartinus comments, Caliper is a micro-benchmarking library so I was interested in how much slower it might be that doing the code by hand.

public static void main(String… args) {
Runner.main(LoopBenchmark.class, args);
}

public static class LoopBenchmark extends SimpleBenchmark {
public void timeInsideLoop(int reps) {
int[] counters = new int[144];
for (int i = 0; i < reps; i++) {
int x = i % 12;
int y = i / 12 % 12;
int times = x * y;
counters[times]++;
}
}

public void timeOutsideLoop(int reps) {
int[] counters = new int[144];
int x, y, times;
for (int i = 0; i < reps; i++) {
x = i % 12;
y = i / 12 % 12;
times = x * y;
counters[times]++;
}
}
}

The first thing to note is the code is shorter as it doesn’t include timing and printing boiler plate code.  Running this I get on the same machine as the first test.

0% Scenario{vm=java, trial=0, benchmark=InsideLoop} 4.23 ns; σ=0.01 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=OutsideLoop} 4.23 ns; σ=0.01 ns @ 3 trials

benchmark   ns linear runtime
InsideLoop 4.23 ==============================
OutsideLoop 4.23 =============================

vm: java
trial: 0

Replacing the modulus with shift and and

0% Scenario{vm=java, trial=0, benchmark=InsideLoop} 1.27 ns; σ=0.01 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=OutsideLoop} 1.27 ns; σ=0.00 ns @ 3 trials

benchmark   ns linear runtime
InsideLoop 1.27 =============================
OutsideLoop 1.27 ==============================

vm: java
trial: 0

This is consistent with the first result and only about 0.4 – 0.6 ns slower for one test. (about two clock cycles), and next to no difference for the shift, and, plus test.  This may be due to the way calliper samples the data but doesn’t change the outcome.

It is worth nothing that when running real programs, you typically get longer times than a micro-benchmark as the program will be doing more things so the caching and branch predictions is not as ideal.  A small over estimate of the time taken may be closer to what you can expect to see in a real program.

Conclusion

This indicated to me that in this case it made no difference.  I still suspect the cost of allocating local variables is don’t once when the code is compiled by the JIT and there is no per-iteration cost to consider.

 

Source: http://vanillajava.blogspot.ru/2012/12/local-variables-inside-loop-and.html

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s