The Hidden Cost of Array Iteration: A Java Performance Deep Dive

This technical analysis explores performance implications of different array iteration strategies in Java, comparing row-order versus column-order traversal in multi-dimensional arrays. Through practical code examples, it demonstrates how cache-friendly access patterns can result in up to 50x performance improvements. The article also examines alternative approaches using single-dimensional arrays and the Arrays.fill() method, providing valuable insights for optimizing array operations in Java applications. Key findings suggest avoiding multi-dimensional arrays when performance is critical and considering cache-friendly iteration patterns for optimal execution speed.

3 Minutes reading time

Behold the masterpiece that AI hallucinated while reading this post:

"The Tale of the Speedy Row and the Sluggish Column: A Story About Array Adventures"

(after I fed it way too many marketing blogs and memes)

Created using DALL-E 3

AI-Generated: The Tale of the Speedy Row and the Sluggish Column: A Story About Array Adventures

Lets check a very simple example: we want to fill a multi-dimensional array with values. What is the fastest way to do so? The following piece of code gives us surprising results:

public class CacheTest {

    final int SIZE = 256;

    void rowOrderTest() {
        int theArray[][][] = new int[SIZE][SIZE][SIZE];
        long theTime = System.currentTimeMillis();
        for (int i = 0; i <SIZE; i++) {
            for (int j = 0; j <SIZE; j++) {
                for (int k = 0; k <SIZE; k++) {
                    theArray[i][j][k] = 1;
                }
            }
        }
        System.out.println("RowOrder : " + (System.currentTimeMillis() - theTime));
    }

    void columnOrderTest() {
        int theArray[][][] = new int[SIZE][SIZE][SIZE];
        long theTime = System.currentTimeMillis();
        for (int k = 0; k <SIZE; k++) {
            for (int j = 0; j <SIZE; j++) {
                for (int i = 0; i <SIZE; i++) {
                    theArray[i][j][k] = 1;
                }
            }
        }
        System.out.println("ColumnOrder : " + (System.currentTimeMillis() - theTime));
    }

    void arrayTest1() {

        long theCounter = 0;

        int theArray[] = new int[SIZE*SIZE*SIZE];
        long theTime = System.currentTimeMillis();
        for (int i = 0; i <SIZE; i++) {
            for (int j = 0; j <SIZE; j++) {
                for (int k = 0; k <SIZE; k++) {
                    theArray[i*SIZE*SIZE+j*SIZE+k] = 1;
                    theCounter++;
                }
            }
        }
        System.out.println("ArrayTest1 : " + (System.currentTimeMillis() - theTime) + " #" + theCounter);
    }

    void arrayTest2() {

        int theArray[] = new int[SIZE*SIZE*SIZE];
        long theTime = System.currentTimeMillis();
        Arrays.fill(theArray, 0);
        System.out.println("ArrayTest2 : " + (System.currentTimeMillis() - theTime));
    }

    void arrayTest3() {

        int theCounter = 0;

        int theArray[] = new int[SIZE*SIZE*SIZE];
        long theTime = System.currentTimeMillis();
        for (int i = 0; i <SIZE; i++) {
            for (int j = 0; j <SIZE; j++) {
                for (int k = 0; k <SIZE; k++) {
                    theArray[theCounter] = 1;
                    theCounter++;
                }
            }
        }
        System.out.println("ArrayTest3 : " + (System.currentTimeMillis() - theTime) + " #" + theCounter);
    }

     public static void main(String[] args) {
        CacheTest theTest = new CacheTest();
        theTest.rowOrderTest();
        theTest.columnOrderTest();
        theTest.arrayTest1();
        theTest.arrayTest2();
        theTest.arrayTest3();
    }
}

The rowOrderTest() Method finishes on my machine within 42ms, the columnOrderTest() Method takes about 2195ms to complete, so it is more than 50 times slower than rowOrderTest(), but it does exactly the same! The columnOrderTest() Method just trashes the VM and CPU read ahead caches, so this explains the deadly slow down. This effect can be reproduced with long and byte as data types also.

Some unexpected results are coming from the arrayTest1(), arrayTest2() and arrayTest3() Methods. Here i do not use a three dimensional array, i use a one dimensional array and emulate the other dimensions by offset multiplication. The arrayTest1() Method includes some little math to compute the array index, but on my machine it takes about 33ms to complete, so almost as fast as the rowOrderTest() Method. The arrayTest2() Method uses the SDK Array.fill Method to complete the task, and it still takes about 33ms to complete. The manual array fill in arrayTest3() takes about 36ms to complete, which is almost as fast as Array.fill(). Under the hood, Arrays.fill() does nothing else, so this is not surprising.

Always check for array indexes and iteration algorithms, there might be room for performance improvements. From a performance point of view the best would be to avoid multi dimensional arrays at all and use other data structures.

Git revision: 2e692ad