Monday, June 3, 2013

Reducing memory usage in Go programs ("golang")

For a particular task, I needed to write a program in Go that reads an 800 MB CSV file, which simply contains a table of numbers. It serves as an index. These numbers represent integers that point to locations in ordered lists stored in other files (an efficient way to index large amounts of data where there is moderate-high duplication).

Loading the table into [][]int used up about 9 GB of memory. But it turns out that most of the six fields only needed one byte of space to store their values, not 8 entire bytes like the int type uses. See Go's documentation on integer types. Very important to know their bounds.

By creating a struct with specialized int8 and int16 and, in just one case, int32 types, and loading the CSV file into []MyStruct instead of [][]int, my program now uses 1/3 the memory: about 3 GB, and it's just as fast. So where one row was using about 48 bytes of space (times 46 million -- that's a lot), each row now uses only about 16 bytes, which is -- that's right -- 1/3 the size.