Monday, October 14, 2013

Parse large CSV files with Javascript efficiently

Wouldn't it be nice if users could use CSV files on your web app without having to upload it to a remote server? If the file just stays on their own computer, the privacy concerns basically disappear.

Local or remote, though, CSV parsing is hard. There's not a strict standard, so each implementation is a little different. And when I say CSV, I don't necessarily mean literal "comma-separated values" -- the delimiter could be any character. It's also common to see tabs, pipes, and semicolons as separators.

I work with a lot of delimited text data, and too often I find CSV files that are malformed. Usually quotes are not escaped, or quotes are missing around a field that has special characters like newlines, delimiters, or another quote.

But typical CSV parsers are guilty, too. We can do better.

So here's a way to easily parse CSV with Javascript while keeping the process efficient and error-free as possible, even if you encounter malformed content.

Meet Papa Parse

Papa Parse is the result of many hours of effort from several contributors to bring a fully-featured, powerful delimited text parser to the world of Javascript. Most parsers just split on commas or don't handle large files well. Papa Parse is robust and easy to use.

Get Papa Parse from GitHub or try a demo on papaparse.com.

Here's a simple example of parsing a CSV / delimited text string with some custom settings:
results = Papa.parse(csvString, {
    delimiter: ",",
    header: true,
    dynamicTyping: true
});
The second argument (the config) is optional. Papa Parse can automatically guess the delimiter if you don't specify one. Dynamic typing automatically turns numeric values into numbers for you. Here, we've also specified a header row, so data will be keyed to the field names.

To access a value, say, if you're iterating the resulting rows:

results.data[i]["Field Name"]

To parse a file:
$('input[type=file]').parse({
    config: {
        complete: function(results) {
            console.log("Parse results:", results);
        }
    }
});
To stream and parse potentially very large CSV files:
$('input[type=file]').parse({
    config: {
        step: function(data, file, inputElem) {
            console.log("Row data:", data.results);
            console.log("Row errors:", data.errors);
        },
        complete: function() {
            console.log("All done!");
        }
    }
});
Visit PapaParse.com for full documentation and demos.

Papa Parse will always report errors and do its best to handle malformed CSV content.