Tuesday, July 15, 2014

Papa Parse 3.0 is here, and it's boss

After months of re-writing Papa Parse from the ground up, version 3.0 is finally here. In case you don't know, Papa Parse is a CSV library for Javascript. With it, you can parse CSV files or strings into JSON, and convert JSON back to CSV.

(Nerd alert: Today was also the day of csv,conf in Berlin. Isn't that awesome? I had no idea.)

Papa Parse 3 is a major, breaking change. Here's a quick look at what's new.

First, a quick warning: 3.0 is a breaking change. It's not a drop-in replacement for 2.1... you will break your app! The API is slightly different, and the results are structured differently. Read on for more information.

New results structure

Previously, parse results were returned as "results", "errors", and "meta", and "results" contained the parsed data, or if using a header row, "fields" and "rows". This was confusing and led to awkward code that looked like results.results.rows[0][2] to access any data. The new structure of results is much more consistent and intuitive:

{ data: // array of parse results errors: // array of errors meta: // object with extra info }

The "data" property only ever contains the parsed data as an array, where each element in the array represents a row. In the case of a header row, the "fields" have been moved into "meta" where they belong.

Header and dynamic typing are off by default

In Papa Parse 2.x, the header and dynamic typing were enabled by default. Now, the default is false/off, which is more intuitive. If you want any fanciness, you have to turn it on explicitly.

Eliminated the jQuery dependency

Papa Parse is now fully "Papa Parse" and not the "jQuery Parse Plugin" - the files and repository have been renamed to accommodate. Where before you would invoke $.parse(), now you simply call Papa.parse(). Much more elegant.

Technically, Papa Parse is still a jQuery plugin. If jQuery is defined, it still has the familiar $('input[type=file]').parse(...) binding that you may have used to parse local files. This interface has been improved and parsing files has never been easier.

Since Papa has been completely de-coupled from jQuery, it's easier to use in Node and on pages that don't have or want jQuery brought in.

Unparse - convert JSON to CSV

Papa's specialty is parsing CSV into JSON, or Javascript objects. But now it can export CSV too. It's easy to use:

var csv = Papa.unparse([ ["1-1", "1-2", "1-3"], ["2-1", "2-2", "2-3"] ]);
// 1-1,1-2,1-3
// 2-1,2-2,2-3

Here we passed in an array of arrays, but you could also pass in an array of objects. Even more settings are described in the documentation.

Run in web worker

Long-running scripts, like parsing large files or strings, can lock up the browser. No bueno. Papa Parse 3 can spawn a worker thread and delegate the heavy lifting away from your page. This means your page will stay reactive to mouse clicks, scrolling, etc, while heavy-duty parsing is taking place.

Web workers are actually kind of a pain in some sense, but Papa makes it easy. Just say worker: true:

Papa.parse(file, { worker: true,
complete: function(results, file) { ... }

Download and parse files over the Internet

Papa could parse files locally using FileReader for a while. But now it's easy to download remote files and parse them. This isn't hard even without Papa to do it for you, but the advantage here is that now you can stream the file. So if you have a large file, let's say, 200 MB, sitting on another machine, you can give Papa the URL and it will download the file in chunks and feed you the results row-by-row, rather than loading the whole thing into memory. Big win!

Papa.parse("/files/big.csv", { download: true,
step: function(data) { ... },
complete: function(results) { ... } });

Those are the most notable new features and changes in version 3.0. There's a bunch of other stuff under the hood, too, that you'll benefit from.

Now maybe get your feet wet with the demo page or visit it on GitHub.