Tuesday, July 15, 2014

Papa Parse 3.0 is here, and it's boss

After months of re-writing Papa Parse from the ground up, version 3.0 is finally here. In case you don't know, Papa Parse is a CSV library for Javascript. With it, you can parse CSV files or strings into JSON, and convert JSON back to CSV.

(Nerd alert: Today was also the day of csv,conf in Berlin. Isn't that awesome? I had no idea.)

Papa Parse 3 is a major, breaking change. Here's a quick look at what's new.

First, a quick warning: 3.0 is a breaking change. It's not a drop-in replacement for 2.1... you will break your app! The API is slightly different, and the results are structured differently. Read on for more information.

New results structure

Previously, parse results were returned as "results", "errors", and "meta", and "results" contained the parsed data, or if using a header row, "fields" and "rows". This was confusing and led to awkward code that looked like results.results.rows[0][2] to access any data. The new structure of results is much more consistent and intuitive:

{ data: // array of parse results errors: // array of errors meta: // object with extra info }

The "data" property only ever contains the parsed data as an array, where each element in the array represents a row. In the case of a header row, the "fields" have been moved into "meta" where they belong.

Header and dynamic typing are off by default

In Papa Parse 2.x, the header and dynamic typing were enabled by default. Now, the default is false/off, which is more intuitive. If you want any fanciness, you have to turn it on explicitly.

Eliminated the jQuery dependency

Papa Parse is now fully "Papa Parse" and not the "jQuery Parse Plugin" - the files and repository have been renamed to accommodate. Where before you would invoke $.parse(), now you simply call Papa.parse(). Much more elegant.

Technically, Papa Parse is still a jQuery plugin. If jQuery is defined, it still has the familiar $('input[type=file]').parse(...) binding that you may have used to parse local files. This interface has been improved and parsing files has never been easier.

Since Papa has been completely de-coupled from jQuery, it's easier to use in Node and on pages that don't have or want jQuery brought in.

Unparse - convert JSON to CSV

Papa's specialty is parsing CSV into JSON, or Javascript objects. But now it can export CSV too. It's easy to use:

var csv = Papa.unparse([ ["1-1", "1-2", "1-3"], ["2-1", "2-2", "2-3"] ]);
// 1-1,1-2,1-3
// 2-1,2-2,2-3

Here we passed in an array of arrays, but you could also pass in an array of objects. Even more settings are described in the documentation.

Run in web worker

Long-running scripts, like parsing large files or strings, can lock up the browser. No bueno. Papa Parse 3 can spawn a worker thread and delegate the heavy lifting away from your page. This means your page will stay reactive to mouse clicks, scrolling, etc, while heavy-duty parsing is taking place.

Web workers are actually kind of a pain in some sense, but Papa makes it easy. Just say worker: true:

Papa.parse(file, { worker: true,
complete: function(results, file) { ... }
});

Download and parse files over the Internet

Papa could parse files locally using FileReader for a while. But now it's easy to download remote files and parse them. This isn't hard even without Papa to do it for you, but the advantage here is that now you can stream the file. So if you have a large file, let's say, 200 MB, sitting on another machine, you can give Papa the URL and it will download the file in chunks and feed you the results row-by-row, rather than loading the whole thing into memory. Big win!

Papa.parse("/files/big.csv", { download: true,
step: function(data) { ... },
complete: function(results) { ... } });

Those are the most notable new features and changes in version 3.0. There's a bunch of other stuff under the hood, too, that you'll benefit from.

Now maybe get your feet wet with the demo page or visit it on GitHub.

Sunday, July 13, 2014

An AWS signing library for Go

Go-AWS-Auth is a comprehensive, lightweight AWS signing library for Go. Simply give it your http.Request and it will sign the request with the proper authentication for the service you're accessing.

Other libraries like goamz can be useful and convenient, but they do come with a cost: less flexibility and a larger code base. Though Go-AWS-Auth only does signing, it is a reliable and transparent way to interact with AWS from Go. And it works directly with your http.Request objects for any AWS service.

Now making requests to AWS with Go is extremely easy:
url := "https://iam.amazonaws.com/?Action=ListRoles&Version=2010-05-08"
client := new(http.Client)

req, err := http.NewRequest("GET", url, nil)

awsauth.Sign(req)

resp, err := client.Do(req)
The library is thread-safe and supports the following authentication mechanisms:
Feel free to use it and contribute if you find ways to improve it!

Monday, May 12, 2014

Code for Javascript Web Workers in same file as main script

Javascript's Web Workers (actually HTML 5's) have a lot of potential and they're surprisingly safe to use. They're safe because we sacrifice one of the main benefits of threading: sharing memory. Values between the main thread and worker threads are copied, not pointed to, so you have that overhead to deal with. (You send messages between threads rather than sharing memory.)

Anyway, in JS libraries especially, it can be useful to spawn worker threads, but it's inconvenient to have those functions in entirely separate files sometimes. I'm not 100% sure of the implications yet, but I had some success with the following template:

/*
Template for worker script in same file as main script;
adapted from http://stackoverflow.com/a/10136565/1048862
to make it a little cleaner
*/
(function(global)
{
  "use strict";

  var isWorker = !global.document;

if (isWorker)
WorkerThread();
else
MainThread();



// Entry point for when we're a worker thread
function WorkerThread()
{
global.onmessage = messageReceived;


// Worker thread execution goes here...
// This is how to pass messages back to the main/parent script
// global.postMessage("From worker!");
// (messages can also be objects)

function messageReceived(e)
{
console.log("Worker received message from parent:");
console.log(e.data);
}
}



// Entry point for the main thread
function MainThread()
{
var SCRIPT_PATH = getScriptPath();

// Main script execution goes here...
// This is how to start new workers and pass messages:
// var w = newWorker();
// w.postMessage("From parent!");


function getScriptPath()
{
var id = "worker" + String(Math.random()).substr(2);
document.write('<script id="'+id+'"></script>');
return document.getElementById(id).previousSibling.src;
}

function newWorker()
{
var w = new global.Worker(SCRIPT_PATH);
w.onmessage = messageReceived;
return w;
}

function messageReceived(e)
{
console.log("Parent received message from worker: ", e.data);
}
}

})(this);

Anyway, I think I might use this again.

Monday, May 5, 2014

Scribbles from GopherCon 2014

Just a few notes I jotted down when I wasn't too mesmerized by the presentations... they probably won't make sense unless you were there or can see the talks. I'm mostly posting these for my own benefit. Anyway...

High Performance Go

  • Use go tool pprof (linux)
  • Avoid short-lived objects on the heap
  • Use the stack or make long-lived objects
  • Benchmark standard library builtins (strconv)
  • Benchmark builtins (defer, hashmap)
  • Don't use channels in performance-critical paths

CSP (Communicating sequential processes)

Unbuffered channels are nice because there's no ambiguity: you know that both sides are communicating right now, and are synchronizing.

Nil channels block; they can be used in selects to know when multiple channels are closed

Bluetooth Low Energy (BLE) and Embedded Go

If you're worried about GC pauses, think about the number of allocs as much as the size of your allocs.

Sysadmins

You don't need to have a big problem to solve to use Go.

Heka (on streaming data)

Basic pattern: read in data from a stream, split at record boundaries, transcode the records to a common format, route to the appropriate service in its own format.

Bind a single struct to a goroutine (or a single goroutine to a struct).

High-performance database in Go

(From basic OS classes) How to think about performance: optimize the following from least optimization efforts to most: Memory access, mutual exclusion, memory allocation, disk I/O, network I/O

MongoDB and Go

Rethink using Rethink.

But maybe rethink using Mongo, too. Choose the right tool for the job.

database/sql

The DB type represents your database; it is not a connection.

Monday, March 24, 2014

An in-depth review of Google Fiber's free service, with pictures

We had Google Fiber installed a couple months ago (Feb 2014). It was activated on the 25th. Previously, we relied on Veracity which, unfortunately, shared the connection with the whole building of condos.

Problems without Google Fiber

There are a few network switches in our basement parking area which feed each apartment with ethernet cables. Those switches fail on occasion.

What's worse, and kind of hilarious, is that if one resident plugged their router in wrong (by inserting the WAN cable into a LAN port on the router), it brought down the Internet for the whole building, because those switches thought the router was the gateway to the WAN.

Fortunately, tracking them down isn't too hard because they still use the default username and password and I can see which devices are connected, and their computers often have their name.

The nice thing is that a dedicated fiber line to each apartment prevents this kind of chaos.

The installation

After signing up, they were very autonomous about installing. I had to do nothing, except when they finished. First, these guys show up at your house:


And they leave you a nice little note:



Then a few days later, a fiber cable appears. It's remarkably tiny (they later put a protective sleeve over it):


Then finally the jack is mounted (read below for more about this):


The router you get is real nice, but has a small fan (read below for more about this, too):


The back is elegant and simple:


The speed tests

Before Google Fiber


With Google Fiber

Since we're poor, our apartment is on the free plan. It's advertised as 5 Mpbs down reliably and consistently, but the upload isn't ever really mentioned. Here's why...


Fortunately, our apartment still has both lines active because the HOA hasn't turned off Veracity service. I can use either.

However: when others in my apartment are downloading something, I get lower speeds... I guess that's how it is, but it would be nice if it was 5 Mbps per client, not total.



They also have a special page, also powered by Ookla, for doing a speed test. Their page considers whether you're connected wirelessly or hard-wired:


But again, when I'm the only one using it, the 5 Mbps down is very fluid and reliable. The upload speed is disappointing, however. 1 Mbps seems a bit too throttled, even for free.

The wall jack

They install a wall jack which, kind of oddly, has to be powered by a power outlet. The nice thing is you get a power strip/surge protector for free, but it's kind of strange that a wall jack has to be plugged in for power. I guess they do this to speed up installations in bulk, which is why they can offer this for free (or cheap if your HOA doesn't pay it for you).

The light doubles as kind of a night-light, since it's so bright. Blue indicates normal operation, but it can also be red. I came home one night after everyone was gone for the day and it was flashing red. I pulled out my phone and loaded a page. After a few seconds, the light turned blue, but I had to reload the page on my phone before it came in successfully. I wonder if the unit hibernated with little or no activity.

They also provide a short CAT6 cable to plug into their router.

The router

Google gives you a sleek black router that supports 2.4 and 5 GHz b/g/n WiFi. It has a small fan, so it makes a quiet hum which you can hear if you're close. And of course, this hardware all supports IPv6.

So how do you manage this thing? Let's take a look.

Managing your Fiber network

By default, Google provides access to very basic management of your network box from anywhere, just by logging into "My Fiber" and clicking on "My Network". You there have the simple options of changing your network name and wireless password, or I think you can turn wireless off, too. That's about it. But it's cool that you can do that from anywhere, by default, without needing to know how to administer a router.

But, of course, the first thing I did was turn on Advanced Mode:


With this mode, you log in using a local IP address and it's very much like a typical router administration interface. Notice at the bottom of that screen they tell you the password so you can log in as admin.

A more advanced look at the admin interface

Their advanced interface isn't quite as pretty, but it's one of the most functional ones I've seen. 

The login is always admin, and a good password:

The landing page tells you some useful high-level info about the state of the network and its clients:
I have blurred out a few things in these screenshots in case they are potentially personally-identifiable (I err on the side of caution). And since my roommates share this network, I've blurred most of their info.

You can examine and manage the wireless radio:


You can see and manage how IP addresses are being assigned to devices. Notice there's also a Dynamic DNS option (not previewed here), which after I fiddled with is, it pretty modern and useful:

 Firewall settings include port forwarding (not pictured)...

 ... and connection details... holy moly, it's an IPv6 heaven:
There's also a summary of the system:
And of the WAN specifically (you'll see that the Fiber link is down; this is because we aren't subscribed to the gigabit plan):

Interesting and proper, I suppose, that the gigabit adapter is a different piece of hardware. It gets its own MAC address.

Conclusion: Yay or Nay?

Politics and business preferences aside, Google Fiber has been a good experience so far. Their support staff is great, their service centers are helpful and educational, their Fiberspace (demo/retail store) is fun, it comes with IPv6 support, and what can I say -- I have free Internet.

Overall perceived download times are about as fast as they were before, maybe slightly faster. But the connection is definitely more reliable. Yes, it is slower when my roommates use it: multiple users have a more terse, adverse effect on speed. But even when the speed is slow, I can usually stream videos without delay, and in HD. (There's exceptions to this regularly, but not nearly as severe or common as it was before.)

So yes, overall I think it is an improvement. Surely the gigabit plan is great, too... though, a note: you have to plug into the router to get that whole gigabit speed, and you have to be using a CAT6 cable, and your computer has to be able to support the throughput. The wireless-N band, if you choose to use that, will still be blazing fast (about 250-350 Mbps), but you won't get true gigabit if you stay wireless.

It was a big deal when Google Fiber came to town. Provo is just eating this up. There are flags and banners on the street lights downtown, and the mayor, city council, and other organizations have been superb in working with them. (Except BYU, but they don't need Google Fiber anyway. On campus I get about a 40 Mbps connection, even with hundreds of students around.)

Sunday, March 23, 2014

Maybe Go without dependency management can be a good thing

Within the Go community, there are at least two frequently discussed issues:
  1. Generics
  2. Dependency Management
My purpose here isn't to advocate either side on either issue, but I do want to suggest one perspective on the lack of an official Go dependency manager.

Decentralized Sources

The go get command goes straight to the source. It's nice that there's (practically) not really any single point of failure that could ruin everyone's go get. While a central repository of packages and their dependencies could solve the dependency management problem, it also can be the bottleneck. Remember what happened to npm a few weeks ago? (Check out the comments on that link. Holy moly!)

Then there's all this other security jazz and downtime that you have to worry about.

And no central repo == no agendas; nobody saying what you are and aren't allowed to publish/distribute (aside from the usual licenses and copyrights we're used to).

I realize that a central repo isn't necessarily the only possible solution, but not having one definitely has its perks.

Go projects stay actively updated

Yes, with no dependency manager, it can be a pain if a library your dependency uses suddenly changes and breaks your code, all the way from two or three dependencies upstream. I've actually never been bitten by something like this. But I hear it happens: either from an intentional API change (major version) or an accidental bug.

The fact is, people still need their code to work. Fortunately, open source projects are easy to contribute to. When an upstream dependency has a breaking change or a new bug, all its downstream users are very quickly motivated to fix it and stop using the deprecated API. This means that projects -- especially popular ones -- generally stay updated as frequently as its dependencies.

Worst-case scenario, somebody forks the project to keep the old, "working" version. One global search+replace later, you're done.

I find the idea of regularly-updated Go projects very appealing. (I know too many Python, C++, and PHP projects which are dead in the water.)

Careful development and deployment

You have a lot of power. Think twice before you type git push origin master and hit Enter. Those hundreds of projects which depend on your package are at your command and mercy. Your push, or your merge into master, may have far-reaching effects.

This encourages developers to use good source control habits: branch for feature development and bug fixes or testing, and merge carefully into master.

It also means you should plan your projects with solid semantic versioning. Decide what 1.0 looks like and stick to it (timelines are irrelevant here). When you introduce 1.0, don't break it. Sure, add stuff for 1.1 and 1.2, but don't break master until 2.0. And if you have enough users, make sure to tell them about it somehow.

As Go is a meticulously engineered, carefully crafted language (mostly), and so should its projects be. Forethought about design and deployment has allowed The Go Authors to stick tightly to their versioning guarantee: Go 1 will have absolutely no breaking changes. And everyone has been happy.

Basically, the Go project is itself a good example of how to develop and deploy your own Go projects.

Just Go with it

Fortunately, this isn't Node or PHP: your production applications are executed from compiled binaries, not straight, interpreted source code. Those binaries doesn't rely on what go get currently goes and gets. Meaning: your production apps are safe (should be safe) from willy-nilly 3rd-party developers that deploy a breaking change upstream. On the occasion that this does happen, you have time. You won't deploy a broken app. You can't compile code that doesn't build, so it won't get into production.

Admittedly, I'm not too passionate about finding a package management solution for Go. I think a few of the unofficial ones are interesting and clever. But for now, I've been totally fine just writing Go code, and when I need a package, I go get it. And I'm a happy little gopher.

Wednesday, March 5, 2014

How to Almost Install Windows 98 natively on a real computer (not a VM)

NOTE: I never got this working all the way. It's almost there, but there's a black screen during booting. It may have something to do with driver malfunctions or incompatibilities. If anyone of you reading this was successful, please comment, and +mention me (since for some reason I don't get comment notifications from Blogger, grrr)!

My cousin and her husband have an old Dell Inspiron they've retired and wanted to use it for some retro gaming, and (free) virtual machines are still no good for that (usually).

This was actually pretty hard to do, since floppies are no more, and the Windows 98 SE CD-ROM isn't bootable. You have to solve the incredible bootstrapping problem before this whole thing comes together on a real machine, as easy as a virtual machine installation is these days.

With only a couple flash drives, a Windows 98 CD-ROM, and a laptop with its hard drive wiped, here's how I got Windows 98 working on it. (Although, I have no idea how to do this once CD-ROM drives go away entirely, since the use of a USB flash drive instead of a floppy disk proved problematic, as you'll see later.)

I'm doing this from my Macbook Pro. For this operation, you'll need:
  • Mac or Linux computer from which to start working
  • Familiarity with dd
  • Windows 98 CD-ROM (or disc image as an ISO file, written to a CD)
  • Windows 98 Boot Disk Image (a 1.5 MB file)
  • Unetbootin
  • gparted
  • 2 USB flash drives (1-4 GB is fine)

Solving the Incredible Bootstrapping Problem

First, we have to figure out how to run the Windows 98 installation on a totally empty hard drive.

Let's write that Win98 boot disk floppy image to one of the flash drives so that we can boot to it. Unfortunately I couldn't figure this out with dd, so I resorted to Unetbootin. Use it to write the floppy disk image to the flash drive. That part is pretty easy. (Sometimes it doesn't work the first time. If you have problems, try formatting the flash drive with Disk Utility as FAT32 then re-try the Unetbootin stuff.)

I'll call this flash drive the one with "DOS" on it.

Next, we need to write gparted to the other flash drive. You can do this easily, using dd:

$ sudo dd if=/path/to/gparted-live-0.16.1-1-i486.iso of=/dev/rdisk3 bs=1m; sync

Be very careful that you get the if and of options straight, and that your device number is correct (use df -h to find the right device number). As you know, you'll also need to unmount, but not eject, the drive, before using dd.

With that done, verify that the laptop's BIOS is configured to boot from a USB device before the internal HDD.

Plug the flash drive with DOS into the laptop and boot the laptop. Make sure you get a small menu asking about CD-ROM support. You can shut down the computer and remove the flash drive. (Just wanted to make sure it works.)

Plug the flash drive with gparted into the laptop and start it up. Load gparted using all the defaults (just press Enter when prompted). While it's loading, plug in the other flash drive (the one with DOS on it). When ready, open the terminal and type these commands to copy the DOS flash drive onto the internal HDD (as always, verify that the device names are correct for your case):

$ sudo su
# dd if=/dev/sdb of=/dev/sda bs=1M; sync

The of option is usually /dev/sda (because the first one is usually the internal hard disk drive).

(I experimented with resizing the partition here to fill the entire hard drive, but even though I left it at the same starting sector, it destroyed the boot property. I even tried resizing it using fdisk and format in the later step below, but that wasn't successful. I guess for now, we're just limited to the size of the flash drive that dd copies over.)

Now that the internal hard drive looks exactly like the bootable DOS flash drive (including the bootable flag), we're ready to restart the computer again. (I also added the lba flag using gparted, but I don't think it's required.)

The reason we have to do all this is because we can't bootstrap the machine directly from the flash drive into the Windows installation. Windows will only install to drive C:, but for some reason, the flash drive is seen as a non-removable drive and is considered the C: drive and the internal HDD gets bumped to D:. Then install starts and you get an error like:
Error SU0013 Setup could not create files on your startup drive and cannot set up Windows. If you have HPFS or Windows NT file system, you must create an MS-DOS boot partition. If you have LANtastic server or SuperStor compression, disable it before running Setup. See SETUP.TXT on Setup Disk 1 or the Windows CD-ROM.
It might have been SU0018 (similar) though; I don't remember exactly. I know I saw 13 somewhere, but it may have been later and on a different attempt.

Anyway, remove the flash drives and insert the Windows CD-ROM. Boot the computer, and it should (cross your fingers) load a rudimentary DOS from the internal disk, just like as if it were from your flash drive.

Installing Windows

Run:

A:\>format c:

When it's finished, give it a label like "WINDOWS". Now don't shut down the computer because we just wiped the hard drive again, and we'd have to start over.

Now run:

A:\>setup

(With some Win98 discs, you can run oemsetup instead, and I don't really know the difference.) This will launch the Windows 98 setup. First it'll do SCANDSK and probably find some errors in the hard drive. Go ahead and let it fix any of the errors it finds so that Windows setup can continue.

Let setup get to the point where it has to restart the computer. Here, I removed the Win98 disc from the CD drive because it tried to boot from it (that was a BIOS mistake on my part -- you can avoid that by booting from the HDD before the CD-ROM, but still after the USB drives), and I don't know if that caused some grief from later or what.


(TODO): Below are some notes from my foray into booting up Windows... maybe this info will help some random peruser of these things, but I haven't had a chance to organize these memos yet. Sorry!



How did I solve this? Error SU0013 Setup could not create files on your startup drive and cannot set up Windows. If you have HPFS or Windows NT file system, you must create an MS-DOS boot partition. If you have LANtastic server or SuperStor compression, disable it before running Setup. See SETUP.TXT on Setup Disk 1 or the Windows CD-ROM.


Add the following line in the [386Enh] section of the file  (WINDOWS\SYSTEM.INI)
      MaxPhysPage=30000

msmouse.vxd == hit Esc / No