Humanize

I scratched an itch last week and actually wrote some code (these days my primary IDE is Google Docs). Check out humanize, which replaces any large numbers in text piped into it with shortened versions using SI prefix symbols (K, M, G, T, etc.). So “File size: 972917401236” becomes “File size: 973G”.

This came about because I was debugging an issue with a Parquet file using parquet-tools, whose output looks like:

% parquet-tools meta part-01373-r-01373.gz.parquet | grep "row group"
row group 1:           RC:100 TS:121513011 OFFSET:4 
row group 2:           RC:100 TS:44877183 OFFSET:14293218 
row group 3:           RC:111 TS:28645460 OFFSET:19704281 
row group 4:           RC:168 TS:31794536 OFFSET:23387243 
row group 5:           RC:363 TS:28078938 OFFSET:27531686 
row group 6:           RC:15547 TS:899825873 OFFSET:31242035 
row group 7:           RC:9963 TS:47930700 OFFSET:142353955 
row group 8:           RC:20100 TS:911417774 OFFSET:155302131 
row group 9:           RC:20100 TS:889810662 OFFSET:269749697 
row group 10:          RC:10100 TS:855908709 OFFSET:382090935 
row group 11:          RC:7282 TS:1532744480 OFFSET:486019001 

Tired of squinting and counting digits to see how big some of these numbers were, I figured there’d be some easy bash/sed way to convert these numbers into something more readable. It is pretty easy to do this for isolated numbers using numfmt:

% echo "1234567890" | numfmt --to=si --round=nearest
1.2G

But that doesn’t work on numbers embedded in text:

% echo "This is a big number: 1234567890" | numfmt --to=si --round=nearest
numfmt: invalid number: ‘This’

GNU sed can do a replacement with the result of a function, but it’s rather ugly:

% echo "This is a big number: 1234567890" | sed -re 's/[0-9]+/`numfmt --to=si --round=nearest &`/g;s/.*/echo &/;e'
This is a big number: 1.2G

But I’m doing this work on a Mac, which does not come with GNU sed (or numfmt, for that matter, although you can brew install coreutils to get gnumfmt). I started working on a perl one-liner to do it, but decided this was something that needed to exist on its own, and should be easy for anyone to install.

I chose to write it in Go because it has an easy way for people to install binaries with go get. (Rust has something similar with cargo install, but Go is more common in my workplace.) The problem with go get is that you have to run it again to upgrade, so for my fellow Mac users I added a Homebrew tap.

Long story short:

% parquet-tools meta part-01373-r-01373.gz.parquet | grep "row group" | humanize -binary
row group 1:           RC:100 TS:116Mi OFFSET:4 
row group 2:           RC:100 TS:43Mi OFFSET:14Mi 
row group 3:           RC:111 TS:27Mi OFFSET:19Mi 
row group 4:           RC:168 TS:30Mi OFFSET:22Mi 
row group 5:           RC:363 TS:27Mi OFFSET:26Mi 
row group 6:           RC:15Ki TS:858Mi OFFSET:30Mi 
row group 7:           RC:10Ki TS:46Mi OFFSET:136Mi 
row group 8:           RC:20Ki TS:869Mi OFFSET:148Mi 
row group 9:           RC:20Ki TS:849Mi OFFSET:257Mi 
row group 10:          RC:10Ki TS:816Mi OFFSET:364Mi 
row group 11:          RC:7Ki TS:1Gi OFFSET:464Mi

ReadBot

I wrote a Slack bot, ReadBot, to both scratch an itch and check out the current state of Slack integration. As a bonus, I got to spend some time getting familiar with Glitch, a pretty cool live coding platform.

ReadBot lets you connect your Goodreads and Pocket accounts to Slack, and then adding a bookmark (🔖) reactji to a message with an Amazon book link in it will add that book to your Goodreads “to-read” shelf, or any other URL to your Pocket account.

The Slack API and documentation has improved immensely since I last checked it out a couple years ago. I was clearly not the only one to get confused by profusion of integration options (bot? slash command? webhook?) , so they’ve improved their API landing pages to help guide you. There are also plenty of tutorials to get you started.

One of the fears when diving into an API integration is whether that API—or even integrations in general—are going to be supported in the future. Many developers have been burned by neglected, deprecated or even shutdown APIs. Slack, however, seems to be encouraging integrations whole-heartedly with its app directory.

One challenge of creating a new Slack integration has been Slack’s requirement that it be served over SSL. While Let’s Encrypt has made this a lot easier, setting up an SSL-enabled server is still a significant hurdle to clear just to start developing your integration. This is where Glitch comes in.

Glitch is a collaborative live coding environment made by Fog Creek Software. It’s super easy to get started, especially since you can find existing public projects on the site and “remix” them to make them your own. And every project has its own SSL-enabled hostname, right out of the box.

Glitch apps are Node.js-based (they seem to allow Python apps as well, but support is currently unofficial). The app restarts itself on the fly as you edit (although this can be disabled), so your app is always live. Autocomplete package search makes installing new packages a breeze. You can do pretty much everything you need to through the UI, but if you need, you can drop into a shell and do whatever you like to your instance.

There’s no built-in revision/commit history, but you can export your project to GitHub (as well as import from GitHub). Since your app is just a regular Node app, it’s perfectly portable to any other environment.

One general issue with writing Slack bots is that in order to continue working on your app after it is in production, you need to have separate development apps and endpoints. It would be nice if Slack made this a little easier, perhaps by allowing you to specify separate production and development endpoints, slash commands, etc.

I have no plans on publishing ReadBot to the Slack App Directory because I don’t want to have to maintain another production service. If you’re interested in doing so, go for it.

Rescuing Windows with Hammerspoon

I recently got a new monitor and it came with a problem. For some reason when I attached or detached my laptop, many of my windows would end up off the edge of the screen, and I’d have to drag them back over one by one. It was annoying enough that I decided I’d write an app to “rescue” them.

I figured it wouldn’t be too hard. Get a list of active windows, see which are off-screen, and move them over. But it was hard. My code only half-worked. And for some reason I couldn’t get Chrome windows to move.

I started looking around and found Swindler. It seemed to be just what I needed and confirmed that I wasn’t completely incompetent:

Writing window managers for macOS is hard. There are a lot of systemic challenges, including limited and poorly-documented APIs. All window managers on macOS must use the C-based accessibility APIs, which are difficult to use and are surprisingly buggy themselves.

But then I saw Hammerspoon listed at the bottom of the Swindler README, under Related Projects. Intrigued, I checked it out. It was just what I needed.

Hammerspoon lets you write Lua scripts to interact with the MacOS system, and it’s pretty powerful. The Getting Started Guide gives a good sense of some of the things you can do with it.

I’d never written Lua before, but with plenty of examples to crib from and good API documentation, it didn’t take me long to come up with a little script that did just what I needed:

I have that file saved at ~/.hammerspoon/rescuewindows.lua, and in ~/.hammerspoon/init.lua I have:

local rescueWindows = require "rescuewindows"
hs.hotkey.bind({"cmd", "alt", "ctrl"}, "R", rescueWindows)

So when it hit ⌃⌥⌘R, all my off-screen windows slide over to my main screen. Nice!

Going "Serverless"

When my friend Nelson and I set out to build our location tracking app & accompanying website, Wanderings, we started down the familiar path of choosing our stack. Flask seemed like a reasonable choice. Postgres for the database. SqlAlchemy for the ORM? Sure, I guess. We figured we’d try hosting on GCP, mostly to learn more about the platform. Do we bother with containers? Meh, leave those for the youngsters. And so on.

But the more we thought about hosting our own service, the less enthused we were. The GCP costs were piling up, and we didn’t really want to be responsible for having a copy of our users’ location data. One of the driving ideas behind Wanderings was having a location tracker where you controlled your own data. Why should you trust us with a copy of it?

I should back up and talk a little about what Wanderings is and how it works.

Back in 2011 there was a small uproar when it was revealed that Apple was apparently tracking your every move with your iPhone. Out of that grew a project, OpenPaths, that allowed you to generate your own location data, and have full control over it. The app used very little energy, so you could just start it and forget it.

However, with the release of iOS 11, OpenPaths stopped working, and its accompanying website, openpaths.cc, no longer resolved. It appeared to be no longer maintained. There were some other location tracking apps out there, including Google Timeline, Fog of World, and Strut, but none that had the combination of strong privacy guarantees , simplicity, and low-energy usage that we were looking for. So we decided to make our own.

The Wanderings iPhone app is pretty basic. It uses iOS’ significant location change service to only update your location when you move by a significant amount. It also relies on wifi and cellular for location information rather than power-hungry GPS. It does all this in the background, so you can start it once and forget it.

New location data is uploaded to your private, secured CloudKit database on iCloud, so you are the only one with access to your data.

So back to our server conundrum.

Wanderings Screenshot One of our requirements was a website that would render the full heat map of your travels, let you export that data, and eventually let you do a lot of interesting things with your data. Of course, it doesn’t make sense to re-download all your location data every time you visit (especially since CloudKit only lets you pull 200 records per request) so we needed some kind of cache of your data in the app, as in a database.

But we don’t really want your data. And we really don’t want to deal with maintaining a server somewhere, either.

Fortunately, for a few years now browsers have supported IndexedDB, an embedded transactional database with sizable storage limits. We realized if we used this as our cache, the app could be a static site (something Nelson had explored years ago), which we can host cheaply anywhere. In fact, we ended up just using GitHub Pages to host the site right out of our repository.

Using a browser-embedded database has the downside of having to re-download all your data when you go to a new browser or if you’ve cleared your cache, but we think that’s an acceptable trade-off to have complete control of your data.

So check it out and let us know what you think. Send your bugs/feature requests/love to support@wanderin.gs.

Code Signing Will Kill Me Yet

This is one of those I-spent-long-enough-stumped-on-this-issue-I-should-write-it-up-for-future-generations posts.

I wrote a little app at work (which I’ll talk about in a future post), and I was having a strange code signing issue. The app would sign just fine:

$ spctl --assess -v Orwell.app
Orwell.app: accepted
source=Developer ID

But then if I zipped it up and sent it to someone, when they unzipped it, OS X would tell them that it was damaged and should be thrown away:

Blech

Sure enough, if I zip and unzip the app, and verify the code signing again, I get:

$ spctl --assess -v Orwell.app
Orwell.app: a sealed resource is missing or invalid

After much hair-pulling, I found Technical Note TN2318: Troubleshooting Failed Signature Verification, which says:

The file prefixed with “._” is problematic and a was the result of copying certain Mac OS X files to a non-HFS+ formatted disk. These files are referred to as Dot files, Apple Double files, or resource forks. They are invisible to Finder but can be removed using the dot_clean utility.

Sure enough:

$ find Orwell.app -name "._*"
Orwell.app/Contents/Resources/app/node_modules/applescript/._package.json
Orwell.app/Contents/Resources/app/node_modules/applescript/lib/._applescript.js
Orwell.app/Contents/Resources/app/node_modules/applescript/samples/._execString.js
$ dot_clean Orwell.app
$ find Orwell.app -name "._*"

Running dot_clean before I signed the app fixed the issue.

(Orwell is an Electron app, hence the node_modules dir, in case you were wondering.)