LINQ

I'm not a C#/.Net programmer, but LINQ is too cool. You can basically use SQL-like querying on just about any collection, whether it be an array in memory, a database, and XML structure. This is one of those "why didn't anyone think of this before?" things. Yes, there have been similar tools for querying objects for a long time, but nothing this elegant. Watch the video for a quick (well, 30 minute) intro, or, if you prefer text, read the overview document.

New Pics

Finally got off my butt and posted some more pics to Flickr.


Spam Filtering Idea

I share a hosted server with a few friends. We host a number of sites (including this one) and our own email. We have SpamAssassin set up and tuned pretty nicely (thanks, Jelo). Unfortunately, it's killing our server. SpamAssassin can suck up a lot of resources. Ideally we'd have a server dedicated to processing mail, but we can't afford that.

So here's my idea: a spam filtering cluster. Have a bunch of [trusted] home-based servers set up with SpamAssassin and dynamic DNS. A mail comes into the main mailserver and if it isn't on a whitelist, it gets shipped out to one of the filter machines. The filter machine runs SA on it and lets the main server know the result (it doesn't need to sent the mail back, so no worries about low upload speeds). Emails over a certain size would probably just be processed on the main server to avoid the bandwidth and time required to send it out (large emails are rarely spam anyway). Finally, in order for the Bayesian filtering to function correctly, you'd have to periodically sync the data from the Bayesian learner out to the filter machines.

I've looked at SpamAssassin a bit and I think it wouldn't be too hard. SA comes with a client and a server, spamc and spamd. Incoming mail gets piped through spamc, which takes care of shipping it off to spamd for processing. Spamd can live on any other host. It can also just report back whether the email was spam or not, rather than returning the entire email. So the only two things left to do are:

  1. Write a small client (spamb?) that maintains a list of hosts running spamd. When a message comes in, pick the next hostname in the list, and pass it on to spamc. Spamc sends it to the appropriate spamd host, and receives the response--either yes/no or in our case, the full SA report, which gets attached to the message headers. If we get the full message back, that means that spamc timed out trying to contact that spamd host. In this case, mark that host as being down, along with a timestamp, and take it out of the rotation for a certain period of time.
  2. A way to distribute the users' bayesian data files and prefs to the remote systems. Apparently spamd can read user from a SQL database, although I haven't looked into it to see if the bayesian learner data can be stored in a database. If so, that's an easy solution to the problem. Otherwise, you could just write a script that checks for changes in any of the user files and if it sees them, rsyncs them to each of the hosts.

I'll let you know what happens if I get around to trying this out.

Ruby on Rails Problem with OS X

I've been playing around with Ruby and Ruby on Rails lately. I'm not that far into it yet, but they both look interesting. I'll write more about my experiences later.

In the meantime, if you're setting this up on a Mac (running Tiger) like I am, do yourself a favor and before you start getting into any of the Rails tutorials (like Rolling With Ruby on Rails), go here first:

Ruby On Rails, Mysql, and OSX Tiger Woes

Do what it says (including the fix it mentions), and you'll save yourself a lot of grief. Note that as one of the commentors mentions, it works just fine with mysql-ruby-2.7 also.

Back to my tutorial...

What do ants smell like?

Just got an appointment reminder from Kaiser. Must remember not to put on my Parfum de la Fourmi.

kaiser sm