Sunday, May 10, 2009

Nerd Alert!

The following post is intended for a very technical audience. Consultant supervision is advised.

I wrote a thingee that lets you mount "The Web" as a filesystem. So far it works under Mac OS X, but I intend to port it to Linux. It uses MacFUSE for the filesystem interface - makes it much easier to write without kernel muckery. It's all written in C. It's available on GitHub. It's called CREST fs - Cached REST, or Cached Representational State Transfer.

The thinking is that The Web, as a whole, is a set of resources, which I should be able to access like files from a file system. So as opposed to doing something like trying to curl http://samplesamp.sa/apage/something, then trying to run the code that lives on that page, you could instead mount the web under some directory, then cd to samplesamp.sa and execute apage/something directly. The first access to 'something' would require it be fetched from across the internet, but with a caching scheme built in, subsequent accesses should be from the local hard-drive caching system.

I thought it would be clever to be able to mount it under a folder called /http:/ - so you could say:
ls -al /http:/www.google.com/someurl/something
See? I think that's clever. You could probably put two slashes there and it wouldnt' mess up anything. And if you were sitting in the / directory you could skip the first slash. But that's starting to get too clever just for cleverness's sake.

There's stuff like that Out There already, but I wanted to build something that was extremely aggressive about caching, and very primitive (low-level, usable in Early boot environments). I expect it to notbe remotely coherent, but I do want it to be fast. So far, so good.

Sidenote: it's my first Git project. Git is nice. Hosting it on GitHub is interesting, too, but less interesting than the fact that it's on Git.

Writing code in C is painful. Allocating memory is not fun. Troubleshooting subtle memory leaks is not fun. Doing your own string manipulation by hand is not fun. But there is a certain feeling you get from being this close to the bare metal of the hardware...that's really pretty exciting.

This is the second time I've written this - the first version was lost in the Great Hard Drive crash of '08. Writing all this low-level crap is not that cool, but writing it for the second time is even less cool.

The intent behind all of this is to tie it in with Braydix somehow - to allow you to boot a "minimal" Braydix image from CD or USB key, and have it pull the rest from Teh Intarwubs. I have a new client who does a lot of work within Amazon's EC2 environment, so I've had to study how that works. I think this FS might be interesting for that, too. As soon as I can find an excuse to put something up over there, I definitely will mess around with that too. Once I've done that, just think of it - they'll be Braydix both client and server versions.

Friday, May 01, 2009

More Spam

Ugh.

So my clever hack about RSET apparently triggers problems in feeble, horrible, nasty mail clients like Eudora - which one of my client's clients actually uses. So I had to back out my change. It was funny to hear someone read me my 'garbage' message right back to me, though.

So in the process of poking around, I found that there was already a feature in the qmail chkuser patch which allows you to set a number of bad recipients before which you are over your limit. So I enabled that. And it did not at all stem the flood, because it simply just rejected all subsequent attempts with 400-series messages - not disconnecting the sender.

So once again, I jumped in to the code. And I made it so that it actually disconnects you instead of just marking subsequent connection attempts as automatically-failing.

This seems like it's working. I have 6500 IP's in my self-written blacklist, and the smtp server-load has dropped to half. It's still there, though, so I'll have to keep an eye on it.

All in all, not a fun day...

Spam

Spammers are nasty little pieces of work.

It's been a constant cat-and-mouse game where we (anti-spammer people!) take a few steps forward, then the spammers hit us back twice as hard.

This time, they're doing some kind of distributed dictionary attack. So that means that thousands upon thousands of computers across the globe are all trying to send mail to various mailservers (including one I'm responsible for) looking like "joe@domain.com, jack@domain.com, jeb@domain.com, jorge@domain.com..." for several domains that we host.

The problem is - they are slamming the servers so hard that they're starting to overpower the DNS blacklists we use to block spammers. And they're not showing up in the blacklists always.

So my idea was to find out when someone fails to send mail to 5 or 10 accounts in a row, and then add them to a blacklist. I wrote a simple PHP script to do that, and it works...eh, okay. Not stellar. I even added in a piece that kill -9's their smtp process when they get listed, it doesn't always seem to work right. Maybe they're coming in 20 times at once, or something.

So I've run my little blacklister script for a while - and as of press time I have about 5100 IP's in my block list. And it doesn't really seem like it's getting any better. I finally turn on 'record entire SMTP conversation'.

So this is what they're doing -

HELO IMASPAMMER
MAIL FROM:<somelikelyinnocentvictim@somerandomdomain.com>
RCPT TO:<joe@domain.com>
RCPT TO:<jack@domain.com>
RCPT TO:<jeb@domain.com>
RCPT TO:<jorge@domain.com>

To which it gets answers like:

451 No such user 'joe@domain.com'
451 No such user 'jack@domain.com'

etc.

So here's the clever bit - then they do:

RSET


Which apparently just 'resets' the SMTP communication, and start again to do the next five recipients. Ugh.

So now it's time to dust off the ole C coding, and I've rewritten the 'rset' command to now say:

502 Just send your mail again, don't pull this RSET garbage.

And disconnect 'em. That seems to have helped a lot - with the spammers having to reconnect, they get a second chance to get looked-up in the blacklists, or checked against on the my own custom blacklist. Load is reduced - though not eliminated. I guess we'll see how well it works.

My next thing will be to augment this username-check with a counter, and if the counter goes about 'n' bad lookups, bounce the connection. That could help as well - but I don't think by as much as what I've done so far.