[OH-Dev] Antispam updates
Asheesh Laroia
asheesh at asheesh.org
Sat Nov 17 19:21:10 UTC 2012
Howdy all,
Last night I wrote a small toolkit for despamming the site.
Summary:
* We mass-export user data into email-like files.
* We use an "off the shelf" anti-email-spam tool to learn which users'
contents are spam. It takes about 1 minute to classify all 10,000 as spam
or non-spam.
* We then semi-manually pass those usernames to the backend, which emails
the users, archives their data, and then deletes them from the site.
Code and full details here: https://github.com/openhatch/oh-antispam
As a side note, I wrote this in about 24 hours. The code quality is not
amazing. But I am pretty proud of the speed of execution (75 seconds to
analyze all ~10,000 users on my unimpressive laptop) and the fact that
it's an automated, statistical approach. I have not written or maintained
any whitelist/blacklist as part of this antispam effort, which I find
thrilling.
Right now, it isn't fully integrated into oh-mainline; it was written more
as a proof of concept. What I'd *love* to see is someone take this and
make it a Django reusable app, and then it can live as a dependency of
oh-mainline rather than as a part of it.
If we don't make it a dependency at that level, within a week I/we/etc.
should set up a cron job that at least runs the code in this current form
and alerts the site admins when a spammy post is made.
-- Asheesh.
More information about the Devel
mailing list