This site is an archive; learn more about 8 years of OpenHatch.

[OH-Dev] A plan for dealing with spam on the "answers" on project pages

Asheesh Laroia asheesh at asheesh.org
Mon Aug 27 19:24:07 UTC 2012


Howdy, all,

I have a plan for spam (and wanted to see if others had other thoughts). 
(Also, if you want to help, that would be awesome.)

Over the past few weeks, on the live site, we've been getting some spammy 
"answers" to questions like, "How do I get involved in a project, without 
coding?"

The spammy "answers" are usually really blatantly spam -- just links to 
irrelevant shopping websites. In fact, I wrote a blacklist of about 15 
terms that has caught every one of them so far, with zero false positives.

What I do right now is, periodically, run a script I wrote that lives on 
the deployment that looks for all the "Answer" objects matching any terms 
in the blacklist; it prompts me for each one, and I say "yes", and then 
after it makes a list of the ones that are spam.

The problem with this method is that spam stays on the site until I run 
the script once a day (usually in the morning).

What I'm thinking of doing is the following: letting the spammy answers be 
"saved", but only show them to the user who created them. Then, once a 
day, I (or someone else) gets an email with a list of spammy answers, and 
if there are any, takes some action (like deleting their account, or just 
deleting the spammy answers).

Here's how I imagine it working technically for creating this "moderation" 
system:

* Modifying the "Answer" model to have a new field called is_hidden 
(boolean)

* On save to an Answer object, we have pre_save hook that checks if the 
answer looks spammy (based on our blacklist). If so, we set is_hidden to 
True.

* In the templates, if an Answer has is_hidden set to True, then we only 
display it if its owner is the same as the user currently logged-in.

* On the front page, in the news feed, we only show Answer objects that 
have is_hidden set to False.

For the actual spam removal, I imagine we'll keep the current process of 
someone SSH-ing into the deployment and removing the spam by running a 
little bit of Python code.

Thoughts? Volunteers? (:

(The code for the current tool is sitting here: 
https://openhatch.org/bugs/issue624 )

-- Asheesh.




More information about the Devel mailing list