This site is an archive; learn more about 8 years of OpenHatch.

[OH-Dev] Further thoughts about wiki spam (still a problem)

Asheesh Laroia lists at asheesh.org
Mon May 14 17:57:02 UTC 2012


Excerpts from Asheesh Laroia's message of Sat May 12 17:20:02 -0400 2012:
> Hey all,
> 
> Some of you might have noticed that we're still getting wiki spam. A week 
> ago, I thought that it would be enough to just monitor 
> Special:RecentChanges and remove things as they pop up, but when spammers 
> do attack the wiki, I find it tedious and error-prone to actually remove 
> the spam. At the moment, I'm researching some tools that might make it 
> easier to react to the vandalism edits (which luckily are not super 
> common; 10-20 per week or so).
> 
> A question: If I can create a git-based command-line-based workflow for 
> identifying and reverting spam, would some person be interested in 
> volunteering to keep up with that?
> 
> More info:
> 
> A look at https://openhatch.org/wiki/Special:RecentChanges suggests that 
> spam comes in waves (which I personally find quite curious). Some other 
> things that seem true, based on looking at the past week's edits:
> 
> * Jessica (jesstess) has been very good about de-vandalizing and 
> protecting pages that relate to the Boston Python Workshop; thank you for 
> quietly doing this work that you really shouldn't have to.
> 
> * Spammers are creating openhatch.org accounts and using those to log into 
> the wiki. (I find that fairly impressive.)
> 
> * There's a form of spam I hadn't much seen before, which is to abuse 
> "Move" repeatedly on the same page. See 9 May 2012 for examples of this. 
> This strategy creates lots of new pages.
> 
> * Much "spam" doesn't contain external links. In my opinion, this is 
> somewhat bizare. See e.g. 
> https://openhatch.org/w/index.php?title=User_talk:207.151.36.229&curid=875&diff=9944&oldid=9935&rcid=9954
> 
> * Some spam removes many sections of a page and replaces them with 
> irrelevant text, for example 
> https://openhatch.org/w/index.php?title=Boston_Python_Workshop_5/Friday/OSX_set_up_Python&curid=486&diff=9936&oldid=8472&rcid=9946C
> 
> Interestingly, so much of this activity is not "link spamming" -- it's 
> just automated vandalism. It could be that the particulars of the text 
> being left behind is a method of using our wiki as a decentralized content 
> store for these bots; I can't think of any other purpose.
> 
> Since most of these edits don't add new links (perhaps because we're 
> already blocking those sorts of spam edits effectively), the existing 
> link-oriented tools are a poor match. What we need is either humans or 
> bots to identify vandalism edits and revert them, and preferably to ban 
> the account/IP that caused them. At this moment, I'm investigating 
> tooling that should make it easier to:
> 
> * Review all edits since a given date
> 
> * Revert the ones that are spammy
> 
> * Block users/IPs that are spamming
> 
> For spammers using OpenHatch accounts, we could go the full route of 
> deleting the account across all OpenHatch sites that user our central 
> login, or we could just block the account in the wiki. For now, it's 
> simplest to automate blocking the account in the wiki.
> 
> I'm particularly intrigued by the idea of doing this all from within 
> 'git', via this package: 
> https://github.com/Bibzball/Git-Mediawiki/wiki/User-manual
> 
> I'm doing a 'git clone' of the wiki now. I will follow up to this thread 
> with more information about if I can make these tools be useful for 
> reviewing and reverting vandals' edits. If so, I can document what I've 
> done for others to see.

As an update: partial progress.

Removal of spam from the wiki with git still seems like a very reasonable
idea to me. Roan Kattouw, who was at the sprint yesterday, thinks it's a
passably reasonable idea, and helped add some features to the MediaWiki
API to make it possible to handle page deletion from the GitMediawiki
layer.

Our wiki is now on his branch, and he also improved the GitMediawiki
integration Perl script.

* GitMediawiki changes: https://github.com/catrope/git/tree/mediawiki-deletionsupport

* MediaWiki changes: https://gerrit.wikimedia.org/r/7572

It doesn't seem to let me delete pages, but it worked for Roan during
the sprint on a test wiki on his laptop, so I sent Roan some sample
credentials and a demo of what error I was getting.

Soon, we'll all be able to 'git push' to the OpenHatch wiki. Moreover,
then we can write instructions to help any project remove spam from
their wikis the same way. (-:

-- Asheesh.


More information about the Devel mailing list