Saving Identica and StatusNet?

Saving Identica and StatusNet?


There is a significant spam problem on Identi.ca, and it looks like some fresh ideas are needed to crush it. Here are mine, and a few ideas that I like from other people.

Identi.ca is the open microblogging site based on the StatusNet software. It's a fantastic service, with features that leave Twitter in the dust. But spammers are not being caught and banned quickly. Users are becoming frustrated.

Marketing accounts, with names like scooterforsale, cheapwidebeltsanders, mastercrafttablesawspowertools, bestbuyfinancialcalculator, ad nauseum, seem to make up more than half of the members of any Identica group. They often don't actually spam the group (though sometimes they do, and eventually get removed), but they make it hard to look through the genuine group members. This is a pity, as the "group" function is one of the features that sets StatusNet apart.

So, this is brainstorm time. Share your ideas in the comments.

Disclaimers: I am not a coder, and it's possible some of these features have been implemented without me finding them.

Anti-spam ideas

If there are more than n group addresses in a notice, prompt a CAPTCHA. Other triggers can be added - e.g. repeated @ notices. (Thanks tamasrepus)

Add a "sandbox" feature. Sandboxing would prevent problematic accounts from sending direct messages, having their messages appear in the public timeline, joining groups, or posting to existing groups if they're a member. Any @ messages would not appear in the target's timeline. The only way to view their posts would be to go to their profile.

If marketing accounts are considered acceptable as long as they don't send direct messages or post off-topic to groups (which seems to be the case), then at least add a "permanent sandbox" option. This is like regular sandbox, but either:

  • allows only silent membership of groups (they can effectively subscribe to the groups like an RSS feed, but can't post and don't appear in the member list.)
  • (much easier option) removes any existing group memberships and prevents new ones.
  • then again, maybe this is a weaselly option and it's better just to ban them.

Use flagging to automatically sandbox spammers. (Based on the suggestion at the Status.Net wiki):

  • CURRENT SETUP: The UserFlag plugin lets users mark any profile on the site as 'flagged', which puts it on a list for site moderators to check out and if possibly block/delete.
  • PROBLEM: It's not working well - very slow, doesn't give good feedback to the person reporting (am I wasting my time?) and it's hard to tell if anybody's watching the flag list.
  • SOLUTION: When n users have flagged the profile (n might be 10, 20, 100), the account in question is automatically sandboxed until @support looks at it.
  • POTENTIAL ISSUE: The system might be used to target a genuine user, for whatever malicious reason. Somehow flagging by new accounts, or accounts which have themselves received flags, should not count towards auto-sandboxing.

Define "trusted users" to be able to perform certain anti-spam actions - maybe their flags would count as x flags, or they could sandbox an account on sight. Obviously to be used with care, the greater the powers are. Start with a simple algorithm based number of followers, number of posts, number of posts to groups, length of time they've been registered, and number of times they've been flagged. (Perhaps, number of times they've been flagged by anyone who has been flagged less than n times.)

A spam-checking bot - I'm sure these exist already, but I'm thinking of one with the power to flag (at "trusted user" level) or even sandbox in extreme cases:

  • The bot can rely on keyword and Bayesian filters and/or the "Bad Behaviour" software - it can be fairly strict since it's only adding them to a list for checking (or at most sandboxing them) rather than auto-blocking them.
  • A separate test, or a separate bot, to filter out the almost-certain bots and sandbox them immediately - e.g. repeated posts with multiple @ or multiple group tags, in a short space of time, along with certain spammy keywords, such that a genuine user would find it hard to trigger it by accident.

A public "maybe-spam" list for all messages that have been flagged at least n times, where dependent on a simple trust metric; or which have been added by a spam-checking bot:

  • This public list becomes a place that trusted users can go and shoot down spam if they feel like it. With the auto-sandboxing for accounts that get repeatedly flagged, this should allow spammers to be quickly removed from view.
  • Add an "unflag" option for "Trusted users", to remove accounts from the list. (This still has to be transparent, so a list of removed accounts should be viewable.)

There are more suggestions on the wiki, and people like antispam and spammer007 will have more ideas.

I'm not wedded to any of these suggestions, but something needs to be done. I think it's too much to expect more than a couple of these suggestions to be implemented soon, but my request to the StatusNet developers is: at the very least, auto-sandbox accounts after a certain number of flags. That one measure will empower the community, give us a quick reduction in spam, and give a quick boost in goodwill towards StatusNet and Identi.ca.

Licensing Notice

This work may be distributed under the terms of the Creative Commons Attribution-ShareAlike License, version 3.0, with attribution to "Chris Watkins, first published in Free Software Magazine".

Notes on the author

Chris Watkins is the co-founder of Appropedia - wiki for sustainability, development, ICT4D.

Category: 

Comments

warbo's picture
Submitted by warbo on

Firstly I would stop using Identi.ca if it required a captcha to send some messages; not through choice, but because I do all of my interaction via XMPP, which is running on my computer all of the time. I never get in the habit of trawling through Web sites individually when I have an instant messenger, an email client and an RSS reader to give me what I'm after on my terms.

Secondly, to me a lot of your ideas could be summed up in three words: PageRank for users. In other words, rather than special-casing so many things, have a general metric to judge users against each other. The way Google's PageRank works is to treat hyperlinks as an endorsement of the target from the source. Each page's rank is the sum of its endorsements each multiplied by that source's own rank. A similar thing could be used to judge the validity of users: group posts, direct messages, @replies, favourites, redents, age of the account, etc. would all count towards a user's validity, whilst negatives like flags, dubious keywords ("sale", "buy", "cheap", etc.) would detract from the validity. Those kicking off the spammers just need to keep an eye on the bottom of the rankings, since spam accounts would either drop right down quickly and be spotted, or else they're not annoying enough people to make it down there.

The advantages are that 1) the worst offenders are most easily found (right at the bottom), 2) accounts being 'bullied' with flags should have enough positives to not drop all of the way down, 3) since everything's weighted on an account's ranking, generating new accounts to fulfill an agenda (advertising, artificially raising or lowering another account's "spamminess", etc.) would be ineffective, since established users (with a much higher confidence level) can overpower such untrusted accounts.

May be a bit of a slog to code, but it would be generic enough to solve many problems in one fell swoop.

Chris Watkins's picture

Excellent point - I completely missed that problem with CAPTCHA. Implementing it in desktop clients would be difficult (e.g. in Pino, Gwibber, Choqok) to impossible (XMPP, I guess).

So, how about sandboxing as a temporary fix, with work on a PageRank for users as a longer term solution?

--
Chris Watkins
Appropedia - wiki for sustainability, development, ICT4D.

Chris Watkins's picture

Thinking again...

CAPTCHA might be useful as a very blunt tool for *very* suspicious posts (e.g. a series of very similar posts, with many tags or many "@", by an account with a low trust metric*). But it would need to have a very low false positive rate, and maybe only be turned on occasionally when spam is out of control. If it's working properly, you as a legitimate user should never see a CAPTCHA.

The CAPTCHA would need to send a message back to those using XMPP or desktop clients, to follow a link to CAPTCHA or wait for x minutes (giving time for previous posts to be tagged on not, and thus assessed as spam or not).

But if it can be avoided, all the better.

--
Chris Watkins
Appropedia.org - wiki for sustainability, development, ICT4D.

joeka's picture
Submitted by joeka on

I've already seen implementations with links to captchas for xmpp MUC group chat rooms.

That was only in addition to xmmp CAPTCHA Forms described here: http://xmpp.org/extensions/xep-0158.html
and implemented by some clients like gajim.

So, with xmpp this is possible and with the increasing popularity of xmpp, more clients will have to implement this feature to prevent spam in different usage scenarios.

Ryan Cartwright's picture

Some good suggestions there Chris. I think there are two kinds of marketing accounts on Identi.ca - real ones and link-monkeys. The real ones I would have thought would not object to a few extra hurdles if they still reach the right people. The link-monkeys seem only interested in posting as many links as possible - probably in some vain Google PageRank effort.

The suggestions you give are reminiscent of greylisting in e-mail (not the challenge response type). That allows genuine corporate senders to pass through while telling the fire-and-forget spammers to take a hike.

Congrats on a good first post and welcome to FSM!

Ryan
--
Equitas IT Solutions - fairness, quality, freedom
http://www.equitasit.co.uk

srf21c's picture
Submitted by srf21c on

Have any of these measures been implemented? Seems like link monkey spam is still a problem.

Author information

Chris Watkins's picture

Biography

Co-founder of Appropedia.org - sharing knowledge for a rich and sustainable world. Supporting efforts in ICT4D (ICT for Development), green computing, and freedom.