« SIGMOD 2010 talk on Sedna XML Database Management System | Main | Analytics for the Real-Time Web »

August 05, 2010

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a011168ef3c10970c013486014691970c

Listed below are links to weblogs that reference Filtering Out Noise In Twitter Lists:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Dmitry Shaporenkov

The work on identifying the main topic is, of course, interesting per se. However, I'd challenge the assumption that "an average Twitter list produces stream of tweets most of which are relevant to the main list's topic". It's completely unclear to me why most of user's tweets should be relevant to the main list's topic. He may as well tweet on the topic just occasionally, and still can be a valuable member of the list.

It seems to me that lists in Twitter are not very well thought thru, and this is probably the reason why they haven't been widely adopted. For "topical lists" (like NoSQL list in your example) I think what is really needed is a combination of user/hash tag she uses to identify the tweets on topic. Which in turn raises the question why we need lists at all, for filtering by a hash tag is supported straight out of the box...

Mariagrineva

Agree, not all Twitter lists are "topical lists", didn't want to put this issue into the post for not to overload it.
People use Twitter lists differently: some of them use it to organize their follower. For example, I have list "zurich" - for people I know who live in Zurich. Of course, such a list does not contain the main topic.

Still, most topical lists have most tweets about the main topic. We set up a simple experiment for 20+ topical lists of different sizes, on different topics: sort the words in tweets by frequency. And the most frequent words of a list always clearly identifies the main topic.

Dmitry Shaporenkov

So does your experiment mean that most of the people included into a topical list tweet mostly about the topic? I find it quite surprising, given the lack of any filtering in Twitter lists.

Dmitry Shaporenkov

Okay, it probably doesn't mean that, would be incorrect to imply that from your experiment's description. Still I don't see why topical lists should exhibit topic coherence (not that I'm very suprised they often *do*), shouldn't be very hard to find counter examples.

Dmitry Shaporenkov

BTW here's one counter example: a list of NLP which I'm a member of: http://twitter.com/zelandiya/nlproc At the time of posting this comment, none of the top 20 posts is NLP-related

Mariagrineva

The first one :). Anyway, I think, if you fetch back ~100 tweets - the word frequencies would should it is NLP-related

Mariagrineva

The experiments mean that even if many of the list's members tweet about the different topics, tweets about the main topic sum up into a distinguishable topic signature. Because other topics are diverse, but the main topic is common to all members, so it dominates in frequencies

Dmitry Shaporenkov

The latter makes perfect sense. Now would be interesting to determine automatically whether a list is a topical one, for a suitable definition of topical. Like "most of time, most of its last N tweets pertain to the topic".

uguest22

"Lists" are simply too much work for most.

And, a "positive filter" as suggested (push certain information) rather than a "negative filter" (remove information) is counter-intuitive for most Users.

We know what we don't want, not necessarily what we do want.

If you can just get rid of all Location check-ins (4square, etc.) the reduction in noise with that alone might make Twitter more valuable. Location check-ins populating a stream are the most invasive "noise" yet, and growing (You are one of the only people I haven't unFollowed due to check-ins; but, give it another week).

Bibek

The dyndns link doesnt work atm

meizitang

I hope i can get the working link here.

Maria Grineva

I am sorry, but the demo has been already shut down

microsoft office 2010

It seems to me that lists in Twitter are not very well thought thru, and this is probably the reason why they haven't been widely adopted. For "topical lists" (like NoSQL list in your example) I think what is really needed is a combination of user/hash tag she uses to identify the tweets on topic. Which in turn raises the question why we need lists at all, for filtering by a hash tag is supported straight out of the box...

microsoft office

If you can just get rid of all Location check-ins (4square, etc.) the reduction in noise with that alone might make Twitter more valuable. Location check-ins populating a stream are the most invasive "noise" yet, and growing (You are one of the only people I haven't unFollowed due to check-ins; but, give it another week).

meizitang

So does your experiment mean that most of the people included into a topical list tweet mostly about the topic? I find it quite surprising, given the lack of any filtering in Twitter lists.

The comments to this entry are closed.

Become a Fan