May 16, 2009


Elliot Turner

An interesting approach! Amazon Turk seems to be increasingly leveraged as a source of machine learning training data, linkages, etc.

Another worthwhile variant of this technique would be to combine Turk-based human annotators with an automated suggestion tool, along the lines of ODDLinker (Used to interlink the LinkedMDB project with dbpedia).

Tools such as ODDLinker that leverage tuple data to generate potential linkages can alleviate much of the human legwork for "obvious linkages", leaving manual disambiguation/lookups for the more ambiguous entries. Combining these sort of tools with workflow systems such as Amazon Turk has the potential to bring the Crunchbase annotation/linkiong costs down significantly.

Maria Grineva

Thanks Elliot! It is a reasonable suggestion to make some more obvious things automatically in order to bring costs down.

Can you give me links to ODDLinker tool?

Enrique Gomez

I wonder if your idea of using human annotation could be used against the twitter data through twitter itself. Consider that a human could add content to a twitter post, especially when referring to a linked article, through the use of a new kind of twitter tag. The content added would raise or lower the relevance according to the user. You could then 'sift' through the twitter data and extract a metric of relevance based on the new tag. The Turk humans are rated for effectiveness and therefore more reliable but in the twitter scenario you would leverage the larger mass of annotators. Hopefully some good data would bubble out of the noise.

I'm still thinking on how the tags would look like and a simple syntax. Something like the RT tag, say ReleVance. One could RV + or RV - if you find the whole post relevant or not. Maybe RV +term if you want to add a term you think applies to the post, RV -term to remove, etc.

Maybe I just need more coffee...

Maria Grineva

I didn't understand what you mean "relevant"? Do you mean, if I post something to Twitter I would rate how is the posted link relevant to me?

