The Web 2.0 era is characterized by large amounts of user-generated content. People started generating and sharing data on Web services like blogs, social networks, Wikipedia, photo sharing sites and other.
Today, with the emergence mobile Internet access, the nature of user-generated content has changed. Now people contribute more often, with smaller posts and the life-span of these posts has become shorter. On Twitter people share short posts on what they are doing now or reading now, they discuss breaking news, share their current location on services like Foursquare or Facebook places.
MapReduce/Hadoop has become the state-of-the-art approach for analytical batch processing of user-generated data. But now, processing data in batches is becoming too slow for real-time sensitive data. Accumulated data can lose its importance in several hours or, even, minutes. Real-time Web brings new requirements for analytical systems: they must aggregate values in real-time, incrementally, as new data arrives. It follows that workloads are more database-intensive because aggregate values are not produced at once, as in batch processing, but stored in a database constantly being updated.
At Systems group @ ETH Zurich, we are working on Triggy - a system for real-time analytics. Our system is based on Cassandra, distributed key value store. You can find an overview of Cassandra's internals in my presentation embedded below and read about its data model here. We extend Cassandra with push-style procedures and with a serialized access to aggregate values. Push-style processing allows us to immediately propagate the data to the analytical computations. Serialization is used to arrange light-weight transactions for consistent updates of counters (aggregate values), as Cassandra initially does not provide any support for transactions.
In Triggy, we implemented programming model similar to MapReduce, but we modified it to support incremental processing.
Here is my presentation about Triggy where I describe its internals and programming model; compare it to similar systems: Yahoo! S4 and Google Percolator; and discuss applications for Triggy. See presenter notes for slides to get more information.
Triggy will be demostrated at VLDB2011: Max Grinev, Maria Grineva, Martin Hentschel and Donald Kossmann: "Analytics for the Real-Time Web"
You know what's a great idea in showing data about analytic? It's through data graphing. Through that, people who really don't have enough time to read the entire article can see glimpse of rising and falling in a real time web.
Posted by: how to find mr. right | December 19, 2012 at 11:39 PM
Imagekind makes things easy by showing you all the sizes that your image can be printed.
Posted by: arizona interior designer | January 30, 2013 at 05:46 PM
The information of this post is very relevant for what i am looking for, thank you so much for sharing this one.
Posted by: wooden beads wholesale | May 12, 2013 at 11:35 PM