Nexmark Paper

Pete Tucker, Kristin Tufte, Vassilis Papadimos, David Maier. Bottom Line Up Front I rate this 6.2. README.md but formatted in beautiful unreadable dead-tree sized extra-hyphenated LaTeX and a few single-use acronyms (SUA)s to keep it confusing. Summary Intro XMark measures XML format. Presenting Niagara Extension to XMark (NEXMark). Adapting to Streaming EBay scenario. New people registering, new items submitted for auction, bids continuously arriving for items. Static files on disk for category information....

March 14, 2023 · 3 min · 601 words · Amos

Apache Flink Paper

Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, Kostas Tzoumas, presented in the world’s most confusing game of asterisks I’ve seen so far. Bottom Line Up Front I rate this 3.9. Flink is an academic attempt at replacing Spark. I haven’t figured out why. I guess just even higher level/more optimizations? Or maybe I’m late to the party and most of these have spilled into Spark? Flink programs compute both early and approximate and delayed and accurate results in the same operation....

March 8, 2023 · 6 min · 1250 words · Amos

Apache Flink 101

Robert Metzger - GOTO 2019 What is Flink? Low latency, high throughput, stateful, distributed stream processing framework. Stateful Computations over Data Streams You can use this for batch processing, static or historic data in a fast way. Or, you are processing realtime data, processing a stream of data and updating your model of the world. Or, event-driven applications. 3 Use Cases Streaming ETL Traditionally, ETL is a periodic job fired off by cron....

March 3, 2023 · 5 min · 903 words · Amos

Bigtable

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Bottom Line Up Front I rate this Summary Introduction Distributed storage system. PB scale. Very applicable, scalable, performant, and available. 60 projects using it. BigTable is like a database but not relational. Everything is a string, clients need to cope with it. Data Model Sparse, distributed, persistent multi-dimensional sorted map....

March 1, 2023 · 3 min · 543 words · Amos

In Search of an Understandable Consensus Algorithm

Diego Ongaro Bottom Line Up Front I rate this 7.7. Raft is easier than Paxos to understand. Here’s how Raft works. Logs, pick a leader, commit once majority has written to disk. Elections rely on majorities. Write the correct things to disk, use random delays to break ties. Makes me want to read about Chubby. Summary Consensus algorithms allow a collection of machines to work as a coherent group that can survive some failures....

January 19, 2023 · 4 min · 769 words · Amos

Zookeeper Wait Free Coordination

The paper which describes ZooKeeper, a service for coordinating processes of distributed applications. Bottom Line Up Front I rate this 7.2. ZooKeeper is hard to kill, clints think it is FIFO, writes are, reads are fast and loose. Sync allows cool tricks for when reads matter. Thoughts before reading Zookeeper is a distributed tree for small pieces of data and is intended to be the last thing to die during a cluster-wide train-wreck....

January 18, 2023 · 5 min · 906 words · Amos