<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.igvita.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>igvita.com</title>
	
	<link>http://www.igvita.com</link>
	<description>A goal is a dream with a deadline.</description>
	<pubDate>Mon, 01 Mar 2010 19:21:19 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.igvita.com/igvita" /><feedburner:info uri="igvita" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId>igvita</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>Schema-Free MySQL vs NoSQL</title>
		<link>http://feeds.igvita.com/~r/igvita/~3/syizv-mDxYU/</link>
		<comments>http://www.igvita.com/2010/03/01/schema-free-mysql-vs-nosql/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 18:55:02 +0000</pubDate>
		<dc:creator>Ilya Grigorik</dc:creator>
		
		<category><![CDATA[Databases]]></category>

		<category><![CDATA[eventmachine]]></category>

		<category><![CDATA[MySQL]]></category>

		<category><![CDATA[nosql]]></category>

		<guid isPermaLink="false">http://www.igvita.com/?p=1017</guid>
		<description><![CDATA[Amidst the cambrian explosion of alternative database engines (aka, NoSQL) it is almost too easy to lose sight of the fact that the more established solutions, such as relational databases, still have a lot to offer: stable and proven code base, drivers and tools for every conceivable language, and more features than any DBA cares [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" src="http://www.igvita.com/posts/10/cambrian-db.png" style="margin-right: 1em;">Amidst the cambrian explosion of alternative database engines (aka, NoSQL) it is almost too easy to lose sight of the fact that the more established solutions, such as relational databases, still have a lot to offer: stable and proven code base, drivers and tools for every conceivable language, and more features than any DBA cares to learn about. Not to mention that relational or not, they often times perform just as well as any other single instance key-value store when faced with large datasets - hence the reason why <a href="http://riak.basho.com/">Riak</a>, <a href="http://project-voldemort.com/">Voldemort</a> and others use InnoDB as their data stores.  Granted, the “feature bloat” is also the reason why a rewrite can be a good idea, but it also feels like this gray zone is too often overlooked in the NoSQL community - just because you are “NoSQL” does not mean you have to throw away years of work put into relational databases.</p>
<p>Setting aside the fact that we are yet to define what “NoSQL” actually is, some of the attributes that we commonly glob under this label are: document based, schema-free, distributed and “scalable”. The fact that being distributed and being scalable are not one and the same is a subject for another post, instead let’s take a closer look at what schema-free and document-based actually means. In fact, let me jump ahead: I am genuinely surprised that we are yet to see a schema-free engine built on top of MySQL. I know, I know, but suspend you disbelief for a second, because it is not as outrageous as it sounds.</p>
<h4><strong>Document Based: a Double Edged Sword</strong></h4>
<p>The <a href="http://en.wikipedia.org/wiki/Database_normalization#Objectives_of_normalization">original reason</a> for and the benefit of the relational model is that by constraining the data schema (read, eliminating structural complexity of the data, or decomposing it into relations), you actually gain power and flexibility in the types of queries you can execute against your database. Said another way, normalized data design allows us to have a <a href="http://en.wikipedia.org/wiki/Sql">general-purpose query language</a>, which allows for queries whose parameters we do not even know at design time, whereas denormalized designs do not. What we loose in flexibility of our data structures, we gain in our ability to interact with the data. Hence, in theory, if you have no way to anticipate the types of queries in the future, a relation model is your best bet. Lose some, win some, chose your poison.</p>
<p><img align="left" src="http://www.igvita.com/posts/10/nested-data.png" style="margin-right: 1em;">At the same time, we all know that “no join is faster than no join”. The inherent disadvantage of decomposing your data is the required assembly. If you are looking for “speed” or “scalability”, then denormalizing your data is usually the first step. The disadvantage? Now you have introduced a number of potential anomalies into your data: updates, inserts, and deletes can cause data inconsistencies unless you keep careful accounting of all duplication. One-to-One, and One-to-Many relations are usually easy to manage, but Many-to-Many in denormalized schemas are nothing but a recipe for disaster. That is, if you care about <a href="http://en.wikipedia.org/wiki/ACID#Consistency">consistency</a>.</p>
<p>Finally, since you lose the power of a general purpose query language (SQL), you are now at a mercy of the DSL provided by your new database. Mongo, Couch and many others had to introduce their own query language constructs alongside "map-reduce" functionality to address the problem of querying arbitrarily deep records. Now, I am a fan of both, but frankly, none I have worked with so far are as clean, or as easy to understand as SQL (<a href="http://rickosborne.org/download/SQL-to-MongoDB.pdf">case in point</a>) - with the downside of making me learn yet another query language.</p>
<h4><strong>Schema-free != Document Based</strong></h4>
<p>Document based and schema-free are often used interchangeably, but there is an important difference: schema-free does not necessarily imply nested data structures. Likewise, just because MySQL is “relational” does not mean that it must be fixed to a predefined schema - at create time, maybe, but not at runtime. Intersect the two statements, and it means that there is absolutely no reason why we cannot have a schema-free engine in MySQL:</p>
<p><a href="javascript:showme('8997_1');"> <b>> schema-free.sql</b></a>
<div style=" background:white;" id=8997_1>
<pre class="sql">mysql&gt; <span style="color: #993333; font-weight: bold;">USE</span> noschema;
mysql&gt; <span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> widgets;  <span style="color: #808080; font-style: italic;">/* look ma, no schema! */</span>
mysql&gt; <span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> widgets <span style="color: #66cc66;">&#40;</span>id, name<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">VALUES</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;a&quot;</span>, <span style="color: #ff0000;">&quot;apple&quot;</span><span style="color: #66cc66;">&#41;</span>;
mysql&gt; <span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> widgets <span style="color: #66cc66;">&#40;</span>id, name, type<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">VALUES</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;b&quot;</span>, <span style="color: #ff0000;">&quot;blackberry&quot;</span>, <span style="color: #ff0000;">&quot;phone&quot;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
mysql&gt; <span style="color: #993333; font-weight: bold;">SELECT</span> * <span style="color: #993333; font-weight: bold;">FROM</span> widgets <span style="color: #993333; font-weight: bold;">WHERE</span> id = <span style="color: #ff0000;">&quot;a&quot;</span>;
+<span style="color: #808080; font-style: italic;">---------+---------------+</span>
| id      | name          |
+<span style="color: #808080; font-style: italic;">---------+---------------+</span>
| a       | apple         |
+<span style="color: #808080; font-style: italic;">---------+---------------+</span>
&nbsp;
mysql&gt; <span style="color: #993333; font-weight: bold;">SELECT</span> * <span style="color: #993333; font-weight: bold;">FROM</span> widgets;
+<span style="color: #808080; font-style: italic;">---------+---------------+--------+</span>
| id      | name          | type   |
+<span style="color: #808080; font-style: italic;">---------+---------------+--------+</span>
| a       | apple         | <span style="color: #993333; font-weight: bold;">NULL</span>   |
| b       | blackberry    | phone  |
+<span style="color: #808080; font-style: italic;">---------+---------------+--------+</span>
&nbsp;</pre>
</div>
<p>As long as we avoid nested data structures, then there is no reason why we should be limited by the columns defined in our tables because we can compose and decompose any relation at runtime. Not only would this mean no migrations or need to store null values, but you could also keep all the tools, drivers, and the SQL query language while adding the full flexibility of being schema-free. </p>
<h4><strong>Schema-free DB on top of MySQL</strong></h4>
<p>Not able to find any project that would give me this behavior, I ended up prototyping it myself over the weekend, and believe it or not, it works just fine. In fact, the output above is from a real console session with MySQL. All it took is an <a href="http://github.com/igrigorik/em-proxy">em-proxy server</a> with a little low-level protocol and query rewriting, and all of the sudden, my MySQL forgot that it requires a schema. Take it for a test-drive yourself (you will need Ruby 1.9):</p>
<blockquote><p>
git clone git://github.com/igrigorik/em-proxy.git && cd em-proxy<br />
ruby examples/schemaless-mysql/mysql_interceptor.rb<br />
<strong>mysql -h localhost -P 3307 --protocol=tcp</strong>
</p></blockquote>
<p><a href="javascript:showme('8997_2');"> <b>> schema-free-mysql.rb</b></a>
<div style=" background:white;" id=8997_2>
<pre class="ruby"><span style="color:#008000; font-style:italic;"># snip ... </span>
<span style="color:#008000; font-style:italic;"># build the select statements, hide the tables behind each attribute</span>
join = <span style="color:#996600;">&quot;select #{table}.id as id &quot;</span>
tables.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> |column|
  join += <span style="color:#996600;">&quot; , #{table}_#{column}.value as #{column} &quot;</span>
<span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
<span style="color:#008000; font-style:italic;"># add the joins to stich it all together</span>
join += <span style="color:#996600;">&quot; FROM #{table} &quot;</span>
tables.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> |column|
  join += <span style="color:#996600;">&quot; LEFT OUTER JOIN #{table}_#{column} ON #{table}_#{column}.id = #{table}.id &quot;</span>
<span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
join += <span style="color:#996600;">&quot; WHERE #{table}.id = '#{key}' &quot;</span> <span style="color:#9966CC; font-weight:bold;">if</span> key
&nbsp;</pre>
</div>
<p><div class='download-link'>
							<a href='http://www.igvita.com/download.php?file=http://www.github.com/igrigorik/em-proxy/blob/master/examples/schemaless-mysql/mysql_interceptor.rb'><img alt='Download' class='leftalign' src='http://www.igvita.com/wp-content/plugins/dBeautifier/icons/downloads.png' /></a>
							<h4>
								<a href='http://www.igvita.com/download.php?file=http://www.github.com/igrigorik/em-proxy/blob/master/examples/schemaless-mysql/mysql_interceptor.rb'>mysql_interceptor.rb (MySQL Proxy in Ruby)</a>
							</h4><p>Downloads: 122 File Size: 0.0 KB </p>
						</div></p>
<p>Of course, this is nothing but a cute code example nor does it even cover all the different use cases, but let us look at the feature set: driver support for every language (you can point Rails + ActiveRecord, JDBC, etc.  at it out the box, no problem), tool support (GUI and command line), replication that works, basically impossible to corrupt, transactions, and so on. Not bad for half a day of hacking with a simple data model in the background:</p>
<p align="center"><img src="http://www.igvita.com/posts/10/attr-join.png"></p>
<p>Instead of defining columns on a table, each attribute has its own table (new tables are created on the fly), which means that we can add and remove attributes at will. In turn, performing a select simply means joining all of the tables on that individual key. To the client this is completely transparent, and while the proxy server does the actual work, this functionality could be easily extracted into a proper MySQL engine - I’m just surprised that no one has done so already. For a closer look, <a href="http://github.com/igrigorik/em-proxy/blob/master/examples/schemaless-mysql/mysql_interceptor.rb">check out the proxy code itself</a>, there are plenty of comments, which explain how it is all pieced together.</p>
<h4><strong>The gray zone of SQL vs NoSQL</strong></h4>
<p>So what is the point of all this? Well, I hope someone actually writes such an engine, because I believe there is a market for it. There is a lot to be said for a drop in, SQL compatible, schema-free engine, and unlike what the NoSQL propaganda may say, there is absolutely no reason why we can’t have many of the benefits of “NoSQL” within MySQL itself. There is no one clear winner for a database engine or model, so put some thought into your decision up front. Just because Mongo, TC, or Couch are 'document-oriented' or 'schema-free' does not mean they are necessarily better for your application. In the meantime, don't get me wrong, I am still rooting for all the NoSQL projects, as well as have high expectations for Drizzle - they are all doing fantastic work.</p>
<div class="feedflare">
<a href="http://feeds.igvita.com/~ff/igvita?a=syizv-mDxYU:knXSu-0VRkI:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/igvita?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=syizv-mDxYU:knXSu-0VRkI:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/igvita?i=syizv-mDxYU:knXSu-0VRkI:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=syizv-mDxYU:knXSu-0VRkI:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/igvita?i=syizv-mDxYU:knXSu-0VRkI:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=syizv-mDxYU:knXSu-0VRkI:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/igvita?i=syizv-mDxYU:knXSu-0VRkI:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=syizv-mDxYU:knXSu-0VRkI:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/igvita?i=syizv-mDxYU:knXSu-0VRkI:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/igvita/~4/syizv-mDxYU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.igvita.com/2010/03/01/schema-free-mysql-vs-nosql/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.igvita.com/2010/03/01/schema-free-mysql-vs-nosql/</feedburner:origLink></item>
		<item>
		<title>Data Serialization + RPC with Avro &amp; Ruby</title>
		<link>http://feeds.igvita.com/~r/igvita/~3/1Qtp5vuyWWQ/</link>
		<comments>http://www.igvita.com/2010/02/16/data-serialization-rpc-with-avro-ruby/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 18:35:03 +0000</pubDate>
		<dc:creator>Ilya Grigorik</dc:creator>
		
		<category><![CDATA[Ruby]]></category>

		<category><![CDATA[avro]]></category>

		<category><![CDATA[hadoop]]></category>

		<category><![CDATA[rpc]]></category>

		<guid isPermaLink="false">http://www.igvita.com/?p=999</guid>
		<description><![CDATA[Any programmer or project worth their salt needs to invent their own serialization, and if they are serious, an RPC framework - or, at least, that is what it seems like. Between, Protocol Buffers, Thrift, BERT, BSON, or even  plain JSON, there is no shortage of choices and architectural decisions packed into each one. [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" src="http://www.igvita.com/posts/10/avro.png" style="margin-right: 1em;">Any programmer or project worth their salt needs to invent their own serialization, and if they are serious, an RPC framework - or, at least, that is what it seems like. Between, <a href="http://code.google.com/p/protobuf/">Protocol Buffers</a>, <a href="http://developers.facebook.com/thrift/">Thrift</a>, <a href="http://bert-rpc.org/">BERT</a>, <a href="http://www.mongodb.org/display/DOCS/BSON">BSON</a>, or even  plain JSON, there is no shortage of choices and architectural decisions packed into each one. For that reason, when Doug Cutting (one of the lead developers on Hadoop) first proposed <a href="http://markmail.org/message/7cgrwoc4er4mr3bp">Avro in April of 2009</a>, a healthy dose of skepticism was in order, after all, both Thrift and PB already had thriving communities - why reinvent the wheel? Having said that, the proposal passed and since then Avro has been making good progress. </p>
<p>Reviewing the <a href="http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking">latest benchmarks</a> shows Avro as fully competitive in both speed and size of the output data to PB and Thrift. Though, neither speed nor size, while critical components, were the motivating reasons for Avro. Interestingly enough, Avro was designed by Doug Cutting with the goal of making it more friendly to dynamic environments (Python, Ruby, Pig, Hive, etc) where code generation is often an unnecessary and an unwanted step. Unlike PB, or Thrift, Avro stores its schema in plain JSON, as part of the output file, which makes it extremely easy to parse (JSON parsers are easy and abundant) and avoid the need for extra IDL definition stubs and compilers (though if you really want to, Avro can generate code stubs as well).</p>
<h4><strong>Embedding IDL with Avro</strong></h4>
<p>The decision to embed the Avro data schema alongside the binary packed data opens up a number of interesting use cases. First, dynamic frameworks such as Pig, Hive, or any other Hadoop infrastructure (the goal of Avro is to become the standard data exchange and RPC protocol for all of Hadoop), can load and process data on the fly, without looking for or invoking an IDL compiler. Additionally, having the original schema also allow us to do “data projection”: if the reader is only interested in a subset of the data, then it can selectively parse it out of the stream, allowing for faster processing and easy "versioning" support out of the box. Let’s take a look at a simple example:</p>
<p><a href="javascript:showme('2081_1');"> <b>> avro-writer.rb</b></a>
<div style=" background:white;" id=2081_1>
<pre class="ruby">SCHEMA = &lt;&lt;-JSON
<span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#996600;">&quot;type&quot;</span>: <span style="color:#996600;">&quot;record&quot;</span>,
  <span style="color:#996600;">&quot;name&quot;</span>: <span style="color:#996600;">&quot;User&quot;</span>,
  <span style="color:#996600;">&quot;fields&quot;</span> : <span style="color:#006600; font-weight:bold;">&#91;</span>
    <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#996600;">&quot;name&quot;</span>: <span style="color:#996600;">&quot;username&quot;</span>, <span style="color:#996600;">&quot;type&quot;</span>: <span style="color:#996600;">&quot;string&quot;</span><span style="color:#006600; font-weight:bold;">&#125;</span>,
    <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#996600;">&quot;name&quot;</span>: <span style="color:#996600;">&quot;age&quot;</span>, <span style="color:#996600;">&quot;type&quot;</span>: <span style="color:#996600;">&quot;int&quot;</span><span style="color:#006600; font-weight:bold;">&#125;</span>,
    <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#996600;">&quot;name&quot;</span>: <span style="color:#996600;">&quot;verified&quot;</span>, <span style="color:#996600;">&quot;type&quot;</span>: <span style="color:#996600;">&quot;boolean&quot;</span>, <span style="color:#996600;">&quot;default&quot;</span>: <span style="color:#996600;">&quot;false&quot;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
  <span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
JSON
&nbsp;
file = <span style="color:#CC00FF; font-weight:bold;">File</span>.<span style="color:#CC0066; font-weight:bold;">open</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'data.avr'</span>, <span style="color:#996600;">'wb'</span><span style="color:#006600; font-weight:bold;">&#41;</span>
schema = <span style="color:#6666ff; font-weight:bold;">Avro::Schema</span>.<span style="color:#9900CC;">parse</span><span style="color:#006600; font-weight:bold;">&#40;</span>SCHEMA<span style="color:#006600; font-weight:bold;">&#41;</span>
writer = <span style="color:#6666ff; font-weight:bold;">Avro::<span style="color:#CC00FF; font-weight:bold;">IO</span>::DatumWriter</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>schema<span style="color:#006600; font-weight:bold;">&#41;</span>
dw = <span style="color:#6666ff; font-weight:bold;">Avro::DataFile::Writer</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>file, writer, schema<span style="color:#006600; font-weight:bold;">&#41;</span>
dw &lt;&lt; <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#996600;">&quot;username&quot;</span> =&gt; <span style="color:#996600;">&quot;john&quot;</span>, <span style="color:#996600;">&quot;age&quot;</span> =&gt; <span style="color:#006666;">25</span>, <span style="color:#996600;">&quot;verified&quot;</span> =&gt; <span style="color:#0000FF; font-weight:bold;">true</span><span style="color:#006600; font-weight:bold;">&#125;</span>
dw &lt;&lt; <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#996600;">&quot;username&quot;</span> =&gt; <span style="color:#996600;">&quot;ryan&quot;</span>, <span style="color:#996600;">&quot;age&quot;</span> =&gt; <span style="color:#006666;">23</span>, <span style="color:#996600;">&quot;verified&quot;</span> =&gt; <span style="color:#0000FF; font-weight:bold;">false</span><span style="color:#006600; font-weight:bold;">&#125;</span>
dw.<span style="color:#9900CC;">close</span>
&nbsp;</pre>
</div>
<p><a href="http://hadoop.apache.org/avro/docs/1.2.0/spec.html">Avro specification</a> provides all the primitives types you would expect (string, bool, double, etc.), and also a number of <a href="http://hadoop.apache.org/avro/docs/1.2.0/spec.html#schema_complex">complex types</a> such as records, enums, arrays, maps, unions, and fixed. Also, default values and sort order can be applied for some of the types. The schema itself is a JSON document, which you can peek at in the header of any serialized Avro file, which means that when it comes to reading the data, we don’t need to know anything about the data itself, or alternatively, only read an available subset:</p>
<p><a href="javascript:showme('2081_2');"> <b>> avro-reader.rb</b></a>
<div style=" background:white;" id=2081_2>
<pre class="ruby"><span style="color:#008000; font-style:italic;"># read all data from avro file </span>
file = <span style="color:#CC00FF; font-weight:bold;">File</span>.<span style="color:#CC0066; font-weight:bold;">open</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'data.avr'</span>, <span style="color:#996600;">'r+'</span><span style="color:#006600; font-weight:bold;">&#41;</span>
dr = <span style="color:#6666ff; font-weight:bold;">Avro::DataFile::Reader</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>file, <span style="color:#6666ff; font-weight:bold;">Avro::<span style="color:#CC00FF; font-weight:bold;">IO</span>::DatumReader</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#41;</span>
dr.<span style="color:#9900CC;">each</span> <span style="color:#006600; font-weight:bold;">&#123;</span> |record| <span style="color:#CC0066; font-weight:bold;">p</span> record <span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;
<span style="color:#008000; font-style:italic;"># extract the username only from the avro serialized file</span>
READER_SCHEMA = &lt;&lt;-JSON
<span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#996600;">&quot;type&quot;</span>: <span style="color:#996600;">&quot;record&quot;</span>,
  <span style="color:#996600;">&quot;name&quot;</span>: <span style="color:#996600;">&quot;User&quot;</span>,
  <span style="color:#996600;">&quot;fields&quot;</span> : <span style="color:#006600; font-weight:bold;">&#91;</span>
    <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#996600;">&quot;name&quot;</span>: <span style="color:#996600;">&quot;username&quot;</span>, <span style="color:#996600;">&quot;type&quot;</span>: <span style="color:#996600;">&quot;string&quot;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
 <span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
JSON
&nbsp;
reader = <span style="color:#6666ff; font-weight:bold;">Avro::<span style="color:#CC00FF; font-weight:bold;">IO</span>::DatumReader</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#0000FF; font-weight:bold;">nil</span>, <span style="color:#6666ff; font-weight:bold;">Avro::Schema</span>.<span style="color:#9900CC;">parse</span><span style="color:#006600; font-weight:bold;">&#40;</span>READER_SCHEMA<span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
dr = <span style="color:#6666ff; font-weight:bold;">Avro::DataFile::Reader</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>file, reader<span style="color:#006600; font-weight:bold;">&#41;</span>
dr.<span style="color:#9900CC;">each</span> <span style="color:#006600; font-weight:bold;">&#123;</span> |record| <span style="color:#CC0066; font-weight:bold;">p</span> record <span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;</pre>
</div>
<h4><strong>RPC with Ruby and Avro</strong></h4>
<p>The RPC piece of Avro is also pretty straight forward: the protocol is defined as an Avro schema, where both the inputs and the methods (along side with the request / response input and output parameters) are provided inline. In principle, this also means that given the right framework, different clients could easily negotiate different data-formatting on the fly, without having to worry about versioning or conditional code paths. A simple Mail protocol with Avro:</p>
<p><a href="javascript:showme('2081_3');"> <b>> mail-protocol.js</b></a>
<div style=" background:white;" id=2081_3>
<pre class="javascript"><span style="color: #66cc66;">&#123;</span>
  <span style="color: #3366CC;">&quot;namespace&quot;</span>: <span style="color: #3366CC;">&quot;example.proto&quot;</span>,
  <span style="color: #3366CC;">&quot;protocol&quot;</span>: <span style="color: #3366CC;">&quot;Mail&quot;</span>,
&nbsp;
  <span style="color: #3366CC;">&quot;types&quot;</span>: <span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#123;</span><span style="color: #3366CC;">&quot;name&quot;</span>: <span style="color: #3366CC;">&quot;Message&quot;</span>, <span style="color: #3366CC;">&quot;type&quot;</span>: <span style="color: #3366CC;">&quot;record&quot;</span>, <span style="color: #3366CC;">&quot;fields&quot;</span>: <span style="color: #66cc66;">&#91;</span>
        <span style="color: #66cc66;">&#123;</span><span style="color: #3366CC;">&quot;name&quot;</span>: <span style="color: #3366CC;">&quot;to&quot;</span>, <span style="color: #3366CC;">&quot;type&quot;</span>: <span style="color: #3366CC;">&quot;string&quot;</span><span style="color: #66cc66;">&#125;</span>,
        <span style="color: #66cc66;">&#123;</span><span style="color: #3366CC;">&quot;name&quot;</span>: <span style="color: #3366CC;">&quot;from&quot;</span>, <span style="color: #3366CC;">&quot;type&quot;</span>: <span style="color: #3366CC;">&quot;string&quot;</span><span style="color: #66cc66;">&#125;</span>,
        <span style="color: #66cc66;">&#123;</span><span style="color: #3366CC;">&quot;name&quot;</span>: <span style="color: #3366CC;">&quot;body&quot;</span>, <span style="color: #3366CC;">&quot;type&quot;</span>: <span style="color: #3366CC;">&quot;string&quot;</span><span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#93;</span>
       <span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#93;</span>,
&nbsp;
   <span style="color: #3366CC;">&quot;messages&quot;</span>: <span style="color: #66cc66;">&#123;</span>
      <span style="color: #3366CC;">&quot;replay&quot;</span>: <span style="color: #66cc66;">&#123;</span> <span style="color: #3366CC;">&quot;response&quot;</span>: <span style="color: #3366CC;">&quot;string&quot;</span>, <span style="color: #3366CC;">&quot;request&quot;</span>: <span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">&#125;</span>,
      <span style="color: #3366CC;">&quot;send&quot;</span>: <span style="color: #66cc66;">&#123;</span> <span style="color: #3366CC;">&quot;response&quot;</span>: <span style="color: #3366CC;">&quot;string&quot;</span>, <span style="color: #3366CC;">&quot;request&quot;</span>: <span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#123;</span><span style="color: #3366CC;">&quot;name&quot;</span>: <span style="color: #3366CC;">&quot;message&quot;</span>, <span style="color: #3366CC;">&quot;type&quot;</span>: <span style="color: #3366CC;">&quot;Message&quot;</span><span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">&#125;</span>
  <span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span>
&nbsp;</pre>
</div>
<p>Here we defined a "Mail" protocol, which takes as input a record of type “Message”, which in turn, includes three strings: to, from, and body. Additionally, we defined two available methods (send and replay), which our server and client can use, as well as, their input and output parameters. Check out the full Ruby implementations of the <a href="http://github.com/apache/avro/blob/trunk/lang/ruby/test/sample_ipc_server.rb">server</a> and <a href="http://github.com/apache/avro/blob/trunk/lang/ruby/test/sample_ipc_client.rb">client</a> in the repo.</p>
<h4><strong>Avro, Ruby and Hadoop</strong></h4>
<p>The <a href="http://github.com/apache/avro/tree/trunk/lang/ruby">Ruby implementation of Avro</a> still needs a lot of love and polish (it currently has a distinct Python smell to it), but given the growing adoption of Hadoop and the rising popularity of all the adjacent frameworks, Avro is definitely here to stay and for good reasons. So, before you invent another serialization framework, grab the source, build the gem, and give it a try.</p>
<div class="feedflare">
<a href="http://feeds.igvita.com/~ff/igvita?a=1Qtp5vuyWWQ:_ZcgTNTeg3M:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/igvita?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=1Qtp5vuyWWQ:_ZcgTNTeg3M:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/igvita?i=1Qtp5vuyWWQ:_ZcgTNTeg3M:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=1Qtp5vuyWWQ:_ZcgTNTeg3M:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/igvita?i=1Qtp5vuyWWQ:_ZcgTNTeg3M:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=1Qtp5vuyWWQ:_ZcgTNTeg3M:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/igvita?i=1Qtp5vuyWWQ:_ZcgTNTeg3M:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=1Qtp5vuyWWQ:_ZcgTNTeg3M:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/igvita?i=1Qtp5vuyWWQ:_ZcgTNTeg3M:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/igvita/~4/1Qtp5vuyWWQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.igvita.com/2010/02/16/data-serialization-rpc-with-avro-ruby/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.igvita.com/2010/02/16/data-serialization-rpc-with-avro-ruby/</feedburner:origLink></item>
		<item>
		<title>Cluster Monitoring with Ganglia &amp; Ruby</title>
		<link>http://feeds.igvita.com/~r/igvita/~3/7B43H-gIsz8/</link>
		<comments>http://www.igvita.com/2010/01/28/cluster-monitoring-with-ganglia-ruby/#comments</comments>
		<pubDate>Thu, 28 Jan 2010 17:16:56 +0000</pubDate>
		<dc:creator>Ilya Grigorik</dc:creator>
		
		<category><![CDATA[Architecture]]></category>

		<category><![CDATA[Monitoring]]></category>

		<category><![CDATA[cloud]]></category>

		<category><![CDATA[ganglia]]></category>

		<guid isPermaLink="false">http://www.igvita.com/?p=973</guid>
		<description><![CDATA[A good monitoring solution can make or break an entire service - a well implemented one will enable you to forecast and plan ahead, as well as, quickly spot and debug problems when they arise. However, anyone that has worked with a cluster of machines will know that this is also a non-trivial problem. There [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" style="margin-right: 1em;" src="http://www.igvita.com/posts/10/ganglia.png">A good monitoring solution can make or break an entire service - a well implemented one will enable you to forecast and plan ahead, as well as, quickly spot and debug problems when they arise. However, anyone that has worked with a cluster of machines will know that this is also a non-trivial problem. There are a number of options, both open-source and commercial, and they span a variety of use cases: alerting and notification (<a href="http://www.nagios.org/">Nagios</a>), intrusion detection (<a href="http://www.snort.org/">SNORT</a>), performance monitoring (<a href="http://ganglia.sourceforge.net/">Ganglia</a>, <a href="http://www.cacti.net/">Cacti</a>, <a href="http://scoutapp.com/">Scout</a>, etc), or even customized systems such as <a href="http://www.mysql.com/products/enterprise/monitor.html">MySQL Monitor</a> for in-depth database analysis.</p>
<p>Chances are, you will have to run a mix of all of the above to cover all the cases (ex: <a href="http://nagios.wikimedia.org/">WikiMedia's Nagios</a> & <a href="http://ganglia.wikimedia.org/">Ganglia</a>) since there is no all-in-one solution and each has its tradeoffs. On that note, for performance monitoring, Ganglia is definitely an option to explore (it seems that I tried everything but Ganglia first, and I wish I did so much earlier). Originally developed at University of California, it is an open-source project designed from the ground up to be a distributed monitoring system for high-performance system such as clusters and grids - let's see what that means.</p>
<p align="center"><a href="http://ganglia.wikimedia.org"><img style="border:1px solid #ccc; padding 2px;" src="http://www.igvita.com/posts/10/ganglia-preview.png"></a></p>
<h4><strong>Ganglia's Distributed Architecture</strong></h4>
<p>Ganglia is powered by three independent components: <a href="http://sourceforge.net/apps/trac/ganglia/wiki/Ganglia%203.1.x%20Installation%20and%20Configuration#gmond_configuration">gmond</a>, <a href="http://sourceforge.net/apps/trac/ganglia/wiki/Ganglia%203.1.x%20Installation%20and%20Configuration#gmetad_configuration">gmetad</a> and a <a href="http://sourceforge.net/apps/trac/ganglia/wiki/Ganglia%203.1.x%20Installation%20and%20Configuration#php_web_frontend_configuration">PHP frontend</a>. Due to how the system is architected, all three could either run on the same host, or more likely, be distributed between a number of different nodes. Gmond is the workhorse responsible for gathering user specified stats and sharing them over the network: a gmond daemon runs on every monitored node. It is designed to be fast, portable, with low memory footprint, and comes with a number of native monitoring modules (disk, memory, network, etc). Importantly, the gmond daemon never actually persists any data (memory only) to optimize for speed. But that's not all, because it can also receive data from other gmond's, allowing us to build arbitrary hierarchies of nodes - this is how and why Ganglia is capable of scaling to thousands of nodes.</p>
<p align="center"><img src="http://www.igvita.com/posts/10/ganglia-architecture.png"></p>
<p>Unlike other monitoring solutions such as Nagios or Cacti, all of the Ganglia metrics rely on data push - there is absolutely no polling involved. The gmond daemons are all responsible for periodically gathering and distributing their stats upstream. However, the gmond's do not persist data, and that is where the <em>gmetad</em> daemon comes in. This daemon is responsible for collecting data from an arbitrary number of gmond's, or even other gmetad daemons, persisting the metrics into correct <a href="http://en.wikipedia.org/wiki/RRDtool">RRD</a> (round robin database) files, and then making this data available to the PHP frontend (or any other service that consumes RRD's).</p>
<h4><strong>Distributed Monitoring & Custom Metrics</strong></h4>
<p><a href="http://ganglia.wikimedia.org/"><img align="left" style="margin-right: 1em;" src="http://www.igvita.com/posts/10/wikimedia-ganglia.png"></a> <a href="http://ganglia.wikimedia.org/">Wikimedia's Ganglia</a> setup is a good example to dissect: it consists of two "clusters" (Florida and Kennisnet), each most likely running their own gmetad node, which is then monitored by a central gmetad node which stitches them together. Each cluster, in turn, has its own hierarchy of gmond nodes: squids, apaches, databases, and so on. All together, it is monitoring over 350 nodes without breaking a sweat - not bad. Alas, one gotcha to be aware of upfront: run your <a href="http://www.joyent.com/joyeurblog/2008/04/24/dtrace-mysql-ganglia-and-digging-for-solutions/">gmetad's off a ram-disk to avoid the IO bottleneck</a> associated with updating hundreds of RRD's. </p>
<p>With default configuration the gmond daemon will automatically monitor over 20 core metrics: load, network in and out, disk, memory and so on. However, it is also easily extensible via several mechanisms: custom monitoring modules, or straight up <em>gmetric</em> command line client. As of version 3.1.0, Ganglia now offers a simple <a href="http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_gmond_python_modules">Python API</a> which allows us to create custom metrics modules which will be integrated directly into the gmond process. Hence the gmond process will call the code, gather the metrics and then distribute the data for us. A great working example is <a href="http://g.raphaelli.com/2009/1/5/ganglia-mysql-metrics">Gilad Raphaelli's MySQL extension</a>, which gathers over 50 metrics about your database, and <a href="http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_gmond_python_modules">creating your own</a> is also pretty straightforward. </p>
<p>However, if writing a python script is too heavy weight, Ganglia also provides the '<a href="http://ganglia.sourceforge.net/gmetric/">gmetric</a>' executable which you can call right from the command line. Give the metric a name, type and a value, and you are off to the races. In other words, you can use bash, Ruby, or anything that will execute from the command line, in combination with gmetric, to funnel data into Ganglia:</p>
<blockquote><p>
# submit "variable_name" which is a string, with value of "hello", remove this var after 600s<br />
gmetric -n variable_name -t string -v "hello" -d 600</p>
<p># submit "int_var", which is an 8-bit int, with value of 20, remove this var after 60s<br />
gmetric -n int_var -t uint8 -v 20 -d 60
</p></blockquote>
<h4><strong>Connecting Ganglia With Ruby</strong></h4>
<p>One of the key advantages of Ganglia over Cacti or similar services is that there is no per variable data source setup. Once a metric is pushed to a gmetad node, it automatically gets its own RRD file, and appears on your dashboard - no configuration required, just bring up a new service, push the data and you're rocking! Now, if you want to monitor a Ruby process, you have a couple of alternatives: write a python module, or shell out to gmetric. Neither is optimal because while the first method requires more (Python) code and extra polling, the system exec option can also be prohibitively expensive in terms of performance. Thankfully, Ganglia uses a simple UDP protocol with <a href="http://en.wikipedia.org/wiki/External_Data_Representation">XDR</a> data formatting which means that with a little reverse-engineering, we can talk to our gmond process directly from Ruby (by pretending that it is gmetric generating the packets):</p>
<p><a href="javascript:showme('9462_1');"> <b>> gmetric.rb</b></a>
<div style=" background:white;" id=9462_1>
<pre class="ruby"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'gmetric'</span>
&nbsp;
<span style="color:#008000; font-style:italic;"># generate metric from Ruby and send it over UDP</span>
<span style="color:#6666ff; font-weight:bold;">Ganglia::GMetric</span>.<span style="color:#9900CC;">send</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;127.0.0.1&quot;</span>, <span style="color:#006666;">8670</span>, <span style="color:#006600; font-weight:bold;">&#123;</span>
  <span style="color:#ff3333; font-weight:bold;">:name</span> =&gt; <span style="color:#996600;">'pageviews'</span>,
  <span style="color:#ff3333; font-weight:bold;">:units</span> =&gt; <span style="color:#996600;">'req/min'</span>,
  <span style="color:#ff3333; font-weight:bold;">:type</span> =&gt; <span style="color:#996600;">'uint16'</span>,     <span style="color:#008000; font-style:italic;"># unsigned 8-bit int </span>
  <span style="color:#ff3333; font-weight:bold;">:value</span> =&gt; <span style="color:#006666;">7000</span>,       <span style="color:#008000; font-style:italic;"># value of metric</span>
  <span style="color:#ff3333; font-weight:bold;">:tmax</span> =&gt; <span style="color:#006666;">60</span>,          <span style="color:#008000; font-style:italic;"># maximum time in seconds between gmetric calls</span>
  <span style="color:#ff3333; font-weight:bold;">:dmax</span> =&gt; <span style="color:#006666;">300</span>          <span style="color:#008000; font-style:italic;"># lifetime in seconds of this metric</span>
<span style="color:#006600; font-weight:bold;">&#125;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
&nbsp;</pre>
</div>
<p><div class='download-link'>
							<a href='http://www.igvita.com/download.php?file=http://www.github.com/igrigorik/gmetric/tree/master/.git'><img alt='Download' class='leftalign' src='http://www.igvita.com/wp-content/plugins/dBeautifier/icons/github.png' /></a>
							<h4>
								<a href='http://www.igvita.com/download.php?file=http://www.github.com/igrigorik/gmetric/tree/master/.git'>gmetric (GMetric Ruby Library)</a>
							</h4><p>Downloads: 174 File Size: 0.0 KB </p>
						</div></p>
<p>Install the <strong>gmetric gem</strong>, specify the hostname and port number of your gmond daemon and fire off your metrics via UDP directly into the monitoring system. This way, you can track latency, request rates, throughput, or any other application metric directly from Ruby and push it into Ganglia without any performance penalties.</p>
<p>Finally, once you have pushed a dozen new metrics, you can then also generate your own cumulative reports, which aggregate data from multiple sources (<a href="http://static.g.raphaelli.com/contrib/code/ganglia/mysql_query_report.php">key database metrics</a>, etc). Ganglia is an incredibly flexible platform, and the 3.1.x release has done a lot to improve the ability to customize and extend it to fit your applications. If you haven't already, definitely a tool to look into for your cloud applications.</p>
<div class="feedflare">
<a href="http://feeds.igvita.com/~ff/igvita?a=7B43H-gIsz8:N0cSdaTxmeY:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/igvita?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=7B43H-gIsz8:N0cSdaTxmeY:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/igvita?i=7B43H-gIsz8:N0cSdaTxmeY:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=7B43H-gIsz8:N0cSdaTxmeY:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/igvita?i=7B43H-gIsz8:N0cSdaTxmeY:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=7B43H-gIsz8:N0cSdaTxmeY:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/igvita?i=7B43H-gIsz8:N0cSdaTxmeY:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=7B43H-gIsz8:N0cSdaTxmeY:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/igvita?i=7B43H-gIsz8:N0cSdaTxmeY:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/igvita/~4/7B43H-gIsz8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.igvita.com/2010/01/28/cluster-monitoring-with-ganglia-ruby/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.igvita.com/2010/01/28/cluster-monitoring-with-ganglia-ruby/</feedburner:origLink></item>
		<item>
		<title>Distributed Ruby with the MagLev VM</title>
		<link>http://feeds.igvita.com/~r/igvita/~3/FYnogKYoAgo/</link>
		<comments>http://www.igvita.com/2010/01/15/distributed-ruby-with-the-maglev-vm/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 17:02:37 +0000</pubDate>
		<dc:creator>Ilya Grigorik</dc:creator>
		
		<category><![CDATA[Ruby]]></category>

		<category><![CDATA[maglev]]></category>

		<category><![CDATA[smalltalk]]></category>

		<guid isPermaLink="false">http://www.igvita.com/?p=947</guid>
		<description><![CDATA[GemStone team made a splash with MagLev at RailsConf '08 where they attracted a fair dose of attention from the attendees. Based on an existing GemStone/Smalltalk VM, it promised a lot of inherent advantages: 64-bit, JIT, years of VM optimizations, and built-in persistence and distribution layers. Since then the team has been making steady progress, [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" src="http://www.igvita.com/posts/09/maglev-logo.png" style="margin-right: 1em;"/>GemStone team made a <a href="http://tech.slashdot.org/article.pl?sid=08/05/31/2316215">splash</a> with <a href="http://github.com/MagLev/maglev">MagLev</a> at RailsConf '08 where they attracted a fair dose of attention from the attendees. Based on an existing GemStone/Smalltalk VM, it promised a lot of inherent advantages: 64-bit, JIT, years of VM optimizations, and built-in persistence and distribution layers. Since then the team has been making steady progress, which recently resulted in the announcement of the first <a href="http://groups.google.com/group/maglev-discussion/browse_thread/thread/1102993e9e21492a">public alpha</a>. In fact, the project appears to be on track for 1.0 status later this year, alongside with <a href="http://www.igvita.com/2009/11/20/state-of-ruby-vms-ruby-renaissance/">IronRuby, MacRuby, and Rubinius</a>.</p>
<p>However, while the initial focus centered around the potential speed improvements offered by the VM, it is the persistence and distribution aspects of the runtime which make it stand out - if it happens to be faster, so much the better. Based on the Smalltalk VM, it offers integrated persistence (with <a href="http://en.wikipedia.org/wiki/ACID">ACID semantics</a>) and distribution. In other words, you can treat MagLev as a distributed database that is capable of running Ruby code and storing native Ruby bytecode  internally. Now that's a mouthful, let's see what it actually means.</p>
<h4><strong>MagLev VM: Features & Limitations</strong></h4>
<p>The goal of the GemStone team is to write as much of MagLev as possible in Ruby (the standard libraries, the parser, etc), which has already resulted in some good collaboration and synergies with the Rubinius project. As of the first public beta release, the project passes over 27,900 RubySpecs, features a pure ruby parser (slightly modified fork of <a href="http://parsetree.rubyforge.org/">ruby_parser</a>), and runs RubyGems 1.3.5 out of the box. Popular gems such as rack, sinatra, and minitest all run unmodified, and there is even work on FFI support for C and Smalltalk extensions.</p>
<p><img align="left" src="http://www.igvita.com/posts/09/ruby-logo.png" style="margin-right: 1em;"/>The end goal is full RubySpec compatibility, support for Ruby 1.9, and of course, running Rails - a stripped down version was demoed at RailsConf '09, but more work still needs to be done to make it fully compatible. The VM also ships with a <a href="http://github.com/MagLev/maglev/tree/master/examples/mysql/">MySQL driver</a>, which means that you can use MagLev as any other Ruby runtime to power your applications, or, you could leverage the built-in persistence API's. MagLev has a distinctly different VM architecture which allows it to persist and share both code and data between multiple runtimes and execution cycles, all through a straight-forward Ruby API! Incidentally, this is also the reason for lack of support of several ObjectSpace methods (garbage_collect, each_object), as the enumeration could potentially mean retrieving gigabytes of persistent objects. </p>
<p>To get started, install <a href="http://groups.google.com/group/maglev-discussion/browse_thread/thread/71e470419cafbf9d">MagLev via RVM</a>, or follow the simple <a href="http://github.com/MagLev/maglev/blob/master/README.rdoc">instructions on the wiki</a>.</p>
<h4><strong>MagLev VM Architecture</strong></h4>
<p align="center"><img src="http://www.igvita.com/posts/10/maglev-architecture.png"/></p>
<p>The first thing you will notice about working with MagLev is that before you can run the interpreter, you will have to launch the MagLev service itself (<em>maglev start</em>). Turns out, unlike other Ruby VM's, all of the core Ruby classes, and all other persisted code and data actually lives in a separate "stone" process. The VM's ("gems"), connect to the stone and retrieve all of their data from this service. Ruby classes are stored as <a href="http://maglevity.wordpress.com/2009/12/03/vms-repository-maglev/">bytecode in the stone server</a>, which is transported via shared memory for local connections, and via optimized binary protocol for remote connections, to the local interpreter and then compiled down to native machine code. This is how object persistence is made possible in MagLev: the stone server is a standalone process that acts as a database for your Ruby bytecode!</p>
<p>The added advantage is that the stone server supports full ACID semantics, which means that multiple processes can interact with the same repository and share state, objects, and code. A simple example of sharing data between multiple runs:</p>
<p><a href="javascript:showme('938_1');"> <b>> maglev-data.rb</b></a>
<div style=" background:white;" id=938_1>
<pre class="ruby"> <span style="color:#008000; font-style:italic;"># persist a string in the stone server</span>
 <span style="color:#6666ff; font-weight:bold;">Maglev::PERSISTENT_ROOT</span><span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#ff3333; font-weight:bold;">:hello</span><span style="color:#006600; font-weight:bold;">&#93;</span> = <span style="color:#996600;">&quot;world&quot;</span>
 Maglev.<span style="color:#9900CC;">commit_transaction</span>
&nbsp;
 $ maglev-ruby -e <span style="color:#996600;">'p Maglev::MAGLEV_PERSISTENT_ROOT[:hello]'</span>
 &gt; <span style="color:#996600;">&quot;world&quot;</span>
&nbsp;</pre>
</div>
<p>That covers a simple key-value example, but Maglev is also capable of transparently persisting entire object graphs without any <a href="http://maglevity.wordpress.com/">data-modeling impedance mismatch:</p>
<p><a href="javascript:showme('938_2');"> <b>> maglev-persist.rb</b></a>
<div style=" background:white;" id=938_2>
<pre class="ruby">graph_node =&lt;&lt;-EOS
  <span style="color:#9966CC; font-weight:bold;">class</span> Graph
    <span style="color:#9966CC; font-weight:bold;">def</span> initialize; <span style="color:#0066ff; font-weight:bold;">@nodes</span> = <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006600; font-weight:bold;">&#93;</span>; <span style="color:#9966CC; font-weight:bold;">end</span>
    <span style="color:#9966CC; font-weight:bold;">def</span> push<span style="color:#006600; font-weight:bold;">&#40;</span>node<span style="color:#006600; font-weight:bold;">&#41;</span>; <span style="color:#0066ff; font-weight:bold;">@nodes</span>.<span style="color:#9900CC;">push</span> node; <span style="color:#9966CC; font-weight:bold;">end</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
  <span style="color:#9966CC; font-weight:bold;">class</span> Node; <span style="color:#9966CC; font-weight:bold;">end</span>
EOS
&nbsp;
<span style="color:#008000; font-style:italic;"># commit Graph class's bytecode into stone server</span>
<span style="color:#008000; font-style:italic;"># - can also load external file: load 'class.rb'</span>
Maglev.<span style="color:#9900CC;">persistent</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#CC0066; font-weight:bold;">eval</span> graph_node <span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;
<span style="color:#008000; font-style:italic;"># build a simple in memory graph</span>
g = Graph.<span style="color:#9900CC;">new</span>
g.<span style="color:#9900CC;">push</span> Node.<span style="color:#9900CC;">new</span>
g.<span style="color:#9900CC;">push</span> Node.<span style="color:#9900CC;">new</span>
&nbsp;
<span style="color:#008000; font-style:italic;"># commit in-memory graph to stone server</span>
<span style="color:#6666ff; font-weight:bold;">Maglev::PERSISTENT_ROOT</span><span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#ff3333; font-weight:bold;">:data</span><span style="color:#006600; font-weight:bold;">&#93;</span> = g
Maglev.<span style="color:#9900CC;">commit_transaction</span>
&nbsp;
<span style="color:#008000; font-style:italic;">############################</span>
<span style="color:#008000; font-style:italic;"># in different process / VM:</span>
graph = <span style="color:#6666ff; font-weight:bold;">Maglev::PERSISTENT_ROOT</span><span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#ff3333; font-weight:bold;">:data</span><span style="color:#006600; font-weight:bold;">&#93;</span>
<span style="color:#CC0066; font-weight:bold;">puts</span> graph.<span style="color:#9900CC;">inspect</span>
<span style="color:#008000; font-style:italic;"># &gt; #&lt;Graph:0xa205f01 @nodes=[#&lt;Node:0xa202d01&gt;, #&lt;Node:0xa202c01&gt;]&gt;</span>
&nbsp;</pre>
</div>
<p><div class='download-link'>
							<a href='http://www.igvita.com/download.php?file=http://www.github.com/MagLev/maglev/tree/master/.git'><img alt='Download' class='leftalign' src='http://www.igvita.com/wp-content/plugins/dBeautifier/icons/github.png' /></a>
							<h4>
								<a href='http://www.igvita.com/download.php?file=http://www.github.com/MagLev/maglev/tree/master/.git'>maglev (GemStone Maglev Ruby Repository)</a>
							</h4><p>Downloads: 161 File Size: 0.0 KB </p>
						</div></p>
<p>Instead of using an ORM to map Ruby classes to rows or documents in a database, you can simply store the objects directly in the stone server and interact with them through multiple processes, all without any extra conversions or additional infrastructure. The only caveat is that you would have to build your own indexing structures to power search and lookups beyond the key-value semantics. The <a href="http://maglevity.wordpress.com/2009/12/17/kd-trees-and-maglev/">KD-Tree example</a> is a great showcase of the power and flexibility this can enable.</p>
<h4><strong>MagLev at Scale & In-Production</strong></h4>
<p><a href="http://boldr.net/blockcampparis/"><img align="left" src="http://www.igvita.com/posts/10/smalltalk-ruby.png" style="margin-right: 1em;"/></a>While the "stone" server persists all the core Ruby classes and any additional data, the VM's ("gems") are not free. According to the <a href="http://maglev.gemstone.com/docs/GS64-SysAdminGuide-2.4.pdf">documentation</a>, each VM takes ~30Mb of memory at boot time and starts growing from there. On the other hand, the shared memory communication is extremely efficient, which means that hundreds of VM's can be run in parallel on a single box. GemStone claims production deployments of their Smalltalk VM on 64-128 core machines with up to 512GB RAM, running hundreds of concurrent VM's, and achieving over 10K transactions per second (TPS) on their "stone" servers - impressive numbers!</p>
<p>With the new Smalltalk VM (3.0) on the horizon and years of production optimization and research, MagLev is definitely a project to watch. GemStone team has recently <a href="http://maglevity.wordpress.com/">started a blog</a>, opened up a <a href="http://groups.google.com/group/maglev-discussion">Google group</a>, and are starting to produce some <a href="http://maglevity.wordpress.com/2010/01/12/gemstone-internals-videos/">great content</a> to help Rubyists leverage their platform. What is missing now are the deployments, case studies, and new frameworks that can leverage all of these features - though, I'm sure, that will come. </p>
<div class="feedflare">
<a href="http://feeds.igvita.com/~ff/igvita?a=FYnogKYoAgo:67eYNU9rJwY:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/igvita?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=FYnogKYoAgo:67eYNU9rJwY:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/igvita?i=FYnogKYoAgo:67eYNU9rJwY:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=FYnogKYoAgo:67eYNU9rJwY:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/igvita?i=FYnogKYoAgo:67eYNU9rJwY:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=FYnogKYoAgo:67eYNU9rJwY:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/igvita?i=FYnogKYoAgo:67eYNU9rJwY:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=FYnogKYoAgo:67eYNU9rJwY:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/igvita?i=FYnogKYoAgo:67eYNU9rJwY:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/igvita/~4/FYnogKYoAgo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.igvita.com/2010/01/15/distributed-ruby-with-the-maglev-vm/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.igvita.com/2010/01/15/distributed-ruby-with-the-maglev-vm/</feedburner:origLink></item>
		<item>
		<title>Flow Analysis &amp; Time-based Bloom Filters</title>
		<link>http://feeds.igvita.com/~r/igvita/~3/gZpJBys6UNU/</link>
		<comments>http://www.igvita.com/2010/01/06/flow-analysis-time-based-bloom-filters/#comments</comments>
		<pubDate>Wed, 06 Jan 2010 17:45:21 +0000</pubDate>
		<dc:creator>Ilya Grigorik</dc:creator>
		
		<category><![CDATA[Architecture]]></category>

		<category><![CDATA[bigdata]]></category>

		<category><![CDATA[bloomfilter]]></category>

		<category><![CDATA[flow]]></category>

		<guid isPermaLink="false">http://www.igvita.com/?p=925</guid>
		<description><![CDATA[Working with large streams of data is becoming increasingly widespread, be it for log, user behavior, or raw firehose analysis of user generated content. There is some very interesting academic literature on this type of data crunching, although much of it is focused on query or network packet analysis and is often not directly applicable [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" src="http://www.igvita.com/posts/10/data-graph.png" style="margin-right: 1em;"/>Working with large streams of data is becoming increasingly widespread, be it for log, user behavior, or raw firehose analysis of user generated content. There is some very interesting academic literature on this type of data crunching, although much of it is focused on query or network packet analysis and is often not directly applicable to the type of data we have to deal with in the social web. For example, if you were tasked to build (a better) "<a href="http://search.twitter.com/">Trending Topics</a>" algorithm for Twitter, how would you do it?</p>
<p>Of course, the challenge is that it has to be practical - it needs to be "real-time" and be able to react to emerging trends in under a minute, all the while using a reasonable amount of CPU and memory. Now, we don't know how the actual system is implemented at Twitter, nor will we look at any specific solutions - I have some ideas, but I am more curious to hear how you would approach it. Instead, I want to revisit the concept of Bloom Filters, because as I am making my way through the literature, it is surprising how sparsely they are employed for these types of tasks. Specifically, a concept I have been thinking of prototyping for some time now: <strong>time-based, counting bloom filters</strong>!</p>
<h4><strong>Bloom Filters: What & Why</strong></h4>
<p>A <a href="http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/">Bloom Filter is a probabilistic data structure</a> which can tell if an element is a member of a set. However, the reason it is interesting is because it accomplishes this task with an incredibly efficient use of memory: instead of storing a full hash map, it is simply a bit vector which guarantees that you may have some small fraction of false positives (the filter will report that a key is in the bloom filter when it is really not), but it will never report a false negative. File system and web caches frequently use bloom filters as the first query to avoid otherwise costly database or file system lookups. There is some math involved in determining the right parameters for your bloom filter, which you can read about in an <a href="http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/">earlier post</a>. </p>
<p align="center"><img src="http://www.igvita.com/posts/10/tbbf.png" style="padding: 3px;"/></p>
<p>Of course, as is, the Bloom Filter data structure is not very useful for analyzing continuous data streams - eventually we would fill up the filter and it would begin reporting false positives all the time. But, what if your bloom filter only remembered seen data for a fixed interval of time? Imagine adding time-to-live (TTL) timestamp on each record. All of the sudden, if you knew the approximate number of messages for the interval of time you wanted to analyze, then a bloom filter is once again an incredibly fast and space-efficient (fixed memory footprint) data structure!</p>
<h4><strong>Time-based Bloom Filters</strong></h4>
<p>Arguably the key feature of bloom filters is their compact representation as a bit vector. By associating a timestamp with each record, the size of the filter immediately expands by an order of magnitude, but even with that, depending on the size of the time window you are analyzing, you could store the TTL's in just a few additional bits. Conversely, if counting bits is not mission critical, you could even used a backend such as <a href="http://code.google.com/p/redis/">Redis</a> or <a href="http://memcached.org/">Memcached</a> to drive the filter as well. The direct benefit of such approach is that the data can be shared by many distributed processes. On that note, I have <a href="http://github.com/igrigorik/bloomfilter/commit/cf3c0661213e8b9432057e54622504976431cde7">added a prototype Redis backend</a> to the bloomfilter gem which implements a time-based, counting Bloom Filter. Let's take a look at a simple example:</p>
<p><a href="javascript:showme('9576_1');"> <b>> chrono-bloom.rb</b></a>
<div style=" background:white;" id=9576_1>
<pre class="ruby"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'bloomfilter'</span>
&nbsp;
options = <span style="color:#006600; font-weight:bold;">&#123;</span>
  <span style="color:#ff3333; font-weight:bold;">:size</span>	=&gt; <span style="color:#006666;">100</span>,       <span style="color:#008000; font-style:italic;"># size of bit vector</span>
  <span style="color:#ff3333; font-weight:bold;">:hashes</span> =&gt; <span style="color:#006666;">4</span>,       <span style="color:#008000; font-style:italic;"># number of hash functions</span>
  <span style="color:#ff3333; font-weight:bold;">:seed</span>	=&gt; <span style="color:#CC0066; font-weight:bold;">rand</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006666;">100</span><span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#008000; font-style:italic;"># seed value for the filter</span>
  <span style="color:#ff3333; font-weight:bold;">:bucket</span> =&gt; <span style="color:#006666;">3</span>        <span style="color:#008000; font-style:italic;"># number of bits for the counting filter</span>
<span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;
<span style="color:#008000; font-style:italic;"># Regular, in-memory counting bloom filter	</span>
bf = BloomFilter.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>options<span style="color:#006600; font-weight:bold;">&#41;</span>
bf.<span style="color:#9900CC;">insert</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;mykey&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
bf.<span style="color:#9966CC; font-weight:bold;">include</span>?<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;mykey&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>  <span style="color:#008000; font-style:italic;"># =&gt; true</span>
bf.<span style="color:#9966CC; font-weight:bold;">include</span>?<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;mykey1&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#008000; font-style:italic;"># =&gt; false</span>
&nbsp;
<span style="color:#008000; font-style:italic;">#</span>
<span style="color:#008000; font-style:italic;"># Redis-backed bloom filter, with optional time-based semantics</span>
<span style="color:#008000; font-style:italic;">#</span>
bf = BloomFilter.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>options.<span style="color:#9900CC;">merge</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#123;</span>:type =&gt; <span style="color:#ff3333; font-weight:bold;">:redis</span>, <span style="color:#ff3333; font-weight:bold;">:ttl</span> =&gt; <span style="color:#006666;">2</span>, <span style="color:#ff3333; font-weight:bold;">:server</span> =&gt; <span style="color:#006600; font-weight:bold;">&#123;</span>:host =&gt; <span style="color:#996600;">'localhost'</span><span style="color:#006600; font-weight:bold;">&#125;</span><span style="color:#006600; font-weight:bold;">&#125;</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
bf.<span style="color:#9900CC;">insert</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;mykey&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
bf.<span style="color:#9966CC; font-weight:bold;">include</span>?<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;mykey&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>  <span style="color:#008000; font-style:italic;"># =&gt; true</span>
<span style="color:#CC0066; font-weight:bold;">sleep</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006666;">3</span><span style="color:#006600; font-weight:bold;">&#41;</span>
bf.<span style="color:#9966CC; font-weight:bold;">include</span>?<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;mykey&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>  <span style="color:#008000; font-style:italic;"># =&gt; false</span>
&nbsp;
<span style="color:#008000; font-style:italic;"># custom 5s TTL for a key</span>
bf.<span style="color:#9900CC;">insert</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;newkey&quot;</span>, <span style="color:#0000FF; font-weight:bold;">nil</span>, <span style="color:#006666;">5</span><span style="color:#006600; font-weight:bold;">&#41;</span>
&nbsp;</pre>
</div>
<p><div class='download-link'>
							<a href='http://www.igvita.com/download.php?file=http://www.github.com/igrigorik/bloomfilter/tree/master/.git'><img alt='Download' class='leftalign' src='http://www.igvita.com/wp-content/plugins/dBeautifier/icons/github.png' /></a>
							<h4>
								<a href='http://www.igvita.com/download.php?file=http://www.github.com/igrigorik/bloomfilter/tree/master/.git'>bloomfilter.git (Ruby+Redis counting Bloom Filter)</a>
							</h4><p>Downloads: 982 File Size: 0.0 KB </p>
						</div></p>
<p>Storing data in Redis or Memcached is roughly an order of magnitude less efficient, but it gives us an easy to use, distributed, and fixed memory filter for analyzing continuous data streams. In other words, a useful tool for applications such as duplicate detection, trends analysis, and many others.</p>
<h4><strong>Mechanics of Time-Based Bloom Filters</strong></h4>
<p><img align="left" src="http://www.igvita.com/posts/10/algorithm-small.png" style="margin-right: 1em;"/>So how does it work? Given the settings above, we create a fixed memory vector of 100 buckets (or bits in raw C implementation). Then, for each key, we hash it 4 times with different key offsets and increment the counts in those buckets - a non-negative value indicates that one of the hash functions for some key has used that bucket. Then, for a lookup, we reverse the operation: generate the 4 different hash keys and look them up, if all of them are non-zero then either we have seen this key or there has been a collision (false positive). By optimizing the size of the bit vector we can control the false positive rate - you're always trading the of amount of allocated memory vs. collision rate. Finally, we also make use of the <a href="http://code.google.com/p/redis/wiki/ExpireCommand">native expire functionality</a> in Redis to guarantee that keys are only stored for a bounded amount of time.</p>
<p>Time-based bloom filters have seen a few rogue mentions in the academic literature, but to the best of my knowledge, have not seen wide applications in the real world. However, it is an incredibly powerful data structure, and one that could benefit many modern, big-data applications. Gem install the bloomfilter gem and give it a try, perhaps it will help you build a better trends analysis tool. Speaking of which, what other tools, algorithms, or data structures would you use to build a "Trending Topics" algorithm for a high-velocity stream?</p>
<div class="feedflare">
<a href="http://feeds.igvita.com/~ff/igvita?a=gZpJBys6UNU:X4yY3nx3fT4:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/igvita?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=gZpJBys6UNU:X4yY3nx3fT4:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/igvita?i=gZpJBys6UNU:X4yY3nx3fT4:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=gZpJBys6UNU:X4yY3nx3fT4:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/igvita?i=gZpJBys6UNU:X4yY3nx3fT4:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=gZpJBys6UNU:X4yY3nx3fT4:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/igvita?i=gZpJBys6UNU:X4yY3nx3fT4:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=gZpJBys6UNU:X4yY3nx3fT4:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/igvita?i=gZpJBys6UNU:X4yY3nx3fT4:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/igvita/~4/gZpJBys6UNU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.igvita.com/2010/01/06/flow-analysis-time-based-bloom-filters/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.igvita.com/2010/01/06/flow-analysis-time-based-bloom-filters/</feedburner:origLink></item>
		<item>
		<title>Ruby &amp; WebSockets: TCP for the Browser</title>
		<link>http://feeds.igvita.com/~r/igvita/~3/NOxdg9Y0Ejg/</link>
		<comments>http://www.igvita.com/2009/12/22/ruby-websockets-tcp-for-the-browser/#comments</comments>
		<pubDate>Tue, 22 Dec 2009 16:39:09 +0000</pubDate>
		<dc:creator>Ilya Grigorik</dc:creator>
		
		<category><![CDATA[Architecture]]></category>

		<category><![CDATA[html5]]></category>

		<category><![CDATA[realtime]]></category>

		<category><![CDATA[websocket]]></category>

		<guid isPermaLink="false">http://www.igvita.com/?p=896</guid>
		<description><![CDATA[WebSockets are one of the most underappreciated innovations in HTML5. Unlike local storage, canvas, web workers, or even video playback, the benefits of the WebSocket API are not immediately apparent to the end user. In fact, over the course of the past decade we have invented a dozen technologies to solve the problem of asynchronous [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" style="margin-right: 1em;" src="http://www.igvita.com/posts/09/html5.png"/>WebSockets are one of the most underappreciated innovations in HTML5. Unlike local storage, canvas, web workers, or even video playback, the benefits of the <a href="http://dev.w3.org/html5/websockets/">WebSocket API</a> are not immediately apparent to the end user. In fact, over the course of the past decade we have invented a dozen technologies to solve the problem of asynchronous and bi-directional communication between the browser and the server: AJAX, <a href="http://www.igvita.com/2009/10/21/nginx-comet-low-latency-server-push/">Comet & HTTP Streaming</a>, BOSH, <a href="http://www.igvita.com/2009/08/18/smart-clients-reversehttp-websockets/">ReverseHTTP</a>, <a href="http://www.igvita.com/2009/06/29/http-pubsub-webhooks-pubsubhubbub/">WebHooks & PubSubHubbub</a>, and Flash sockets amongst many others. Having said that, it does not take much experience with any of the above to realize that each has a weak spot and none solve the fundamental problem: web-browsers of yesterday were not designed for bi-directional communication.</p>
<p>WebSockets in HTML5 change all of that as they were designed from the ground up to be data agnostic (binary or text) with support for full-duplex communication. <strong>WebSockets are TCP for the web-browser.</strong> Unlike BOSH or equivalents, they require only a single connection, which translates into much better resource utilization for both the server and the client. Likewise, WebSockets are proxy and firewall aware, can operate over SSL and leverage the HTTP channel to accomplish all of the above - your existing load balancers, proxies and routers will work just fine.</p>
<h4><strong>WebSockets in the Browser: Chrome, Firefox & Safari</strong></h4>
<p><img align="left" src="http://www.igvita.com/posts/09/websocket-browsers.png" style="margin-right: 1em;"/>The WebSocket API is still a draft, but the developers of our favorite browsers have already implemented much of the functionality. Chrome’s <a href="http://blog.chromium.org/2009/12/web-sockets-now-available-in-google.html">developer build (4.0.249.0)</a> now officially supports the API and has it enabled by default. Webkit nightly builds also support WebSockets, and Firefox has an <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=472529">outstanding patch</a> under review. In other words, while mainstream adoption is still on the horizon, as developers we can start thinking about much improved architectures that WebSockets enable. A minimal example with the help of jQuery: </p>
<p><a href="javascript:showme('6755_1');"> <b>> websocket.html</b></a>
<div style=" background:white;" id=6755_1>
<pre class="javascript">&lt;html&gt;
  &lt;head&gt;
    &lt;script src=<span style="color: #3366CC;">'http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js'</span>&gt;&lt;/script&gt;
    &lt;script&gt;
      $<span style="color: #66cc66;">&#40;</span>document<span style="color: #66cc66;">&#41;</span>.<span style="color: #006600;">ready</span><span style="color: #66cc66;">&#40;</span><span style="color: #003366; font-weight: bold;">function</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#123;</span>
        <span style="color: #003366; font-weight: bold;">function</span> debug<span style="color: #66cc66;">&#40;</span>str<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#123;</span> $<span style="color: #66cc66;">&#40;</span><span style="color: #3366CC;">&quot;#debug&quot;</span><span style="color: #66cc66;">&#41;</span>.<span style="color: #006600;">append</span><span style="color: #66cc66;">&#40;</span><span style="color: #3366CC;">&quot;&lt;p&gt;&quot;</span>+str+<span style="color: #3366CC;">&quot;&lt;/p&gt;&quot;</span><span style="color: #66cc66;">&#41;</span>; <span style="color: #66cc66;">&#125;</span>;
&nbsp;
        ws = <span style="color: #003366; font-weight: bold;">new</span> WebSocket<span style="color: #66cc66;">&#40;</span><span style="color: #3366CC;">&quot;ws://yourservice.com/websocket&quot;</span><span style="color: #66cc66;">&#41;</span>;
        ws.<span style="color: #006600;">onmessage</span> = <span style="color: #003366; font-weight: bold;">function</span><span style="color: #66cc66;">&#40;</span>evt<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span> $<span style="color: #66cc66;">&#40;</span><span style="color: #3366CC;">&quot;#msg&quot;</span><span style="color: #66cc66;">&#41;</span>.<span style="color: #006600;">append</span><span style="color: #66cc66;">&#40;</span><span style="color: #3366CC;">&quot;&lt;p&gt;&quot;</span>+evt.<span style="color: #006600;">data</span>+<span style="color: #3366CC;">&quot;&lt;/p&gt;&quot;</span><span style="color: #66cc66;">&#41;</span>; <span style="color: #66cc66;">&#125;</span>;
        ws.<span style="color: #006600;">onclose</span> = <span style="color: #003366; font-weight: bold;">function</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span> debug<span style="color: #66cc66;">&#40;</span><span style="color: #3366CC;">&quot;socket closed&quot;</span><span style="color: #66cc66;">&#41;</span>; <span style="color: #66cc66;">&#125;</span>;
        ws.<span style="color: #006600;">onopen</span> = <span style="color: #003366; font-weight: bold;">function</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span>
          debug<span style="color: #66cc66;">&#40;</span><span style="color: #3366CC;">&quot;connected...&quot;</span><span style="color: #66cc66;">&#41;</span>;
          ws.<span style="color: #006600;">send</span><span style="color: #66cc66;">&#40;</span><span style="color: #3366CC;">&quot;hello server&quot;</span><span style="color: #66cc66;">&#41;</span>;
        <span style="color: #66cc66;">&#125;</span>;
      <span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#41;</span>;
    &lt;/script&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;div id=<span style="color: #3366CC;">&quot;debug&quot;</span>&gt;&lt;/div&gt;
    &lt;div id=<span style="color: #3366CC;">&quot;msg&quot;</span>&gt;&lt;/div&gt;
  &lt;/body&gt;
&lt;/html&gt;
&nbsp;</pre>
</div>
<p>The above example showcases the bi-directional nature of WebSockets: <em>send</em> pushes data to the server, and <em>onmessage</em> callback is invoked anytime the server pushes data to the client. No need for long-polling, HTTP header overhead, or juggling multiple connections. In fact, you could even deploy the WebSocket API today without waiting for the browser adoption by using a Flash socket as an intermediate step: <a href="http://github.com/gimite/web-socket-js">web-socket-js</a>.</p>
<h4><strong>Streaming Data to WebSocket Clients</strong></h4>
<p>WebSockets are not the same as raw TCP sockets and for a good reason. While it may seem tempting to be able to open a raw TCP connections from within the browser, the security of the browser would be immediately compromised: any website could then access the network on behalf of the user, within the same security context as the user. For example, a website could open a connection to a remote SMTP server and start delivering spam - a scary thought. Instead, WebSockets extend the HTTP protocol by defining a special handshake in order for the browser to establish a connection. In other words, it is an opt-in protocol which requires a standalone server.</p>
<p align="center"><img src="http://www.igvita.com/posts/09/websocket-chat.png" style="border: 1px solid rgb(204, 204, 204);"/></p>
<p>Nothing stops you from talking to an SMTP, AMQP, or any other server via the raw protocol, but you will have to introduce a WebSocket server in between to mediate the connection. <a href="http://www.kaazing.org/confluence/display/KAAZING/What+is+Kaazing+Open+Gateway">Kaazing Gateway</a> already provides adapters for STOMP and Apache ActiveMQ, and you could also implement your own JavaScript wrappers for others. And if a Java based WebSocket server is not for you, Ruby EventMachine also allows us to build a very simple event-driven WebSocket server in just a few lines of code: </p>
<p><a href="javascript:showme('6755_2');"> <b>> websocket.rb</b></a>
<div style=" background:white;" id=6755_2>
<pre class="ruby"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'em-websocket'</span>
&nbsp;
<span style="color:#6666ff; font-weight:bold;">EventMachine::WebSocket</span>.<span style="color:#9900CC;">start</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:host</span> =&gt; <span style="color:#996600;">&quot;0.0.0.0&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:port</span> =&gt; <span style="color:#006666;">8080</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#9966CC; font-weight:bold;">do</span> |ws|
  ws.<span style="color:#9900CC;">onopen</span>    <span style="color:#006600; font-weight:bold;">&#123;</span> ws.<span style="color:#9900CC;">send</span> <span style="color:#996600;">&quot;Hello Client!&quot;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
  ws.<span style="color:#9900CC;">onmessage</span> <span style="color:#006600; font-weight:bold;">&#123;</span> |msg| ws.<span style="color:#9900CC;">send</span> <span style="color:#996600;">&quot;Pong: #{msg}&quot;</span> <span style="color:#006600; font-weight:bold;">&#125;</span>
  ws.<span style="color:#9900CC;">onclose</span>   <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;WebSocket closed&quot;</span> <span style="color:#006600; font-weight:bold;">&#125;</span>
<span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;</pre>
</div>
<p><div class='download-link'>
							<a href='http://www.igvita.com/download.php?file=http://www.github.com/igrigorik/em-websocket/tree/master/.git'><img alt='Download' class='leftalign' src='http://www.igvita.com/wp-content/plugins/dBeautifier/icons/github.png' /></a>
							<h4>
								<a href='http://www.igvita.com/download.php?file=http://www.github.com/igrigorik/em-websocket/tree/master/.git'>em-websocket (Ruby EventMachine WebSocket Server)</a>
							</h4><p>Downloads: 564 File Size: 0.0 KB </p>
						</div></p>
<h4><strong>Consuming WebSocket Services</strong></h4>
<p>Support for WebSockets in Chrome and Safari also means that our mobile devices will soon support bi-directional push, which is both easier on the battery, and much more efficient for bandwidth consumption. However, WebSockets can also be utilized outside of the browser (ex: real-time data firehose), which means that a regular Ruby HTTP client should be able to handle WebSockets as well: </p>
<p><a href="javascript:showme('6755_3');"> <b>> em-http-websocket.rb</b></a>
<div style=" background:white;" id=6755_3>
<pre class="ruby"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'eventmachine'</span>
&nbsp;
EventMachine.<span style="color:#9900CC;">run</span> <span style="color:#006600; font-weight:bold;">&#123;</span>
  http = <span style="color:#6666ff; font-weight:bold;">EventMachine::HttpRequest</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;ws://yourservice.com/websocket&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">get</span> <span style="color:#ff3333; font-weight:bold;">:timeout</span> =&gt; <span style="color:#006666;">0</span>
&nbsp;
  http.<span style="color:#9900CC;">errback</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;oops&quot;</span> <span style="color:#006600; font-weight:bold;">&#125;</span>
  http.<span style="color:#9900CC;">callback</span> <span style="color:#006600; font-weight:bold;">&#123;</span>
    <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;WebSocket connected!&quot;</span>
    http.<span style="color:#9900CC;">send</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;Hello client&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
  <span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;
  http.<span style="color:#9900CC;">stream</span> <span style="color:#006600; font-weight:bold;">&#123;</span> |msg|
    <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;Recieved: #{msg}&quot;</span>
    http.<span style="color:#9900CC;">send</span> <span style="color:#996600;">&quot;Pong: #{msg}&quot;</span>
  <span style="color:#006600; font-weight:bold;">&#125;</span>
<span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;</pre>
</div>
<p><div class='download-link'>
							<a href='http://www.igvita.com/download.php?file=http://www.github.com/igrigorik/em-http-request/tree/master/.git'><img alt='Download' class='leftalign' src='http://www.igvita.com/wp-content/plugins/dBeautifier/icons/github.png' /></a>
							<h4>
								<a href='http://www.igvita.com/download.php?file=http://www.github.com/igrigorik/em-http-request/tree/master/.git'>em-http-request (Asynchronous HTTP Client)</a>
							</h4><p>Downloads: 298 File Size: 0.0 KB </p>
						</div></p>
<p>WebSocket support is still an experimental branch within em-http-request, but the aim is to provide a consistent and fully transparent API: simply specify a WebSocket resource and it will do the rest, just as if you were using a streaming HTTP connection! Best of all, HTTP & OAuth authentication, proxies and existing load balancers will all work and play nicely with this new delivery model.</p>
<h4><strong>WebHooks, PubSubHubbub, WebSockets, ...</strong></h4>
<p>Of course, WebSockets are not the panacea to every problem. <a href="http://www.igvita.com/2009/06/29/http-pubsub-webhooks-pubsubhubbub/">WebHooks and PubSubHubbub</a> are great protocols for intermittent push updates where a long-lived TCP connection may prove to be inefficient. Likewise, if you require non-trivial routing then <a href="http://www.igvita.com/2009/10/08/advanced-messaging-routing-with-amqp/">AMQP is a powerful tool</a>, and there is little reason to reinvent the powerful <a href="http://www.igvita.com/2009/11/10/consuming-xmpp-pubsub-in-ruby/">presence model built into XMPP</a>. Right tool for the right job, but WebSockets are without a doubt a much-needed addition to every developers toolkit.</p>
<div class="feedflare">
<a href="http://feeds.igvita.com/~ff/igvita?a=NOxdg9Y0Ejg:4gXZ3HYDPxU:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/igvita?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=NOxdg9Y0Ejg:4gXZ3HYDPxU:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/igvita?i=NOxdg9Y0Ejg:4gXZ3HYDPxU:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=NOxdg9Y0Ejg:4gXZ3HYDPxU:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/igvita?i=NOxdg9Y0Ejg:4gXZ3HYDPxU:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=NOxdg9Y0Ejg:4gXZ3HYDPxU:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/igvita?i=NOxdg9Y0Ejg:4gXZ3HYDPxU:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=NOxdg9Y0Ejg:4gXZ3HYDPxU:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/igvita?i=NOxdg9Y0Ejg:4gXZ3HYDPxU:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/igvita/~4/NOxdg9Y0Ejg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.igvita.com/2009/12/22/ruby-websockets-tcp-for-the-browser/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.igvita.com/2009/12/22/ruby-websockets-tcp-for-the-browser/</feedburner:origLink></item>
		<item>
		<title>Future of RDBMS is RAM Clouds &amp; SSD</title>
		<link>http://feeds.igvita.com/~r/igvita/~3/Vfj7WBNM80E/</link>
		<comments>http://www.igvita.com/2009/12/07/future-of-rdbms-is-ram-clouds-ssd/#comments</comments>
		<pubDate>Mon, 07 Dec 2009 16:51:58 +0000</pubDate>
		<dc:creator>Ilya Grigorik</dc:creator>
		
		<category><![CDATA[Architecture]]></category>

		<category><![CDATA[Databases]]></category>

		<category><![CDATA[database]]></category>

		<category><![CDATA[ramcloud]]></category>

		<category><![CDATA[ssd]]></category>

		<guid isPermaLink="false">http://www.igvita.com/?p=854</guid>
		<description><![CDATA[Rumors of the demise of relational database systems are greatly exaggerated. The NoSQL movement is increasingly capturing the mindshare of the developers, all the while the academia have been talking about the move away from "RDBMS as one size fits all" for several years. However, while the new storage engines are exciting to see, it [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.igvita.com/posts/09/ramcloud.png" align="left" style="margin-right:1em;" />Rumors of the demise of relational database systems are greatly exaggerated. The NoSQL movement is increasingly capturing the mindshare of the developers, all the while the academia have been talking about the move away from "RDBMS as one size fits all" for several years. However, while the new storage engines are exciting to see, it is also important to recognize that relational databases still have a bright future ahead - <em>RDBMS systems are headed into main memory, which changes the playing field all together</em>.</p>
<p>Performance is only one aspect that influences the choice of a database. Tree and graph structures are not easy to model within a relational structure, which in turn leads to complicated schemas and system overhead. For that reason alone, document-stores (<a href="http://www.igvita.com/2009/02/13/tokyo-cabinet-beyond-key-value-store/">Tokyo</a>, <a href="http://couchdb.apache.org/">CouchDB</a>, <a href="http://www.mongodb.org/display/DOCS/Home">MongoDB</a>), graph stores (<a href="http://neo4j.org/">Neo4J</a>), and other alternative data structure databases (<a href="http://code.google.com/p/redis/">Redis</a>) are finding fertile ground for adoption. However, the end of "RDBMS as one size fits all" does not mean the end of relational systems all together. It is too early to bury RDBMS in favor of No (or Less) SQL.  We just need to reset how we think about the RDBMS.</p>
<h4><strong>Disks are the New Tape</strong></h4>
<p>The evolution of disks has been extremely uneven over the last 25 years: disk capacity has increased 1000x, data transfer speeds increased 50x, while seek and rotational delays have only gone up by a factor of 2. Hence, if we only needed to transfer several hundred kilobytes of data in the mid 80's to achieve good disk utilization, then today we need to read at least 10MB of data to amortize the costs of seeking the data - refresh your memory on <a href="http://www.igvita.com/2009/06/23/measuring-optimizing-io-performance/">seek, rotational, and transfer times of our rusty hard drives</a>.</p>
<p><img src="http://www.igvita.com/posts/09/disk-architecture.png" align="left" style="margin-right:1em;" />When the best we can hope for is 100-200 IOPS out of a modern hard drive, the trend towards significantly larger block sizes begins to make a lot more sense. Whereas your local filesystem is likely to use 4 or 8kb blocks, systems such as Google's GFS and Hadoop's HDFS are opting out for 64MB+ blocks in order to amortize the cost of seeking for the data - by using much larger blocks, the cost of seeks and access time is once again brought down to single digit percent figures over the transfer time. </p>
<p>Hence, as we generate and store more and more data, the role of the disks must inevitably become more archival. Batch processing systems such as Map-Reduce are well suited for this world and are quickly replacing the old business intelligence (BI) systems for exactly these reasons. In the meantime, the limitations imposed by the random access to disk mean that we need to reconsider the role of disk in our database systems.</p>
<h4><strong>OLTP is Headed Into Main Memory & Flash</strong></h4>
<p>An average random seek will take 5-10ms when hitting the physical disk and hundreds of microseconds for accessing data from cache. Compare that to a fixed cost of 5-10 microseconds for accessing data in RAM and the benefits of a 100-1000x speed difference can be transformative. <strong>Instead of treating memory as a cache, why not treat it as a primary data store?</strong> John Ousterhout and his co-authors outline a compelling argument for "<a href="http://www.stanford.edu/~ouster/cgi-bin/papers/ramcloud.pdf">RAMCloud</a>". After all, if Facebook keeps over 80% of their data in memcached, and Google stores entire indexes of the web in memory many times over, then your average database-backed application should easily fit and be able to take advantage of the pure memory model also.</p>
<p>The moment all of the data is available in memory, it is an entirely new game: access time and seek times become irrelevant (no disk seeks), the value of optimizing for locality and access patterns is diminished by orders of magnitude, and in fact, entirely new and much richer query models can enable a new class of data-intensive applications. In a world where the developer's time is orders of magnitude more expensive than the hardware (a recent phenomenon), this also means faster iterations and less data-optimization overhead.</p>
<p><img src="http://www.igvita.com/posts/09/ssd.png" align="left" style="margin-right:1em;" />The downside to the RAMCloud is the equivalent order of magnitude increase in costs - RAM prices are dropping, but dollar for dollar, RAMCloud systems are still significantly more expensive. Flash storage is an obvious compromise for both speed and price. Theoretical access time for solid-state devices is on the order of 50 microseconds for reads, and 200 microseconds for writes. However, in reality, wrapping solid-state storage in SATA-like hardware devices brings us back to ~200 microseconds for reads, or ~5000 IOPS. Though, of course, innovation continues and devices such as <a href="http://www.fusionio.com/products/ioxtreme/">FusionIO’s PCI-E flash storage</a> controller bring us back to 80 microsecond reads at a cost of ~$11 per Gigabyte.</p>
<p>However, even the significantly higher hardware price point is often quickly offset once you factor in the saved developer time and adjacent benefits such as guaranteed performance independent of access patterns or data locality. Database servers with 32GB and 64GB of RAM are no longer unusual, and when combined with SSDs, such as the <a href="http://assets.en.oreilly.com/1/event/21/The%20SmugMug%20Tale%20Presentation.pdf">system deployed at SmugMug</a>, often offer a much easier upgrade path than switching your underlying database system to a NoSQL alternative.</p>
<h4><strong>Database Architecture for the RAMCloud</strong></h4>
<p>Migrating your data into RAM or Flash yields significant improvements via pure speedup in hardware, however, "<a href="http://nms.csail.mit.edu/~stavros/pubs/hstore.pdf">it is time for a complete rewrite</a>" argument still holds: majority of existing database systems are built with implicit assumptions for disk-backed storage. These architectures optimize for disk-based indexing structures, and have to rely on multithreading and locking-based concurrency to hide latency of the underlying storage. </p>
<p><img src="http://www.igvita.com/posts/09/rethink-drizzle.png" align="left" style="margin-right:1em;" />When access time is measured in microseconds, optimistic and lock-free concurrency is fair game, which leads to much better multi-core performance and allows us to drop thousands of lines of code for multi-threaded data structures (concurrent B-Trees, etc). <a href="http://www.rethinkdb.com/">RethinkDB</a> is a drop-in MySQL engine designed for SSD drives leveraging exactly these trends, and <a href="https://launchpad.net/drizzle">Drizzle</a> is a larger fork of the entire MySQL codebase aimed at optimizing the relational model for "cloud and net applications": massively distributed, lightweight kernel and extensible.</p>
<h4><strong>Migrating Into Main Memory</strong></h4>
<p>Best of all, you can start leveraging the benefits of storing your data in main memory even with the existing MySQL databases - most of them are small enough to make the memory buffers nothing but a leaky abstraction. Enable periodic flush to disk for InnoDB (<a href="http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/">innodb_flush_log_at_trx_commit=2</a>), and create <a href="http://en.wikipedia.org/wiki/Index_%28database%29#Covering_Index">covering indexes</a> for your data (a covering index is an index which itself contains all the required data to answer the query). Issue a couple of warm-up requests to load the data into memory and you are off to the races. </p>
<p>Of course, the above strategy is at best an intermediate solution, so investigating SSD’s as a primary storage layer, and if you are adventurous, give RethinkDB a try. Also keep an eye on Drizzle as the first production release is aimed for summer of 2010. Alternative data storage engines such as Redis, MongoDB and others are also worth looking into, but let us not forget: laws of physics still apply to NoSQL. There is no magic there. Memory is fast, disks are slow. Nothing is stopping relational systems from taking advantage of main memory or SSD storage.</p>
<div class="feedflare">
<a href="http://feeds.igvita.com/~ff/igvita?a=Vfj7WBNM80E:E-hcWdm6Fdc:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/igvita?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=Vfj7WBNM80E:E-hcWdm6Fdc:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/igvita?i=Vfj7WBNM80E:E-hcWdm6Fdc:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=Vfj7WBNM80E:E-hcWdm6Fdc:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/igvita?i=Vfj7WBNM80E:E-hcWdm6Fdc:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=Vfj7WBNM80E:E-hcWdm6Fdc:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/igvita?i=Vfj7WBNM80E:E-hcWdm6Fdc:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=Vfj7WBNM80E:E-hcWdm6Fdc:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/igvita?i=Vfj7WBNM80E:E-hcWdm6Fdc:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/igvita/~4/Vfj7WBNM80E" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.igvita.com/2009/12/07/future-of-rdbms-is-ram-clouds-ssd/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.igvita.com/2009/12/07/future-of-rdbms-is-ram-clouds-ssd/</feedburner:origLink></item>
		<item>
		<title>State of Ruby VMs: Ruby Renaissance</title>
		<link>http://feeds.igvita.com/~r/igvita/~3/_RLQ_jnJvT8/</link>
		<comments>http://www.igvita.com/2009/11/20/state-of-ruby-vms-ruby-renaissance/#comments</comments>
		<pubDate>Fri, 20 Nov 2009 22:03:41 +0000</pubDate>
		<dc:creator>Ilya Grigorik</dc:creator>
		
		<category><![CDATA[Ruby]]></category>

		<category><![CDATA[vm]]></category>

		<guid isPermaLink="false">http://www.igvita.com/?p=836</guid>
		<description><![CDATA[Ruby is commonly associated with the frameworks (Rails, RSpec, and many others) that it enabled, but it is much more than that. The same ideology and design principles that popularized the language at the start are also the reason why it is being currently ported to a variety of alternative platforms: JVM, Objective-C, Smalltalk VM [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" style="margin-right: 1em;" src="http://www.igvita.com/posts/09/ruby-logo.png"/>Ruby is commonly associated with the frameworks (Rails, RSpec, and many others) that it enabled, but it is much more than that. The same ideology and design principles that popularized the language at the start are also the reason why it is being currently ported to a variety of alternative platforms: JVM, Objective-C, Smalltalk VM and Microsoft’s DLR. Technical details aside, few will disagree that <a href="http://www.artima.com/intv/ruby.html">Matz’s focus</a> on “how we feel while programming” and the objective of “making the programmer happy” has resonated with the larger community.</p>
<p>In a short span of just a couple of years, the Ruby VM space has evolved to more than just a handful of choices: <a href="http://www.ruby-lang.org/en/">MRI</a>, <a href="http://jruby.org/">JRuby</a>, <a href="http://ironruby.net/">IronRuby</a>, <a href="http://www.macruby.org/">MacRuby</a>, <a href="http://rubini.us/">Rubinius</a>, <a href="http://maglev.gemstone.com/">MagLev</a>, <a href="http://www.rubyenterpriseedition.com/">REE</a> and <a href="https://wiki.sdn.sap.com/wiki/display/Research/BlueRuby">BlueRuby</a>. In fact, keeping up with all of the most recent developments within each VM is now easily a full-time job. For that reason, and with RubyConf ‘09 in full swing, let’s take a quick survey of the space and where it’s taking us.</p>
<h4><strong>2010: Year of Ruby Renaissance</strong></h4>
<p>Observing the trends and acceleration of development amongst all the VM’s, it is clear that 2010 is going to be an exciting year for the language. While more than a few developers have proclaimed Ruby (and more often, Rails) as dead within the past year, likely due to it losing the initial novelty angle, in reality the language is at the cusp of becoming available to a much broader community. Within the next year, MacRuby, Rubinius, IronRuby, and MagLev should all hit the 1.0 status, effectively making all the things we love about Ruby available to entirely new communities of programmers. HotCocoa with MacRuby makes writing Mac apps a breeze, IronRuby will bring Ruby scripting to the .NET crowd, Rubinius will become a viable deployment platform, and MagLev will give us the distributed persistence model offered by their Smalltalk VM. All of this without even mentioning the growing adoption of JRuby, which marries the best of JVM with Ruby, or the rising popularity of REE fork of Matz’s Ruby which offers significant performance and memory improvements.</p>
<p align="center"><img src="http://www.igvita.com/posts/09/ruby-vms.png"/></p>
<p>In other words, <em>if the “Ruby revolution is over”, then the next year is likely to be the first year of its Renaissance</em>. It won’t happen overnight, but slowly and surely we will see the same idioms, tools, and DSL’s we are all accustomed to in Ruby make their way to adjacent platforms. Many projects and companies are already using RSpec and WebRat to test their non-Ruby code. Likewise, why not use Ruby for DOM manipulation via Silverlight (<a href="http://visitmix.com/labs/gestalt/">Gestalt</a>), or abstract Java or Cocoa API’s into concise DSL’s? It’s an exciting time to be a Rubyist.</p>
<h4><strong>MRI: Matz’s Ruby (1.8.x / 1.9.x)</strong></h4>
<p><img align="left" style="margin-right: 1em;" src="http://www.igvita.com/posts/09/mri-logo.png"/>MRI Ruby, which is the original and the default platform for the vast majority of Ruby users have been making steady progress throughout the past year. First, we saw the <a href="http://www.ruby-lang.org/en/news/2009/01/30/ruby-1-9-1-released/">1.9.1 release</a> at the beginning of the year, which made Ruby 1.9 a viable deployment platform, although the overall pickup has remained relatively low. In mid July, <a href="http://www.ruby-lang.org/en/news/2009/07/20/ruby-1-9-2-preview-1-released/">Ruby 1.9.2 preview 1</a> hit the shelves, and the original schedule planned for the final 1.9.2 release on December 25th. However, <a href="http://twitter.com/yugui">Yuki Sonoda</a> (release manager) recently indicated that the <a href="http://www.ruby-forum.com/topic/195825">schedule will be canceled</a> in favor of making Ruby 1.9.2 compatible with the RubySpec suite prior to the final release - this is great news for everyone and well worth the wait.</p>
<p>Sitting in the audience at RubyKaigi in Tokyo earlier this year, it was clear that the focus of the development team is on Ruby 1.9. Moving forward, there will be one more release within the 1.8.x branch (Ruby 1.8.8), and it will serve as a bridge between 1.8 and 1.9. If you haven’t already, you should investigate migrating your code to Ruby 1.9 - <a href="http://isitruby19.com/">most of the critical gems</a>, and all popular frameworks work out of the box, not to mention the numerous performance improvements.</p>
<p>In the coming year, we <a href="http://twitter.com/yugui/status/5739171454">may even see a Ruby 2.0</a>, and in all likelihood a continual improvement in speed and library support. Matz showed off a number of experimental branches at RubyKaigi, and Koichi Sasada indicated that many of the performance optimizations are yet to be turned on for Ruby 1.9 - to date the focus has been on compatibility and feature completeness. </p>
<h4><strong>JRuby: Ruby on the JVM</strong></h4>
<p>Out of all the “alternative” Ruby VM’s, JRuby is by far the most mature project both in terms of compatibility and community coverage. By combining the best of the JVM platform - generational GC, true concurrency (no GIL), and transparent interop with any Java library - with Ruby syntax, it is no surprise that JRuby has been quietly <a href="http://adtmag.com/articles/2009/11/10/jruby-1.4-released.aspx">gaining market share</a> in the community. It is fast, it runs Rails, and it will soon be compatible with Ruby 1.9.</p>
<p><img align="left" style="margin-right: 1em;" src="http://www.igvita.com/posts/09/jruby-duke.png"/>Within the past year the JRuby team, which consists of 7 active committers (3 of whom <a href="http://arstechnica.com/open-source/news/2009/07/jruby-team-leaves-sun-joins-engine-yard.ars">migrated from Sun to Engine Yard</a> earlier this year) , and dozens of intermittent contributors has fixed more user reported issues than in all previous releases combined. <a href="http://jruby.org/2009/11/02/jruby-1-4-0.html">JRuby 1.4.0</a> is a good indicator of the health of the project:  a large collection of new features, and over 307 bug fixes since JRuby 1.3.1.</p>
<p>Tickets for <a href="http://jrubyconf.com/">JRubyConf</a> were sold out in a matter of hours following the announcement, and in all likelihood serve as a good indicator that JRuby is the platform to watch in the coming year. The combination of the JVM optimizations and its widescale deployment within the enterprise world will definitely make it an appealing Ruby VM. </p>
<h4><strong>MacRuby: Objective-C, LLVM and Ruby</strong></h4>
<p><img align="left" style="margin-right: 1em;" src="http://www.igvita.com/posts/09/macruby-logo.png"/>Want to mix and match Cocoa API’s or access OSX system libraries all within a Ruby VM? Then MacRuby is the answer and the VM is picking up steam fast. On March 9th, <a href="http://www.macruby.org/blog/2009/03/09/macruby-0.4.html">MacRuby 0.4</a> shipped with a threaded GC, full 64-bit support, DTrace probes, and many improvements to the HotCocoa API’s. Since then, the project has switched from <a href="http://en.wikipedia.org/wiki/YARV">YARV</a> to a completely new VM based on LLVM compiler infrastructure (<a href="http://www.macruby.org/blog/2009/10/07/macruby05b1.html">shipped in 0.5 beta 1</a>), and the benefits are numerous: machine code compilation, true concurrency (no GIL), a working JIT, and even ahead of time compilation (AOT)!</p>
<p>In other words, MacRuby is now a true Ruby compiler. You can write a HotCocoa app, leverage native POSIX threads, or even take advantage of Apple’s <a href="http://en.wikipedia.org/wiki/Grand_Central_Dispatch">Grand Central Dispatch</a> (GCD) and then compile your program and distribute it as a binary to any OSX user.</p>
<p>With 7 members on the team and a growing community (<a href="http://rubyonosx.com/">RubyOnOSX</a>), MacRuby  is quickly becoming one of the most promising open-source projects for Apple - it would be great to see them officially embrace it in the coming year. With the new VM and numerous performance improvements, MacRuby has the potential to bring Ruby to all the Objective-C developers and open up an entirely new market for Ruby. In theory, we could even see MacRuby on the IPhone, that is, if we overcome a few minor snags, like a <a href="http://merbist.com/2009/05/27/macruby-changing-the-ruby-ecosystem/#comment-659">missing GC</a>.  Definitely a project to watch in the coming year as it edges towards the 1.0 status. </p>
<h4><strong>MagLev: Smalltalk VM and Ruby</strong></h4>
<p><img align="left" style="margin-right: 1em;" src="http://www.igvita.com/posts/09/maglev-logo.png"/>The GemStone team has been quietly making steady progress with <a href="http://maglev.gemstone.com/">MagLev</a> over the past year. The development is being done in parallel with work on their upcoming GemStone 3.0 VM and is promising to bring Ruby to their 64-bit Smalltalk VM, which offers a JIT, years of VM optimizations, and most importantly, a <a href="http://pivotallabs.com/talks/28-maglev">built-in persistence and distribution layer</a>. In other words, you could think of Maglev as a distributed database that runs Ruby code internally - thousands of concurrent VM’s can be spread across hundreds of nodes, all accessing the same data with ACID semantics.</p>
<p>At RailsConf this year the team showed off a working Sinatra app, and since then they have gone from passing 5000 to nearly ~27900 RubySpecs. Rack, Sinatra, MiniTest and a few others already run unmodified on the VM and the tentative plan is for the project to hit 1.0 sometime in the upcoming year. At the moment, there is a closed alpha test in progress, but soon it will be opened to the larger public.<em> (Update: grab the <a href="http://groups.google.com/group/maglev-discussion/browse_thread/thread/1102993e9e21492a">public beta here</a>).</em> </p>
<p>The persistence layer offered by the VM is definitely one of the most interesting features, but the team has also indicated that MagLev will support other persistence models as well - you’ll be able to use ActiveRecord with MySQL, etc.</p>
<h4><strong>Ruby Enterprise Edition</strong></h4>
<p><img align="left" style="margin-right: 1em;" src="http://www.igvita.com/posts/09/phusion-logo.png"/>Launched in mid 2008, REE is a fork of Ruby 1.8.7 optimized for server and production deployments of MRI Ruby. Combination of <a href="http://github.com/brentr/matzruby/tree/v1_8_7_72-mbari">MBARI patches</a>, improvements in thread and scheduling overhead by <a href="http://timetobleed.com/fixing-threads-in-ruby-18-a-2-10x-performance-boost/">Joe Damato and Aman Gupta</a>, a copy-on-write (COW) fork model, and a tunable GC all contribute to a measurable difference in the amount of used memory and overall performance. Earlier this year Twitter <a href="http://blog.evanweaver.com/articles/2009/09/24/ree/">switched their infrastructure to REE</a> and reported a 30% improvement in throughput!</p>
<p><a href="http://www.modrails.com/">Phusion Passenger</a>, developed by the same team, has also seen a good pickup in the community within the past year, but it remains to be seen if REE adoption will continue to grow in light of all the progress by alternative VM’s.</p>
<h4><strong>IronRuby: Ruby on .NET</strong></h4>
<p><img align="left" style="margin-right: 1em;" src="http://www.igvita.com/posts/09/ironruby-logo.png"/>IronRuby is a .NET implementation of the Ruby VM which leverages Microsoft’s Dynamic Language Runtime (DLR), and hence enables an entire host of interesting use cases for Ruby. For example, not only is there seamless integration with all the .NET libraries and infrastructure, but running on top of the DLR also means that Ruby can be run within Silverlight  (yep, right in your browser - <a href="http://visitmix.com/labs/gestalt/">check out Gestalt</a>).</p>
<p>The 1.0 release is looming on the horizon and the team has been making great progress. IronRuby now <a href="http://ironruby.info/">passes over 92% of all the RubySpecs</a> and the adaptive compiler, combined with the optimized DLR also means much lower VM startup times, as well as significant performance improvements over MRI. <em>(Update: <a href="http://ironruby.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=35312">download IronRuby 1.0 RC1</a>)</em></p>
<p>It remains to be seen if the .NET community will embrace IronRuby, but it <a href="http://www.sapphiresteel.com/Who-Needs-IronRuby">sounds like tool support</a> (Visual Studio, Intellisense, etc) might be the next big hurdle for the project.</p>
<h4><strong>Rubinius: Ruby written in (mostly) Ruby</strong></h4>
<p><img align="left" style="margin-right: 1em;" src="http://www.igvita.com/posts/09/rubinius-logo.png"/>Rubinius is an initiative to implement as much of Ruby as possible via Ruby code itself - turtles all the way down. In the works since 2006, the Rubinius team <a href="http://blog.fallingsnow.net/2008/11/18/a-sad-day/">lost a few developers</a> early this year, but the project is <a href="http://blog.fallingsnow.net/2009/05/28/rumors-of-our-demise-are-greatly-exaggerated/">alive and healthy</a>. The VM has been rewritten in C++, a JIT compiler has been added, and you can now run mongrel, thin, rack, amqp, and a number of other gems all unmodified. The <a href="http://rubini.us/roadmap.html">roadmap shows</a> 1.0RC in the works, and a focus on enabling real-world application deployments (packaging, distribution, etc). </p>
<p>It is too early to talk about performance, but the combination of the JIT and LLVM infrastructure means that Rubinius will have true concurrency (no GIL), and <a href="http://www.engineyard.com/blog/2009/improving-the-rubinius-bytecode-compiler/">plenty of opportunities</a> for introspection and optimization of your code - an inherent benefit of rewriting Ruby in Ruby itself. </p>
<p>Both FFI and RubySpecs, which are now used by virtually every other VM are a direct result of the project, so all things considered, Rubinius can already be called a resounding success. However, with 1.0 on the horizon, it remains to be seen how and if the community will react to the release. Getting a few production deployments under the belt, as well as building a larger user community, are likely to be the big challenges for the year ahead.</p>
<h4><strong>BlueRuby: Ruby on ABAP VM</strong></h4>
<p><img align="left" style="margin-right: 1em;" src="http://www.igvita.com/posts/09/sap-logo.png"/>An <a href="https://wiki.sdn.sap.com/wiki/display/Research/BlueRuby">exploratory research project</a> form SAP Labs, BlueRuby is an initiative to bring the Ruby runtime to the ABAP VM powering the AP NetWeaver and SAP ERP 6.0 products. It is already passing 75.5% of all the RubySpecs and is being pushed by SAP as a way to <a href="https://www.sdn.sap.com/irj/scn/index?rid=/library/uuid/10ba055e-6856-2c10-b88f-873f208fcdf9">adopt TDD practices</a> within their products. At this point, there is no known release schedule or public roadmap for BlueRuby, so don’t expect Rails apps on top of SAP NetWeaver anytime soon, but it does have the potential to bring Ruby and all the best practices enabled by it to thousands of enterprise developers.</p>
<h4><strong>Rubies everywhere!</strong></h4>
<p>Talking to the developers of all of the alternative VM's you can't help but to feel excited about the future of Ruby. There is a clear pattern emerging: using Ruby to go beyond Ruby, and as a bridge to other communities. Halfway through RubyConf, it is a clear theme here as well - several serialization talks, code generation, and an entire evening of presentations on all the Ruby VM's.</p>
<div class="feedflare">
<a href="http://feeds.igvita.com/~ff/igvita?a=_RLQ_jnJvT8:l3SU41YodM0:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/igvita?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=_RLQ_jnJvT8:l3SU41YodM0:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/igvita?i=_RLQ_jnJvT8:l3SU41YodM0:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=_RLQ_jnJvT8:l3SU41YodM0:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/igvita?i=_RLQ_jnJvT8:l3SU41YodM0:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=_RLQ_jnJvT8:l3SU41YodM0:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/igvita?i=_RLQ_jnJvT8:l3SU41YodM0:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=_RLQ_jnJvT8:l3SU41YodM0:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/igvita?i=_RLQ_jnJvT8:l3SU41YodM0:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/igvita/~4/_RLQ_jnJvT8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.igvita.com/2009/11/20/state-of-ruby-vms-ruby-renaissance/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.igvita.com/2009/11/20/state-of-ruby-vms-ruby-renaissance/</feedburner:origLink></item>
		<item>
		<title>Consuming XMPP PubSub in Ruby</title>
		<link>http://feeds.igvita.com/~r/igvita/~3/DYDYvKzwjxM/</link>
		<comments>http://www.igvita.com/2009/11/10/consuming-xmpp-pubsub-in-ruby/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 17:22:39 +0000</pubDate>
		<dc:creator>Ilya Grigorik</dc:creator>
		
		<category><![CDATA[Architecture]]></category>

		<category><![CDATA[pubsub]]></category>

		<category><![CDATA[realtime]]></category>

		<category><![CDATA[xmpp]]></category>

		<guid isPermaLink="false">http://www.igvita.com/?p=819</guid>
		<description><![CDATA[XMPP is a very versatile protocol with well over several hundred proposed and working extensions, which has also proven itself in production (ex: Google Talk). Presence, roster management, federated and server to server (S2S) messaging are all examples of features that you get for free, which make it a very appealing platform for messaging applications. [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" src="http://www.igvita.com/posts/09/xmpp-logo.png" style="margin-right: 1em;"/>XMPP is a very versatile protocol with well over several hundred <a href="http://xmpp.org/extensions/">proposed and working extensions</a>, which has also proven itself in production (ex: Google Talk). Presence, roster management, federated and server to server (S2S) messaging are all examples of features that you get for free, which make it a very appealing platform for messaging applications. Combine it with extensions such as <a href="http://xmpp.org/extensions/xep-0060.html">XEP-0060 (PubSub)</a>, and we have all the relevant buzzwords: pubsub, real-time, federated, and presence. </p>
<p>The PubSub specification within XMPP, as defined in XEP-0060, is definitely <a href="http://www.igvita.com/2009/10/08/advanced-messaging-routing-with-amqp/">not as flexible as that of AMQP</a>, but it is often times enough to cover the most popular use cases. However, technical merits aside, one of the key missing components, especially in Ruby, has been the historical lack of functioning libraries - xmpp4r claims to support it, but examples are lacking. Thankfully, after test driving the latest batch of gems, it looks like we're finally there. </p>
<h4><strong>Getting off the ground with XMPP</h4>
<p></strong></p>
<p>Without a good toolkit XMPP can be a gnarly protocol to get started with - <a href="http://www.pidgin.im/">Pidgin IM client</a> has some great tools for spying on the exchange, but monitoring pages of XML scroll by can only get you so far. Thankfully, Seth Fitzsimmons has built <a href="http://github.com/mojodna/switchboard">switchboard</a> ("curl for XMPP"), which offers a powerful command line tool to greatly simplify the process. Make sure to read the <a href="http://mojodna.net/2009/07/16/switchboard-curl-for-xmpp.html">full tutorial</a>, or jump right into it by testing it with the <a href="http://en.support.wordpress.com/jabber/">Wordpress XMPP stream</a>:</p>
<blockquote><p>
# list available options, subscribe to a blog, list subscriptions and then open the stream<br />
switchboard disco --target pubsub.im.wordpress.com info<br />
switchboard pubsub --server pubsub.im.wordpress.com --node /blog/icanhazcheesburger.com subscribe<br />
switchboard pubsub --server pubsub.im.wordpress.com subscriptions<br />
switchboard pubsub --server pubsub.im.wordpress.com listen
</p></blockquote>
<p>Based on <a href="http://home.gna.org/xmpp4r/">xmpp4r</a>, switchboard is also a toolkit for assembling your own XMPP clients, which means that it can be easily customized to power a PubSub consumer. From start to finish, and since examples are still hard to come by:</p>
<p><a href="javascript:showme('7694_1');"> <b>> switchboard-pubsub.rb</b></a>
<div style=" background:white;" id=7694_1>
<pre class="ruby"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'rubygems'</span>
<span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'switchboard'</span>
&nbsp;
<span style="color:#9966CC; font-weight:bold;">class</span> WordpressJack
  <span style="color:#9966CC; font-weight:bold;">def</span> <span style="color:#0000FF; font-weight:bold;">self</span>.<span style="color:#9900CC;">connect</span><span style="color:#006600; font-weight:bold;">&#40;</span>switchboard, settings<span style="color:#006600; font-weight:bold;">&#41;</span>
    switchboard.<span style="color:#9900CC;">plug</span>!<span style="color:#006600; font-weight:bold;">&#40;</span>PubSubJack<span style="color:#006600; font-weight:bold;">&#41;</span>
    switchboard.<span style="color:#9900CC;">hook</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:post</span><span style="color:#006600; font-weight:bold;">&#41;</span>
&nbsp;
    switchboard.<span style="color:#9900CC;">on_pubsub_event</span> <span style="color:#9966CC; font-weight:bold;">do</span> |event|
      event.<span style="color:#9900CC;">payload</span>.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> |payload|
        payload.<span style="color:#9900CC;">elements</span>.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> |item|
          on<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:post</span>, item<span style="color:#006600; font-weight:bold;">&#41;</span>
        <span style="color:#9966CC; font-weight:bold;">end</span>
      <span style="color:#9966CC; font-weight:bold;">end</span>
    <span style="color:#9966CC; font-weight:bold;">end</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
settings = <span style="color:#6666ff; font-weight:bold;">Switchboard::Settings</span>.<span style="color:#9900CC;">new</span>
settings<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'pubsub.server'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = <span style="color:#996600;">'pubsub.im.wordpress.com'</span>
settings<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'jid'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = <span style="color:#996600;">'user@im.wordpress.com'</span>
settings<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'password'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = <span style="color:#996600;">'password'</span>
&nbsp;
switchboard = <span style="color:#6666ff; font-weight:bold;">Switchboard::Client</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>settings<span style="color:#006600; font-weight:bold;">&#41;</span>
switchboard.<span style="color:#9900CC;">plug</span>!<span style="color:#006600; font-weight:bold;">&#40;</span>WordpressJack<span style="color:#006600; font-weight:bold;">&#41;</span>
&nbsp;
switchboard.<span style="color:#9900CC;">on_post</span> <span style="color:#9966CC; font-weight:bold;">do</span> |post|
  <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;A new post was received:&quot;</span>
  <span style="color:#CC0066; font-weight:bold;">puts</span> post.<span style="color:#9900CC;">methods</span>.<span style="color:#9900CC;">sort</span>.<span style="color:#9900CC;">uniq</span>
  <span style="color:#CC0066; font-weight:bold;">exit</span>
<span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
switchboard.<span style="color:#9900CC;">run</span>!
&nbsp;</pre>
</div>
<h4><strong>XMPP with EventMachine and Nokogiri</strong></h4>
<p>If you have an EventMachine stack, or looking for a high performance library, Jeff Smick's <a href="http://github.com/sprsquish/blather">blather</a> is definitely a gem to investigate. The combination of the asynchronous nature of EventMachine, a SAX parser within Nokogiri, and a great DSL make it very fast and a pleasure to work with:</p>
<p><a href="javascript:showme('7694_2');"> <b>> blather-pubsub.rb</b></a>
<div style=" background:white;" id=7694_2>
<pre class="ruby"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'rubygems'</span>
<span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'blather/client/client'</span>
<span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'blather/client/dsl/pubsub'</span>
<span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'blather'</span>
&nbsp;
EventMachine.<span style="color:#9900CC;">run</span> <span style="color:#006600; font-weight:bold;">&#123;</span>
  host = <span style="color:#996600;">'pubsub.im.wordpress.com'</span>
  node = <span style="color:#996600;">'blog/icanhazcheesburger.com'</span>
  user = <span style="color:#996600;">'user@im.wordpress.com'</span>
  pass = <span style="color:#996600;">'pass'</span>
&nbsp;
  jid = <span style="color:#6666ff; font-weight:bold;">Blather::JID</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>user<span style="color:#006600; font-weight:bold;">&#41;</span>
  client = <span style="color:#6666ff; font-weight:bold;">Blather::Client</span>.<span style="color:#9900CC;">setup</span><span style="color:#006600; font-weight:bold;">&#40;</span>jid, pass<span style="color:#006600; font-weight:bold;">&#41;</span>
  client.<span style="color:#9900CC;">register_handler</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:ready</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#123;</span>
    <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;Connected. Send messages to #{client.jid.inspect}.&quot;</span>
    pub = <span style="color:#6666ff; font-weight:bold;">Blather::DSL::PubSub</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>client, host<span style="color:#006600; font-weight:bold;">&#41;</span>
  <span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;
  client.<span style="color:#9900CC;">register_handler</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:pubsub_event</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#123;</span> |event|
    <span style="color:#CC0066; font-weight:bold;">puts</span> event
  <span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;
  client.<span style="color:#9900CC;">connect</span>
<span style="color:#006600; font-weight:bold;">&#125;</span></pre>
</div>
<h4><strong>PubSub & Event-Driven Architecture</strong></h4>
<p>Having personally struggled in the past with XMPP PubSub and Ruby, it's been great to revisit the use case and find a new set of fully functional libraries. The <a href="http://www.igvita.com/2009/04/06/henry-ford-event-driven-architecture/">event driven architecture</a> which is enabled by technologies such as XMPP, <a href="http://www.igvita.com/2009/10/08/advanced-messaging-routing-with-amqp/">AMQP</a>, <a href="http://www.igvita.com/2009/10/21/nginx-comet-low-latency-server-push/">Comet</a>, <a href="http://www.igvita.com/2009/06/29/http-pubsub-webhooks-pubsubhubbub/">Webhooks and PubsubHubbub</a> are increasingly becoming the staple of many web applications, and for a good reason. If you haven't already, grab <a href="http://github.com/mojodna/switchboard">switchboard</a> or <a href="http://github.com/sprsquish/blather">blather</a> and take XMPP for a test drive.</p>
<div class="feedflare">
<a href="http://feeds.igvita.com/~ff/igvita?a=DYDYvKzwjxM:9L9LPcvu6SY:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/igvita?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=DYDYvKzwjxM:9L9LPcvu6SY:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/igvita?i=DYDYvKzwjxM:9L9LPcvu6SY:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=DYDYvKzwjxM:9L9LPcvu6SY:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/igvita?i=DYDYvKzwjxM:9L9LPcvu6SY:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=DYDYvKzwjxM:9L9LPcvu6SY:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/igvita?i=DYDYvKzwjxM:9L9LPcvu6SY:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=DYDYvKzwjxM:9L9LPcvu6SY:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/igvita?i=DYDYvKzwjxM:9L9LPcvu6SY:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/igvita/~4/DYDYvKzwjxM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.igvita.com/2009/11/10/consuming-xmpp-pubsub-in-ruby/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.igvita.com/2009/11/10/consuming-xmpp-pubsub-in-ruby/</feedburner:origLink></item>
		<item>
		<title>Nginx &amp; Comet: Low Latency Server Push</title>
		<link>http://feeds.igvita.com/~r/igvita/~3/21hcnE4IGjM/</link>
		<comments>http://www.igvita.com/2009/10/21/nginx-comet-low-latency-server-push/#comments</comments>
		<pubDate>Wed, 21 Oct 2009 16:06:45 +0000</pubDate>
		<dc:creator>Ilya Grigorik</dc:creator>
		
		<category><![CDATA[Architecture]]></category>

		<category><![CDATA[comet]]></category>

		<category><![CDATA[nginx]]></category>

		<category><![CDATA[push]]></category>

		<guid isPermaLink="false">http://www.igvita.com/?p=793</guid>
		<description><![CDATA[Server push is the most efficient and low latency way to exchange data. If both the publisher and the receiver are publicly visible then a protocol such as PubSubHubbub or a simpler Webhook will do the job. However, if the receiver is hidden behind a firewall, a NAT, or is a web-browser which is designed [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" style="margin: 0pt 1em 0pt 0pt;" src="http://www.igvita.com/blog/posts/09/comet.png"/>Server push is the most efficient and low latency way to exchange data. If both the publisher and the receiver are publicly visible then a protocol such as <a href="http://www.igvita.com/2009/06/29/http-pubsub-webhooks-pubsubhubbub/">PubSubHubbub or a simpler Webhook</a> will do the job. However, if the receiver is hidden behind a firewall, a NAT, or is a web-browser which is designed to generated outbound requests, not handle incoming traffic, then the implementation gets harder. If you are adventurous, you could setup a <a href="http://www.igvita.com/2009/08/18/smart-clients-reversehttp-websockets/">ReverseHTTP server</a>. If you are patient, you could wait for the <a href="http://www.igvita.com/2009/08/18/smart-clients-reversehttp-websockets/">WebSocket's API</a> in HTML5. And if you need an immediate solution, you could compromise: instead of a fully asynchronous push model, you could use <a href="http://en.wikipedia.org/wiki/Comet_%28programming%29">Comet</a>, also known as Reverse Ajax, HTTP Server Push, or HTTP Streaming.</p>
<p>Coined by <a href="http://alex.dojotoolkit.org/2006/03/comet-low-latency-data-for-the-browser/">Alex Russell in early 2006</a>, the term Comet is an umbrella term for technologies which take advantage of persistent connections initiated by the client and kept open until data is available (<a href="http://en.wikipedia.org/wiki/Comet_%28programming%29#Ajax_with_long_polling">long polling</a>), or kept open indefinitely as the data is pushed to the client (<a href="http://en.wikipedia.org/wiki/Comet_%28programming%29#Streaming">streaming</a>) in chunks. The immediate advantage of both techniques is that the client and server can communicate with minimal latency. For this reason, Comet is widely deployed in chat applications (Facebook, Google, Meebo, etc), and is also commonly used as a firehose delivery mechanism. </p>
<h4><strong>Converting Nginx into a Long Polling Comet Server</strong></h4>
<p>A large entry barrier to Comet adoption is the implicit requirement for specialized, event driven web servers capable of efficiently handling large numbers of long polling connections. <a href="http://www.tornadoweb.org/">Friendfeed's Tornado</a> server is a good example of an app level server that meets the criteria. However, thanks to Leo Ponomarev's efforts, you can now also turn your Nginx server into a fully functional Comet server with the <a href="http://github.com/slact/nginx_http_push_module">nginx_http_push_module</a> plugin.  </p>
<p align="center"><img style="border: 1px solid rgb(204, 204, 204); " src="http://www.igvita.com/posts/09/nginx-push.png"/></p>
<p>Instead of using a custom framework, Leo's plugin exposes two endpoints on your Nginx server: one for the subscribers, and one for the publisher. The clients open long-polling connections to a channel on the Nginx server and start waiting for data. Meanwhile, the publisher simply POST's the data to Nginx and the plugin then does all the heavy lifting for you by distributing the data to the waiting clients. This means that the publisher never actually serves the data directly, it is simply an event generator! It is hard to make it any simpler then that.</p>
<p>Best of all, it only gets better from here. Both the client and the publisher can create arbitrary channels, and the plugin is also capable of message queuing, which means that the Nginx server will store intermediate messages if the client is offline. Queued messages can be expired based on time, size of the waiting stack, or through a memory limit. </p>
<h4><strong>Configuring Nginx & Ruby Demo</strong></h4>
<p>To get started you will have to build Nginx from source. Unpack the <a href="http://wiki.nginx.org/NginxInstall#Source_Releases">source tree</a>, grab the plugin repo from GitHub and then build the server with the push module (<strong>./configure --add-module=/path/to/plugin && make && make install</strong>). Next, consult the <a href="http://github.com/slact/nginx_http_push_module/blob/master/README">readme</a> and the <a href="http://github.com/slact/nginx_http_push_module/blob/master/protocol.txt">protocol</a> files to learn about all the available options. A simple multi client broadcast configuration looks like the following:</p>
<p><a href="javascript:showme('2868_1');"> <b>> nginx-push.conf</b></a>
<div style=" background:white;" id=2868_1>
<pre class="ruby"><span style="color:#008000; font-style:italic;"># internal publish endpoint (keep it private / protected)</span>
location /publish <span style="color:#006600; font-weight:bold;">&#123;</span>
  set <span style="color:#ff6633; font-weight:bold;">$push_channel_id</span> <span style="color:#ff6633; font-weight:bold;">$arg_id</span>;      <span style="color:#008000; font-style:italic;">#/?id=239aff3 or somesuch</span>
  push_publisher;
&nbsp;
  push_store_messages on;            <span style="color:#008000; font-style:italic;"># enable message queueing </span>
  push_message_timeout 2h;           <span style="color:#008000; font-style:italic;"># expire buffered messages after 2 hours</span>
  push_max_message_buffer_length <span style="color:#006666;">10</span>; <span style="color:#008000; font-style:italic;"># store 10 messages</span>
  push_min_message_recipients <span style="color:#006666;">0</span>;     <span style="color:#008000; font-style:italic;"># minimum recipients before purge</span>
<span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;
<span style="color:#008000; font-style:italic;"># public long-polling endpoint</span>
location /activity <span style="color:#006600; font-weight:bold;">&#123;</span>
  push_subscriber;
&nbsp;
  <span style="color:#008000; font-style:italic;"># how multiple listener requests to the same channel id are handled</span>
  <span style="color:#008000; font-style:italic;"># - last: only the most recent listener request is kept, 409 for others.</span>
  <span style="color:#008000; font-style:italic;"># - first: only the oldest listener request is kept, 409 for others.</span>
  <span style="color:#008000; font-style:italic;"># - broadcast: any number of listener requests may be long-polling.</span>
  push_subscriber_concurrency broadcast;
  set <span style="color:#ff6633; font-weight:bold;">$push_channel_id</span> <span style="color:#ff6633; font-weight:bold;">$arg_id</span>;
  default_type  text/plain;
<span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;</pre>
</div>
<p>Once you have the Nginx server up and running, we can setup a simple broadcast scenario with a single publisher and several subscribers to test-drive our new Comet server:</p>
<p><a href="javascript:showme('2868_2');"> <b>> comet-push-consume.rb</b></a>
<div style=" background:white;" id=2868_2>
<pre class="ruby"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'rubygems'</span>
<span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'em-http'</span>
&nbsp;
<span style="color:#9966CC; font-weight:bold;">def</span> subscribe<span style="color:#006600; font-weight:bold;">&#40;</span>opts<span style="color:#006600; font-weight:bold;">&#41;</span>
  listener = <span style="color:#6666ff; font-weight:bold;">EventMachine::HttpRequest</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'http://127.0.0.1/activity?id='</span>+ opts<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#ff3333; font-weight:bold;">:channel</span><span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">get</span> <span style="color:#ff3333; font-weight:bold;">:head</span> =&gt; opts<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#ff3333; font-weight:bold;">:head</span><span style="color:#006600; font-weight:bold;">&#93;</span>
  listener.<span style="color:#9900CC;">callback</span> <span style="color:#006600; font-weight:bold;">&#123;</span>
    <span style="color:#008000; font-style:italic;"># print recieved message, re-subscribe to channel with</span>
    <span style="color:#008000; font-style:italic;"># the last-modified header to avoid duplicate messages </span>
    <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;Listener recieved: &quot;</span> + listener.<span style="color:#9900CC;">response</span> + <span style="color:#996600;">&quot;<span style="color:#000099;">\\n</span>&quot;</span>
&nbsp;
    modified = listener.<span style="color:#9900CC;">response_header</span><span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'LAST_MODIFIED'</span><span style="color:#006600; font-weight:bold;">&#93;</span>
    subscribe<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#123;</span>:channel =&gt; opts<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#ff3333; font-weight:bold;">:channel</span><span style="color:#006600; font-weight:bold;">&#93;</span>, <span style="color:#ff3333; font-weight:bold;">:head</span> =&gt; <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#996600;">'If-Modified-Since'</span> =&gt; modified<span style="color:#006600; font-weight:bold;">&#125;</span><span style="color:#006600; font-weight:bold;">&#125;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
  <span style="color:#006600; font-weight:bold;">&#125;</span>
<span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
EventMachine.<span style="color:#9900CC;">run</span> <span style="color:#006600; font-weight:bold;">&#123;</span>
  channel = <span style="color:#996600;">&quot;pub&quot;</span>
&nbsp;
  <span style="color:#008000; font-style:italic;"># Publish new message every 5 seconds</span>
  EM.<span style="color:#9900CC;">add_periodic_timer</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006666;">5</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#9966CC; font-weight:bold;">do</span>
    time = <span style="color:#CC00FF; font-weight:bold;">Time</span>.<span style="color:#9900CC;">now</span>
    publisher = <span style="color:#6666ff; font-weight:bold;">EventMachine::HttpRequest</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'http://127.0.0.1/publish?id='</span>+channel<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">post</span> <span style="color:#ff3333; font-weight:bold;">:body</span> =&gt; <span style="color:#996600;">&quot;Hello @ #{time}&quot;</span>
    publisher.<span style="color:#9900CC;">callback</span> <span style="color:#006600; font-weight:bold;">&#123;</span>
      <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;Published message @ #{time}&quot;</span>
      <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;Response code: &quot;</span> + publisher.<span style="color:#9900CC;">response_header</span>.<span style="color:#9900CC;">status</span>.<span style="color:#9900CC;">to_s</span>
      <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;Headers: &quot;</span> + publisher.<span style="color:#9900CC;">response_header</span>.<span style="color:#9900CC;">inspect</span>
      <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;Body: <span style="color:#000099;">\\n</span>&quot;</span> + publisher.<span style="color:#9900CC;">response</span>
      <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;<span style="color:#000099;">\\n</span>&quot;</span>
    <span style="color:#006600; font-weight:bold;">&#125;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
  <span style="color:#008000; font-style:italic;"># open two listeners (aka broadcast/pubsub distribution)</span>
  subscribe<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:channel</span> =&gt; channel<span style="color:#006600; font-weight:bold;">&#41;</span>
  subscribe<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:channel</span> =&gt; channel<span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;</pre>
</div>
<p><div class='download-link'>
							<a href='http://www.igvita.com/download.php?file=http://www.igvita.com/downloads/nginx-push.zip'><img alt='Download' class='leftalign' src='http://www.igvita.com/wp-content/plugins/dBeautifier/icons/downloads.png' /></a>
							<h4>
								<a href='http://www.igvita.com/download.php?file=http://www.igvita.com/downloads/nginx-push.zip'>nginx-push.zip (Full Nginx Config + Ruby client)</a>
							</h4><p>Downloads: 415 File Size: 2.7 KB </p>
						</div> </p>
<p>In the script above, every five seconds a publisher emits a new event to our Nginx server, which in turn, pushes the data to two subscribers which have long-polling connections open and are waiting for data. Once the message is sent to each subscriber, Nginx closes their connections and the clients then immediately re-establish them to wait for the next available message. End result, a real-time message push between the publisher and the clients via Nginx!</p>
<h4><strong>Long Polling, Streaming, and Comet in Production</strong></h4>
<p>Leo's module is still very young and is under active development, but it is definitely one to keep an eye on. The upcoming release is focused on bug fixes, but looking ahead there are also plans to add a streaming protocol: instead of closing the connection every time (aka, long polling), Nginx would keep it open and stream the incoming events as chunks of data to the clients in real-time. Having such an option would make it ridiculously easy to deploy your own firehose API's (ex: <a href="http://apiwiki.twitter.com/Streaming-API-Documentation">Twitter streaming</a>).</p>
<p>Last but not least, don't forget about the growing number of <a href="http://wiki.nginx.org/NginxModules">other available</a> <a href="http://wiki.nginx.org/Nginx3rdPartyModules">modules for Nginx</a>, or if you are so inclined, get a head start on building your own by reading <a href="http://www.evanmiller.org/nginx-modules-guide.html">Evan Miller's  great guide</a> on the subject. </p>
<div class="feedflare">
<a href="http://feeds.igvita.com/~ff/igvita?a=21hcnE4IGjM:rUjIiurf9O8:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/igvita?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=21hcnE4IGjM:rUjIiurf9O8:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/igvita?i=21hcnE4IGjM:rUjIiurf9O8:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=21hcnE4IGjM:rUjIiurf9O8:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/igvita?i=21hcnE4IGjM:rUjIiurf9O8:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=21hcnE4IGjM:rUjIiurf9O8:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/igvita?i=21hcnE4IGjM:rUjIiurf9O8:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.igvita.com/~ff/igvita?a=21hcnE4IGjM:rUjIiurf9O8:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/igvita?i=21hcnE4IGjM:rUjIiurf9O8:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/igvita/~4/21hcnE4IGjM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.igvita.com/2009/10/21/nginx-comet-low-latency-server-push/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.igvita.com/2009/10/21/nginx-comet-low-latency-server-push/</feedburner:origLink></item>
	</channel>
</rss>
