Thursday, February 24, 2005

kde.rb: Why You Should Use Ruby

I have been having a lot of fun with Ruby lately.

I now have a Ruby-based renderer for KDE Dot News. The great thing is that I've dumped all the articles and comments from Zope onto the filesystem and it works. It works great... so far.

Ruby is the latest technological craze from Japan by a guy named matz. It has weird stuff like "objects" in it. You could take the number 1, and because it is an object, you can make it quack like a duck.

It gets much worse. Did you know that I can connect from anywhere on the Internets, or at the very least within throwing distance of a Unix Domain Socket, to my running dot.rb instance, and inspect and change anything on the fly?

Yes, it's possible and it's working and it doesn't have any significant overhead. I could give KDE Dot News a sickly blue corporate background, post goatse links, clear the cache tables, or call the garbage collector, all on the fly from any strategically located VT100.

All just for a few lines of code. The distributed stuff works because of an anomaly called DRb -- I don't really know why the rest works.

If you're a KDE developer, you should really give Korundum a serious look. This stuff could blow .Net and Java away.

So, anyway. About dot.rb.

Management-wise, dot.rb's 100% filesystem-based backend is nothing less than a godsend compared to having to deal with a gigantic opaque database.

It's also pure bliss to be able to write dynamic HTML code using Ruby's heredoc and powerful string interpolation features. Ruby has all sorts of template engines but I didn't have to bother with any of that. Never again.

I am so glad to be getting away from the headache that is DTML. Dot's HTML code finally looks tractable, since there is much more sharing and consequently much less code.

dot.rb is fast on ext3, practically instantaneous on my localhost, and even faster on ReiserFS. I've run it on the same machine as the Dot, in parallel to the "production" site, and it was still fast while current Dot (and wiki) crawled.

dot.rb uses 10 times less memory and half the diskspace than present Dot at its worse (meaning several weeks without packing the DB and a few hours of uninterrupted memory leaking).

Of course, dot.rb hasn't been on any kind of load like Zope is. I've tested dot.rb with the full KDE Dot News db and at one point I had 5 simultaneous recursive wgets pulling content and it was still fast... Incidentally, this is using Ruby's built-in webserver; I implemented the logic in a few lines.

But that's still not a realistic load and, of course, there are big gaps in the functionality which could easily close the performance divide come judgement day. Hopefully by the time I'm finished we'll have a 10GHz machine waiting and that won't be a concern.

The reason it's fast of course is that dot.rb is particularly optimised for typical usage patterns of the Dot. Right up front I cache the most recent 100 articles and accompanying comments in a nice little forest of objects. I've also got a dynamic cache table that starts out empty and keeps the most frequently accessed article trees around.

The HTML pages are completely dynamically generated, including Flat Forty and the All Articles list -- both of the latter tend to kill present Dot. I do however use a two-level string interpolation in Ruby and cache strings at the first level.

I lightly process all the site articles up front, compute interesting stuff like previous and next links (present Dot only computes those for the 10 most recent articles, dot.rb does it for all articles) and keep a table of the skeletons around. If you think that would kill startup time, it doesn't really. dot.rb still loads in a fraction of the time that it takes Zope to boot up. If you think it kills memory, nope, doing OK.

To make a long story short, I'm basically making an educated guess about the typical dot reading patterns and optimising for that. This kind of optimisation isn't really feasible in Squishdot since it lies on top of the huge layers of abstraction that is Zope.

Of course, I haven't solved the problem of searching. Stuff like searching on Authors, Titles and Categories should be quite easy and will already be quite useful since this is a function Google cannot readily do for us. What still bothers me is searching the full article and comment bodies for content. I have some ideas, and no, my hierarchical filesystem isn't likely to pan out for this particular case... I will probably need to build an index of some kind or else I could tap Google search for hints and zone in on the search.

Not that searching is working very well for Dot present anyway. Basic searches like the aforementioned-ones work somewhat erratically, others kill the server. Having that Search box on the site-wide footer of the Dot right now is all but meaningless when the thing doesn't work. I guess it's comforting to have it there and it does help maintain the illusion that we have search.

Phase 1 of dot.rb, which is rendering and viewing, is basically done minus search. Phase 2 will be to implement actual posting, which I anticipate to be fairly easy given all that's already been done. Phase 3 will be to implement some sort of management interface for the editors and will probably be slightly tricky... some of those premature optimisations might just come back and bite me.

Sadly, all of this is going to have to wait. Ruby is way too addictive and I need to spend a month or three away in detox, for my own good. Also, I need to get away from some of those crazy people. Hopefully these issues will be addressed in Ruby 2.0.

Did I mention Korundum?

15 Comments:

Blogger Joao Pedrosa said...

Well written. Congrats Navindra.

8:08 AM  
Anonymous Stephan said...

I admit, Ruby is awesome. Web development can be - I experienced this the first time with ruby.

But the real beaty only shows with Rails. Have you tried it? You should.

http://www.rubyonrails.org/

1:25 PM  
Anonymous Stephan said...

Oops.. I meant "web development can be actually _fun_"..

1:26 PM  
Anonymous Anonymous said...

Of course, I don't know anything about how the dot works internally, but in case you happen to be using MySQL, it has a full-text search function that could be useful.

2:15 PM  
Blogger Navindra Umanee said...

No Rails. I'm just going to end up fighting it to make it work the way I need it to. Just not interested that.

Also, I've already read reports about people having upgrade headaches from Rails version to version and that's exactly the kind of situation I want to avoid. I already have that with Squishdot/Zope/Python and I'm not really interested in tracking another framework.

As for MySQL, no way. Not interested in dealing with DB corruption or having to figure out why the performance is so bad. There are search implementations for Ruby anyway.

Thanks for the comments. :)

2:03 AM  
Anonymous somekool said...

Rails is definitely worth looking at....

its great.

for upgrade, you gotta be careful for sure, but you dont really have to upgrade never ever. the thing is once your app it production, you lock your app to use a specific version of rails, and then, even if you upgrade rails on your server for future app, your apps on prod would not break,

I,m also talk already with David to make it care a little bit more about upgrade.... its a lot better now and rails gonna it 1.0 soon.


anyway, one of the biggest magic of rails, is activerecord, and if you dont use relational database, it would not be that usefull. the other one is the MVC but... anyway... whatever you are confortable with is just as good I am sure.

but dont think rails is a problem, its the solution, just like ruby ;) hehe.

as for mySQL, I never had any datacorruption in the last 10 years. and dont expect it to happen ever.

and these days, I am testing SQLite, working my Database model with kexi and my web frontend in rails....

have fun

3:58 AM  
Blogger Navindra Umanee said...

Yes, Rails provide a lot of capabilities I don't need (relational DB, MVC) and I'm not interested in. Also, I need the new site to be backwards compatible with the old site. Rails has its own idea of what a URL should look like. So it's simply not going to work out of the box here. It's great for other stuff.

I've had several MySQL corruptions over the years. We use this thing for heavy stuff here. Wiki uses it and is quite slow and process intensive without caching.

Have fun back at ya!

10:48 AM  
Blogger Alexander said...

Also, I need the new site to be backwards compatible with the old site. Rails has its own idea of what a URL should look like.Actually, starting with 0.10, Rails no longer has any idea of what the URL should look like. It no longer relies on Apache's mod_rewrite, and instead provides its own miniframework called Routes. For example:

map.connect ':id',
:controller => thread, :action => "view"
map.connect ':id/:action',
:controller => thread

This will map /1234 to the "thread" controller and the "view" action, and /1234/reply will be mapped to the "reply" action in the same controller. Routes even supports regular expressions.

While there's something to be said about learning the ropes by doing everything yourself, there's a lot of overlap between your project and what Rails aims to solve.

A nice feature of Rails is that it doesn't force every feature down your throat. You can easily ditch the database mapper (ActiveRecord) and use the file system to store your data. As for MVC, it's a design pattern, not a specific implementation; even in your homegrown solution you will want to apply a few strokes of MVC.

11:40 AM  
Blogger Navindra Umanee said...

Thanks for the info. It's interesting and informative. Rails is changing every day and getting better and better. I just don't want to track it though. It's a work in progress.

As someone said, I could pick a version and stick to that, but this is exactly the current situation with the Dot and it really isn't pretty after a few years. :-)

Ruby upgrades I can handle.

Also, why would I want Rails/Routes to connect this and that action? I don't need a framework for that. Nice MVC and all, but I simply don't need that level of abstraction when a few lines of code already takes care of it.

I know MVC (Model View Controller) is a design pattern. So when you tell me Rails provides MVC, that's just a buzzword to me. I don't need Rails for MVC. Call me unfair...

That being said, I was already considering Rails for the non-visible non-public part of the project. I will take a closer look and see if it can be made to fit the needs of the rest of the project but I have my doubts it's worth the trouble.

Speed and performance are also a major concern. I'm not sure where Rails stands on that or what kind of new performance issues it introduces.

1:27 PM  
Anonymous somekool said...

to complete what Alexander said about Rails Routing....

when you are saying Alexander "Rails no longer has any idea of what the URL should look like." its actually not entirely true, "no longer" here its not quite right. Rails never ruled how your URL should look like. it was manage through .htaccess and mod_rewrite, which is not compatible with every web server and not as nice as ruby programming, thats why Routing has been implemented, not to fill a blank, but to improove a current existing functionnality.

just my 2 cents

...

Navindra, to answer your question about performance,... well, you simply want to use FastCGI as your server settings of choice. additionally Rails offer really advance and flexible caching. you gotta look into that. even if you dont want a use rails, you can use part of it, require a sublib or just use it as an example into your system. You'll love Rails caching and its perfectly appropriate for a site like TheDot

....

now, I have two question for you.
1- you dont use Relation DB, can I ask you what you are using ? you mention something like on the FileSystem... you made me curious...

2- can we see the current state of TheDot(ruby) ?

thanks !

8:52 PM  
Anonymous Anonymous said...

cannot blogger offer a damn RSS feed for this comment thread ?

8:54 PM  
Blogger Navindra Umanee said...

Thanks for the additional comments. I get comments mailed to me, I don't know if others can subscribe here or if there is any RSS feed available unfortunately.

What I meant was that I don't understand the performance implications of Rails itself. That is, what performance hit I would experience in addition, if I use Rails instead of directly coding and optimising my stuff in Ruby.

I am already using quite a lot of caching techniques including a dynamic cache table and most of my templates are partially pre-generated, as I might have mentioned. But I'll take a look at the libs in Rails for sure.

My stuff in Ruby is already working great although I haven't done any fancy benchmarks. I don't use CGI really, just Webrick.

Webrick provides the backend and all my articles are stored hierarchically on the filesystem. I had thought of using YAML for the article format, but that was more complex than necessary and I used a text format closer to the email header/body format instead.

It really is that simple and it works great so far because the dot articles are inherently organised and related to each other hierarchically. I wrote a bit about this in some of my previous blog entries, so you might find some further details there.

I can't tell you how much simpler it makes admin tasks to have all the data accessible from the filesystem with Unix tools and/or Ruby scripting.

I did already make a demo of dot.rb available internally. I can't currently risk a public demo for security reasons... I need to do a proper audit and look over the code; the server I was running it on was an important production machine -- it wasn't the dot server which is behind a firewall and blocks most of the ports.

Right now (and I've made some progress since I last posted this blog entry), everything to do with viewing and browsing is working except search. This includes Thread Threshold configuration (nesting/indexing) and so on.

Unfortunately I don't currently have time or plans to set up a demo or do more work, but it's all coming!

9:22 PM  
Anonymous somekool said...

awesome to ear.....

I'm really excited...

for the platform I like webrick, I also use CGI for development, depending on the setup where apache is available or not, but webrick is faster.

for production, I think FastCGI is the only real choice. can't remember of the problem with mod_ruby but FastCGI is perfect.

for the perfomance hit with Rails.

> What I meant was that I don't understand the
> performance implications of Rails itself. That is,
> what performance hit I would experience in
> addition, if I use Rails instead of directly coding
> and optimising my stuff in Ruby.

look at it this way. would you ever want to write a KDE application without the KDE libs ? what kind of perfomance hit can you get by the extra stuff you dont really need ? well, you'll be agree that the work they put in the common libs are shared for the best of all user. the optimization you would place into your system if they are not already in rails, well, rails user would like to ear about them from you.

7:44 PM  
Blogger Navindra Umanee said...

somekool, are you saying that webrick is not suitable for production use? That would be a terrible inconvenience for me. Do you have any more details?

As for the rest, well I'm designing a website, it matters little to the user which technology I choose to use -- I'm free to choose my backend technology so that it actually suits me and my needs. It could be Lisp and they wouldn't even know. :-)

The dot's look or interface is mainly dictated by what URLs it exposes and what HTML it outputs. Also, how fast it works. :-)

Similarly, implementing my work as a generic framework so that all Rails users can benefit is much more work than I can afford right now. Maybe in the future.

Note that by not using Rails I also get to avoid fun like this; the alternative is to be stuck to an obsolete version and/or have to figure out how to fix Rails myself. I'm already familiar with the nightmares involved with that sort of thing. :-)

8:01 PM  
Anonymous somekool said...

http://wiki.rubyonrails.com/rails/show/ProductionEnvironments
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/133166

I think Lighttpd or Apache along with FastCGI is the way to go.

1:43 PM  

Post a Comment

<< Home