Sunday, October 31, 2004

KDE Dot News: FootNotes Goes Drupal

As some of you might know, KDE Dot News initially started off as a clone of the now-defunct GNOME Gnotices site.

In principle, Gnotices was a great idea that worked spectacularly well for the GNOME project -- at least at the beginning. Whereas KDE news had been a painful process with typically only one person taking the initiative to manually update a static webpage, Gnotices demonstrated that project news left in the hands of the "unwashed masses" could handily beat a one-man show, no matter how good the intentions of the latter. The key was evidently adequate infrastructure.

In the end however, Gnotices suffered from severe editorial neglect, and when the GNOME community turned hostile and grew rebellious, Gnotices took a turn for the worse. The site was harming the GNOME project and the webmasters decided to put their foot down. Severe restrictions were implemented on the site but it never recovered. The restrictions just made matters worse and made GNOME look even more neglected. Gnotices dropped off the net altogether and was eventually replaced by the independently managed FootNotes site, with a less than stellar transition.

In fact, it seems that the GNOME website has gone back to the dark ages. It seems hardly maintained (no news of GNOME 2.8.1?) compared to the old days where, although it looked amateurish, it was vibrant and alive because the community made it what it was. I reloaded it every day. Mind you, I'm not saying the designed-by-committee KDE frontpage doesn't suck in its own right (oh! I can theme it. sometimes. oh CSS! such joy!), but at least the content is there, always fresh, and people like Steve still care for it.

KDE Dot News was based on the same technology as Gnotices and consequently followed some of the same principles, but, in contrast, the dot has always had a stronger editorial and admin presence. The same crowd that attacked Gnotices did sometimes come over to KDE Dot News but attacking us tended to be more of an uphill battle, with diminishing returns, such that matters have tended to stay under control. Furthermore, our growth has been nothing like that of Slashdot, so personal attention remains possible without having to resort to ugly discriminatory hacks such as a mob moderation system.

I was happy to choose Squishdot/Zope at the time because the interface was so simple, so easy, yet so effective. Although it did and still has limitations and bugs, it really was "good enough" and for the near future still is. I believe the dot design is in principle very clean and very effective, to the point that some of the limitations grow on you and even start masquerading as features (e.g. lack of formal user accounts). I'd like it to stay this way as much as possible and it will.

Yes, to give credit where credit is due, we all blatantly stole ideas and designs from Slashdot.

Unfortunately, the dot has problems -- problems that grow worse day by day. As simple as Squishdot is, it perhaps owes a lot of that simplicity to Zope. Not surprisingly however, Zope is no simple beast. Zope is powerful, fantastic for rapidly developing a site and exploring possibilities, but maintaining it or trying to understand it for the purposes of optimisation isn't a trivial matter if you don't have proper resources.

To pick an example, the database grows larger every day and gets slower and slower until it is "packed" -- and then the process just starts over. This is partly due to the implementation of Squishdot but trying to solve the problem always involves having to deal with the complexity of Zope or, worse, attempting to bypass it by using external resources. I may be naive, even proudly so, but something that aspires to the simplicity of the dot shouldn't have to depend on something of the complexity of Zope.

Havoc always used to say you could trivially solve the Zope problem by throwing more RAM/CPU/disk at it, but in truth we depend on the kindness of friends (such as LeVillage.org) and we owe it to our friends not to abuse that kindness.

(Incidentally, I think Havoc's rant on Python is right on the money. I hate it that stuff like the obvious "help(File)" doesn't work in Python and searching for its documentation is ridiculously hard to a newbie because you have to know "File" is indexed under "built-in types". At least, that was the case for the old version of Python I'm using for the dot.)

So I'm looking for solutions. I have big, vapourous, ideas that will not be realised within the next four months, not until I'm out of school (and, hopefully, safely at Google) but that I'm determined to see through.

I was therefore quite fascinated to learn that Stro had problems of his own with FootNotes. The main problem seemed to be the beast that is MySQL, which was apparently faltering in a similar way to the Zope DB. Essentially, PHP-Nuke was abusing MySQL almost as badly as Squishdot is abusing Zope. The solution for Stro was to move to Drupal.

I have to admit, I never really liked FootNotes. The site was complex, ugly, buggy, (and slow, though easily faster than the dot) and the content was often, in my opinion, just as poor as the old Gnotices site, often consisting of minor app announcements and the such. Not very elegant at all, and the GNOME project doesn't really give it much visibility or credibility.

The transition does pique my interest though. What was particularly nice was that Stro converted all the old PHP Nuke articles and comments over to the new structure, a touch and a sign that he cares and is competent. This time, GNOME content and history didn't drop off the face of the planet as it did with the Gnotices to FootNotes transition.

Drupal looks vastly simpler than PHP Nuke on the surface and FootNotes seems to benefit automatically. The URL structure looks fairly clean and it seems straightforward to maintain link compatibility with PHP Nuke, although Stro hasn't done this yet.

On the other hand, Drupal still has its bugs and weaknesses. KDE Developer Journals, based on Drupal, never remembers logins and the new FootNotes is no different. This is incredibly aggravating but apparently nobody cares and the bug has survived in Drupal/PHP for many months. Further playing around, the thread display controls seem badly broken and ineffective. And I hate that stupid PHPSESSID variable that appears in the URL from time to time.

There are some features on the dot that don't seem to be present on Drupal, so it might require modification. And if I'm going to move the dot to a new CMS, near-total backward compability will be a primary requirement and the dot's existing hierarchical structure must be maintained. This might be possible with Drupal but might also be against Drupal's "one SQL query" philosophy. Even if it could be convinced to do otherwise, we'll probably end up with the same MySQL performance problems as PHP Nuke. Speculation.

I have done some digging on MySQL, and drawing from my own admin experience of it (both KDE Wiki and KDE Forums use it), I'm not at all keen on transitioning from Zope to MySQL. And from what I gather, FootNotes/Drupal/MySQL is still quite resource intensive.

It'll definitely be interesting to watch the evolution of FootNotes and learn from Stro's experience.

So, yeah. So there.

Sunday, October 17, 2004

KDE Dot News: Ext3's Miserable Failure

As I mentioned previously, I have been musing a lot about KDE Dot News lately.

After some interesting discussion about the merits of ext3 vs ReiserFS vs MySQL vs Zope, I thought I'd put it to the test. I have zero, absolutely zero, free time to waste on such things, but in the end the lure was just too much.

Zope

I'd been somewhat negligent. I hadn't packed the Zope DB for 4 months. The nasty, horrible, consequence of that was that Data.fs was a whooping 2 gigabytes in size. So I took an hour or so to pack the DB, resulting in a more reasonable size of 260M.

Then I spent a couple of hours setting up a parallel Zope server for experimentation, as well as implementing the necessary code for dumping the Zope DB to the filesystem in the aforementioned hierarchical structure.

ext3

Since I like testing code as I'm implementing, I only dumped the directory structure in the first step. Evidently this resulted in a hierarchical directory structure -- nothing but 55000 directories. The results were horrific.

Ext3 takes up a whooping 220M holding nothing but directories. Nothing but directories. Yet it takes up almost as much space as the entire Zope DB, which actually has more stuff in it than just the dot.kde.org data.

Ostensibly unfazed, I implemented the remaining code necessary to dump the full monty, including article headers, bodies, meta-information and any file attachments.

The resulting dump takes up 930M. Even worse, global file operations (e.g. find or du) from the root are extremely slow. Slow as hell. I think ext3 is going to be absolutely hopeless here, although the running time of file operations is not that bad when kept local or lower down in the structure.

tar.gz

Still curious though, I made a tar.gz of the directory structure. Perhaps not surprisingly, that takes up only 700K compared to the ridiculous 220M for ext3. I tried the same thing with the full dump and that takes up 60M compared to ext3's gig. Help me, Jebus.

ReiserFS

Simon Edwards had previously suggested that ReiserFS might be up to the task. KDE Dot News does not have reiserfs. We're running Ark Linux and Bero, one of the dot admins, strongly favours ext3 (to put it politely).

KTown to the rescue! That worthy machine has tons of space and tons of resources, so it made a suitable victim. Don't nobody tell the sysadmins.

I was astounded to find that the entire directory structure took less than 200K on reiserfs. That's stupidly less than even what the tar.gz takes up. Hans, you da man, man.

The bad news is that the full monty still takes up a huge 700M on reiserfs. However, it is fast. Very fast. Very very fast compared to ext3.

tar

As suggested by a reader, I also took a look at the results for the uncompressed tar files. Directories take 30M and the full dump takes 270M. The latter result is very interesting, because one might deduce from it that there is a lot of room for improvement in ReiserFS in terms of space usage.

Summary of Results

KDE Dot News Zope tar tar.gz ext3ReiserFS
Directories - 30M700K 200M 200K
Full Monty 260M270M60M 930M 700M


Speed-wise, I'm not giving any numbers since I tested reiserfs and ext3 on two different systems; although, I did try to make sure both were mounted noatime and so on.

Nonetheless, I can say that it seemed like ext3 took as many minutes as reiserfs took seconds to perform global operations. There was essentially an order of magnitude of difference across the systems.

Conclusion?

So I'm not sure what to do. 700M in reiserfs represents 4 entire years of articles and comments on the dot. The information is uncompressed and in plain text (excepting binary file attachments). Given that, perhaps the space usage is reasonable. I had also envisioned using some form of revision control for articles, so that would just take up more space.

Space-wise, Zope is winning here. At least still for a few weeks, until it grows horribly out of control again.

On the other hand, ReiserFS could easily have the speed advantage.

Friday, October 15, 2004

KDE Dot News: MySQL versus ext3

I've been musing a lot about KDE Dot News lately.

The site is currently host to over 55000 articles and comments all stored in Zope's internal DB format. Yet, the structure is completely hierarchical. Every article/comment has a single parent but may have one or more children.

This structure could easily be represented on a filesystem such as ext3 using directories and plain text files holding the information.

My question, for anyone with the experience and know-how, is: How efficient would it be to store all the dot's articles and comments on the filesystem? How would this compare to storing the information in, say, a MySQL database?

Remember that when you access an article/comment, you usually want to access all the descendants as well.

Monday, October 11, 2004

I Killed The Dot

While investigating a resource leak in AMP (Apache/MySQL/PHP) on the dot machine, I had the brilliant idea of strace'ing an out-of-control Apache process.

Result: The Dot Is Dead.

We're waiting for a reboot, sorry guys.

Saturday, October 09, 2004

Google's Billboard Challenge

I have posted my take on the now so-very-old Google billboard challenge.

It was more of an excuse to brush up on certain skills, but this last week I've just been kicking myself and agonising over the fact that I decided to use the classical factorial formula to calculate e. That formula is evidently not suitable for what I was trying to do and seems inherently bounded for a stream approach.

There are rumours of the existence of a digit-extraction algorithm for e but I have yet to see it or if I have seen it, I have yet to gr0k it. Is it too much to ask for a simple damn formula or explanation for calculating the value of any digit of e without having to read through pages and pages of theory?

Evidently my math skills need much work.

Dijkstra's Computing Science

I'm surprised to see Richard and Chris speaking of "Computer Science" in the UK. I suppose nobody calls it Computing Science anymore.

Although I'm doing CS in Canada, when I was an undergrad, they refrained from teaching us specific stuff like C, C++, Unix or DOS -- except for the bare minimum necessary to bootstrap ourselves. They taught us programming and how to think. We learned syntax and the joys of semi-colons on our own during the weekend. They made a point of telling us that.

A university course on HTML would have been especially inconceivable!

Things are certainly different nowadays and, for better or worse, the university has moved forward with the market demand.