dark [filesystem] days at CBC.ca

It’s been a rough 48 hours for those of us at CBC.ca Operations. As you can see from the posted notice, a failure of the primary storage device hosting most of the site’s content has knocked out most of the website. Tod Maffin has posted a reasonably complete explanation on Inside The CBC, and I’m thankful for that. At the time of writing, the main volume containing the site content is still offline and fscking with no known ETA.

Originally I wasn’t going to say anything more about the outage, because Tod’s given an adequate update in his entry, and any speculation about the root cause of the outage (whether technical or managerial), and how soon it might be before we can restore service is just that — speculation. But some of the comments that have been posted on the above entry are just astounding and prompted me to write. While I am happy that many technical geeks are amongst our most enthusiastic audience members, I find the glib attitude of many of them with respect to operating the site to be very disturbing and upsetting. Continue reading

searching for a 64-bit future

This month in ACM Queue there’s an interesting and lengthy article entitled The Long Road to 64 Bits, which addresses why, fifteen years later after the 64-bit MIPS R4000 was announced, most systems are still not fully 64-bit clean. I use the word “clean” to mean that most systems do not run entirely in 64-bit mode; many systems are running 32-bit operating systems on 64-bit processors, or even when a 64-bit operating system is in use, running many 32-bit programs in compatibility mode.

The article is a fascinating account of how technological decisions that were made all the way back in the 1970’s, both with respect to hardware and compilers, still impose limitations today. Although many of the hardware compatibility challenges have now gone away — for example, system designers now know enough to trap address bits that they are not using for addresses rather than letting "clever" programmers get away with using them for data — the assumptions that were made by programmers back in the days of 16 and 32-bit machines with respect to the size of C data types continues to hinder the porting of programs from 32 to 64-bit. One can’t just make and hope one’s pointers all work. As Mashey puts it, some programmers got sloppy and assumed things like

sizeof(int) == sizeof(long) == sizeof(ptr) == 32

All this may sound really abstract to readers who don’t have a hardware design background (admittedly, mine is minimal, but I understand enough of the general concepts) so let’s talk about how this impacts us end-users. We run into problems like “there is no 64-bit web browser that can execute Java and Flash” because the Java and Flash plugins haven’t been ported to 64-bit clean versions. In some ways, this is an example of shocking neglect on the part of software vendors like Sun and Macromedia (pardon me, Adobe). Bug Number 4802695, entitled “Support Java Plug-in on 64-bit AMD Opteron”, has been open with Sun since January 14, 2003, and after three years there is still no resolution in sight. This should be embarrassing for Sun, which is a vendor of 64-bit Opteron and UltraSparc IIIi workstations.

Continue reading

supervised disconnect on PSTN lines

It’s funny how sometimes you have a technical problem that you think is only attributable to your own (possibly crappy telephony) setup, but later on you discover that the problem is quite common. For example, today on the TAUG mailing list, someone complained about their Zap (analog) channel never being hung up by Asterisk particularly when the user at the other end hangs up right away. I have also had this problem on my home Asterisk setup, where Asterisk will busy-out the Zap channel for hours, preventing people from making incoming calls.

At first I attributed it to my crappy clone FXO card, but now I’m wondering if it’s a problem with Asterisk. Others seem to think so and feel that Asterisk doesn’t properly handle the supervised disconnect in situations when the channel hasn’t been Answer()ed. I haven’t done enough tests to nail down when in the dialplan it happens exactly.

Later in the day, someone claimed that one way to fix the root cause is to teach Asterisk how to handle the standard off-hook warning tone, but I don’t think it’ll be very simple to implement, since the warning tones are probably different for each country.

For now, I’m happy to ssh into the server and soft hangup Zap/1 (as someone suggested) but I wish there was a solution that didn’t require such manual intervention, so if any telephony enthusiasts are reading this and have any better ideas, feel free to comment.

static vs. dynamic content: a footnote

My colleague Blake recently wrote an article on the occasion of the decommissioning of NewsDelivery, a dynamic content display engine that until recently ran all the news stories on CBC.ca. I can’t speak for any of our alumni, but I think all of us at CBC.ca have learned one lesson:

Large websites should never, ever use dynamic rendering for content rendering.

It’s amazing how many content management systems still do not grasp this principle. On a busy site, especially one that is liable to be Slashdotted or visited heavily (say, on 11 September 2001), you do not want to be executing Java/ASP/Smalltalk/FORTRAN/whatever code every time someone visits a story. In short, you do not want CPU usage to rise proportionally to the number of visitors you have.

What you do want is to make the content rendering "system" as simple as possible; in the ideal case, you can barely call it a system. For content rendering, CBC.ca now mostly uses a bare Apache instances with server-side includes, meaning that aside from the core Apache engine, no other code needs to be executed every time you view a story.

This seems like a very simple principle, but many other news sites are still not grasping it. I can almost guarantee that if there is another 9/11-scale of event, sites that use a servlet-based dynamic execution system like The Globe and Mail and The Toronto Star will fall over under heavy load far sooner than CBC.ca. But I don’t really blame those organizations for choosing, for example, Fatwire Content Server (as in the Star’s case) because a news organization’s primary need is to create content. Displaying it is a whole separate problem entirely and the shame should be on the vendor for closely tying the two together.

send CBC.ca news alerts to your cell phone with procmail and awk

CBC.ca has a News Alerts mailing list where you can subscribe to get breaking news delivered to your e-mail inbox. Unfortunately, the news alert e-mail has a lot of extra gunge before and after the body of the actual alert, in our case, a header saying “Breaking news from CBC” and then some trailing text about how to unsubscribe from our mailing lists, etc. I use this procmail recipe to strip out the extra stuff and just send the body of the message to my cell phone. You might find it useful too!

Continue reading

SuSE Linux round two

I decided to format my CBC-issued desktop and install SuSE Linux 10.1. You will recall that back in January I tried to install SuSE 10.0 on a Thinkpad T42, with very poor results. So why did I decide to try again? There are a number of reasons:

  1. CBC is a Novell shop internally; they use GroupWise for email and make extensive use of NDS, ZENworks Desktop Management, iPrint, and many other Novell technologies. SuSE, as you probably know, is a division of Novell.
  2. I despise GroupWise but I am hoping at some point in the future to be able to use Ximian Evolution’s Groupwise connector to talk to our Groupwise servers. This probably won’t happen until corporate IT upgrades to Groupwise 7, though (or at least until they turn on the web services interface in Groupwise)
  3. CBC.ca runs SuSE Linux Enterprise Server 9 on all of its production web and Java servers, so having a similar environment on my desktop for development purposes makes sense.

After a few weeks of using SuSE Linux 10.1, I’m generally happy with it. Part of that is due to the fact that I’m not using it on a laptop, so my original beef about WPA being broken is a non-issue in my use case. (That doesn’t mean that the problems have been solved, though; 10.1 still doesn’t support WPA properly.) I still have a couple of complaints:

  1. The successor to Red Carpet, Zen Updater, is horribly broken out of the box. In order to “fix” it I still needed to run YaST to upgrade to the latest versions of libzypp and all that jazz. Still, YaST and Zen Updater are both far, far slower than any other package management tool I’ve ever used.
  2. SaX2, SuSE’s X Server configuration tool, is still extremely buggy. This would be only slightly annoying if you could hack the xorg.conf yourself, but SaX puts all kinds of proprietary directives in there (such as Option "SaXDualHead") and prefaces the file with a warning to not hand-edit it. So what are you supposed to do when SaX fails you, e.g. it refuses to properly configure my Matrox G450 in dual-head mode?

There are some positive aspects to SuSE Linux though, namely its integration with NDS (I guess they call this "eDirectory" now). I was able to successfully install the Novell Client for Linux 1.2 and log onto eDirectory. There’s a nice fancy QT GUI and GNOME tray icon for managing Novell connections and for the most part it works flawlessly, just like Novell Client for Windows. This is a huge improvement over the awful NovelClient (sic) that I used to use before during my first term at CBC.

As I said, I’m generally happy with SuSE Linux. My one remaining complaint is this: why does Novell have so many confusing names for its Linux products, and why do they seem to change them every 6 months? We have SuSE Linux Enterprise Server (SLES), Novell Linux Desktop (NLD), OpenSuSE, SuSE Linux Enterprise Desktop (SLED), SuSE Linux Enterprise OpenExchange Server (SLEOS? SLOS???), Open Enterprise Server (which I gather isn’t even really Linux but some form of Netware?), and so on. Even SuSE Linux 10.1 has two versions: a so-called “retail” version that comes with support, and a downloadable “community” edition (that I’m using) with no support and missing a bunch of non-GPL packages like RealPlayer, Flash, Adobe Reader, etc. but which you can install later. Worse still, Novell refers to SuSE Linux 10.1 as “created by the openSuSE project”, but the next version of SuSE Linux, 10.2, is going to be called openSuSE 10.2 …?!

All of this SLES, SLOS, and SLED nonsense makes my head spin — makes me want to give the Novell marketing monkeys a SLAP.

Sphinx talk at TAUG this month

As a follow-up to my last post, Simon Ditner is going to be giving a talk about speech-to-text integration in Asterisk using Sphinx2 and Sphinx4 at this month’s TAUG meeting. All the meeting details can be found here. You’ll also get to hear a demonstration of Simon’s own use of this engine, which is a hilarious Zork-over-IP implementation in Asterisk.

By the way, you may have noticed that some of my previous posts allude to me returning VoIP equipment to Devlin. That’s because I’ve returned to the Platform Administration team at CBC.ca. This time around, I’ll be working on a number of exciting capital projects (I know, it doesn’t sound exciting when you call them capital projects) that are mostly aimed at fixing the basics with CBC.ca’s infrastructure. I’m not sure I can disclose many details at the moment beyond that, but rest assured that I will be discussing the relevant technical challenges and their solutions at an appropriate time. I hope to be able to informally follow in the footsteps of my colleague Blake’s contributions to the Inside The CBC weblog in his Under The Hood column, but going into far greater technical detail than Blake is able to, for reasons of audience accessibility.

Sphinx: An Open Source Speech-to-Text Engine

I attended the Toronto Asterisk Users’ Group meeting tonight and one of the hot topics discussed over dinner was speech-to-text (i.e. speech recognition). Text-to-speech (TTS) in Asterisk is already well-handled by Festival and the corresponding Asterisk application, but I think you’ll agree that speech recognition is a far more interesting topic. (Except if you hate Emily, Bell Canada’s vocal equivalent of the stupid Microsoft paperclip)

Carnegie Mellon University has long had a group working on a recognition engine called Sphinx, funded by a DARPA grant. I’m told that Sphinx-II, the original C version, is available as an application for Asterisk, but later versions of Sphinx have much higher accuracy. Sphinx-3 is written in C++ and Sphinx-4 is written entirely in Java. Sphinx is different from many other speech recognition systems in that it does not require training, which makes it ideal for use in telephony applications. Instead, you supply it with a dictionary of known waveforms (the bigger the dictionary, the more RAM is used). Mike Ashton of QualityTrack claims over 96% accuracy using Sphinx, using it to strip sensitive information out of recorded phone calls from a call centre monitoring application.

This is really fascinating technology, and the best part about it is that despite having been developed under a DARPA grant, it’s open source! Apparently this was one of the stipulations of the CMU researchers when they first agreed to accept the grant, and the community is the better for it. According to the site, it’s rather difficult to install and set up, particularly for those of us with no knowledge in speech patterns and the like, but perhaps one day I’ll be able to have a system that I can dial and say “Please reboot programGuide” and Asterisk will be able to do the right thing.

Super Mario meets the SPA-941

The default ring tones that come with the Sipura SPA 941 IP phone are a bit lame, and actually the "Classic" tone is very jarring. I’ve been playing with my Super Nintendo (yes, that 16-bit dinosaur) again recently, and I must say that the sound effects and music in Super Mario Brothers 2 are very catchy. I decided to see if I could make a ringtone out of some key tunes.

Continue reading