a quick reflection upon DemoCampToronto7

This evening I went to DemoCampToronto #7, a project of BarCamp Toronto. As BarCamp’s website says,

BarCamp is an ad-hoc gathering born from the desire for people to share and learn in an open environment. It is an intense event with discussions, demos, and interaction from attendees.

DemoCamp consists of a set of presentations totally no more than 15 minutes apiece (including questions) on up-and-coming software projects. It’s basically the same as a WiP session at any USENIX conference.

I don’t have enough time to summarize all of the presentations, but I’m sure others will (and I’ll try to link to some of the better summaries here). I just wanted to step back a moment and reflect on the fact that a room full of 150 passionate, articulate coders — in Toronto, no less — makes me think that we’re having a renaissance in the software development and IT industry. These are not coders who are just buzzword and Web 2.0-compliant; I sense that these folks are making real productive use of technologies like Ruby on Rails, AJAX, DHTML, Flash, and all the other gadgets that are revolutionizing the Internet by providing a true challenge to the classic thick application.

This renaissance is borne out by the increasing proliferation of jobs. Tucows just held a job fair, after which they hired a number of individuals fresh out of Computer Science at U of T (I know because two of them were sitting at my table). Exciting companies like Nurun and Critical Mass are hiring and expanding. I’ve personally been courted by one or two companies, unsolicited. Contrast this with the state of affairs five years ago, which is when I graduated from U of T. Jobs were scarce and I was lucky to land a position programming PHP for a firm that hadn’t blown its money in the dot-com crash.

It seems to be a great time to be in IT. The buzz is in the air again, and I have but one word of warning for many of the IT firms that have just barely stayed afloat for the last few years: You’d better do something to make sure you hang onto your technical staff — i.e. give them interesting, challenging work, and respect their talents — or you will lose them to other companies that are willing to make those tools available.

getting VLANs working between Cisco & HP gear

Ever since I started at Devlin, I’ve had one nagging problem with the network gear: the VLANs from the Cisco equipment (a triad of Catalyst 3550-24 switches) won’t propagate to the other gear we have (an HP ProCurve 2424M and a Linksys SRW2024). I read all I could about VLANs and tagging, but no matter what I did I couldn’t get the non-default VLANs to show up on anything but the Cisco gear. I figured I was missing some key information, particularly about when to tag and not tag VLAN traffic, that was preventing me from getting this working.

I finally did a search on Google about Cisco interoperability, and found this page which indirectly made everything clear. It turns out that the tagging on the HP (or any other switch being connected to the topology) needs to be done as follows:

  • set traffic on the trunk port to be tagged for every VLAN you want to propagate
  • allow access to the VLAN on the non-trunk ports but set them to be untagged

I was originally a little worried because the VLAN I’m interested in propagating is the voice VLAN (for our IP telephony setup) and I feared that the Catalyst would do something really weird with it (seeing as how you specify switchport voice vlan 2), but it seems to be just another VLAN. I assume the foregoing IOS directive is just for QoS or something on the Catalyst.

By the way, doesn’t the University of Wales’ IT department have an awesome name? I know it’s Gaelic, but I should start calling my department Gwasanaethau Gwybodaeth too. That would certainly cut down on the help tickets — I could start saying “please e-mail help-gwasanaethau@devlin.ca to open a ticket” 🙂

NetworkManager starts getting some docs

Looks like someone has started putting together some informal documentation for NetworkManager.

In a completely unrelated note, the upgrading of my Fedora Core 5 Thinkpad T42 to kernel 2.6.17-1.2139 has broken wireless (again). Any attempt to use NetworkManager with it causes ipw2200: Firmware error detected. Restarting. to be seen in the dmesg. However, if I run wpa_supplicant manually and then dhclient, it works.

I’m really looking forward to the day when all this is fixed, although I suspect wireless is such a bleeding edge problem space that the day won’t be coming soon.

who’s AFRAID of real hardware RAID?

Recently we bought a low-end IBM xSeries 306m server to handle generic IT utility tasks, such as hosting an installation of Request Tracker, Cacti and, in the near future, Nagios. The server came with a pair of 160GB SATA disks attached to a ServeRAID-8e HostRAID controller. I quickly discovered that HostRAID is an awful hack; it’s not real hardware RAID, but software-emulated RAID, utilizing the host system’s SATA controller to do the actual I/O to the disks, but with the RAID processing done in software using a proprietary driver, in my case, a driver called adpahci. In other words, it’s "A Fake RAID", which some pundits have noted collapses into the fitting acronym AFRAID.

Several admins have criticized HostRAID for a number of reasons:

  • Performance is terrible because the AFRAID controller must do polled I/O (PIO) through the CPU
  • The drivers are, by nature, proprietary, since the RAID logic is licensed from a third party
  • Limited sophistication in array rebuilds, since the controller has a minimal BIOS and online rebuilds are not possible
  • Disks in an AFRAID array are probably unusable outside of the array, given that the driver is chipset-specific

Although I don’t really care about performance for such a low-end utility box, I have been seriously bitten by the second point. We use RedHat Enterprise Linux 4 Update 3 on all production servers like my utility box. IBM only provides binary HostRAID drivers up to RHEL4 Update 2. You can allegedly rebuild the drivers using a SHIM from Adaptec, but it doesn’t work; although the SHIM package contains C drivers for all the Adaptec HostRAID controllers (aar81xx, adp94xx, adpahci, adpsata, etc.) the only binary blob you can obtain is the one for the aar81xx. Ergo, I am S.O.L. I’m stuck with a RHEL 4 Update 3 userland on a RHEL 4 Update 2 kernel.

I guess the appropriate solution if you’re going to buy this model of server (with SATA) is to ditch the on-board AFRAID and buy a ServeRAID-7t SATA controller, which has a real 80302 processor and 64MB of cache memory, or any of the other ServeRAID products which fit in the server.

On a final note, what the heck is with IBM’s insane naming schemes for all of its ServeRAID products? I can’t keep the 6i+, 7t, 7k, 7e, 8e, 8i, 6M, etc. straight — can you? Have a look at this driver matrix and your eyes will glaze over. Why don’t they name the controllers something meaningful?

memories of Farallon PhoneNet

My 10-year high school reunion is happening over the August long weekend this year, and the event got me thinking about some of the technology we used during those years.

Every Ontario elementary school and high school student of a certain vintage will remember the ubiquitous Unisys ICON terminals, a topic that I will actually leave to a later entry (we had a lot of fun with those ICONs, especially upon discovering that one could write a C program fork() from any PID on the system, including /sbin/init, with extremely useful results). However, I started thinking about Farallon PhoneNet, a fabulous networking technology for Macintoshes back in the day, and I thought I should record for posterity what kind of equipment it took to produce the Mackenzie High Times back in the day.

Continue reading

new computer woes

So my trusty 6 year-old desktop, jupiter, died after a power outage a couple of weeks ago. I suspect the motherboard got fried, because trying to power on the system did nothing, although a monitor plugged into the back of the PSU could still power up.

I’d been thinking of getting a new computer for some time, because the Pentium III 800 MHz processor and 768 MB of PC133 RAM wasn’t really cutting it for running VMWare Workstation, so this failure pushed me into action. I decided to purchase a standard "template" system from Canada Computers with the following specifications:

  • ASUS P5LD2-VM motherboard
  • Pentium 4 3.2 GHz CPU
  • 512 MB of DDR400 PC3200 RAM
  • 250 GB Western Digital SATA hard disk
  • LG 16x dual-layer DVD writer

The P5LD2-VM has onboard sound, video (using an Intel 945G chipset) and Intel Gigabit Ethernet, so I decided to just use those.

To this system I added another 512 MB of RAM, a second 250GB SATA hard disk (for a software RAID-1 mirror), and an APC Back-UPS CS 500 uninterruptible power supply.

I picked up the system on Saturday, took it home and powered it up. Immediately I saw a problem: one of the SATA hard disks wasn’t being properly detected. After fiddling around with the connections on the motherboard, I was able to get both disks to show up, but only if I used SATA ports 1 and 3, rather than 1 and 2. Plugging any device into ports 3 and 4 caused them to not show up in the BIOS.

I resolved to take the system back to have Canada Computers’ technicians diagnose the issue (eventually they reset the BIOS and everything was fine) but in the meantime I could still install Fedora Core 5 on it. Or so I thought.

I started by installing the i386 version of Fedora, which succeeded, but then I realized that the Pentium 4 is an EM64T CPU, so I should install x86_64 Fedora. Trying to do so, however, caused the installer to lock up right before the first boot, and resulted in a corrupted system — for example, /etc/inittab would be missing. I observed other weird behaviour, like the fact that the primary software RAID partition, /dev/md2, would be in a rebuilding state immediately after the install, even though the installer said to reboot the system.

I subsequently tried to install the Fedora Unity Re-Spin of x86_64 Fedora Core 5, with similar results; at least I was able to get through the installer and onto first boot, but when starting up X the system would lock up hard. SUSE Linux 10.1, which I tried just to see if it would behave differently, had the same issues.

I came to the conclusion that the on-board Intel 945G video chipset is no good, at least with the 64-bit drivers in X. So I ran out to Canada Computers again and bought the cheapest PCI Express video card I could find: an ASUS Extreme AX300SE (basically an ATI Radeon X300SE). Then I tried to reinstall Fedora Core 5, and it worked perfectly! So I would advise everyone to stay away from the Intel 945G chipset for on-board video.

By the way, the UPS was broken too — I opened a support case with APC and they are planning courier me a replacement. I guess I just have bad luck with computer equipment.

why has CORBA failed?

There’s a great article in this month’s ACM Queue entitled The Rise and Fall of CORBA. Since it’s authored by Michi Henning, who worked on CORBA as part of the OMG’s architecture board, and subsequently became an ORB implementer, consultant, and author of a book on CORBA Programming with C++, I had to take notice. The article itself isn’t available online, so I’m sorry I can’t suggest that you read it — instead you’ll just have to put up with my opinions, peppered with some quotes from the article.

Continue reading

cbc.ca now XHTML compliant

I don’t work there anymore, but I still feel some affinity for the site, particularly since many of my friends are still on staff. So I must congratulate my long-time colleague David Raso (we’ve worked together both at cbc.ca and at VerticalScope) for making CBC.ca completely XHTML 1.0 Compliant! For those of you who know the complexity of the CBC bureaucracy, this is quite an achievement. Kudos, Dave, on your temerity. Sadly I think the long road and uphill battle to XHTML will be lost on certain senior management types who “don’t understand what those programmers even do.” (the source of this comment will remain anonymous)

Google for system logs

I’ve been playing around with Splunk recently, which I bill as “Google for your system logs.” It’s much more than just a simple search engine, but that’s the simplest way to describe what it does; it aggregates log data from multiple sources and allows you to search, correlate data in time, and also post (anonymized) snippets from your log data on Splunk Base for others to see.

For our little shop, Splunk is probably overkill; I have about 30 servers (physical and virtual) to manage, and I have not found myself needing the functionality, per se. But it’s still a neat tool. I wish we’d had something like this at my previous job, in particular to index log4j entries from misbehaving Java applications. Trying to sift through data from six Java servers and six webservers in real-time to try and find out why the site is tanking is nearly impossible and often led to live hacks on production to disable dumb ideas that were taking the site down.

Now that I’ve posted all those HREFs, I wonder if Google will take down the site when it next indexes my journal. 🙂

MTBF for Sun drives: 4 months or less

Boy, I’m glad I wrote down directions for replacing a drive in an LVM mirror, because c0t1d0 just died on me. That’s right, the drive that I didn’t replace last time.

Keep in mind that I purchased this Sun server less than 4 months ago. I wonder if the assembly line workers at Seagate were smoking pot 6 months ago when they put this batch together?