bad ideas in usability

At my new company I unfortunately have to deal with Active Directory. I understand that AD is supposed to be the authoritative source for any information about users, groups, computers, and so on, but does the interface have to be so crammed with junk?

This has got to be the worst interface I’ve ever seen (Lotus Notes aside, but I’ve never had to administer Notes). It’s not clear where to find anything! Not only is the interface kludgy (multiple rows of tabs?) but the tab labels are totally non-intuitive. Why are there at least four tabs pertaining to e-mail (Microsoft Exchange)? What the heck is the Member Of tab for, and how does that differ from what I might find under Account?

I can’t imagine trying to administer hundreds of users with this kludgy tool. Thank God our company is only < 50 people.

home router replaced!

I finally decided to replace my FreeBSD-based Sun Ultra 10-based home router. There were a couple of reasons for this:

  1. I was running FreeBSD 5.x, which meant that the keyboard wouldn’t work — I could only control the system remotely over SSH or through a serial console. This was fixed in later versions of FreeBSD 5.x but I didn’t want to bother upgrading, since the box isn’t the fastest machine
  2. Using a desktop workstation for routing and running ppp consumes more power than it’s worth, and makes a fair amount of noise
  3. Using an 400 MHz UltraSparc III-based workstation with 512 MB of ECC RAM for a simple firewall and router seemed like a bit of overkill ๐Ÿ™‚
  4. I want to free up the Ultra 10 for testing out Solaris 10 and possibly upgrading my Solaris 9 SCSA designation.
  5. I want to (finally!) equip my home with wireless… yes, I’m a little late getting on the bandwagon.

Continue reading

The Design and Implementation of the NetBSD rc.d System

This is a moderately old paper, but I think it’s worth reading if you want to understand the rationale behind the NetBSD rc.d startup system. I think this is what is referred to on FreeBSD (which has adopted a similar mechanism) as rcNG.

The Design and Implementation of the NetBSD rc.d system

There are many things to like in this design, which is far better than the organic (to put it politely) way in which the system startup sequence of a given Linux box has evolved. For one, it has the following advantages (outlined in the paper, but I’ll detail them here if you don’t want to read it):

  • Independence from lexicographical ordering of filenames (no S90foo running before S91foo), which always struck me as having a sort of BASIC-style limitation (i.e. back in the day having to number your code lines in multiples of ten in case you wanted to insert code in between)
  • Use of dynamic dependency ordering (via a special header and the rcorder script)
  • No reliance upon a special platform-specific "function" library, as is the case in many Linuxes
  • Centralized system configuration via /etc/rc.conf — no bloated /etc/sysconfig nonsense as on many Linuxes (but this is a topic for another day)
  • Avoidance of mandatory runlevels, which I can never remember on a given Linux or Solaris machine. ("What is runlevel 5 again?")

I could go on, but I urge you to read the paper instead, where Luke demonstrates a solid design methodology and rationale and then executes on the same. This is more than can be said for Linux.

Unskilled and Unware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments

The title of this post is also the title of a fabulous paper published in the Journal of Personality and Social Psychology of the American Psychological Association (PDF). I mention this in the context of technology because the paper was first mentioned as a response to this post on The Daily WTF, a site exposing bad programming in a daily blog format.

First, in regards to the post — I can vouch for the fact that there is some really bad code out there, and much of that, I’m sure, comes from programmers with overinflated egos who don’t realize their own incompetence because

people who are unskilled in these domains suffer a dual burden: Not only do these people reach erroneous conclusions and make unfortunate choices, but their incompetence robs them of the metacognitive ability to realize it.

Still, part of the problem is a lack of proper management oversight — whether it be functional management, or technical management. Indeed, in many cases, bad programmers’ incompetence is rewarded because their products are seen to be "business successes" because they allegedly meet the functional requirements — never mind the fact that the applications consume far too many resources on the system, crash all the time, and cause a huge maintenance burden for the operations staff. I can provide many examples, but I’m sure I would get in trouble ๐Ÿ™‚

I considered printing out the APA paper and anonymously stuffing it in peoples’ mailboxes — not only in the mailboxes of those programmers who I feel are totally incompetent, but also in the mailboxes of their managers who still think they perform(ed) well. I decided against it not because I think I would get in trouble — they’d have no way of detecting who was the culprit — but again because their own incompetence would prevent them from detecting that the paper is targetted at them.

As Kruger and Dunning point out, the only way to resolve this dilemma is to remove the incompetence — train the bad programmers to be better programmers, and to recognize their own shortcomings. That can’t happen if bad management is preventing even the open discussion of the poor code quality.


Today’s my last day at CBC.ca. I’m moving on to a pure systems administration position with a much smaller e-business company in Toronto called Devlin e-Business Architects. I decided that working on content delivery projects like the Torino Olympics website is really not where I want to be strategically with my career, and I don’t think I ever fit into the big company mindset very well. I’ll be writing more about that once I’m not formally under the employ of said big company ๐Ÿ™‚

In the meantime I wish you all a very happy holidays and new year!

Java Virtual Machine Tuning under JVM 1.4.2

Here’s an article I wrote about tuning Sun Java JRE 1.4.2 some time ago. I’m only posting it now to save it from loss when I leave CBC.ca.

This page is intended to document some proposals and empirical data gathered while attempting to tune the JVM used for running web applications on CBC.ca’s Java servers.

Topics to be covered:

  • Impact of using different garbage collectors
  • Impact of tuning garbage collectors
  • Maximum and minimum heap size settings
  • [potentially] Impact of using different JVMs other than the Sun JVM. For example, compiling Java code into native OS code using gcj? among others.

Continue reading

Broadcom NetXtreme issues part 2

Here’s a follow-up to my previous post about the Broadcom BCM570x Gig-E adapters on HP-DL380 servers. HP pointed us to the following advisory:

Advisory: Primary Port of Integrated NC7782 Gigabit Server Adapters with NFS protocol with Certain Firmware Versions Stops Transmitting under Linux, Resulting in Lost Network Connectivity

However, reading the advisory indicates that the problem only afflicts the primary port of the Ethernet adapter. We’ve been seeing problems on the secondary port, as well as an add-on card.

This has been raised with HP, so we’ll see what they say.

Broadcom NetXtreme Gigabit Ethernet adapter problems

Recently we’ve been seeing a lot of error messages while using the Broadcom BCM570x series of Gigabit Ethernet adapters under SUSE Linux Enterprise Server 9. The symptoms are that the interface will simply hang under high traffic and refuse to pass more packets, eventually giving the error:


Dec 1 01:17:46 dev03 kernel: NETDEV WATCHDOG: eth2: transmit timed out
Dec 1 01:17:46 dev03 kernel: tg3: eth2: transmit timed out, resetting
Dec 1 01:17:46 dev03 kernel: tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2
Dec 1 01:17:46 dev03 kernel: tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
Dec 1 01:17:46 dev03 kernel: tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
Dec 1 01:17:46 dev03 kernel: tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
Dec 1 01:17:46 dev03 kernel: tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2

It’s become a very serious issue for us because we have Broadcom BCM570x controllers on board all of our HP-DL380 servers. The problem seems to occur more frequently now that we’ve upgraded an SP2 (and beyond) SLES9 kernel, although we have had problems dating back several months with older kernels.

Doing some research on the Internet, I’ve found that this is a very common problem out in the field. In a summary document I prepared to management, I wrote the following:

Other customers in the field have reported the same problems running RedHat Enterprise Server 3, Debian GNU Linux, FreeBSD/NetBSD and even Novell Netware (internal communication with Novell PSE). In many of the reported incidents, customers were running identical server hardware (HP/Compaq Proliant DL-3×0 series) to CBC.ca. [HP IT Resource Centre thread #898761 where customers have reported issues with a variety of HP hardware and operating systems.]

There are a number of root causes to the problem including Linux driver instability (the Tigon3 (tg3) driver was created by reverse-engineering the Broadcom bcm5700 driver due to the low quality of the latter) and manufacturing defects (manufacturing defects with some Broadcom 5704 chips afflicted Sun’s initial customer shipment of Sun Fire V210 and V240 servers in 2003 leading to Sun Alert #55620 The impact of such defects beyond Sun is unclear because Broadcom refused to provide further details.)

Right now, we’re awaiting feedback from HP and Novell on how they plan to resolve this issue. In the meantime, we’re going to stockpile some Intel Gigabit Ethernet cards.

Toothpaste for Dinner

Toothpaste For Dinner is perhaps one of my favourite online cartoons… there’s not a lot (none) complexity in the cartoonist’s art, so the value’s all in the raw humour. I just about died of laughter when I saw this t-shirt.

Even very basic jokes, for some reason, have hilarious humour value when paired with an odd character face.

Finally, nothing beats that element of humourous surprise in Basic Electronics Symbols

a few changes…

I’ve decided to give up my consulting practice. It’s too hard to do consulting and hold down a full-time job, so I’ve decided that for now, the full-time job trumps consulting. OpenTrend will still continue to exist, as my business partner Rob has bought me out, and I think he’s bringing additional people on board. So I would continue to recommend OpenTrend for your open-source consulting needs. My last day with OpenTrend is October 31st, 2005.

The latest project I’ve been working on at $DAYJOB is rolling out an implementation of SUSE Linux Enterprise Server 9, all managed with ZENworks Linux Management (ZLM). Originally, we were sold ZLM 6.6.1, which is partly cobbled together from some stuff that SUSE AG did, and also partly cobbled together from Ximian’s Red Carpet. Sadly, this shows; there are a few things that don’t work quite right, bundled with some serious show-stopper bugs.

We’re a Novell shop, though, so I’m currently evaluating ZENworks Linux Management 7, which is a complete rewrite. I’ll keep you all posted on how that pans out!