operating systems that hold your hand too much…

I’m all in favour of making an operating system like Linux easy-to-use. Linux’s popularity means that for many users it is the only exposure to a UNIX-like operating system that they are likely to see, and that’s why it’s important to give them the best first impression of UNIX so that they’re not turned off by it. This includes being standards-compliant and introducing as few distribution-specific hacks as possible.

I bring this up in the context of shell aliases. Today I was alarmed to see the following set by default for all users on a a SUSE Linux Enterprise Server 9 system:

alias +='pushd .' alias -='popd' alias ..='cd ..' alias ...='cd ../..' alias beep='echo -en "07"' alias dir='ls -l' alias l='ls -alF' alias la='ls -la' alias ll='ls -l' alias ls='/bin/ls $LS_OPTIONS' alias ls-l='ls -l' alias md='mkdir -p' alias o='less' alias rd='rmdir' alias rehash='hash -r' alias unmount='echo "Error: Try the command: umount" 1>&2; false' alias which='type -p' alias you='yast2 online_update'

I get very alarmed when I see default behaviour set like this. There are a number of issues with this:

It misleads new users by making them believe the behaviour of “ls” and other commands is different than what the actual default behaviour is.
It introduces a set of commands to the user (e.g. “rehash”) that don’t really exist in the shell, leading to confusion if the user goes to use another UNIX machine without these aliases.
It misleads users into believing that some DOS commands also exist in the Bash shell (e.g. “rd” or “md”). Rather than encouraging them to learn the correct commands, these aliases provide a crutch to the user that they are unlikely to discard. They may then use this incorrect information when describing procedures to other users. This would particularly be disastrous in an interview type situation (e.g. “Q: What is the correct command to make a directory under UNIX?”)

All of these aliases are unnecessary and imply that the personal shell alias preferences of SUSE developers are being imposed upon all users.

I would like this to serve as a call to all distribution vendors, SUSE particularly, to not ship Linux with unnecessary customizations that only serve to confuse users and introduce disparity between Linux distributions where none originally existed.

publishing free/busy information in Evolution

Ximian Evolution can publish Free/Busy information by using WebDAV, but this doesn’t seem to be documented anywhere I could find. Here’s what I did to set it up:

Set up a WebDAV-compliant webserver. I installed mod_dav for Apache 1.3.x.
Configure DAV properly, and make sure that the directory you are enabling DAV for is writable by the webserver user.
Configure Evolution. Select Tools > Settings, then Free/Busy Publishing. Click Add URL and in Publishing Location type in http://your-server-name/your-location/. Don’t forget to supply the username and password you set up for DAV.
You’ll get no diagnostics from Evolution when the publishing occurs, so you’ll have to check the webserver logs to see if it succeeded or failed.

BSDCan 2005 Remarks

I’ve just returned from the 2nd annual BSDCan conference in Ottawa, Ontario, organized by the very capable Dan Langille. In addition to being a super nice guy, Dan is also the founder of the FreeBSD Diary, FreshPorts, FreshSource and is involved with the BSD Certification project. Let nobody say that Dan hasn’t given enough to the BSD community!

In the spirit and style of the USENIX conference summaries published in ;login: magazine, I’m going to summarize (and pollute with my own personal remarks) the sessions that I attended.
Continue reading →

a crazy little chart about the Bay Area

Recently on sage-members folks have been having a discussion about just how expensive it is to work in the Bay Area. I think Dustin Puryear started the thread as he’s thinking of moving from Louisiana. Anyway, someone posted the following link:

Bill Manning’s Blog >> A crazy little chart about the Bay Area

The numbers in there seem astonishing. When I told my girlfriend, who’s lived in the Bay Area (working for Sun), she wondered how it compares with Toronto. So, after some research at the Toronto Urban Affairs Library, I give you numbers for Toronto. (Here’s a PDF illustrating this in a similar manner to Bill’s original post.)

First, some metadata about wages:

Gross Minimum Wage		$7.45 per hour
	Federal Income Tax	$1.19 per hour
	Ontario Income Tax	$0.45 per hour
	EI Deduction	$0.15 per hour
	CPP Deduction	$0.13 per hour
Net Minimum Wage		$5.54 per hour

Now for some rental rates from CMHC‘s research report:

Private Apartment Average Rental Rates for 2004

York Region$851.00$212.7538.43

Zone	Avg. Rent/Month	Avg.Rent/Week	Hours Needed @ Minimum Wage
Toronto (Old City)	$950.00	$237.50	42.91
Etobicoke (South)	$841.00	$210.25	37.98
York	$811.00	$202.75	36.63
East York	$844.00	$211.00	38.12
Scarborough	$831.00	$207.75	37.53
North York	$865.00	$216.25	39.07
Mississauga City	$890.00	$222.50	40.20
Brampton City	$887.00	$221.75	40.06
Oakville	$918.00	$229.50	41.46

So, assuming you were to spend your after-tax income on nothing but accommodations, you could work a regular week in Toronto and be able to rent an apartment. This is about 1/4 the effort it would require in the Bay Area. Pretty shocking!

tape hardware, part two

While on the topic of tape hardware and backups… never mind my little DLT7000 drive at home. How do you back up a 4TB Titan NAS?

We bought one of these servers at work last year; we’re finally getting around to using it for something. Our current challenge is trying to figure out how to back up a 1TB Interwoven content store (we’ve just bought almost the entire product line from Interwoven) without IT screaming at us for taking up their entire tape rotation schedule. This is on top of having to back up a large MediaBin store as well.

I’ll be happy when the Titan is actually up and running, though. We’ve been having some problems getting the CIFS partitions running, because the Titan really needs an Active Directory server in order to enforce permissions, and all we have is a Windows NT 4 domain controller (think again about hacking it; it’s on an internal network). The problem is that we never originally intended the Titan to be used for Windows shares; the unit was purchased long before we decided to go with Interwoven on Windows entirely.

Interesting technical challenges abound…

upgrading tape hardware to DLT

After my girlfriend‘s Powerbook crashed, taking with it several months of her un-backed-up data, I decided enough was enough with my own antiquated backup hardware (ExaByte 8505 8mm tape drive, 5GB/10GB) and I bought a DLT7000 (35GB/70GB) drive off eBay, thus increasing my backup capability by seven times. With eight tapes, the whole adventure cost me approximately CAD$150.

I started thinking about what the purpose of hardware compression on tape drives is. In principle, it seems like it’s a good idea; offload compression, which is a CPU-intensive activity, onto the drive. The only problem is that it makes the estimation of whether or not a backup will fit onto a tape a virtual impossibility. I want to know, before I even start writing to the tape, whether or not a backup is going to fit. I don’t want to start writing to tape and then, 2 hours later, find I just hit End Of Media. It’s not something that you can recover from.

I don’t see a technical solution around this problem, so what I do is turn off hardware compression and just gzip the data to a holding disk. This is one of the great features of AMANDA; you can stage the entire backup to a temporary disk, and then write the backup to tape from that disk.

So, as far as I can tell, hardware compression is not very useful; it seems like a scenario where solving one technical problem (moving slow compression activities onto hardware) creates another (inability to know a priori if you’re going to run out of tape before you start writing the backup).

… and now for a bit of leisure

I’m a little behind on videogames, since I don’t own any console gaming systems (except for an ancient Super Nintendo that was donated to me by its original owner). But I had the opportunity to play this PS/2 game called Katamari Damacy over the holidays. Its premise is very simple: You’re the Prince of the Cosmos, and your father got drunk and accidentally destroyed all the stars in the sky. Your job is to roll around with this big sticky ball (the Katamari) and pick up items, growing the Katamari in order to pick up even larger items. At first, you start out by picking up small things (like plants, suitcases, etc.) but eventually you evolve to being able to pick up buildings, land, even clouds and whirlpools.

It sounds kind of silly, but it’s very addictive, and actually a lot of fun. Plus, it’s rated Everyone so that even your kids can play!

I’m a system administrator again!

I’m a system administrator again; on 6 December I moved to CBC.ca’s Platform Administration team. Yay!

Now where is my honourary fire extinguisher?

huge increase in fake Rolex watch spam

No, it’s not just you:

Sophos reports 300% increase in Rolex watch spam

connecting Tomcat and Apache

Please bear with me while I engage in the following diatribe about: “Why Is It So Darn Difficult to Connect Apache and Tomcat?” Anyone who has worked with mod_jk/mod_jk2 and its ilk know that connecting Apache and Tomcat over AJP (Apache JServ Protocol) is probably one of the more difficult server configuration tasks out there.

A little history: When Tomcat was still Apache/JServ (way back in the day), there was a mod_jserv that managed the AJP pipe between the front-end HTTP server (i.e. Apache HTTPD 1.x) and the back-end application server (JServ). Eventually, this evolved into mod_jk for the first series of Tomcat application servers.

All well and good, and the configuration is fairly straightforward, up to the point of actually talking to your web application: the dreaded JkMount syntax. The example directive looks like this:

JkMount /examples/* worker1

There are a number of problems with this syntax. First, it unnecessarily ties the paths that you use to access the web application from the backend with those that you use on the front-end. So for instance, I have no way to specify that I actually want to map “/julians_examples” on the front-end to “/examples” on the backend. Want to do that? Sorry — time to institute some kind of mod_rewrite hackery. Secondly, the “*” doesn’t mean what you think it means! It’s not a wildcard, so you can’t selectively map stuff; for instance, I can’t say JkMount /examples/foo* to map all resources starting with foo to the application server. This will tell AJP to look for a resource matching, literally, “/examples/foo*” and of course will fail as there’s no resource with that asterisk in there.

Ok, so along comes mod_jk2, which is supposed to be a refactoring of mod_jk. It has certain improvements, such as the ability to talk over a shared UNIX socket (instead of using a network-based AJP protocol), the configuration is simplified again, etc. But again, the web application mapping problem is prevalent! The syntax to map the front-end to the back-end is like this:

<Location "/foo"> JkUriSet worker ajp13:backend-server:8009 </Location>

ARGH! Still no way to specify that the front-end /foo should be mapped to some other back-end path!

Why is this so difficult? And why do we have so many connector projects (like mod_webapp) that have died? A few years ago, I looked into mod_webapp‘s WARP protocol and it seemed to be a breath of fresh air over this antique AJP13 protocol. What happened to it?

I should mention as a postscript that maybe, maybe, in HTTPd 2.1, the new mod_proxy_ajp will solve my problems. Its syntax looks like this:

<Location /examples/> ProxyPass ajp://backend-server:8009/examples/ </Location>

Wow! Finally a way to say that I should map something on the front-end to a path that could possibly be different on the back-end.

I don’t understand why it’s taken us ten years (and counting) to get to this state. Is it just me that thinks this is totally bonkers?

As a footnote to this, I get the sense from the documentation that AJP13 is a very poorly documented protocol, and is still around simply due to momentum. Read these statements from the documentation, for example:

"This describes the Apache JServ Protocol version 1.3 (hereafter ajp13 ). There is, apparently, no current documentation of how the protocol works. "
"In general, the C code which Shachor wrote is very clean and comprehensible (if almost totally undocumented)."
"I also don’t know why certain design decisions were made. Where I was able, I’ve offered some possible justifications for certain choices, but those are only my guesses."

Undocumented code? Unjustifiable design decisions? Little current documentation about how the protocol works?

It’s things like this that are killing us in the Open Source community. I find it pretty difficult to pitch Tomcat as a worthy alternative to IBM WebSphere or BEA WebLogic when we have this kind of cruft sitting around, pretending to be an "enterprise-worthy" solution.