operating systems that hold your hand too much…

I’m all in favour of making an operating system like Linux easy-to-use. Linux’s popularity means that for many users it is the only exposure to a UNIX-like operating system that they are likely to see, and that’s why it’s important to give them the best first impression of UNIX so that they’re not turned off by it. This includes being standards-compliant and introducing as few distribution-specific hacks as possible.

I bring this up in the context of shell aliases. Today I was alarmed to see the following set by default for all users on a a SUSE Linux Enterprise Server 9 system:

alias +='pushd .' alias -='popd' alias ..='cd ..' alias ...='cd ../..' alias beep='echo -en "07"' alias dir='ls -l' alias l='ls -alF' alias la='ls -la' alias ll='ls -l' alias ls='/bin/ls $LS_OPTIONS' alias ls-l='ls -l' alias md='mkdir -p' alias o='less' alias rd='rmdir' alias rehash='hash -r' alias unmount='echo "Error: Try the command: umount" 1>&2; false' alias which='type -p' alias you='yast2 online_update'

I get very alarmed when I see default behaviour set like this. There are a number of issues with this:

It misleads new users by making them believe the behaviour of “ls” and other commands is different than what the actual default behaviour is.
It introduces a set of commands to the user (e.g. “rehash”) that don’t really exist in the shell, leading to confusion if the user goes to use another UNIX machine without these aliases.
It misleads users into believing that some DOS commands also exist in the Bash shell (e.g. “rd” or “md”). Rather than encouraging them to learn the correct commands, these aliases provide a crutch to the user that they are unlikely to discard. They may then use this incorrect information when describing procedures to other users. This would particularly be disastrous in an interview type situation (e.g. “Q: What is the correct command to make a directory under UNIX?”)

All of these aliases are unnecessary and imply that the personal shell alias preferences of SUSE developers are being imposed upon all users.

I would like this to serve as a call to all distribution vendors, SUSE particularly, to not ship Linux with unnecessary customizations that only serve to confuse users and introduce disparity between Linux distributions where none originally existed.

“lp0: on fire” explained

Ever get the above message on your Unix/Linux machine? This awesome explanation shows you from where the error originates.

RWL and maddog’s talk

Last Thursday I ducked out of work to hear maddog give a talk at the Real World Linux trade show — conveniently located across the street from my office. Given that RWL was largely a trade show for PHBs (Pointy-Haired Bosses), I was bracing myself for a PHB-oriented talk, and in many ways, it was. His subject matter was clearly intended to help win over whatever proportion of the audience not already enamoured with Linux. That’s fair, and I applaud him for that. Linux is still suffering slow adoption in large, conservative corporations — financial institutions, for example — and anyone making an effort to loosen the ties of conservative CTOs, on whatever grounds, should be applauded.

I do want to point out the hilarious juxtaposition of some of maddog’s talking points with the circumstances of the show. Let me summarize the central points of maddog’s talk:

Between the 50’s and the 70’s all software development was open source — when you paid for software, you got the source code if you wanted it. (Ignore the historical inaccuracies of this generalization.)
In the late 70’s and early 80’s when a company was developing (closed-source) software, they had, for example, 100 engineers, and 2500 customers. Each customer would generate on average one feature request and one bug fix per year, so per year you would have 5000 requests. No problem; each engineer would handle 50 requests a year.
Once IT became a huge industry, the company in question might now have 200 engineers, but 2.5 million customers, each generating two requests a year. Therefore each engineer would be required (theoretically) to handle 250,000 requests a year, which is clearly untenable.
Therefore, open-source software development is better because even if there are 2.5 million consumers, the number of developers is limitless.

Obviously this is a gross oversimplification, and I’m not trying to criticize maddog on these grounds. As I pointed out above, he’s trying to convince PHBs to use Linux, and why the quality of Linux as an OS can be better, due to more eyes looking at the code.

Maddog went on to talk about how large commercial organizations are unresponsive to customers’ concerns due to this very reason (scope/feature creep), and also used this to justify OSS development as better. Okay, that’s probably a reasonable statement too.

While I was sitting there and listening to maddog outline these truisms about how OSS software development and community support, etc. is better than that of commercial software development and commercial support, not ten feet away we had an entire trade show floor of exactly the same closed-source-type, commercial organizations, pitching their products the same as they would be pitching them at COMDEX or CeBIT! The only difference is that, perhaps, some of the products were built on OSS technology, or they ran on Linux. Nevertheless, when I go up to ACCPAC’s booth at RWL and talk to the sales drone, how is this any different than when I go up to ACCPAC’s booth at COMDEX and talk to the same drone? There’s no difference; ACCPAC is still the same, massive, monolithic commercial company with the same problems regarding creeping featurism that maddog outlined in his talk!

The fundamental problem I now have with Linux is that rather than companies developing software the way OSS developers would develop software (which, if you believe maddog, would be the better way), those same companies are just taking Linux (as they have every right to do, mind you), inserting it into their own corporate framework, and selling it just like any other product that they would sell. It doesn’t matter whether they contribute the code back to the community; the development model is still all wrong. To see this in action, I point you to Novell.

Novell made their money selling a proprietary server operating system called Netware, and now makes some money selling copies of SuSE Linux, Red Carpet, Evolution, and so on. Problem is, they’re selling these things like they used to sell Netware. They haven’t realized there’s a paradigm shift here: all the benefits of OSS development that maddog pointed out in his talk aren’t worth a damn if they have to be funnelled through a vendor who’s just as inflexible (in terms of support) with their distribution of Linux as they are with their own, proprietary, closed-source software!

RedHat is another example of this: in order to meet the demands of their customers, they heavily bastardize the stock Linux kernel with their own patches, written by their own developers. But there’s nothing to say that these patches have to be incorporated back into the kernel: that’s up to Linus’ personal discretion. Eventually RedHat winds up with The RedHat Linux Kernel which is significantly different than the stock kernel, and voilà you lose the benefits of having the greater OSS community available to help you with all those feature requests and bug fixes. We’re back again to the situation where only the vendor’s 200 engineers understand the end product, and the support sucks again because those 200 engineers can’t handle the five million support requests.

Explain to me how this is different than non-OSS, commercial software?

In conclusion, what I find most perverse about RWL and the state of Linux in general is that rather than it changing the paradigm of the way software development is done in the world, it is, in fact, being subsumed into the closed-source software development paradigm. To put it another way: rather than leading formerly closed-source companies to open their source in order to reap the
benefits of limitless development manpower, Linux is now, by virtue of the vendors, being closed down.

I should note that this closing-down isn’t absolute, and it won’t ever be, so long as we have independent distributions like Debian. But I was still surprised to see maddog get up on stage and trumpet the virtues of the OSS development model, when those benefits are being circumvented by many of the vendors before his very eyes.

messy Linux dmesgs

Season’s greetings, everyone! It’s time for yet-another-edition of Things In IT That Bug Me. Today’s victim is: overly chatty Linux dmesgs. This may seem a bit frivolous of a complaint. However, I feel that since the dmesg is one of the first things one seems when one boots an operating system, having a ridiculously chatty and verbose bootup sequence makes Linux look like it’s patched together with no overarching control. Basically, I don’t think 90% of end-users care about seeing:

Memory address space allocation dumps
The compiler used to create the kernel
RCS ID strings, version numbers, names and companies of the authors of various pieces
Debugging information only useful to the developers of a particular piece.

I’m a big fan of the way the BSD kernel messages are
structured. With a few exceptions, all one really needs to know when
the OS is booting up is what devices were detected. And that’s all.

Just have a look at the following bootup sequence from my work machine. Do you really think an end-user cares, for example, that "Linux NET4.0 for Linux 2.4" is "[b]ased upon Swansea University Computer Society NET3.039" or that the USB UHCI driver was committed on October 11 at 3:36 p.m. with revision 1.275, or that Richard Gooch (rgooch@atnf.csiro.au) wrote the mtrr driver? I highly doubt someone is going to e-mail Richard Gooch directly based on the contents of the dmesg, but this shows up on every Linux dmesg.

The following dmesg is nearly 140 lines long. Booting FreeBSD on the same machine yields a dmesg that’s around 80 lines. It’s time that Linux got its act together and cleaned up the messy dmesg, or the problem will continue to balloon out of control.

My dmesg:

Linux version 2.4.20-20.9.XFS1.3.1 (root@naboo.americas.sgi.com) (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)) #1 Sat Oct 11 15:23:43 CDT 2003 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003ff77000 (usable) BIOS-e820: 000000003ff77000 - 000000003ff79000 (ACPI NVS) BIOS-e820: 000000003ff79000 - 0000000040000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved) BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved) 127MB HIGHMEM available. 896MB LOWMEM available. On node 0 totalpages: 262007 zone(0): 4096 pages. zone(1): 225280 pages. zone(2): 32631 pages. Kernel command line: auto BOOT_IMAGE=2.4.20-20.9.XFS ro BOOT_FILE=/boot/vmlinuz-2.4.20-20.9.XFS1.3.1 hdd=ide-scsi root=LABEL=/ ide_setup: hdd=ide-scsi Initializing CPU#0 Detected 1993.983 MHz processor. Console: colour VGA+ 80x25 Calibrating delay loop... 3971.48 BogoMIPS Memory: 1026556k/1048028k available (1407k kernel code, 17896k reserved, 1072k data, 136k init, 130524k highmem) kdb version 4.3 by Keith Owens, Scott Lurndal. Copyright SGI, All Rights Reserved Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes) Inode cache hash table entries: 65536 (order: 7, 524288 bytes) Mount cache hash table entries: 512 (order: 0, 4096 bytes) Buffer-cache hash table entries: 65536 (order: 6, 262144 bytes) Page-cache hash table entries: 262144 (order: 8, 1048576 bytes) CPU: Trace cache: 12K uops, L1 D cache: 8K CPU: L2 cache: 512K Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU: After generic, caps: bfebfbff 00000000 00000000 00000000 CPU: Common caps: bfebfbff 00000000 00000000 00000000 CPU: Intel(R) Pentium(R) 4 CPU 2.00GHz stepping 07 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au) mtrr: detected mtrr type: Intel PCI: PCI BIOS revision 2.10 entry at 0xfbe5e, last bus=2 PCI: Using configuration type 1 PCI: Probing PCI hardware Transparent bridge - Intel Corp. 82801BA/CA/DB PCI Bridge PCI: Using IRQ router PIIX [8086/2440] at 00:1f.0 isapnp: Scanning for PnP cards... isapnp: No Plug & Play device found Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16) Starting kswapd allocated 32 pages and 32 bhs reserved for the highmem bounces VFS: Disk quotas vdquot_6.5.1 pty: 2048 Unix98 ptys configured Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled ttyS0 at 0x03f8 (irq = 4) is a 16550A ttyS1 at 0x02f8 (irq = 3) is a 16550A Real Time Clock Driver v1.10e Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 NET4: Frame Diverter 0.46 RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize Uniform Multi-Platform E-IDE driver Revision: 7.00beta3-.2.4 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ICH2: IDE controller at PCI slot 00:1f.1 ICH2: chipset revision 4 ICH2: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:DMA hda: ST340016A, ATA DISK drive blk: queue c03ed4e0, I/O limit 4095Mb (mask 0xffffffff) hdc: Lite-On LTN486S 48x Max, ATAPI CD/DVD-ROM drive hdd: HL-DT-ST GCE-8481B, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 ide1 at 0x170-0x177,0x376 on irq 15 hda: attached ide-disk driver. hda: host protected area => 1 hda: 78165360 sectors (40021 MB) w/2048KiB Cache, CHS=4865/255/63, UDMA(100) ide-floppy driver 0.99.newide Partition check: hda: hda1 hda2 hda3 ide-floppy driver 0.99.newide md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP, IGMP IP: routing cache hash table of 8192 buckets, 64Kbytes TCP: Hash tables configured (established 262144 bind 65536) Linux IP multicast router 0.06 plus PIM-SM NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. RAMDISK: Compressed image found at block 0 Freeing initrd memory: 394k freed VFS: Mounted root (ext2 filesystem). SGI XFS 1.3.1 with ACLs, no debug enabled SGI XFS Quota Management subsystem XFS mounting filesystem ide0(3,2) Ending clean XFS mount for filesystem: ide0(3,2) Freeing unused kernel memory: 136k freed usb.c: registered new driver usbdevfs usb.c: registered new driver hub usb-uhci.c: $Revision: 1.275 $ time 15:36:30 Oct 11 2003 usb-uhci.c: High bandwidth mode enabled PCI: Found IRQ 11 for device 00:1f.2 PCI: Setting latency timer of device 00:1f.2 to 64 usb-uhci.c: USB UHCI at I/O 0xff80, IRQ 11 usb-uhci.c: Detected 2 ports usb.c: new USB bus registered, assigned bus number 1 hub.c: USB hub found hub.c: 2 ports detected PCI: Found IRQ 9 for device 00:1f.4 PCI: Setting latency timer of device 00:1f.4 to 64 usb-uhci.c: USB UHCI at I/O 0xff60, IRQ 9 usb-uhci.c: Detected 2 ports usb.c: new USB bus registered, assigned bus number 2 hub.c: USB hub found hub.c: 2 ports detected usb-uhci.c: v1.275:USB Universal Host Controller Interface driver usb.c: registered new driver hiddev usb.c: registered new driver hid hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik hid-core.c: USB HID support drivers mice: PS/2 mouse device common for all mice hub.c: new USB device 00:1f.2-1, assigned address 2 Adding Swap: 1044216k swap-space (priority -1) input0: USB HID v1.10 Mouse [Logitech USB Optical Mouse] on usb1:2.0 XFS mounting filesystem ide0(3,1) Ending clean XFS mount for filesystem: ide0(3,1) hdc: attached ide-cdrom driver. hdc: ATAPI 48X CD-ROM drive, 120kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.12 SCSI subsystem driver Revision: 1.00 hdd: attached ide-scsi driver. scsi0 : SCSI host adapter emulation for IDE ATAPI devices Vendor: HL-DT-ST Model: CD-RW GCE-8481B Rev: C102 Type: CD-ROM

When I get a chance, I’ll capture a FreeBSD dmesg on this same box and you can see how much cleaner it is.

Linux is for Bitches

Pardon the slight profanity; I don’t generally like to swear when I’m trying to make a point, but I didn’t invent the name of this site.

The views espoused by the author are obviously not much different from those in this excellent article in USENIX’s own journal, ;login:. (You’ll need to be a member to access that link, by the way) I’ve complained before about the proliferation of poorly-configured, poorly-managed Linux boxes taking over from the Windows boxes. It’s obviously still continuing to happen. Of course, the vendors are partly to blame, too. When the author of linuxforbitches.org writes about /var being an inappropriate place for web content (I wholeheartedly agree) you have many vendors to thank for that.

I lay the blame for the kernelized web-server, though, at the foot of Linus himself. Given that Linus is so militant about accepting patches, idiotic or not, I’m surprised — no, shocked — that he accepted this one. Considering that many kernel hackers are the same folks who probably bitched and whined about insecurity and instability when Windows NT 4.0 moved the drivers from user mode to supervisor mode (or Ring 1 to Ring 0, I don’t remember the exact terminology), the kernelized web server is a completely brain-damaged idea. It should be removed from the kernel at once, if it hasn’t already been so excised.

You know, despite all the claims about Linux’s stability, it still has a long way to go before it achieves the stability level of the BSDs. Under heavy workload, Linux still doesn’t cut mustard. Andrew Hume from AT&T Research presented a paper at HotOS-iX entitled Operating Systems: Shouldn’t They Be Better? True, he takes Solaris 2.6 to task in this paper as well, but the Linux flaws he describes are pretty shocking (these are from David Oppenheimer’s summary notes in August’s ;login::

Hume described eight problems the Gecko [his billing system] implementers experienced with Linux (versions 4.18 through 4.20), including Linux’s forcing all I/O through a file-system buffer cache with highly unpredictable performance scaling (30MB/sec. to write to one file system at a time, 2MB/sec. to write to two at a time), general I/O flakiness (1-5% of the time corrupting data read into gzip), TCP/IP networking that was slow and that behaved poorly under overload, lack of a good file system, nodes that didn’t survive two reboots, and slow operation of some I/O utilities such as df. In general, Hume said that he has concluded that "Linux is good if you want to run Apache or compile the kernel. Every other application is suspect."

The problem with many people measuring "stability" of Linux is that they think it’s a relative measurement: as long as it’s more stable than Windows, then it’s good. This is obviously a stupid way to look at it. Just because my Kia[1] doesn’t have exploding tires, doesn’t mean that it’s a particularly safe car.

People working on performance and stability in the Linux kernel are far outnumbered by the people trying to get their little pet project into the tree — vis à vis the kernelized webserver. Admittedly, performance and stability aren’t the most exciting research areas, but making Linux as stable as the BSDs is critical to its long term success. I mean, who cares if Linux can run on a zSeries or S/390 if the thing goes down like a ton of bricks when you throw a heavy workload at it?

Ultimately as a system administrator, I care much more about stability, and failing that, predictable, recoverable failure, rather than "feature-niftiness". When you have 1000 user accounts to manage and you get DDoSed, I want an OS that is feature-conservative but rock solid.

And that, in a convoluted way of my saying so, is why I don’t run Linux on my servers.

[1] I don’t, for the record, own a Kia. 🙂

Proliferation of Poorly-Configured Linux Boxes

Someone in ;login: magazine a few issues back talked about the proliferation of poorly-configured Linux boxes, and how the volume of these will eventually outstrip the quantity of poorly-configured Windows boxes as Linux increases in popularity. The notion that Linux is more secure than Windows falls apart when you have clueless users who willfully follow directions like those listed on Ximian‘s website to install Ximian Desktop 2.0:

There is nothing to download first, just follow the instructions below.

<snip>

Open a terminal window.

Using the su command, become superuser (root).

Type the following command or cut and paste it into your terminal: wget -q -O - http://go.ximian.com |sh

Great job, Ximian. Encourage people to download a shell script, as root, and blindly execute it — no MD5 sanity check, nothing. I mean, it makes me want to compromise go.ximian.com and replace the index page with a text file containing “rm -rf /”. It’s also fabulous that they advocate using the -q (quiet) switch with wget, so that I could now hack the httpd.conf to send a redirect to my own website, which could provide a text file containing “rm -rf /” — and the 302 Temporarily Moved code would NEVER be seen by the user.

What is wrong with these people? Isn’t it blazingly obvious that this is a stupid thing to do?