sysadmin

Release Party Ubuntu Mirror

Our Ubuntu LoCo release parties always end up being part-install-fest. Even when we used to meet at pubs in the early days, people would pull out laptops to burn ISOs for each other and get assistance with upgrades.

As the maintainer of a local university mirror, I took along a mini-mirror to our Lucid Release party and will be doing it for the Maverick party tomorrow. If anyone wants to do this at future events, it's really not that hard to organise, you just need the bandwidth to create the mirror. Disk space requirements (very rough, per architecture, per release): package mirror 50GiB, Ubuntu/Kubuntu CDs 5GiB, Xubuntu/Mythbuntu/UbuntuStudio CDs 5GiB, Ubuntu/Kubuntu/Edubuntu DVDs: 10GiB.

I took the full contents of our ubuntu-archive mirror, but you can probably get away with only i386 and amd64 for the new release people are installing and any old ones they might be upgrading from. You can easily create a partial (only selected architectures and releases) Ubuntu mirror using apt-mirror or debmirror. It takes a while on the first run, but once you have a mirror, updating it is quite efficient.

The CD and DVD repos can easily be mirrored with rsync. Something like --include '*10.04*'.iso' --exclude '*.iso' will give you a quick and dirty partial mirror.

As to the network. I took a 24port switch and a pile of flyleads. A laptop with 1TB external hard drive ran the mirror. At this point, you pick between providing Internet access as well (which may result in some poorly-configured machines upgrading over the Internet) or doing it all offline (which makes sense in bandwidth-starved South Africa). For lucid, I chose to run this on a private network - there was a separate WiFi network for Internet access. This slightly complicates upgrades because update-manager only shows the update button when it can connect to changelogs.ubuntu.com, but that's easily worked around:

cd /var/www
for file in meta-release meta-release-development meta-release-lts meta-release-lts-development meta-release-lts-proposed meta-release-proposed meta-release-unit-testing; do
    wget -nv -N "http://changelogs.ubuntu.com/$file"
done

Instead of getting people to reconfigure their APT sources (and having to modify those meta-release files), we set our DNS server (dnsmasq) to point all the mirrors that people might be using to itself. In /etc/hosts:

10.0.0.1        mirror za.archive.ubuntu.com archive.ubuntu.com security.ubuntu.com mirror.is.co.za ubuntu.saix.net ftp.leg.uct.ac.za ftp.sun.ac.za ubuntu.mirror.ac.za changelogs.ubuntu.com

Dnsmasq must be told to provide DHCP leases /etc/dnsmasq.conf: :

domain=ubuntu-za.lan
dhcp-range=10.0.0.2,10.0.0.255,12h
dhcp-authoritative

Then we ran an Apache (all on the default virtualhost) serving the Ubuntu archive mirror as /ubuntu, the meta-release files in the root, and CDs / DVDs in /ubuntu-releases, /ubuntu-cdimage. There were a couple of other useful extras thrown in.

We could have also run ftp and rsync servers, and provided a netboot environment. But there was a party to be had :)

For the maverick party, I used this script to prepare the mirror. It's obviously very specific to my local mirror. Tweak to taste:

#!/bin/bash
set -e
set -u

MIRROR=ftp.leg.uct.ac.za
MEDIA=/media/external/ubu

export GNUPGHOME="$MEDIA/gnupg"
gpg --no-default-keyring --keyring trustedkeys.gpg --keyserver keyserver.ubuntu.com --recv-keys 40976EAF437D05B5 2EBC26B60C5A2783

debmirror --host $MIRROR --root ubuntu --method rsync \
    --rsync-options="-aIL --partial --no-motd" -p --i18n --getcontents \
    --section main,universe,multiverse,restricted \
    --dist $(echo {lucid,maverick}{,-security,-updates,-backports,-proposed} | tr ' ' ,) \
    --arch i386,amd64 \
    $MEDIA/ubuntu/

debmirror --host $MIRROR --root medibuntu --method rsync \
    --rsync-options="-aL --partial --no-motd" -p --i18n --getcontents \
    --section free,non-free \
    --dist lucid,maverick \
    --arch i386,amd64 \
    $MEDIA/medibuntu/

rsync -aHvP --no-motd --delete \
    rsync://$MIRROR/pub/packages/corefonts/ \
    $MEDIA/corefonts/

rsync -aHvP --no-motd --delete \
    rsync://$MIRROR/pub/linux/ubuntu-changelogs/ \
    $MEDIA/ubuntu-changelogs/

rsync -aHvP --no-motd \
    rsync://$MIRROR/pub/linux/ubuntu-releases/ \
    --include 'ubuntu-10.10*' --include 'ubuntu-10.04*' \
    --exclude '*.iso' --exclude '*.template' --exclude '*.jigdo' --exclude '*.list' --exclude '*.zsync' --exclude '*.img' --exclude '*.manifest' \
    $MEDIA/ubuntu-releases/

rsync -aHvP --no-motd \
    rsync://$MIRROR/pub/linux/ubuntu-dvd/ \
    --exclude 'gutsy' --exclude 'hardy' --exclude 'jaunty' --exclude 'karmic' \
    $MEDIA/ubuntu-cdimage/

We'll see how it works out tomorrow. Looking forward to a good party.

smartmontools

In line with my SysRq Post comes another bit of assumed knowledge, SMART. Let’s begin at the beginning (and stick to the PC world).

Magnetic storage is fault-prone. In the old days, when you formatted a drive, part of the formatting process was to check that each block seemed to be able to store data. All the bad blocks would be listed in a “bad block list” and the file-system would never try to use them. File-system checks would also be able to mark blocks as being bad.

As disks got bigger, this meant that formatting could take hours. Drives had also become fancy enough that they could manage bad blocks themselves, and so a shift occured. Disks were shipped with some extra spare space that the computer can’t see. Should the drive controller detect that a block went bad (they had parity checks), it could re-allocate a block from the extra space to stand in for the bad block. If it was able to recover the data from the bad block, this could be totally transparent to the file-system, if not the file-system would see a read-error and have to handle it.

This is where we are today. File-systems still support the concept of bad blocks, but in practice they only occur when a disk runs out of spare blocks.

This came with a problem, how would you know if a disk was doing ok or not? Well a standard was created called SMART. This allows you to talk to the drive controller and (amongst other things) find out the state of the disk. On Linux, we do this via the package smartmontools.

Why is this useful? Well you can ask the disk to run a variety of tests (including a full bad block scan), these are useful for RMAing a bad drive with minimum hassle. You can also get the drive’s error-log which can give you some indication of it’s reliability. You can see it’s temperature, age, and Serial Number (useful when you have to know which drive to unplug). But, most importantly, you can find out the state of bad sectors. How many sectors does the drive think are bad, and how many has it reallocated.

Why is that useful?

In the event of a bad block, you can manually force a re-allocation. This way it happens under your terms, and you’ll know exactly what got corrupted.

Next, Google published a paper linking non-zero bad sector values to drive failure. Do you really want be trusting known-non-trustworthy drives with critical data?

Finally, there is a nasty RAID situation. If you have a RAID-5 array with say 6 drives in it and one fails either the RAID system will automatically select a spare drive (if it has one), or you’ll have to replace it. The system will then re-build on the new disk, reading every sector on all the other disks, to calculate the sector contents for the new disk. If one of those reads fails (bad sector) you’ll now be up shit-creek without a paddle. The RAID system will kick out the disk with the read failure, and you’ll have a RAID-5 array with two bad disks in it — one more than RAID-5 can handle. There are tricks to get such a RAID-5 array back online, and I’ve done it, but you will have corruption, and it’s risky as hell.

So, before you go replacing RAID-5 member-disks, check the SMART status of all the other disks.

Personally, I get twitchy when any of my drives have bad sectors. I have smartd monitoring them, and I’ll attempt to RMA them as soon as a sector goes bad.

The joy that is SysRq

I’m constantly surprised when I come across long-time Linux users who don’t know about SysRq. The Linux Magic System Request Key Hacks are a magic set of commands that you can get the Linux kernel to follow no matter what’s going on (unless it has panicked or totally deadlocked).

Why is this useful? Well, there are many situations where you can’t shut a system down properly, but you need to reboot. Examples:

  • You’ve had a kernel OOPS, which is not quite a panic but there could be memory corruption in the kernel, things are getting pretty weird, and quite honestly you don’t want to be running in that condition for any longer than necessary.
  • You have reason to believe it won’t be able to shut down properly.
  • Your system is almost-locked-up (i.e. the above point)
  • Your UPS has about 10 seconds worth of power left
  • Something is on fire (lp0 possibly?)
  • …Insert other esoteric failure modes here…

In any of those situations, grab a console keyboard, and type Alt+SysRq+s (sync), Alt+SysRq+u (unmount), wait for it to have synced, and finally Alt+SysRq+b (reboot NOW!). If you don’t have a handy keyboard attached to said machine, or are on another continent, you can

# echo u > /proc/sysrq-trigger

In my books, the useful SysRq commands are:

b
Reboot
f
Call the oom_killer
h
Display SysRq help
l
Print a kernel stacktrace
o
Power Off
r
Set your keyboard to RAW mode (required after some X breakages)
s
Sync all filesystems
u
Remount all filesystems read-only
0-9
Change console logging level

In fact, read the rest of the SysRq documentation, print it out, and tape it above your bed. Next time you reach for the reset switch on a Linux box, stop your self, type the S,U,B sequence, and watch your system come up as if nothing untoward has happened.

Update: I previously recommended U,S,B but after a bit of digging, I think S,U,B may be correct.

Some Wiki Updates

I’ve just spent an afternoon and evening on the local wikis I look after: CLUG, Freedom Toaster, and GeekDinner.

They’ve all been upgraded to MediaWiki 1.11.0, with reCAPTCHA on sign-ups, and OpenID support.

If you are a user of one of these wikis, you can go to Special:OpenIDConvert (CLUG, FT, GeekDinner) to add OpenID to your account.

In the past, the CLUG wiki has had minimal wikispam, because we thought up some clever regexes, that blocked spammers from editing. However spammers would still sign up, before they tried to edit. This has left the wiki with over a thousand bogous users. Not that that is a problem in itself, but it becomes a bore when you want to guess somebody’s wikiname to give them “Bureaucrat” status for example.

So jerith talked himself into coding up a quick SQL query to find all these bogus users, and a python script to remove them. Any history they’ve had has been assigned to the “Spammer” user, and they have been wiped from the wiki. If, in our zealousness, we’ve deleted any legitimate users who’ve simply never edited the wiki, we apologise. Maybe if you contribute something, it won’t happen again… :-)

Dovecot shared mailboxes (the correct way)

I’ve just implemented shared mailboxes in dovecot (which rocks, btw). It isn’t difficult, but I don’t think it’s very well documented

The preferred way to do this is with IMAP Namespaces. My natural approach would be to create something like a Maildir tree /srv/mail/shared, and make this the “public” namespace. Then set filesystem permissions on subtrees of that, to define who can see what. Unfortunately, dovecot uses strict Maildir++, and won’t let you create mailboxes inside each other (on the filesystem) /Foo/Bar is stored as a Maildir called .Foo.Bar, so subtrees don’t exist, so this isn’t an option. The up-comming dbox format should allow something like this, but it isn’t usable yet.

My solution was to create multiple namespaces. One for each shared mailbox. Users are given permission to use them via file-system permissions (i.e. group membership), example:

# Default namespace, needed if you add namespaces
namespace private {
    prefix =
    separator = /
    index = yes
}
# Office inbox, available to receptionists, office managers, and directors:
namespace public {
   prefix = office/
   separator = /
   location = maildir:/srv/mail/office/.Maildir:CONTROL=~/.Maildir/control/office:INDEX=~/.Maildir/index/office
   hidden = no
}
# Umask for shared folders
umask = 0007

Setting CONTROL and INDEX mean that dovecot’s metadata is stored in the user’s personal Maildir, so users who don’t have permission to see the shared mailbox don’t get errors.

The permissions of the mailbox should be done as follows:

# touch /srv/mail/office/dovecot-shared
# chown -R mail.office-mailbox /srv/mail/office
# find /srv/mail/office -type d -print0 | xargs -0 chmod 2770
# find /srv/mail/office -type f -print0 | xargs -0 chmod 660

If you want a common subscription list, you have to manually symlink:

# ln -s /srv/mail/office/subscriptions ~luser/.Maildir/control/office/

Seems to work well. (at least with thunderbird)

Syndicate content