One neat upgrade in Debian's recent 5.0.0 release1 was Squid 2.7. In this bandwidth-starved corner of the world, a caching proxy is a nice addition to a network, as it should shave at least 10% off your monthly bandwidth usage. However, the recent rise of CDNs has made many objects that should be highly cacheable, un-cacheable.
For example, a YouTube video has a static ID. The same piece of video will always have the same ID, it'll never be replaced by anything else (except a "sorry this is no longer available" notice). But it's served from one of many delivery servers. If I watch it once, it may come from
But the next time it may come from
v15.cache.googlevideo.com. And that's not all, the signature parameter is unique (to protect against hot-linking) as well as other not-static parameters.
Basically, any proxy will probably refuse to cache it (because of all the parameters) and if it did, it'd be a waste of space because the signature would ensure that no one would ever access that cached item again.
I came across a page on the squid wiki that addresses a solution to this.
Squid 2.7 introduces the concept of a
storeurl_rewrite_program which gets a chance to rewrite any URL before storing / accessing an item in the cache. Thus we could rewrite our example file to
We've normalised the URL and kept the only two parameters that matter, the video id and the itag which specifies the video quality level.
The squid wiki page I mentioned includes a sample perl script to perform this rewrite. They don't include the itag, and my perl isn't good enough to fix that without making a dog's breakfast of it, so I re-wrote it in Python. You can find it at the end of this post. Each line the rewrite program reads contains a concurrency ID, the URL to be rewritten, and some parameters. We output the concurrency ID and the URL to rewrite to.
The concurrency ID is a way to use a single script to process rewrites from different squid threads in parallel. The documentation is this is almost non-existant, but if you specify a non-zero
storeurl_rewrite_concurrency each request and response will be prepended with a numeric ID. The perl script concatenated this directly before the re-written URL, but I separate them with a space. Both seem to work. (Bad documentation sucks)
All that's left is to tell Squid to use this, and to override the caching rules on these URLs.
Done. And it seems to be working relatively well. If only I'd set this up last year when I had pesky house-mates watching youtube all day ;-)
It should of course be noted that doing this instructs your Squid Proxy to break rules.
ignore-reload violate guarantees that the HTTP standards provide the browser and web-server about their communication with each other.
They are relatively benign changes, but illegal nonetheless.
And it goes without saying that rewriting the URLs of stored objects could cause some major breakage by assuming that different objects (with different URLs) are the same. The provided regexes seem sane enough to not assume that this won't happen, but YMMV.
My post on split-routing on OpenWRT has been incredibly popular, and led to many people implementing split-routing, whether or not they had OpenWRT. While it's fun to have an exercise as a reader, it led to me having to help lots of newbies through porting that setup to a Debian / Ubuntu environment. To save myself some time, here's how I do it on Debian:
Background, especially for non-South Africa readers: Bandwidth in South Africa is ridiculously expensive, especially International bandwidth. The point of this exercise is that we can buy "local-only" DSL accounts which only connect to South African networks. E.g. I have an account that gives me 30GB of local traffic / month, for the same cost as 2.5GB of International traffic account. Normally you'd change your username and password on your router to switch account when you wanted to do something like an Debian apt-upgrade, but that's irritating. There's no reason why you can't have a Linux-based router concurrently connected to both accounts via the same ADSL line.
Firstly, we have a DSL modem. Doesn't matter what it is, it just has to support bridged mode. If it won't work without a DSL account, you can use the Telkom guest account. My recommendation for a modem is to buy a Telkom-branded Billion modem (because Telkom sells everything with really big chunky, well-surge-protected power supplies).
For the sake of this example, we have the modem (IP 10.0.0.2/24) plugged into eth0 on our server, which is running Debian or Ubuntu, doesn't really matter much - personal preference. The modem has DHCP turned off, and we have our PCs on the same ethernet segment as the modem. Obviously this is all trivial to change.
You need these packages installed:
You need ppp interfaces for your providers. I created
unit 1 makes a connection always bind to "ppp1". Everything else is pretty standard. Note that only the international connection forces a default route.
/etc/ppp/pap-secrets I added my username and password combinations:
You need custom iproute2 routing tables for each interface, for the source routing. This will ensure that incoming connections get responded to out of the correct interface. As your provider only lets you send packets from your assigned IP address, you can't send packets with the international address out of the local interface. We get around that with multiple routing tables. Add these lines to
Now for some magic. I create
/etc/ppp/ip-up.d/20routing to set up routes when a connection comes up:
That script loads routes from
/etc/network/routes-local-dsl. It also sets up source routing so that incoming connections work as expected.
Now, we need those route files to exist and contain something useful. Create the script
/etc/cron.daily/za-routes (and make it executable):
It downloads the routes file from cocooncrash's site (he gets them from
local-route-server.is.co.za, aggregates them, and publishes every 6 hours). Run it now to seed that file.
Now some International-only routes. I use IS local DSL, so SAIX DNS queries should go through the SAIX connection even though the servers are local to ZA.
/etc/network/routes-intl-dsl contains SAIX DNS servers and proxies:
Now we can tell
/etc/network/interfaces about our connections so that they can get brought up automatically on bootup:
For DNS, I use dnsmasq, hardcoded to point to IS & SAIX upstreams. My machine's
/etc/resolv.conf just points to this dnsmasq.
So something like
If you haven't already, you'll need to turn on ip_forward. Add the following to
/etc/sysctl.conf and then run
sudo sysctl -p:
Finally, you'll need masquerading set up in your firewall. Here is a trivial example firewall, put it in
/etc/network/if-up.d/firewall and make it executable. You should probably change it to suit your needs or use something else, but this should work:
My post about repositories wasn't just a little attempt to stave off work, it was part of a larger scheme.
I share the ADSL line in my digs with 3 other people. We do split-routing to save money, but we still have to divide the phone bill at the end of the month. Rather than buy a fixed cap, and have a fight over who's fault it was when we get capped, we are running a pay-per-use system (with local use free, subsidised by me). It means you don't have to restrain yourself for the common cap, but it also means I need to calculate who owes what.
For the first month, I used my old standby, bandwidthd. It uses pcap to count traffic, and gives you totals and graphs. For simplicity of logging, I gave each person a /28 for their machines and configured static DHCP leases. Then bandwidthd totalled up the internet use for each /28.
This was sub-optimal. bandwidthd either sees the local network, in which case it can't see which packets went out over which link. Or it can watch the international link, but then not know which user is responsible.
I could have installed some netflow utilities at this point, but I wanted to roll my own with the correct Linux approach (ulog) rather than any pcapping. ulogd is the easy ulog solution.
Ulogd can pick up packets that you "-j ULOG" from iptables. It receives them over a netlink interface. You can tell iptables how many bytes of each packet to send, and how many to queue up before sending them. E.g.
will log the first 48 bytes of any incoming packet to netlink-group 1. It will tag the packet as being "input", and send them in batches of 50. 48 bytes is usually enough to catch any data you could want from the headers. If you were only need size, 4 bytes will do, and for source and destination as well, 20.
Now, we tell ulogd to listen for this stuff and log it. Ulogd has a pluggable architecture. IPv4 decoding is a plugin, and there are various logging plugins for "-j LOG" emulation, Text files, pcap-files, MySQL, PostgreSQL, and SQLite. For my purposes, I used MySQL as the router in question already had MySQL on it (for Cacti). Otherwise, I would have opted for SQLite. Be warned that the etch version of ulogd doesn't automatically reconnect to the MySQL server should the connection break for any reason. I backported the lenny version to etch to get around that. (You also need to provide the
Besides the reconnection issue, the SQL implementations are quite nice. They have a set schema, and you just need to create a table with the columns in it that you are interested in. No other configuration (beyond connection details) is necessary.
My MySQL table:
The relevant parts of my firewall rules:
So, traffic for my /28 (sr) will be counted as
sr-p so I can tally up proxy & forwarded traffic separately. (Yes, I can count traffic with squid too, but doing it all in one place is simpler.)
fb is random housemate Foo Bar, and
gu guest (unreserved IP addresses).
You can query the usage this month with for example:
Your table will fill up fast. We are averaging around 200 000 rows per day. So obviously some aggregation is in order:
And every night, run something like:
Finally, I have a simple little PHP script that provides reporting and calculates dues. Done.
Up to now, whenever I've needed a backport or debian recompile, I've done it locally. But finally last night, instead of studying for this morning's exam, I decided to do it properly.
The tool for producing a debian archive tree is reprepro. There are a few howtos out there for it, but none of them quite covered everything I needed. So this is mine. But we'll get to that later, first we need to have some packages to put up.
For building packages, I decided to do it properly and use pbuilder. Just install it:
Make the following changes to
The first, to point to your local mirror, and the second to credit you in the packages.
Then, as root:
Now, we can build a package, lets build the hello package:
dget and debchange are neat little utilities from
devscripts. You can configure them to know your name, e-mail address, etc. If you work with debian packages a lot, you'll get to know them well. Future versions of debchange support
--bpo for backports, but we use
-n which means new package. You should edit the version number in the top line to be a backport version, i.e.:
Now, let's build it. We are only doing a backport, but if you were making any changes, you'd do them before the next stage, and list them in the changelog you just edited:
Assuming no errors, the built package will be sitting in
Now, for the repository:
This file defines your repository. The codename will be the distribution you list in your
sources.list. The version should match it. The architectures are the architectures you are going to carry - "all" refers to non-architecture-specific packages, and source to source packages. I added amd64 to mine. SignWith is the ID of the GPG key you are going to use with this repo. I created a new DSA key for the job. NotAutomatic is a good setting for a backports repo, it means that packages won't be installed from here unless explicitly requested (via
Let's start by importing our source package:
(There is currently a known bug in reprepro's command-line handling.
-P are swapped.)
Now, let's import our binary package:
Reprepro can be automated with it's
processincoming command, but that's beyond the scope of this howto.
Test your new repository, add it to your
Enjoy. My backports repository can be found here.
Even before school, my future interests were clear: I tied-up the house with wires and made “electrical gadgets” out of old electrical junk. I remember being given my first battery, light bulbs, and wires. From there it was downhill.
My first computer was a HP 9816. It was a year older than me, had a 6800 Processor, 128k RAM, and an (external) pair of single sided 3.5” floppy drives.
It had a ROM BASIC board, and a set of VisiCalc floppies (with manual shutters), so I spent my time reading it’s comprehensive manuals, making mazes in Visicalc (out of #s), writing games in BASIC, and otherwise abusing the poor machine. It had really fun, easy graphics, which drew slowly enough that you could learn a lot. On the whole, a nice machine — I wish I knew what has happened to it and it’s pile of manuals…
From there, I migrated to a 386 with hercules graphics and DOS (that I shared with a friend). And eventually, Windows. I toyed with programming in BASIC, Visual Basic and Pascal. But mostly used my computers for gaming (and messing around with things). Most of the software I wrote around this time was in Psion OPL, on my inherited Series 3a.
I was getting just a little peeved with my MS Windows desktop. When one has a 500MiB HDD, fitting Windows 98, Office, and Visual Studio on it and still having a productive machine is difficult. It was obvious that there were big problems with Windows (and Microsoft software in general). I became very Anti-Microsoft, although I knew of no alternatives and hypocritically stuck with the Microsoft way of life.
At the local computer trade show, my friends and I would paste “Microsoft Sucks!” stickers (provided by a nearby labelling store’s demonstration printers ;-) all over the Microsoft stand. We’d also torment the Microsoft demonstrators and shout support when they asked “Who uses Lotus 1-2-3?” — basically, we where their worst nightmare…
Quite soon after my family capitulated to Internet access, I heard about Linux, and started to read about it online. I avidly read anything I could get my hands on, and tried a few shell accounts (BSD presumably), but never got anywhere near installing it myself.
One day, a computer technician was working on the school office PCs (which I considered to be my domain) and we chatted. He asked me if I used Linux, and offered to get me a CD. I’ve still got it — RedHat 5.1.
I installed it, played around with it for a while, and then abandoned it. For the next couple years after, I would try it again every now and again, especially when I could get my hands on a newer version, but never too seriously, because I didn’t have a decent internet connection, know how to program C, or have any real Linux-using friends. And of course, playing XBill only keeps you entertained for so long…
Later, I got involved in building my school’s Computer Room (from a pile of spare parts and dead PCs, plus the insurance payout for 2 stolen [dead] PCs). I knew that this would be a good place to use Linux, because I could share the dial-up internet connection more reliably, and run a local mail server. It would make much better use of our very limited resources.
So, in the holidays I took the fastest machine home, scavenged some more RAM, and taught myself how to configure everything from scratch.
When I came across the sendmail.cf file, I got really frightened and switched to qmail. The same happened when I looked into BIND, and I used djbdns.
After about 6 months of administering this machine (still RH), I hit my first “dependency hell.” At about this point I was getting involved in our LUG, and Tom gave me a copy of Debian woody — I have never looked back!
Of course the next step was to network my home — this taught me almost everything else that I needed to know to be a Linux admin… I still have the same server that I started with (well same Debian install, case, and motherboard - everything else has died along the way).
With the release of Ubuntu Breezy, I decided that it was worth a look at, and installed it on my mother’s LTSP server and my laptop.
This wasn’t all bliss, Ubuntu is still a little rough on the edges (although less so than Debian, and in different places). However, I was pretty happy with it. That doesn’t mean that I run it on my main desktop, but I do on my laptops, and I install it on other people’s machines where possible.
To get a project I’m involved in, ibid, into Debian and Ubuntu, I got started on Debian Development. I am a Debian Developer, maintaining a handful of packages, and do some Universe gardening in Ubuntu.
Now I only use Linux (and only Debian +derivatives). I maintain several networks under the guise of Hybrid, and co-maintain our LUGs servers (mailing lists, ftp/rsync mirror, and a freedom toaster).
I’m very happy with my software choices, and look forward to a Linuxy future :-)