debian

Fun with Squid and CDNs

Wed, 18/02/2009 - 12:29pm — tumbleweed

One neat upgrade in Debian's recent 5.0.0 release¹ was Squid 2.7. In this bandwidth-starved corner of the world, a caching proxy is a nice addition to a network, as it should shave at least 10% off your monthly bandwidth usage. However, the recent rise of CDNs has made many objects that should be highly cacheable, un-cacheable.

For example, a YouTube video has a static ID. The same piece of video will always have the same ID, it'll never be replaced by anything else (except a "sorry this is no longer available" notice). But it's served from one of many delivery servers. If I watch it once, it may come from

http://v3.cache.googlevideo.com/videoplayback?id=0123456789abcdef&itag=34&ip=1.2.3.4&region=0&signature=5B1BA40D8464F2303DDDD59B2586C10A0AEFAD19.169DA15A09AB88E824DE63DF138F0D835295463B&sver=2&expire=1234714137&key=yt1&ipbits=0

But the next time it may come from v15.cache.googlevideo.com. And that's not all, the signature parameter is unique (to protect against hot-linking) as well as other not-static parameters. Basically, any proxy will probably refuse to cache it (because of all the parameters) and if it did, it'd be a waste of space because the signature would ensure that no one would ever access that cached item again.

I came across a page on the squid wiki that addresses a solution to this. Squid 2.7 introduces the concept of a storeurl_rewrite_program which gets a chance to rewrite any URL before storing / accessing an item in the cache. Thus we could rewrite our example file to

http://cdn.googlevideo.com.SQUIDINTERNAL/videoplayback?id=0123456789abcdef&itag=34

We've normalised the URL and kept the only two parameters that matter, the video id and the itag which specifies the video quality level.

The squid wiki page I mentioned includes a sample perl script to perform this rewrite. They don't include the itag, and my perl isn't good enough to fix that without making a dog's breakfast of it, so I re-wrote it in Python. You can find it at the end of this post. Each line the rewrite program reads contains a concurrency ID, the URL to be rewritten, and some parameters. We output the concurrency ID and the URL to rewrite to.

The concurrency ID is a way to use a single script to process rewrites from different squid threads in parallel. The documentation is this is almost non-existant, but if you specify a non-zero storeurl_rewrite_concurrency each request and response will be prepended with a numeric ID. The perl script concatenated this directly before the re-written URL, but I separate them with a space. Both seem to work. (Bad documentation sucks)

All that's left is to tell Squid to use this, and to override the caching rules on these URLs.

storeurl_rewrite_program /usr/local/bin/storeurl-youtube.py

storeurl_rewrite_children 1

storeurl_rewrite_concurrency 10

#  The keyword for all youtube video files are "get_video?", "videodownload?" and "videoplaybeck?id"

#  The "\.(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)\?" is only for pictures and other videos

acl store_rewrite_list urlpath_regex \/(get_video\?|videodownload\?|videoplayback\?id) \.(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)\? \/ads\?

acl store_rewrite_list_web url_regex ^http:\/\/([A-Za-z-]+[0-9]+)*\.[A-Za-z]*\.[A-Za-z]*

acl store_rewrite_list_path urlpath_regex \.(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)$

acl store_rewrite_list_web_CDN url_regex ^http:\/\/[a-z]+[0-9]\.google\.com doubleclick\.net

# Rewrite youtube URLs

storeurl_access allow store_rewrite_list

# this is not related to youtube video its only for CDN pictures

storeurl_access allow store_rewrite_list_web_CDN

storeurl_access allow store_rewrite_list_web store_rewrite_list_path

storeurl_access deny all

# Default refresh_patterns

refresh_pattern ^ftp:           1440    20%     10080

refresh_pattern ^gopher:        1440    0%      1440

refresh_pattern -i (/cgi-bin/|\?) 0     0%      0

# Updates (unrelated to this post, but useful settings to have):

refresh_pattern windowsupdate.com/.*\.(cab|exe)(\?|$) 518400 100% 518400 reload-into-ims

refresh_pattern update.microsoft.com/.*\.(cab|exe)(\?|$) 518400 100% 518400 reload-into-ims

refresh_pattern download.microsoft.com/.*\.(cab|exe)(\?|$) 518400 100% 518400 reload-into-ims

refresh_pattern (Release|Package(.gz)*)$        0       20%     2880

refresh_pattern \.deb$         518400   100%    518400 override-expire

# Youtube:

refresh_pattern -i (get_video\?|videodownload\?|videoplayback\?) 161280 50000% 525948 override-expire ignore-reload

# Other long-lived items

refresh_pattern -i \.(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)(\?|$) 161280 3000% 525948 override-expire reload-into-ims

refresh_pattern .               0       20%     4320

# All of the above can cause a redirect loop when the server

# doesn't send a "Cache-control: no-cache" header with a 302 redirect.

# This is a work-around.

minimum_object_size 512 bytes

Done. And it seems to be working relatively well. If only I'd set this up last year when I had pesky house-mates watching youtube all day ;-)

It should of course be noted that doing this instructs your Squid Proxy to break rules. Both override-expire and ignore-reload violate guarantees that the HTTP standards provide the browser and web-server about their communication with each other. They are relatively benign changes, but illegal nonetheless.

And it goes without saying that rewriting the URLs of stored objects could cause some major breakage by assuming that different objects (with different URLs) are the same. The provided regexes seem sane enough to not assume that this won't happen, but YMMV.

#!/usr/bin/env python

# vim:et:ts=4:sw=4:

import re

import sys

import urlparse

youtube_getvid_res = [

    re.compile(r"^http:\/\/([A-Za-z]*?)-(.*?)\.(.*)\.youtube\.com\/get_video\?video_id=(.*?)&(.*?)$"),

    re.compile(r"^http:\/\/(.*?)\/get_video\?video_id=(.*?)&(.*?)$"),

    re.compile(r"^http:\/\/(.*?)video_id=(.*?)&(.*?)$"),

]

youtube_playback_re = re.compile(r"^http:\/\/(.*?)\/videoplayback\?id=(.*?)&(.*?)$")

others = [

    (re.compile(r"^http:\/\/(.*?)\/(ads)\?(?:.*?)$"), "http://%s/%s"),

    (re.compile(r"^http:\/\/(?:.*?)\.yimg\.com\/(?:.*?)\.yimg\.com\/(.*?)\?(?:.*?)$"), "http://cdn.yimg.com/%s"),

    (re.compile(r"^http:\/\/(?:(?:[A-Za-z]+[0-9-.]+)*?)\.(.*?)\.(.*?)\/(.*?)\.(.*?)\?(?:.*?)$"), "http://cdn.%s.%s.SQUIDINTERNAL/%s.%s"),

    (re.compile(r"^http:\/\/(?:(?:[A-Za-z]+[0-9-.]+)*?)\.(.*?)\.(.*?)\/(.*?)\.(.{3,5})$"), "http://cdn.%s.%s.SQUIDINTERNAL/%s.%s"),

    (re.compile(r"^http:\/\/(?:(?:[A-Za-z]+[0-9-.]+)*?)\.(.*?)\.(.*?)\/(.*?)$"), "http://cdn.%s.%s.SQUIDINTERNAL/%s"),

    (re.compile(r"^http:\/\/(.*?)\/(.*?)\.(jp(?:e?g|e|2)|gif|png|tiff?|bmp|ico|flv)\?(?:.*?)$"), "http://%s/%s.%s"),

    (re.compile(r"^http:\/\/(.*?)\/(.*?)\;(?:.*?)$"), "http://%s/%s"),

]

def parse_params(url):

    "Convert a URL's set of GET parameters into a dictionary"

    params = {}

    for param in urlparse.urlsplit(url)[3].split("&"):

        if "=" in param:

            n, p = param.split("=", 1)

            params[n] = p

    return params

while True:

    line = sys.stdin.readline()

    if line == "":

        break

    try:

        channel, url, other = line.split(" ", 2)

        matched = False

        for re in youtube_getvid_res:

            if re.match(url):

                params = parse_params(url)

                if "fmt" in params:

                    print channel, "http://video-srv.youtube.com.SQUIDINTERNAL/get_video?video_id=%s&fmt=%s" % (params["video_id"], params["fmt"])

                else:

                    print channel, "http://video-srv.youtube.com.SQUIDINTERNAL/get_video?video_id=%s" % params["video_id"]

                matched = True

                break

        if not matched and youtube_playback_re.match(url):

            params = parse_params(url)

            if "itag" in params:

                print channel, "http://video-srv.youtube.com.SQUIDINTERNAL/videoplayback?id=%s&itag=%s" % (params["id"], params["itag"])

            else:

                print channel, "http://video-srv.youtube.com.SQUIDINTERNAL/videoplayback?id=%s" % params["id"]

            matched = True

        if not matched:

            for re, pattern in others:

                m = re.match(url)

                if m:

                    print channel, pattern % m.groups()

                    matched = True

                    break

        if not matched:

            print channel, url

    except Exception:

        # For Debugging only. In production we want this to never die.

        #raise

        print line

    sys.stdout.flush()

Yes, Vhata, Debian released in 2009, I won the bet, you owe me a dinner now. ↩

10 comments

Split-Routing on Debian/Ubuntu

Fri, 19/09/2008 - 5:13pm — tumbleweed

My post on split-routing on OpenWRT has been incredibly popular, and led to many people implementing split-routing, whether or not they had OpenWRT. While it's fun to have an exercise as a reader, it led to me having to help lots of newbies through porting that setup to a Debian / Ubuntu environment. To save myself some time, here's how I do it on Debian:

Background, especially for non-South Africa readers: Bandwidth in South Africa is ridiculously expensive, especially International bandwidth. The point of this exercise is that we can buy "local-only" DSL accounts which only connect to South African networks. E.g. I have an account that gives me 30GB of local traffic / month, for the same cost as 2.5GB of International traffic account. Normally you'd change your username and password on your router to switch account when you wanted to do something like an Debian apt-upgrade, but that's irritating. There's no reason why you can't have a Linux-based router concurrently connected to both accounts via the same ADSL line.

Firstly, we have a DSL modem. Doesn't matter what it is, it just has to support bridged mode. If it won't work without a DSL account, you can use the Telkom guest account. My recommendation for a modem is to buy a Telkom-branded Billion modem (because Telkom sells everything with really big chunky, well-surge-protected power supplies).

For the sake of this example, we have the modem (IP 10.0.0.2/24) plugged into eth0 on our server, which is running Debian or Ubuntu, doesn't really matter much - personal preference. The modem has DHCP turned off, and we have our PCs on the same ethernet segment as the modem. Obviously this is all trivial to change.

You need these packages installed:

# aptitude install iproute pppoe wget awk findutils

You need ppp interfaces for your providers. I created /etc/ppp/peers/intl-dsl:

user intl-account@uber-isp.net

unit 1

pty "/usr/sbin/pppoe -I eth0 -T 80 -m 1452"

noipdefault

defaultroute

hide-password

lcp-echo-interval 20

lcp-echo-failure 3

noauth

persist

maxfail 0

mtu 1492

noaccomp

default-asyncmap

/etc/ppp/peer/local-dsl:

user local-account@uber-isp.net

unit 2

pty "/usr/sbin/pppoe -I eth0 -T 80 -m 1452"

noipdefault

hide-password

lcp-echo-interval 20

lcp-echo-failure 3

connect /bin/true

noauth

persist

maxfail 0

mtu 1492

noaccomp

default-asyncmap

unit 1 makes a connection always bind to "ppp1". Everything else is pretty standard. Note that only the international connection forces a default route.

To /etc/ppp/pap-secrets I added my username and password combinations:

# User                     Host Password

intl-account@uber-isp.net  *    s3cr3t

local-account@uber-isp.net *    passw0rd

You need custom iproute2 routing tables for each interface, for the source routing. This will ensure that incoming connections get responded to out of the correct interface. As your provider only lets you send packets from your assigned IP address, you can't send packets with the international address out of the local interface. We get around that with multiple routing tables. Add these lines to /etc/iproute2/rt_tables:

      local-dsl

      intl-dsl

Now for some magic. I create /etc/ppp/ip-up.d/20routing to set up routes when a connection comes up:

#!/bin/sh -e

case "$PPP_IFACE" in

 "ppp1")

   IFACE="intl-dsl"

   ;;

 "ppp2")

   IFACE="local-dsl"

   ;;

 *)

   exit 0

esac

# Custom routes

if [ -f "/etc/network/routes-$IFACE" ]; then

  cat "/etc/network/routes-$IFACE" | while read route; do

    ip route add "$route" dev "$PPP_IFACE"

  done

fi

# Clean out old rules

ip rule list | grep "lookup $IFACE" | cut -d: -f2 | xargs -L 1 -I xx sh -c "ip rule del xx"

# Source Routing

ip route add "$PPP_REMOTE" dev "$PPP_IFACE" src "$address" table "$IFACE"

ip route add default via "$PPP_REMOTE" table "$IFACE"

ip rule add from "$PPP_LOCAL" table "$IFACE"

# Make sure this interface is present in all the custom routing tables:

route=`ip route show dev "$PPP_IFACE" | awk '/scope link  src/ {print $1}'`

awk '/^[0-9]/ {if ($1 > 0 && $1 < 250) print $2}' /etc/iproute2/rt_tables | while read table; do

  ip route add "$route" dev "$PPP_IFACE" table "$table"

done

That script loads routes from /etc/network/routes-intl-dsl and /etc/network/routes-local-dsl. It also sets up source routing so that incoming connections work as expected.

Now, we need those route files to exist and contain something useful. Create the script /etc/cron.daily/za-routes (and make it executable):

#!/bin/sh -e

ROUTEFILE=/etc/network/routes-local-dsl

wget -q http://mene.za.net/za-routes/latest.txt -O /tmp/zaroutes

size=`stat -c '%s' /tmp/zaroutes`

if [ $size -gt 0 ]; then

  mv /tmp/zaroutes "$ROUTEFILE"

fi

It downloads the routes file from cocooncrash's site (he gets them from local-route-server.is.co.za, aggregates them, and publishes every 6 hours). Run it now to seed that file.

Now some International-only routes. I use IS local DSL, so SAIX DNS queries should go through the SAIX connection even though the servers are local to ZA.

My /etc/network/routes-intl-dsl contains SAIX DNS servers and proxies:

25.255.3

25.1.9

25.1.11

43.1.14

43.1.11

43.34.190

43.38.190

43.42.190

43.45.190

43.46.190

43.50.190

43.53.190

43.9.21

Now we can tell /etc/network/interfaces about our connections so that they can get brought up automatically on bootup:

# This file describes the network interfaces available on your system

# and how to activate them. For more information, see interfaces(5).

# The loopback network interface

auto lo

iface lo inet loopback

# The primary network interface

allow-hotplug eth0

iface eth0 inet static

        address 10.0.0.1

        netmask 255.255.255.0

auto local-dsl

iface local-dsl inet ppp

        provider local-dsl

auto intl-dsl

iface intl-dsl inet ppp

        provider intl-dsl

For DNS, I use dnsmasq, hardcoded to point to IS & SAIX upstreams. My machine's /etc/resolv.conf just points to this dnsmasq.

So something like /etc/resolv.conf:

nameserver 127.0.0.1

/etc/dnsmasq.conf:

no-resolv

# IS:

server=168.210.2.2

server=196.14.239.2

# SAIX:

server=196.43.34.190

server=196.43.46.190

server=196.25.1.11

domain=foobar.lan

dhcp-range=10.0.0.128,10.0.0.254,12h

dhcp-authoritative

no-negcache

If you haven't already, you'll need to turn on ip_forward. Add the following to /etc/sysctl.conf and then run sudo sysctl -p:

net.ipv4.ip_forward=1

Finally, you'll need masquerading set up in your firewall. Here is a trivial example firewall, put it in /etc/network/if-up.d/firewall and make it executable. You should probably change it to suit your needs or use something else, but this should work:

#!/bin/sh

if [ $IFACE != "eth0" ]; then

  exit;

fi

iptables -F INPUT

iptables -F FORWARD

iptables -t nat -F POSTROUTING

iptables -A INPUT -i lo -j ACCEPT

iptables -A INPUT -i eth0 -s 10.0.0.0/24 -j ACCEPT

iptables -A INPUT -i ppp+ -m state --state ESTABLISHED,RELATED -j ACCEPT

iptables -A INPUT -j DROP

iptables -A FORWARD -i ppp+ -m state --state ESTABLISHED,RELATED -j ACCEPT

iptables -A FORWARD -i eth0 -o ppp+ -j ACCEPT

iptables -A FORWARD -j DROP

iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o ppp+ -j MASQUERADE

20 comments

Bandwidth accounting with ulogd

Thu, 03/04/2008 - 12:31am — tumbleweed

My post about repositories wasn't just a little attempt to stave off work, it was part of a larger scheme.

I share the ADSL line in my digs with 3 other people. We do split-routing to save money, but we still have to divide the phone bill at the end of the month. Rather than buy a fixed cap, and have a fight over who's fault it was when we get capped, we are running a pay-per-use system (with local use free, subsidised by me). It means you don't have to restrain yourself for the common cap, but it also means I need to calculate who owes what.

For the first month, I used my old standby, bandwidthd. It uses pcap to count traffic, and gives you totals and graphs. For simplicity of logging, I gave each person a /28 for their machines and configured static DHCP leases. Then bandwidthd totalled up the internet use for each /28.

This was sub-optimal. bandwidthd either sees the local network, in which case it can't see which packets went out over which link. Or it can watch the international link, but then not know which user is responsible.

I could have installed some netflow utilities at this point, but I wanted to roll my own with the correct Linux approach (ulog) rather than any pcapping. ulogd is the easy ulog solution.

Ulogd can pick up packets that you "-j ULOG" from iptables. It receives them over a netlink interface. You can tell iptables how many bytes of each packet to send, and how many to queue up before sending them. E.g.

# iptables -I INPUT 1 -j ULOG --ulog-nlgroup 1 --ulog-qthreshold 50 --ulog-cprange 48 --ulog-prefix input

will log the first 48 bytes of any incoming packet to netlink-group 1. It will tag the packet as being "input", and send them in batches of 50. 48 bytes is usually enough to catch any data you could want from the headers. If you were only need size, 4 bytes will do, and for source and destination as well, 20.

Now, we tell ulogd to listen for this stuff and log it. Ulogd has a pluggable architecture. IPv4 decoding is a plugin, and there are various logging plugins for "-j LOG" emulation, Text files, pcap-files, MySQL, PostgreSQL, and SQLite. For my purposes, I used MySQL as the router in question already had MySQL on it (for Cacti). Otherwise, I would have opted for SQLite. Be warned that the etch version of ulogd doesn't automatically reconnect to the MySQL server should the connection break for any reason. I backported the lenny version to etch to get around that. (You also need to provide the reconnect and connect_timeout options.)

Besides the reconnection issue, the SQL implementations are quite nice. They have a set schema, and you just need to create a table with the columns in it that you are interested in. No other configuration (beyond connection details) is necessary.

My MySQL table:

51&q=CREATE&lr=lang_en">CREATE 51&q=TABLE&lr=lang_en">TABLE `ulog` (

  `id` 51&q=INT&lr=lang_en">int(10) 51&q=UNSIGNED&lr=lang_en">unsigned 5.1/en/non-typed-operators.html">NOT 51&q=NULL&lr=lang_en">NULL 51&q=AUTO_INCREMENT&lr=lang_en">auto_increment,

  `oob_time_sec` 51&q=INT&lr=lang_en">int(10) 51&q=UNSIGNED&lr=lang_en">unsigned 5.1/en/non-typed-operators.html">NOT 51&q=NULL&lr=lang_en">NULL,

  `oob_prefix` 5.1/en/string-functions.html">char(4) 5.1/en/non-typed-operators.html">NOT 51&q=NULL&lr=lang_en">NULL,

  `ip_totlen` 51&q=SMALLINT&lr=lang_en">smallint(5) 51&q=UNSIGNED&lr=lang_en">unsigned 5.1/en/non-typed-operators.html">NOT 51&q=NULL&lr=lang_en">NULL,

  51&q=PRIMARY%20KEY&lr=lang_en">PRIMARY KEY  (`id`),

  51&q=UNIQUE&lr=lang_en">UNIQUE 51&q=KEY&lr=lang_en">KEY `id` (`id`),

  51&q=KEY&lr=lang_en">KEY `oob_prefix` (`oob_prefix`),

  51&q=KEY&lr=lang_en">KEY `oob_time_sec` (`oob_time_sec`)

);

My ulogd.conf:

[global]

# netlink multicast group (the same as the iptables --ulog-nlgroup param)

nlgroup=1    

# logfile for status messages

logfile="/var/log/ulog/ulogd.log"    

# loglevel: debug(1), info(3), notice(5), error(7) or fatal(8)

loglevel=5    

# socket receive buffer size (should be at least the size of the

# in-kernel buffer (ipt_ULOG.o 'nlbufsiz' parameter)

rmem=131071    

# libipulog/ulogd receive buffer size, should be > rmem

bufsize=150000

# ulogd_BASE.so - interpreter plugin for basic IPv4 header fields

#             you will always need this

plugin="/usr/lib/ulogd/ulogd_BASE.so"

plugin="/usr/lib/ulogd/ulogd_MYSQL.so"

[MYSQL]

table="ulog"

pass="foo"

user="ulog"

db="ulog"

host="localhost"

reconnect=5

connect_timeout=10

The relevant parts of my firewall rules:

# Count proxy usage (transparent and explicit)

iptables -A count-from-inside -p ! tcp -j RETURN

iptables -A count-from-inside -p tcp -m multiport --destination-ports ! 3128,8080 -j RETURN

iptables -A count-from-inside -s 10.0.0.16/28 -j ULOG --ulog-nlgroup 1 --ulog-qthreshold 50 --ulog-cprange 4 --ulog-prefix sr-p

iptables -A count-from-inside -s 10.0.0.32/28 -j ULOG --ulog-nlgroup 1 --ulog-qthreshold 50 --ulog-cprange 4 --ulog-prefix fb-p

iptables -A count-from-inside -s 10.0.0.128/25 -j ULOG --ulog-nlgroup 1 --ulog-qthreshold 50 --ulog-cprange 4 --ulog-prefix gu-p

iptables -A count-to-inside -p ! tcp -j RETURN

iptables -A count-to-inside -p tcp -m multiport --source-ports ! 3128,8080 -j RETURN

iptables -A count-to-inside -d 10.0.0.16/28 -j ULOG --ulog-nlgroup 1 --ulog-qthreshold 50 --ulog-cprange 4 --ulog-prefix sr-p

iptables -A count-to-inside -d 10.0.0.32/28 -j ULOG --ulog-nlgroup 1 --ulog-qthreshold 50 --ulog-cprange 4 --ulog-prefix fb-p

iptables -A count-to-inside -d 10.0.0.128/25 -j ULOG --ulog-nlgroup 1 --ulog-qthreshold 50 --ulog-cprange 4 --ulog-prefix gu-p

# Count forwarded traffic (excluding local internet connection - ppp2)

iptables -A count-forward-in -i ppp2 -j RETURN

iptables -A count-forward-in -d 10.0.0.16/28 -j ULOG --ulog-nlgroup 1 --ulog-qthreshold 50 --ulog-cprange 4 --ulog-prefix sr-f

iptables -A count-forward-in -d 10.0.0.32/28 -j ULOG --ulog-nlgroup 1 --ulog-qthreshold 50 --ulog-cprange 4 --ulog-prefix fb-f

iptables -A count-forward-in -d 10.0.0.128/25 -j ULOG --ulog-nlgroup 1 --ulog-qthreshold 50 --ulog-cprange 4 --ulog-prefix gu-f

iptables -A count-forward-out -o ppp2 -j RETURN

iptables -A count-forward-out -s 10.0.0.16/28 -j ULOG --ulog-nlgroup 1 --ulog-qthreshold 50 --ulog-cprange 4 --ulog-prefix sr-f

iptables -A count-forward-out -s 10.0.0.32/28 -j ULOG --ulog-nlgroup 1 --ulog-qthreshold 50 --ulog-cprange 4 --ulog-prefix fb-f

iptables -A count-forward-out -s 10.0.0.128/25 -j ULOG --ulog-nlgroup 1 --ulog-qthreshold 50 --ulog-cprange 4 --ulog-prefix gu-f

# Glue

iptables -A INPUT -i eth0 -j count-from-inside

iptables -A OUTPUT  -o eth0 -j count-to-inside

iptables -A FORWARD -i ppp+ -j count-forward-in

iptables -A FORWARD -o ppp+ -j count-forward-out

So, traffic for my /28 (sr) will be counted as sr-f or sr-p so I can tally up proxy & forwarded traffic separately. (Yes, I can count traffic with squid too, but doing it all in one place is simpler.) fb is random housemate Foo Bar, and gu guest (unreserved IP addresses).

You can query the usage this month with for example:

51&q=SELECT&lr=lang_en">SELECT oob_prefix, 5.1/en/group-by-functions-and-modifiers.html">SUM(ip_totlen) 51&q=FROM&lr=lang_en">FROM ulog 51&q=WHERE&lr=lang_en">WHERE oob_time_sec > 5.1/en/date-and-time-functions.html">UNIX_TIMESTAMP('2008-04-01 00:00:00') 51&q=GROUP%20BY&lr=lang_en">GROUP BY oob_prefix;

Your table will fill up fast. We are averaging around 200 000 rows per day. So obviously some aggregation is in order:

And every night, run something like:

51&q=INSERT&lr=lang_en">INSERT 51&q=INTO&lr=lang_en">INTO daily (51&q=TIME&lr=lang_en">time, oob_prefix, 51&q=DATA&lr=lang_en">data)

51&q=SELECT&lr=lang_en">SELECT 5.1/en/date-and-time-functions.html">FROM_UNIXTIME(5.1/en/group-by-functions-and-modifiers.html">MAX(oob_time_sec)), oob_prefix, 5.1/en/group-by-functions-and-modifiers.html">SUM(ip_totlen)

51&q=FROM&lr=lang_en">FROM ulog

51&q=WHERE&lr=lang_en">WHERE oob_time_sec >= 5.1/en/date-and-time-functions.html">UNIX_TIMESTAMP('2008-04-01 00:00:00')

  5.1/en/non-typed-operators.html">AND oob_time_sec < 5.1/en/date-and-time-functions.html">UNIX_TIMESTAMP('2008-04-02 00:00:00')

51&q=GROUP%20BY&lr=lang_en">GROUP BY oob_prefix;

51&q=DELETE&lr=lang_en">DELETE 51&q=FROM&lr=lang_en">FROM ulog 51&q=WHERE&lr=lang_en">WHERE oob_time_sec  >= 5.1/en/date-and-time-functions.html">UNIX_TIMESTAMP('2008-04-01 00:00:00')

  5.1/en/non-typed-operators.html">AND oob_time_sec < 5.1/en/date-and-time-functions.html">UNIX_TIMESTAMP('2008-04-02 00:00:00');

Finally, I have a simple little PHP script that provides reporting and calculates dues. Done.

11 comments

My first (real) debian repo

Thu, 03/04/2008 - 12:01am — tumbleweed

Up to now, whenever I've needed a backport or debian recompile, I've done it locally. But finally last night, instead of studying for this morning's exam, I decided to do it properly.

The tool for producing a debian archive tree is reprepro. There are a few howtos out there for it, but none of them quite covered everything I needed. So this is mine. But we'll get to that later, first we need to have some packages to put up.

For building packages, I decided to do it properly and use pbuilder. Just install it:

# aptitude install pbuilder cdebootstrap devscripts

Make the following changes to /etc/pbuilderrc:

MIRRORSITE=http://ftp.uk.debian.org/debian

DEBEMAIL="Your Name <you@example.com>"

The first, to point to your local mirror, and the second to credit you in the packages.

Then, as root:

# pbuilder create --distribution etch --debootstrapopts --variant=buildd

Now, we can build a package, lets build the hello package:

$ mkdir /tmp/packaging; cd /tmp/packaging

$ gpg --recv-key 3EF23CD6

$ dget -x http://ftp.uk.debian.org/debian/pool/main/h/hello/hello_2.2-2.dsc

dpkg-source: extracting hello in hello-2.2

dpkg-source: unpacking hello_2.2.orig.tar.gz

dpkg-source: applying ./hello_2.2-2.diff.gz

$ cd hello-2.2/

$ debchange -n

dget and debchange are neat little utilities from devscripts. You can configure them to know your name, e-mail address, etc. If you work with debian packages a lot, you'll get to know them well. Future versions of debchange support --bpo for backports, but we use -n which means new package. You should edit the version number in the top line to be a backport version, i.e.:

hello (2.2-2~bpo-sr.1) etch-backports; urgency=low

  * Rebuild for etch-backports.

 -- Your Name <you@example.com>  Wed,  2 Apr 2008 22:24:30 +0100

Now, let's build it. We are only doing a backport, but if you were making any changes, you'd do them before the next stage, and list them in the changelog you just edited:

$ cd ..

$ dpkg-source -sa -b hello-2.2-2~bpo/

$ sudo pbuilder build hello_2.2-2~bpo-sr.1.dsc

Assuming no errors, the built package will be sitting in /var/cache/pbuilder/result/.

Now, for the repository:

$ mkdir ~/public_html/backports

$ cd ~/public_html/backports

$ mkdir conf

$ cat > conf/distributions << EOF

Origin: Your Name

Label: Your Name's Backports

Suite: stable-backports

Codename: etch-backports

Version: 4.0

Architectures: i386 all source

Components: main

Description: Your Name's repository of etch backports.

SignWith: ABCDABCD

NotAutomatic: yes

EOF

This file defines your repository. The codename will be the distribution you list in your sources.list. The version should match it. The architectures are the architectures you are going to carry - "all" refers to non-architecture-specific packages, and source to source packages. I added amd64 to mine. SignWith is the ID of the GPG key you are going to use with this repo. I created a new DSA key for the job. NotAutomatic is a good setting for a backports repo, it means that packages won't be installed from here unless explicitly requested (via package=version or -d etch-backports).

Let's start by importing our source package:

$ cd /tmp/packaging

$ debsign -kABCDABCD hello_2.2-2~bpo-sr.1.dsc

$ cd ~/public_html/backports

$ reprepro -P optional -S devel --ask-passphrase -Vb . includedsc etch-backports /tmp/packaging/hello_2.2-2~bpo-sr.1.dsc

(There is currently a known bug in reprepro's command-line handling. -S and -P are swapped.)

Now, let's import our binary package:

$ reprepro --ask-passphrase -Vb . includedeb etch-backports /var/cache/pbuilder/result/hello_2.2-2~bpo-sr.1_i386.deb

Reprepro can be automated with it's processincoming command, but that's beyond the scope of this howto.

Test your new repository, add it to your /etc/apt/sources.list

deb http://example.com/~you/backports etch-backports main

# aptitude update

# aptitude install hello=2.2-2~bpo-sr.1

Enjoy. My backports repository can be found here.

2 comments

The Journey to being a Linux Geek

Even before school, my future interests were clear: I tied-up the house with wires and made “electrical gadgets” out of old electrical junk. I remember being given my first battery, light bulbs, and wires. From there it was downhill.

The first PC:

My first computer was a HP 9816. It was a year older than me, had a 6800 Processor, 128k RAM, and an (external) pair of single sided 3.5” floppy drives.

It had a ROM BASIC board, and a set of VisiCalc floppies (with manual shutters), so I spent my time reading it’s comprehensive manuals, making mazes in Visicalc (out of #s), writing games in BASIC, and otherwise abusing the poor machine. It had really fun, easy graphics, which drew slowly enough that you could learn a lot. On the whole, a nice machine — I wish I knew what has happened to it and it’s pile of manuals…

From there, I migrated to a 386 with hercules graphics and DOS (that I shared with a friend). And eventually, Windows. I toyed with programming in BASIC, Visual Basic and Pascal. But mostly used my computers for gaming (and messing around with things). Most of the software I wrote around this time was in Psion OPL, on my inherited Series 3a.

Disillusion with MS Windows sets in

I was getting just a little peeved with my MS Windows desktop. When one has a 500MiB HDD, fitting Windows 98, Office, and Visual Studio on it and still having a productive machine is difficult. It was obvious that there were big problems with Windows (and Microsoft software in general). I became very Anti-Microsoft, although I knew of no alternatives and hypocritically stuck with the Microsoft way of life.

At the local computer trade show, my friends and I would paste “Microsoft Sucks!” stickers (provided by a nearby labelling store’s demonstration printers ;-) all over the Microsoft stand. We’d also torment the Microsoft demonstrators and shout support when they asked “Who uses Lotus 1-2-3?” — basically, we where their worst nightmare…

Enter GNU/Linux

Quite soon after my family capitulated to Internet access, I heard about Linux, and started to read about it online. I avidly read anything I could get my hands on, and tried a few shell accounts (BSD presumably), but never got anywhere near installing it myself.

One day, a computer technician was working on the school office PCs (which I considered to be my domain) and we chatted. He asked me if I used Linux, and offered to get me a CD. I’ve still got it — RedHat 5.1.

I installed it, played around with it for a while, and then abandoned it. For the next couple years after, I would try it again every now and again, especially when I could get my hands on a newer version, but never too seriously, because I didn’t have a decent internet connection, know how to program C, or have any real Linux-using friends. And of course, playing XBill only keeps you entertained for so long…

Later, I got involved in building my school’s Computer Room (from a pile of spare parts and dead PCs, plus the insurance payout for 2 stolen [dead] PCs). I knew that this would be a good place to use Linux, because I could share the dial-up internet connection more reliably, and run a local mail server. It would make much better use of our very limited resources.

Seriously, now

So, in the holidays I took the fastest machine home, scavenged some more RAM, and taught myself how to configure everything from scratch.

When I came across the sendmail.cf file, I got really frightened and switched to qmail. The same happened when I looked into BIND, and I used djbdns.

Debian to the rescue

After about 6 months of administering this machine (still RH), I hit my first “dependency hell.” At about this point I was getting involved in our LUG, and Tom gave me a copy of Debian woody — I have never looked back!

The Linux quest really begins

Of course the next step was to network my home — this taught me almost everything else that I needed to know to be a Linux admin… I still have the same server that I started with (well same Debian install, case, and motherboard - everything else has died along the way).

Enter Ubuntu

With the release of Ubuntu Breezy, I decided that it was worth a look at, and installed it on my mother’s LTSP server and my laptop.

This wasn’t all bliss, Ubuntu is still a little rough on the edges (although less so than Debian, and in different places). However, I was pretty happy with it. That doesn’t mean that I run it on my main desktop, but I do on my laptops, and I install it on other people’s machines where possible.

Debian and Ubuntu Development

To get a project I’m involved in, ibid, into Debian and Ubuntu, I got started on Debian Development. I am a Debian Developer, maintaining a handful of packages, and do some Universe gardening in Ubuntu.

Epilogue

Now I only use Linux (and only Debian +derivatives). I maintain several networks under the guise of Hybrid, and co-maintain our LUGs servers (mailing lists, ftp/rsync mirror, and a freedom toaster).

I try and attend Free Software conferences where I can. Usually LugRadio Live if I’m in the UK, and in 2007, Ubuntu Live and OSCON.

I’m very happy with my software choices, and look forward to a Linuxy future :-)

Navigation