Archives: January 2008

Aggregator noise and growth

For the bloggers on Clug Park, who don’t deign to follow clug-chat or #clug, there have been recent discussions about creating a separate, filtered park for readers with less free time.

The problem is basically that some people post a lot of posts. Sometimes as much as half of the park is dominated by one poster. While this isn’t a problem per se (some people clearly have more blogging time), it means readers have more to wade through, and can feel swamped my the prolific posters. Many would prefer something with a higher signal-to-noise ratio, and lower volume.

As communities grow, the signal-to-noise ratio often suffers, and the higher volume is too much for some readers. Rather than lose the readers, we’d like to provide an alternative, filtered park. It’s currently being prepared here. Personally, I’ll still use the old park, as will many other prolific RSS-feed-followers.

What we need is for all the CLUG Parkers to create a “technical” tag, and tag all relevant posts as such. Then send me the URL of your new tag, and I’ll include it in the “park-tech”. (Or assure me that you don’t post too prolifically, and only tech-related posts, and we’ll carry your entire feed).

Lets see if we can make it work.

Easy home transparent proxy

Everyone in South Africa wants to save a little more bandwidth, as low traffic caps are the rule of the day (esp if you are hanging off an expensive 3G connection).

While the "correct" thing to do is to use wpad autodetection, and thus politely request that users use your proxy, this isn't always an option:

  • Firefox doesn't Autodetect Proxies by default
  • Autodetection doesn't behave well for many roaming users (firefox should talk to network-manager)
  • Many programs simply don't support wpad.
  • Your upstream ISP transparently proxies anyway (the norm in ZA), so it's not like we have any end-to-endness to protect.

So, here's how you do it:

  1. Lets assume your network is 10.1.1.0/24, and the squid box is 10.1.1.1 on eth0
  2. Install squid (aptitude install squid), configure it to have a reasonably large storage pool, give it some sane ACLs, etc.
  3. Add http_port 8080 transparent to squid.conf(or http_port 10.1.1.1:8080 transparent if you are using explicit http_port options)
  4. invoke-rc.d squid reload
  5. Add the following to your iptables script:
iptables -t nat -A PREROUTING -i eth0 -s 10.1.1.0/24 -d ! 10.20.1.1 -p tcp --dport 80 -j REDIRECT --to 8080

If you run squid on your network's default gateway, then you are done. Otherwise, if you have a separate router, you need to do the following on the router:

  1. Add a new transprox table to /etc/iproute2/rt_tables, i.e. 1 transprox
  2. Pick a new netfilter MARK value, i.e. 0x04
  3. Add the following to the router's iptables script:
# Transparent proxy
iptables -t mangle -F PREROUTING
iptables -t mangle -A PREROUTING -i br-lan -s ! 10.1.1.1 -d ! 10.1.1.0/24 -p tcp --dport 80 -j MARK --set-mark 0x04
ip route del table transprox
ip route add default via 10.1.1.1 table transprox
ip rule del table transprox
ip rule add fwmark 0x04 pref 10 table transprox
  1. Done: test and tail your squid logs

The reason we use iproute rules rather than iptables DNAT is that you lose destination-IP information with a DNAT (like the envelope of an e-mail).

An alternative solution is to run tinyproxy on the router (with the transparent option, enabled in ubuntu but not debian), use the REDIRECT rule above on the router, to redirect to the tinyproxy, and have that upstream to the squid. But tinyproxy requires some RAM, and on a WRT54 or the likes, you don't have any of that to spare...

Should you need to temporarily disable this for any reason:

  • With all-in-one-router: iptables -t nat -F PREROUTING
  • With the separate router: iptables -t mangle -F PREROUTING

On Eskom

Anybody who resides anywhere near the mother city, will know about the horrific load shedding we are suffering at the moment. (Actually, I think the whole country may be affected, but I haven’t read any local news recently).

This means:

  • Massive traffic jams: All traffic lights turn into 4-way stops, and if that wasn’t slow enough, people crash into each other out of anger.
  • At least 2-hours every day of sitting and twiddling your thumbs, while listening to the screech of unhappy UPSs. (Occasionally this overlaps with lunch time)
  • Having to shout over the roar (and cough through through the stench) of generators when you go out to visit any such-equipped businesses.
  • Peaceful, inky-black skys, and no noise of neighbour’s TV sets at night (if you are lucky enough to be load-shed at night).
  • Cold supper.
  • A flat laptop, unless you make sure to keep it fully charged against such emergencies.
  • Breakage in various systems, when UPSs don’t transfer cleanly, and routers / switches decide to disagree.
  • Various networks (like UCT) become unreachable, because while they have gensets and UPSs, the telkom equipment connecting to the outside world don’t.
  • And, generally, a very grumpy tumbleweed.

I’ve been frequenting computer suppliers in the last week, and seen an insane amount of UPSs first piled up at the dispatch desks, and then vanish. Now is the time to be in the UPS and genset -selling business.

To make things worse, this morning, I decided that I’d have to dismantle my gate-motor, to get out of the driveway. (Because nobody knows where the key for the manual-override lever is. After getting half-way, I worked out that it had a backup battery. Duh!

Gate: 1, Eskom: 1, Geek: 0.

Some Wiki Updates

I’ve just spent an afternoon and evening on the local wikis I look after: CLUG, Freedom Toaster, and GeekDinner.

They’ve all been upgraded to MediaWiki 1.11.0, with reCAPTCHA on sign-ups, and OpenID support.

If you are a user of one of these wikis, you can go to Special:OpenIDConvert (CLUG, FT, GeekDinner) to add OpenID to your account.

In the past, the CLUG wiki has had minimal wikispam, because we thought up some clever regexes, that blocked spammers from editing. However spammers would still sign up, before they tried to edit. This has left the wiki with over a thousand bogous users. Not that that is a problem in itself, but it becomes a bore when you want to guess somebody’s wikiname to give them “Bureaucrat” status for example.

So jerith talked himself into coding up a quick SQL query to find all these bogus users, and a python script to remove them. Any history they’ve had has been assigned to the “Spammer” user, and they have been wiped from the wiki. If, in our zealousness, we’ve deleted any legitimate users who’ve simply never edited the wiki, we apologise. Maybe if you contribute something, it won’t happen again… :-)