Start
   Blogaria
   Bored
   bsgen
   c-conf
   Cookies
   cycliclog
   Dialwhatever
   dnspb
   fch
   HammerServer
   jpeginfo
   kalk
   Lectures
   Microproxy
   msc
   Nasapics
   Off The Grid
   Perl course
   PGPkey
   Posters
   SafeEdit
   Simple listserv
   syscheck
   Wallpapers
   xml tools
Karel as an adult



Blogaria

Just a collection of random rants that I can't really place anywhere else.

2014-10-06 Custom Squid Proxy Authentication
2014-08-30 CyclicLog
2014-05-02 Playing with Python
2014-04-08 Kalk 1.38
2014-03-18 Closing C++ lecture
2014-02-26 Spring Cleanup
2014-02-26 Gravity error
2014-02-03 More on the knight tour
2014-01-28 The circular knight tour
2014-01-24 A Perl IBAN Checker
2013-10-16 Go2SEPA is live
2013-07-24 AJAXy Fileuploads with JQuery
2013-05-23 The Ticket
2013-04-24 Who is leeching my Facebook
2013-04-15 Time for a compact Coolpix
2013-04-15 Selling my Peavey amp
2013-03-28 encfs Helper Revisited
2013-03-15 c conf 1.17
2013-02-25 Flitsers op de Nederlandse wegen
2013-01-14 OTG Revisited
2013-01-11 Kalk revisited
2012-12-12 God is coming
2012-11-29 Strangely disturbing
2012-11-27 How to sneeze on a motorbike
2012-11-23 Een aanslag uit de toekomst
2012-11-22 More voluntary layoffs
2012-10-17 Browser fingerprint
2012-10-15 Site layout revamped
2012-10-14 Crossroads performing well
2012-10-08 The saddle seat
2012-09-30 Sudo Exploit
2012-09-26 Technomona is live
2012-09-19 A Perl to HTML Prettyprinter
2012-09-17 Playing with Moose
2012-08-22 How do you export photos from Adobe Photoshop Elements
2012-08-21 Security hole at the Empire State Building
2012-07-15 Matrix Sunglasses
2012-07-12 Two Morgans
2012-07-12 Review of the BMW R1200RT Motorbike
2012-06-13 Passwords: maak het hackers moeilijk
2012-06-13 Mijndomein foutje
2012-06-07 YOLNT
2012-06-06 Human Origins
2012-05-23 Me on Facebook
2012-05-16 Voluntary Layoffs
2012-05-07 RBS Bike Tour
2012-04-27 Room with a view
2012-04-23 Browser cookies and Javascript revisited
2012-02-17 Schrodingers Cat
2012-02-14 KPN heeft problemen
2012-01-12 Memebase forever
2012-01-11 Strange squares
2011-12-22 TVV Ondernemingsportaalnl.com zuigt ezel
2011-12-08 Dilbert vs Skype
2011-11-29 The uncanny resilience of bulshytt
2011-11-23 Another silly Trojan attempt
2011-10-29 ACTA is coming our way
2011-10-28 Burgernet in the Netherlands
2011-10-27 Facepalm art
2011-10-26 Do not drag this image
2011-10-22 Off The Grid Challenge
2011-10-12 PI like a boss
2011-10-07 Once upon a time
2011-07-13 Dutch eticket system for trains
2011-07-12 Is Hell exothermic or endothermic
2011-04-27 Optical Illusions
2011-04-19 Odd lyrics
2011-04-16 Band Revival at MON
2011-03-13 Protests in the Middle East and you
2011-03-10 Mac OSX Hotkey for locking your system
2011-02-12 dnspb 0.06 is out
2011-02-08 Would I buy this fridge
2011-02-06 InstaYouth
2011-02-05 The Thinker is back
2011-01-17 Math challenge
2011-01-11 Zero tolerance and zero intelligence
2011-01-05 My interest income in 1991
2011-01-01 Your horoscope by Eddie
2010-12-22 New York City Tours might be half price for you
2010-12-20 Weather Forecast
2010-12-14 World Economy Collapse explained in 3 minutes
2010-12-13 The Salvation Army and its choice of toys
2010-12-08 Elizabeth thinks highly of me
2010-12-06 Should I trust my government with my data
2010-12-05 Announcing dnspb
2010-12-03 Realistic piechart
2010-11-26 Crossroads 2.71 is out
2010-11-24 8 bit Starwars
2010-11-17 Six to eight black men
2010-11-16 Canada wants backdoors and data and everything
2010-11-11 Autumn storm over the Netherlands
2010-10-08 USA wants backdoors to everything
2010-10-05 Sudoku solver in Perl
2010-10-02 Finally wrote up a Syscheck page
2010-09-28 Neon sign fail
2010-09-27 The Renault Eco Team
2010-09-23 Crossroads 2.68 is out
2010-09-20 How to suppress Flash cookies
2010-09-15 Meanwhile on Facebook
2010-09-09 The Yes Men Fix The World
2010-09-07 ed is not dead
2010-08-26 Installing Perl modules in a non root environment
2010-08-22 Magic self leviation
2010-08-20 Google Chrome does not support offline Gmail
2010-08-19 The number 48
2010-08-12 Welsh trout mini HOWTO
2010-08-04 Fooling a NetCache proxy into fetching forbidden files
2010-07-30 The world will end on May 21, 2011
2010-07-28 Hiding or showing a textbox with image animation using JQuery
2010-07-27 Manipulating browser cookies using Javascript
2010-07-25 Survival of the fittest book
2010-07-23 Pastafarians in Spain
2010-07-22 You have two sheep
2010-07-09 Highway bank fire
2010-07-08 Setting up a remote git repository
2010-07-06 Bye bye trusted old Macbook
2010-06-28 John Cleese on Football
2010-06-23 ABN Amro and the Pathetic Customer Service Dept.
2010-06-22 Wally does not like criticism
2010-06-14 Soccermatch Netherlands vs Denmark
2010-06-13 Lazy Cat
2010-06-08 Reading public Buzz using the Google API
2010-06-07 A Personal Letter from Steve Martin
2010-06-05 Sushi Saturday
2010-06-04 Suppressing the Enter key with Javascript
2010-05-31 Temporal spacial anomaly on the Dutch highway
2010-05-23 Greenhost will not log your traffic
2010-05-10 Jarlsberg Webapp Exploits
2010-05-04 A Thought Experiment
2010-05-03 SafeEdit information updated
2010-05-01 Microproxy now supports ftp
2010-04-30 What could get Data angry
2010-04-29 Lego Mindstorm solving the Rubik Cube
2010-04-28 Crossroads 2.65 is out
2010-04-17 Goggomobil in its natural habitat
2010-04-14 Bacon Time
2010-04-11 104 More friends to connect with
2010-04-10 Bacteria infested radio reporter
2010-04-07 The Kubat STAR
2010-03-30 Homework Essay
2010-03-29 C++ mutexes again
2010-03-20 Weird Eyechart
2010-03-15 Microproxy 1.01
2010-03-05 Microproxy
2010-03-03 Sven Kramer and the wrong lane
2010-02-26 Endearing Babe Magnet
2010-02-17 Speed of light measured using chocolate and a microwave
2010-02-17 Never again expires after 65 years
2010-02-16 encfs on the Mac
2010-02-15 Hyves.nl and sexual predators
2010-02-10 Funny textbook
2010-02-09 DNS failing after sleep wake cycle
2010-02-06 Blast from the past
2010-01-28 Simple and straight Perl HTTP::Proxy
2010-01-15 Avatar the Movie
2010-01-08 Slightly NSFW Linux Ad
2010-01-07 WTF
2010-01-05 Stop Software Patents in the EU
2009-12-05 HammerServer 1.02
2009-11-28 Perls Automagical Autoloading
2009-10-07 Office Poster
2009-10-06 The nr 1 Nerdjoke
2009-10-04 WoW Startscript for my Mac
2009-09-27 HammerServer section is online
2009-09-26 The BING HQ
2009-09-26 Digging a WOW Tunnel
2009-06-29 Wee Todd
2009-06-23 The On Off Switch Revisited
2009-06-22 Meatspace
2009-05-30 My old houses
2009-05-11 LOLcats are funny
2009-05-11 Civic Duty WIN
2009-05-10 Vote for the baby, Sky Radio promo FAIL
2009-05-05 My secure data center
2009-02-15 My Valentine is sending me a dot exe
2009-02-05 MacPorts trash: .mp_123456 savefiles cleaning
2009-02-01 Truecrypt 6 on Linux and the ext3 filesystem
2009-01-28 www versus nl.youtube.com
2009-01-27 Songsmith and The Police
2009-01-25 My own Ministery of Silly Walks
2009-01-09 CoolIris Mini HOWTO
2008-11-04 UDP and DNS balancing
2008-11-02 Life in graphs
2008-11-01 Skeined yet?
2008-10-30 New Crossroads on the horizon
2008-10-28 Thread safe or not
2008-10-15 WOW patch 3 on a case sensitive MacOSX filesystem
2008-10-15 Surprising C++ optimizations
2008-10-14 Weird system message
2008-10-08 Data mining against terrorism does not work
2008-09-16 Crossroads at the top of Freshmeat.net
2008-09-09 Stupid spammers at Computable
2008-09-06 Spam prevention with Postfix and Postgrey
2008-09-03 The Gnomish Flying Machine
2008-08-27 Bank customer data on eBay
2008-08-26 Mutexes in C++ Threads
2008-08-22 4M dataloss in the UK last year
2008-08-21 Dropping spam with Postfix and Spamassassin
2008-08-18 Bayes and the War on Photography
2008-08-13 Good marital advice
2008-08-12 Squid proxy for personal usage
2008-08-11 Posix threads in C++
2008-08-09 Crossroads mailing list
2008-08-08 Crossroads 2.00 is out
2008-08-01 Fail Pics
2008-07-14 The Fish Dance
2008-07-01 Big Bother and Massive Data Storage
2008-06-30 MMV One of omitted Unix tools
2008-06-08 Even anonymous breadcrumbs can give you away
2008-05-29 Crossroads in Argentina
2008-05-20 The Party at the Company Outing
2008-05-19 Crossroads 1.80 is out
2008-05-18 Where does technical innovation really come from
2008-05-16 Corporate bs generator
2008-05-15 Even the Vatican has to adapt
2008-05-12 Big Brother is watching your dog
2008-05-09 666 all over the place
2008-04-17 Security and privacy are incompatible
2008-04-16 The Hallmark E Card
2008-04-15 Crosroads Solaris port is out
2008-04-04 Identity theft can cost you dearly
2008-04-03 Crossroads can already do that
2008-03-31 A dagerous safari
2008-03-28 Why some Java J2EE projects are inefficient
2008-03-26 The Hummingbird
2008-03-25 The Easter delusion
2008-03-18 McAfee detects mass hack of 200.000 webpages
2008-03-17 More predictive statistics
2008-03-10 Backwards conclusions even on Slashdot
2008-02-18 A fractal photograph
2008-02-15 Kaprekar revisited
2008-02-14 Kaprekar numbers
2008-02-12 A tale of the criminal ineptitude
2008-02-10 Irritating Selfregistered users in PHPBB
2008-02-08 B2B Spam in the Netherlands
2008-02-06 Surprising iSight Capture
2008-02-05 Breadcrumbs at WickedLasers.com
2008-01-29 iSight Capture Utility
2008-01-28 The Male Brain
2008-01-26 Searching for the next Uri Geller
2008-01-24 Opt in for b2b spam
2008-01-14 Bokito Revisited
2008-01-13 Top Crossroads User
2008-01-12 World of Warcraft Dancing
2008-01-12 Justice dispensed better late than never
2008-01-11 Jeremy Clarkson and Identity Theft
2008-01-10 Terrorism in the Netherlands
2007-12-07 The mind and bodysnatchers are among us
2007-12-05 Bruce Schneier and Hildo
2007-12-04 Bye bye, good Christian soul
2007-12-03 Confusing mail message
2007-11-30 Medion MD 85276 reviewed
2007-11-29 Recent cases of data exposure


2007-11-20 Bayes bites

As I'm re-reading my notes on privacy and huge data collections, a favorite story springs into mind. It's about Baysian statistics. The idea for the text below is from John A. Paulos' book A Mathematician Reads the Newspaper (I can highly recommend reading this).

Imagine a city of 10 million people, where a brutal crime is committed by one of the citizens. Furtunately, the killer left their fingerprints at the murder scene, so the detectives have a perfect starting point for their investigations.

Another fortunate thing is that the City has collected the fingerprints of all citizens in a huge database, with the purpose of speeding up and helping the fight against crime. Great! One might expect that the murder will be solved swiftly...

So the detectives fire up their systems and let the computer compare the crime scene prints against the set of the prints of all citizens. The computers start crunching numbers, everyone holds their breath, and.. finally.. after 3 weeks of humming, the computer finds a match. A suspect is arrested and brought to trial.

The court hears all involved parties, and all subject matter specialists. Just to be sure of the evidence, the judge asks an expert: "How good is that fingerprint matching system?" The expert answers: "Your honor, it's the newest-bestest system one can have, all singing and dancing. The chance of error is very small indeed: it will correctly identify a fingerprint in 99.9999% of the cases! There's a chance of only one in a million that the machine says it's a match while in fact it isn't."

The jury is convinced. Despite the fact that the suspect continues to plea not guilty (but hey, that's to be expected), a guilty verdict is returned. The jurors' motivation is: "The defendant simply must be the one who did it. Why, there machine is almost error-safe! The chance of a bad decision are only one in a million. That's good enough for the court, We, the jury, are convicting the defendant."

The truth here is that the suspect probably didn't do it. If the fingerprint maching machine makes an error once in a million comparisons, then statistically it would spit out 10 suspects, since there are 10 million citizens. The chance that any randomly chosen suspect who is identified by the machine has committed the crime, is therefore 10%. The chance that the John Doe who was arrested and convicted is actually not guilty, is 90%.

The error that was made by the jurors is a typical one. The "one in a million" error chance suggests that the fingerprinting method is pretty accurate. Which it may or may not be - but in any case, that's not the question here. Bayes described his theorem using the following phrasing: "What are the odds that X happens, given the fact that Y is already observed". The "given that" clause is paramount: the jurors should have asked themselves, "what are the odds that the defendant is not guilty given the fact that his fingerprints matched". The answer to this question is straight forward: we expect 10 people's fingerprints to match, and only one of them is the perp - so there's a 90% chance that the defendant is not guilty. The given-that clause will typically decrease the size of the reference population by selecting out a group with an already known factor.

I always try to illustrate this for myself as a matrix of categories and (sub)populations. We have two factors here: (a) A person is guilty or not, and (b) a person's fingerprints match or they don't. These categories can be illustrated in a simple 2x2 matrix, with a third row and column for totals:

Guilty Not Guilty Totals
Fingerprints match 1 9 10
Fingerprints don't match 0 9.999.990 9.999.990
Totals 1 9.999.999 10.000.000

I think that this representation nicely summarizes all of the above, though admittedly it takes some time to getting used to such illustrations. The bottom line of this example is that large data collections (such as a database of fingerprints of all citizens) may not be useful at all in determining culprits when used as the sole source of information. On the contrary: "very precise" methods and large datasets may look good but produce very bad results. Every method, however precise, has a chance of producing false positives (see also the Schiphol entry), and even a small chance of false positives will yield quite large numbers of distinct cases when the reference population is large enough.

As a last remark. Incase you're wondering how precise fingerprint comparisons are: current methods are approx. 98.5% accurate or better, which is way less than the hypothetical 99.9999% used in this example. That's another reason why I think that Japan's plans to fingerprint all foreigners make no sense.


2007-11-19 Japan starts fingerprinting foreigners
2007-11-14 Privacy, Yahoo and the Strange World
2007-11-14 Privacy, Fall through algorithms, and Securing data
2007-11-07 European airlines to retain data
2007-11-03 BloggEd
2007-10-30 Wilders and Marktplaats.nl
2007-10-28 The goldplated Mac
2007-10-26 More morons
2007-10-26 Dilbert nails it again
2007-10-23 Rough yet funny
2007-10-05 Another silly Trojan mail
2007-10-01 So ugly it is beautiful
2007-09-28 Here is a nickel kid
2007-09-23 Spy Shredder
2007-08-29 Web svn view 1.08
2007-08-24 Caught in THE Process
2007-08-21 Stupid Trojan attack
2007-08-21 Back in 1994
2007-08-20 A girly iPod
2007-08-17 Crossroads for RDP connections
2007-08-15 Firewall art
2007-08-14 jpeginfo
2007-08-13 Good People
2007-08-07 The Real Crossroads
2007-07-30 BBC Documentaries in the Netherlands
2007-07-12 No problems with Crossroads so far
2007-07-11 Politically correct ad nauseam
2007-07-02 Waka Waka Poem
2007-07-02 Voyage of the rubber ducks
2007-06-28 The On Off Switch
2007-06-27 No free lunch
2007-06-25 Crossroads web interface
2007-06-25 Blinkenlights
2007-06-21 There is no silver bullet
2007-06-18 Motto of the week
2007-06-18 Do not feed the troll
2007-06-17 Which programming language are you
2007-06-13 Crossroads support request
2007-06-12 Bokito glasses
2007-06-07 Apache mod_proxy balancer description
2007-06-05 A ticketnumber is not support
2007-06-05 403 Hammertime
2007-06-04 Playground Fun
2007-05-24 Ascii man
2007-05-07 Cannot find the damn server
2007-05-02 The BFG200
2007-04-27 Crossroads Top User
2007-03-30 Crossroads Usage
2007-03-25 The guy with the dark motorhelmet
2007-03-22 The Process and The Result
2007-03-21 Quotes attributed to Jos
2007-03-20 A really nice comment about Crossroads
2007-03-18 Kubat in the air