 | 2012-01-12 Memebase forever |
 | 2012-01-11 Strange squares |
 | 2011-12-22 TVV Ondernemingsportaalnl.com zuigt ezel |
 | 2011-12-08 Dilbert vs Skype |
 | 2011-11-29 The uncanny resilience of bulshytt |
 | 2011-11-23 Another silly Trojan attempt |
 | 2011-10-29 ACTA is coming our way |
 | 2011-10-28 Burgernet in the Netherlands |
 | 2011-10-27 Facepalm art |
 | 2011-10-26 Do not drag this image |
 | 2011-10-22 Off The Grid Challenge |
 | 2011-10-12 PI like a boss |
 | 2011-10-07 Once upon a time |
 | 2011-07-13 Dutch eticket system for trains |
 | 2011-07-12 Is Hell exothermic or endothermic |
 | 2011-04-27 Optical Illusions |
 | 2011-04-19 Odd lyrics |
 | 2011-04-16 Band Revival at MON |
 | 2011-03-13 Protests in the Middle East and you |
 | 2011-03-10 Mac OSX Hotkey for locking your system |
 | 2011-02-12 dnspb 0.06 is out |
 | 2011-02-08 Would I buy this fridge |
 | 2011-02-06 InstaYouth |
 | 2011-02-05 The Thinker is back |
 | 2011-01-17 Math challenge |
 | 2011-01-11 Zero tolerance and zero intelligence |
 | 2011-01-05 My interest income in 1991 |
 | 2011-01-01 Your horoscope by Eddie |
 | 2010-12-22 New York City Tours might be half price for you |
 | 2010-12-20 Weather Forecast |
 | 2010-12-14 World Economy Collapse explained in 3 minutes |
 | 2010-12-13 The Salvation Army and its choice of toys |
 | 2010-12-08 Elizabeth thinks highly of me |
 | 2010-12-06 Should I trust my government with my data |
 | 2010-12-05 Announcing dnspb |
 | 2010-12-03 Realistic piechart |
 | 2010-11-26 Crossroads 2.71 is out |
 | 2010-11-24 8 bit Starwars |
 | 2010-11-17 Six to eight black men |
 | 2010-11-16 Canada wants backdoors and data and everything |
 | 2010-11-11 Autumn storm over the Netherlands |
 | 2010-10-08 USA wants backdoors to everything |
 | 2010-10-05 Sudoku solver in Perl |
 | 2010-10-02 Finally wrote up a Syscheck page |
 | 2010-09-28 Neon sign fail |
 | 2010-09-27 The Renault Eco Team |
 | 2010-09-23 Crossroads 2.68 is out |
 | 2010-09-20 How to suppress Flash cookies |
 | 2010-09-15 Meanwhile on Facebook |
 | 2010-09-09 The Yes Men Fix The World |
 | 2010-09-07 ed is not dead |
 | 2010-08-26 Installing Perl modules in a non root environment |
 | 2010-08-22 Magic self leviation |
 | 2010-08-20 Google Chrome does not support offline Gmail |
 | 2010-08-19 The number 48 |
 | 2010-08-12 Welsh trout mini HOWTO |
 | 2010-08-04 Fooling a NetCache proxy into fetching forbidden files |
 | 2010-07-30 The world will end on May 21, 2011 |
 | 2010-07-28 Hiding or showing a textbox with image animation using JQuery |
 | 2010-07-27 Manipulating browser cookies using Javascript |
 | 2010-07-25 Survival of the fittest book |
 | 2010-07-23 Pastafarians in Spain |
 | 2010-07-22 You have two sheep |
 | 2010-07-09 Highway bank fire |
 | 2010-07-08 Setting up a remote git repository |
 | 2010-07-06 Bye bye trusted old Macbook |
 | 2010-06-28 John Cleese on Football |
 | 2010-06-23 ABN Amro and the Pathetic Customer Service Dept. |
 | 2010-06-22 Wally does not like criticism |
 | 2010-06-14 Soccermatch Netherlands vs Denmark |
 | 2010-06-13 Lazy Cat |
 | 2010-06-08 Reading public Buzz using the Google API |
 | 2010-06-07 A Personal Letter from Steve Martin |
 | 2010-06-05 Sushi Saturday |
 | 2010-06-04 Suppressing the Enter key with Javascript |
 | 2010-05-31 Temporal spacial anomaly on the Dutch highway |
 | 2010-05-23 Greenhost will not log your traffic |
 | 2010-05-10 Jarlsberg Webapp Exploits |
 | 2010-05-04 A Thought Experiment |
 | 2010-05-03 SafeEdit information updated |
 | 2010-05-01 Microproxy now supports ftp |
 | 2010-04-30 What could get Data angry |
 | 2010-04-29 Lego Mindstorm solving the Rubik Cube |
 | 2010-04-28 Crossroads 2.65 is out |
 | 2010-04-17 Goggomobil in its natural habitat |
 | 2010-04-14 Bacon Time |
 | 2010-04-11 104 More friends to connect with |
 | 2010-04-10 Bacteria infested radio reporter |
 | 2010-04-07 The Kubat STAR |
 | 2010-03-30 Homework Essay |
 | 2010-03-29 C++ mutexes again |
 | 2010-03-20 Weird Eyechart |
 | 2010-03-15 Microproxy 1.01 |
 | 2010-03-05 Microproxy |
 | 2010-03-03 Sven Kramer and the wrong lane |
 | 2010-02-26 Endearing Babe Magnet |
 | 2010-02-17 Speed of light measured using chocolate and a microwave |
 | 2010-02-17 Never again expires after 65 years |
 | 2010-02-16 encfs on the Mac |
 | 2010-02-15 Hyves.nl and sexual predators |
 | 2010-02-10 Funny textbook |
 | 2010-02-09 DNS failing after sleep wake cycle |
 | 2010-02-06 Blast from the past |
 | 2010-01-28 Simple and straight Perl HTTP::Proxy |
 | 2010-01-15 Avatar the Movie |
 | 2010-01-08 Slightly NSFW Linux Ad |
 | 2010-01-07 WTF |
 | 2010-01-05 Stop Software Patents in the EU |
 | 2009-12-05 HammerServer 1.02 |
 | 2009-11-28 Perls Automagical Autoloading |
 | 2009-10-07 Office Poster |
 | 2009-10-06 The nr 1 Nerdjoke |
 | 2009-10-04 WoW Startscript for my Mac |
 | 2009-09-27 HammerServer section is online |
 | 2009-09-26 The BING HQ |
 | 2009-09-26 Digging a WOW Tunnel |
 | 2009-06-29 Wee Todd |
 | 2009-06-23 The On Off Switch Revisited |
 | 2009-06-22 Meatspace |
 | 2009-05-30 My old houses |
 | 2009-05-11 LOLcats are funny |
 | 2009-05-11 Civic Duty WIN |
 | 2009-05-10 Vote for the baby, Sky Radio promo FAIL |
 | 2009-05-05 My secure data center |
 | 2009-02-15 My Valentine is sending me a dot exe |
 | 2009-02-05 MacPorts trash: .mp_123456 savefiles cleaning |
 | 2009-02-01 Truecrypt 6 on Linux and the ext3 filesystem |
 | 2009-01-28 www versus nl.youtube.com |
 | 2009-01-27 Songsmith and The Police |
 | 2009-01-25 My own Ministery of Silly Walks |
 | 2009-01-09 CoolIris Mini HOWTO |
 | 2008-11-04 UDP and DNS balancing |
 | 2008-11-02 Life in graphs |
 | 2008-11-01 Skeined yet? |
 | 2008-10-30 New Crossroads on the horizon |
 | 2008-10-28 Thread safe or not |
 | 2008-10-15 WOW patch 3 on a case sensitive MacOSX filesystem |
 | 2008-10-15 Surprising C++ optimizations |
 | 2008-10-14 Weird system message |
 | 2008-10-08 Data mining against terrorism does not work |
 | 2008-09-16 Crossroads at the top of Freshmeat.net |
 | 2008-09-09 Stupid spammers at Computable |
 | 2008-09-06 Spam prevention with Postfix and Postgrey |
 | 2008-09-03 The Gnomish Flying Machine |
 | 2008-08-27 Bank customer data on eBay |
 | 2008-08-26 Mutexes in C++ Threads |
 | 2008-08-22 4M dataloss in the UK last year |
 | 2008-08-21 Dropping spam with Postfix and Spamassassin |
 | 2008-08-18 Bayes and the War on Photography |
 | 2008-08-13 Good marital advice |
 | 2008-08-12 Squid proxy for personal usage |
 | 2008-08-11 Posix threads in C++ |
 | 2008-08-09 Crossroads mailing list |
 | 2008-08-08 Crossroads 2.00 is out |
 | 2008-08-01 Fail Pics |
 | 2008-07-14 The Fish Dance |
 | 2008-07-01 Big Bother and Massive Data Storage |
 | 2008-06-30 MMV One of omitted Unix tools |
 | 2008-06-08 Even anonymous breadcrumbs can give you away |
 | 2008-05-29 Crossroads in Argentina |
 | 2008-05-20 The Party at the Company Outing |
 | 2008-05-19 Crossroads 1.80 is out |
 | 2008-05-18 Where does technical innovation really come from |
 | 2008-05-16 Corporate bs generator |
 | 2008-05-15 Even the Vatican has to adapt |
 | 2008-05-12 Big Brother is watching your dog |
 | 2008-05-09 666 all over the place |
 | 2008-04-17 Security and privacy are incompatible |
 | 2008-04-16 The Hallmark E Card |
 | 2008-04-15 Crosroads Solaris port is out |
 | 2008-04-04 Identity theft can cost you dearly |
 | 2008-04-03 Crossroads can already do that |
 | 2008-03-31 A dagerous safari |
 | 2008-03-28 Why some Java J2EE projects are inefficient |
 | 2008-03-26 The Hummingbird |
 | 2008-03-25 The Easter delusion |
 | 2008-03-18 McAfee detects mass hack of 200.000 webpages |
 | 2008-03-17 More predictive statistics |
 | 2008-03-10 Backwards conclusions even on Slashdot |
 | 2008-02-18 A fractal photograph |
 | 2008-02-15 Kaprekar revisited |
 | 2008-02-14 Kaprekar numbers |
 | 2008-02-12 A tale of the criminal ineptitude |
 | 2008-02-10 Irritating Selfregistered users in PHPBB |
 | 2008-02-08 B2B Spam in the Netherlands |
 | 2008-02-06 Surprising iSight Capture |
 | 2008-02-05 Breadcrumbs at WickedLasers.com |
 | 2008-01-29 iSight Capture Utility |
 | 2008-01-28 The Male Brain |
 | 2008-01-26 Searching for the next Uri Geller |
 | 2008-01-24 Opt in for b2b spam |
 | 2008-01-14 Bokito Revisited |
 | 2008-01-13 Top Crossroads User |
 | 2008-01-12 World of Warcraft Dancing |
 | 2008-01-12 Justice dispensed better late than never |
 | 2008-01-11 Jeremy Clarkson and Identity Theft |
 | 2008-01-10 Terrorism in the Netherlands |
 | 2007-12-07 The mind and bodysnatchers are among us |
 | 2007-12-05 Bruce Schneier and Hildo |
 | 2007-12-04 Bye bye, good Christian soul |
 | 2007-12-03 Confusing mail message |
 | 2007-11-30 Medion MD 85276 reviewed |
 | 2007-11-29 Recent cases of data exposure |
 | 2007-11-20 Bayes bites |
 | 2007-11-19 Japan starts fingerprinting foreigners |
 | 2007-11-14 Privacy, Yahoo and the Strange World |
 | 2007-11-14 Privacy, Fall through algorithms, and Securing data |
|
Today I was discussing data privacy with my good buddy &
collegue Eddie. We were talking about the idea of hashing
sensitive information and storing the hash, instead of storing the
actual plain-text information - an idea that I suggested in my
previous note on Yahoo and data
privacy. Is storing hashes instead of plain data feasible? When
can it be used, and when not? Our discussion prompted me to elaborate
a bit.
What's a hash value anyway?
The principle works as follows. One can convert text to a value
by chopping the text into chunks and feeding them into some algorithm
which is non-reversible. That means that if you have the text, you can
compute the value - but not the other way 'round.
Imagine the following hash function: If you want to compute
the hash value of a name, then take each letter of that
name, where a=1, b=2 and so on, and add these values. The result is
the hash of that name. This very simple hash function illustrates
the concept: "karel" would become 11+1+18+5+12=47, while "eddie" would
become 27. But knowing only the number 47, it would be impossible to
reconstruct my first name. So the algorithm is truly one-way.
What good is storing a hash instead of the original data?
So based on this algorithm, one could distinguish two names
without actually knowing them, but only knowing the hash. E.g., let's
say that both Eddie and I register at some website. I am mainly
interested in music and build up my profile accordingly, while Eddie
is interested in tech news stories. Now let's say that the website
owner is so privacy-aware that he decides not to store the usernames
in a database, but only the hashes. I can still log on using "karel"
and Eddie can log on using "eddie", but the first thing the website
will do, is convert my name to "number 47" and Eddie's to "number
27". Then the website looks up a personal profile based on the hash
number, and shows corresponding data on the site. So profile number 47
will display playlists of music, and profile number 27 will display
the latest tech news. The trick here is that the actual user name only
exists on the login page, where I enter "karel" and Eddie enters
"eddie". Beyond that login page, the user name no longer exists,
only the hash!
No problems so far. But what is it good for? Well, one important
aspect is, that my username is not stored in the site's list of
profiles. So a malevolent sysop there cannot get at my username and
log in using my account! Since only number "47" is known there, the
malevolent sysop has no way of knowing that he must type "karel" to
steal my identity.
But wait! This hash algorithm isn't too safe, is it? Number 47
can also be constructed using four j's (four times 10) and one g
(7). So user name "jjjjg" would also access my profile. And what if my
evil twin Lerak decided to register? He would also hit the same hash!
Accidentally hitting the same hash is called a
'collision'. Obviously a good hash function should not be only
one-way, but should also avoid collisions. There are very good hash
functions out there, such as the SHA family. However, none of these
are guaranteed to avoid collisions.
Is there a practical application for this?
When I purchase items over the Internet, I often use my last name and
account number for billing. The account number in the Netherlands
consists of nine digits (eg., 123456789). I enter the name and account
number at the site where I'm purchasing an item, and the site then
connects to my bank to see if my balance is sufficient for the
transaction. Furthermore the site will surely store my last
name and account number, so that I don't have to retype it during a
next purchase. They'll do anything to improve the "user experience"
and to entice me into visiting them again.
Two things are happening here: One, when I'm purchasing something
for ten euro's, the site asks my bank: "Is the balance of Kubat, with
account number 123456789, sufficient for a transaction of 10 euro's?"
Two, the last name Kubat and the account number 123456789 are stored
in some database of the site.
There is a number of dangers lurking here. First, someone may be
eavesdropping on the chitchat between the site and my bank. They can
find out that the combination Kubat/123456789 is valid for making
purchases, and start making purchases in my name! Second, any
malevolent employee of the site can find this in the site's database
and also misuse my identity for their profit.
Obviously I don't want that to happen. So we can take a number of
measures - e.g., secure transactions between the site and my bank, and
restrict database access for all but a few site employees.
Unfortunately, however well designed the security measures may be,
there will always be ways around them. And there is still the basic
question - do the data need to be there in the first place?
So, why don't we use hashes instead of plain text data?
- The hash512 value of my last name "Kubat" is
UlieNqKVJFYPlgu7ZAMUl0J5SG5ZV7nKtF9AK+fuuV/K3uljp0Glj2CyUFvC3N1GCV7SgxBmxkTOmCM+EeIGeA.
- When I log onto the site, I state my last name just once - on
the login page. Beyond that, my last name just isn't available
anymore, only the above hash.
- The hash of my account number "123456789" is
2eZ2LdHI6vbWGzxhkvxAjU1tXxF20MKRabwk5xw/J0rSf81YEbMT1oH35V7ALXPUmclUVba1u1A6z1dPuo/+hQ. I'd
have to enter my account number just once, when making the first
purchase. AFter that, the site's database would store that "Person
UlieNqKVJFYPlgu7ZAMUl0J5SG5ZV7nKtF9AK+fuuV/K3uljp0Glj2CyUFvC3N1GCV7SgxBmxkTOmCM+EeIGeA
has account
2eZ2LdHI6vbWGzxhkvxAjU1tXxF20MKRabwk5xw/J0rSf81YEbMT1oH35V7ALXPUmclUVba1u1A6z1dPuo/+hQ".
- When I purchase an item, the site would ask my bank: "Does person
UlieNqKVJFYPlgu7ZAMUl0J5SG5ZV7nKtF9AK+fuuV/K3uljp0Glj2CyUFvC3N1GCV7SgxBmxkTOmCM+EeIGeA
with account
2eZ2LdHI6vbWGzxhkvxAjU1tXxF20MKRabwk5xw/J0rSf81YEbMT1oH35V7ALXPUmclUVba1u1A6z1dPuo/+hQ
have a balance high enough to cover ten euro's?"
- My bank cannot reverse the supplied hashes to my name and
account number; no-one can. But that's not necessary! My bank has a
full list of customers and account numbers, They can pre-generate
lists of hashes, and compare the two hashes that the site sends with
their list to find my data.
Much better. Since the hashes aren't reversible to values, the
security risks have magically disappeared. But what about collisions?
Fortunately we are quite able to verify whether collisions occur. They
don't - each account number, ranging from 00000000 to 99999999 yields
its own unique hash.
And consider the following real-life situation. Many people
purchase items over the Internet and pay with their credit card. Each
transaction needs to be verified with a credit card company, where a
website might ask: "Is credit card holder A. Smith, with card number
1111222233334444 and verification code 123 and expiry date 02/10, good
for 20 dollars?" Any malevolent person only needs to get hold of these
data; once they have them, they are free to roam Internet sites and
purchase items using Smith's stolen identity.
In contrast, imagine that the identifying data were a hash of the
combined values. In this case we could build up one long identifier,
consisting of "Smith,1111222233334444,123,02/10", compute the hash
(which is incidentally
"svzW3IjYHVr+sFj85FVXyZmMHrtcPMSJdZoTb9BXOjSfoEOdfZYeGYjSlMCbBcaPheYS1yiIMqu7ox+ICxjoIw")
and use that -again- in the following way:
- The hash would be computed just once, when the user registers
at the site. After that, the original data would be discarded, and
only the hash would be stored in the site's database.
- Verification of the transaction for 20 dollars would only
transmit the hash, not the separate data.
All wannabe identity thieves who are looking for a quick buck: good
luck extracting the separate name, credit card number, verification
code and card expiry date from this hash.
So where and how can this approach be used?
The hash approach is suitable for all situations where a requestor
asks some central service a question concerning a person, or indeed,
anything that's identified by some data, such as in the previous
examples. The approach is not suitable in situations where a
requestor asks a generic question without supplying an identifier.
So for example, if websites were allowed to ask a credit card
company: "Give me a list of all your customers whose balance is high
enough to make a $20 purchase", then this approach couldn't be used -
there is no "identifier" in the question to hash. Furtunately, this
isn't a request that's allowed by the credit card companies (one may
hope).
Is this appraoch in use?
Personally, I haven't heard of it. Except for the fact that passwords
are stored in encrypted format; the encryption routine being a one-way
non-reversible algorithm. That's why you can't phone the helpdesk and
ask, "what is my password again, I lost it". The helpdesk guys don't
know, and they have no way of finding out. All they can do is reset
your password to some value that you can use for your next login. The
password storage methodology is so common that I wonder why the same
concept hasn't been applied to other privacy-sensitive data.
Will the approach ever be used? One may hope so, I don't see any
obstacles, only benefits. Well there is of course one obstacle -
websites and credit card companies would need to change their systems
to conform to the new way. This won't ever happen unless someone tells
them to. As I stated previously:
storing private data should be prohibited, unless it is absolutely
necessary and there's no other way. That would be a good incentive!
Perl snippets for the playful
If you want to play around with the algoritms, below are a few Perl
snippets that I used when writing this.
Here is the "idiotically simple" hash algorithm.
#!/usr/bin/perl
use strict;
# ihash - idiotic hash
# --------------------
# Check the command line.
die ("Usage: ihash name(s)\n") if ($#ARGV < 0);
# Show hash all arguments.
for my $n (@ARGV) {
print ("$n: ", hash($n), "\n");
}
# The hash function
sub hash ($) {
my $n = shift;
my $h = 0;
for my $c (split ('', $n)) {
$c = lc($c);
$h += ord($c) - ord('a') + 1;
}
return ($h);
}
Here's a short script to display sha512 hashes. You'll need the Perl
module Digest::SHA to make this run.
#!/usr/bin/perl
# sha512 - displays the sha512 hash of all arguments
use strict;
use Digest::SHA qw(sha512_base64);
# Check the command line
die ("Usage: sha512 strings\n",
"Displays the sha512 digest of all strings.\n") if ($#ARGV < 0);
for my $str (@ARGV) {
print ("$str: ", sha512_base64 ($str), "\n");
}
I verified that Dutch account numbers between 000000000 and 999999999
do not collide when hashed using sha512. Here's now.
#!/usr/bin/perl
# sha-accnr-checker
# If we sha512-encode Dutch account numbers (which have 9 digits),
# will we encounter collisions?
use strict;
use Digest::SHA qw(sha512_base64);
# Make the output unbuffered
$|++;
# Hash of seen digests
my %used;
for my $nr (0..999999999) {
# Pad account number to 9 positions, compute the digest
my $acc = sprintf ("%9.9d", $nr);
my $dig = sha512_base64 ($acc);
# Stop if there's a collision
die ("\n$acc conflicts with $used{$dig}\n",
"both yield sha512: $dig\n") if ($used{$dig});
# Store digest since we've now used it
$used{dig} = $acc;
# Show a ticker so we see what's going on
print ("\r$acc") if (! ($nr % 10000));
}
# All done
print ("\nno collisions detected\n");
|
|
|
 | 2007-11-07 European airlines to retain data |
 | 2007-11-03 BloggEd |
 | 2007-10-30 Wilders and Marktplaats.nl |
 | 2007-10-28 The goldplated Mac |
 | 2007-10-26 More morons |
 | 2007-10-26 Dilbert nails it again |
 | 2007-10-23 Rough yet funny |
 | 2007-10-05 Another silly Trojan mail |
 | 2007-10-01 So ugly it is beautiful |
 | 2007-09-28 Here is a nickel kid |
 | 2007-09-23 Spy Shredder |
 | 2007-08-29 Web svn view 1.08 |
 | 2007-08-24 Caught in THE Process |
 | 2007-08-21 Stupid Trojan attack |
 | 2007-08-21 Back in 1994 |
 | 2007-08-20 A girly iPod |
 | 2007-08-17 Crossroads for RDP connections |
 | 2007-08-15 Firewall art |
 | 2007-08-14 jpeginfo |
 | 2007-08-13 Good People |
 | 2007-08-07 The Real Crossroads |
 | 2007-07-30 BBC Documentaries in the Netherlands |
 | 2007-07-12 No problems with Crossroads so far |
 | 2007-07-11 Politically correct ad nauseam |
 | 2007-07-02 Waka Waka Poem |
 | 2007-07-02 Voyage of the rubber ducks |
 | 2007-06-28 The On Off Switch |
 | 2007-06-27 No free lunch |
 | 2007-06-25 Crossroads web interface |
 | 2007-06-25 Blinkenlights |
 | 2007-06-21 There is no silver bullet |
 | 2007-06-18 Motto of the week |
 | 2007-06-18 Do not feed the troll |
 | 2007-06-17 Which programming language are you |
 | 2007-06-13 Crossroads support request |
 | 2007-06-12 Bokito glasses |
 | 2007-06-07 Apache mod_proxy balancer description |
 | 2007-06-05 A ticketnumber is not support |
 | 2007-06-05 403 Hammertime |
 | 2007-06-04 Playground Fun |
 | 2007-05-24 Ascii man |
 | 2007-05-07 Cannot find the damn server |
 | 2007-05-02 The BFG200 |
 | 2007-04-27 Crossroads Top User |
 | 2007-03-30 Crossroads Usage |
 | 2007-03-25 The guy with the dark motorhelmet |
 | 2007-03-22 The Process and The Result |
 | 2007-03-21 Quotes attributed to Jos |
 | 2007-03-20 A really nice comment about Crossroads |
 | 2007-03-18 Kubat in the air |