a facebook privacy experiment

So yesterday my super-awesome friend John (he’s a published author AND he knows how to make sneaky robots) linked to a blog post on how to see how Facebook’s tracking you:

  • Find a file named ‘hosts’ on your computer. On Mac/Linux systems, it’s under /etc/. On Windows, it used to be under System32 somewhere, but who knows now. Stash a backup copy somewhere.
  • Add the following on a new line: http://www.facebook.com
  • Configure a web server running on your local machine.

He suggested trying this for 24 hours and seeing what happens. So, OK, I did! (I have a Mac, so the file in question is /etc/hosts, and configuring a web server is pretty easy since Macs come with the stuff you need — it’s been so long since I did it I’ve forgotten the details, but I think it was pretty much “find the magic Apache checkbox. check it”).

Sadly I did this in the blind-idiot-following-the-directions way. This is sad because I’m already using for Gluejar web development stuff, which means I cannot just naïvely look through my access logs for anything with — I actually have to read them a bit to find a pattern which I believe matches the Facebook stuff and doesn’t match the development stuff — if I were doing it again I would have used some other spurious address for Facebook. Oh well.

So what I ended up with was this:

less /var/log/apache2/access_log | grep -c fbcdn.net

For those of you who don’t speak command lines (and why not? don’t you like phenomenal cosmic powers?) let me translate:

less is a command for showing a file
/var/log/apache2/access_log is the logs for my web server
| (pronounced “pipe”) means “take the output of the first command, and run it through the second”
grep looks through stuff (and is totally one of those Phenomenal Cosmic Power commands)
-c is the “count” flag for the grep command
fbcdn.net is a Facebook address. That’s that unique pattern that I think matches all the Facebook stuff, and doesn’t match any of my stuff. (I could be wrong — it definitely doesn’t match my stuff, but there might be additional Facebook stuff that’s not hitting that address)

So all together, what that’s saying is “spit out my access log, run it through grep, and give me a count of all the lines that contain fbcdn.net”.

So after a day of doing my normal Internet stuff, trying neither to seek nor avoid potential Facebook stuff — except that I almost never checked facebook.com itself from my computer because that doesn’t work right now — here’s what I got:

less /var/log/apache2/access_log | grep -c fbcdn.net

Well, then.

I think tomorrow I’ll go and do something paranoid about cookies.