a facebook privacy experiment

So yesterday my super-awesome friend John (he’s a published author AND he knows how to make sneaky robots) linked to a blog post on how to see how Facebook’s tracking you:

  • Find a file named ‘hosts’ on your computer. On Mac/Linux systems, it’s under /etc/. On Windows, it used to be under System32 somewhere, but who knows now. Stash a backup copy somewhere.
  • Add the following on a new line: http://www.facebook.com
  • Configure a web server running on your local machine.

He suggested trying this for 24 hours and seeing what happens. So, OK, I did! (I have a Mac, so the file in question is /etc/hosts, and configuring a web server is pretty easy since Macs come with the stuff you need — it’s been so long since I did it I’ve forgotten the details, but I think it was pretty much “find the magic Apache checkbox. check it”).

Sadly I did this in the blind-idiot-following-the-directions way. This is sad because I’m already using for Gluejar web development stuff, which means I cannot just naïvely look through my access logs for anything with — I actually have to read them a bit to find a pattern which I believe matches the Facebook stuff and doesn’t match the development stuff — if I were doing it again I would have used some other spurious address for Facebook. Oh well.

So what I ended up with was this:

less /var/log/apache2/access_log | grep -c fbcdn.net

For those of you who don’t speak command lines (and why not? don’t you like phenomenal cosmic powers?) let me translate:

less is a command for showing a file
/var/log/apache2/access_log is the logs for my web server
| (pronounced “pipe”) means “take the output of the first command, and run it through the second”
grep looks through stuff (and is totally one of those Phenomenal Cosmic Power commands)
-c is the “count” flag for the grep command
fbcdn.net is a Facebook address. That’s that unique pattern that I think matches all the Facebook stuff, and doesn’t match any of my stuff. (I could be wrong — it definitely doesn’t match my stuff, but there might be additional Facebook stuff that’s not hitting that address)

So all together, what that’s saying is “spit out my access log, run it through grep, and give me a count of all the lines that contain fbcdn.net”.

So after a day of doing my normal Internet stuff, trying neither to seek nor avoid potential Facebook stuff — except that I almost never checked facebook.com itself from my computer because that doesn’t work right now — here’s what I got:

less /var/log/apache2/access_log | grep -c fbcdn.net

Well, then.

I think tomorrow I’ll go and do something paranoid about cookies.

8 thoughts on “a facebook privacy experiment

  1. Yeah, when I do stuff like this I take something else from the 127.* block. is *the* loopback address, but anything starting with 127 is effectively the same.

    As for what to do about cookies, that’s easy: Disable 3rd party cookies. Almost all browsers have that setting, and I’ve confirmed through judicious use of Wireshark that it turns off transmission of those cookies as well, at least for Firefox.

    Most of those lines in your log are almost certainly due to “like” and “share” and “connect” buttons scattered across the pages. The scary part is that Facebook tracks you through those sites and knows *you* are *you* even if you are logged out! They don’t delete the cookie from your machine, they just turn off the “logged-in” flag. So they know every page that has a “like” button that you visit.


    1. Thanks for commenting. Yeah, if I were doing it again I’d do — in fact, I’m having so much fun seeing my Apache “Not Found”s pop up across the internet that I’ll edit my /etc/hosts to that right now. The cookie info is useful, too (I hope for other readers as well as for me!).


    2. Thanks! I can’t seem to find that setting on Firefox for Mac, weirdly. I see it is already checked on Safari, but then there are still a bunch of cookies from sites I don’t remember visiting? Um? Maybe I secretly went to Friendster once and just don’t remember???


  2. Yesterday turned out to be a low-websurfing day for me (which is good, because I had a lot of work to do) but I still wound up with 36 hits in the last 24 hours.

    I didn’t click any Facebook links during that time, nor attempt to go to the page itself. I did, however, stop by a friend’s LiveJournal page, which session seems to have been responsible for a good ten hits all by itself! I’m also seeing a couple webcomics, a few with no referral field, and this page.


    1. Thanks for weighing in. I, uh, should probably remove that Facebook sidebar if I’m going to be like this about Facebook privacy, huh. Even though I love having #buyalib there. But perhaps it’s time to find a more active project to put there, and collate my #buyalib resources on a subpage….


  3. FBCDN = FaceBook Content Distribution Network. It is what they use to distribute their content globally. Think Akamai, etc. 🙂


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s