The ladder of user tracking and privacy

Privacy and behavior tracking by websites and web-connected products has long been a concern of privacy advocates and some in the technical sphere.  Recent events prodded me to compose this post to assemble my thoughts on the matter.  Those events include a decision by the courts that hosted email is protected under the Fourth Amendment; a story on NPR that ebook readers collect and transmit usage data; the emergence of Diaspora, a Facebook alternative, into alpha testing; and concerns over electronic voting.  These thoughts are mine and do not represent my employer in any way.

As active participants in a digital society, we are the key ingredient in online services and connected products, providing the entire revenue stream through our behavior.  However, we otherwise cannot participate in the market created by digital behavior tracking.  We have little awareness of what data is collected, no control over how that data is used, and no method of controlling it.  This is because the company or entity doing the tracking owns that data.  I'd like to propose that since we ourselves create that data, we should be able to participate in the process whereby that data is used.

Currently, privacy and tracking are modulated through privacy policies.  Such a policy aims to inform you, the user, about what data is collected and how it might be used.  Privacy policies have several problems.  They are written by lawyers, and as such may be hard for users to understand.  The policy provides the user no leverage aside from deciding not to use the service.  The user cannot determine that the policy is being honored.  The policy may be vague about what is actually collected and how actually that may be used or sold, and there is no avenue for a user to learn more.  If the policy is violated and the user learns about it, avenues for redress are few, expensive, and largely untested.

What is the cost of a privacy breach?  If it results in actual identity theft, then actual damages may be calculable.  However, there may be other breaches and other costs whose damage may be harder to assess.  On the other side, how much is your behavior worth?  Personal information has definite value both in isolation and in aggregate, to the user, to the collector, and to third parties.
 
I propose the following 8-step ladder of user tracking.  At each step of the ladder is a question for you to answer regarding an online service such as Facebook or a connected product such as a Kindle.  At the point of the ladder where the answer is "no" or "I don't know", you stop.

  1. Do you know that the site/company/product tracks your use?
  2. Can you determine when tracking is occurring?
  3. Do you know what activities are tracked?
  4. How is your tracked usage being used by the site/company/product?
  5. How is your usage used by others?
  6. Can you obtain your usage data from the site/company/product?
  7. Can you dictate whether or not your usage data is used?
  8. Can you license how your usage data is used?
Something I've specifically not included on the ladder is knowing how your tracked data is stored, and how secure that storage is.  I don't think this should even be on the ladder.  If a company holds my personal information but in an insecure environment, they should be fully liable for damage to me by their negligence.  If a company hews to best practices in information security, and are nonetheless hacked and my personal data stolen, that sounds to me like a risk that companies can insure against.  But secure storage of personal information is too fundamental to be placed on my ladder.

Level 1 is basic, and you might assume the answer is "yes", but rather than assuming, take time to see if you can figure out what's happening.  Your web browser should let you look through the cookies stored in it, so you can see if the site has placed any there.  If so, you should assume at least your presence on that site is being tracked by that site.  For devices such as a Kindle, a good approach is to connect over wifi, and use your home router to watch if it calls home, and how often.

Level 2, "Can you determine when tracking is occurring", might become apparent during investigation of level 1.  For example, if you find a cookie for a site, you should know that this cookie is sent to that host with every HTTP request to that host.  Additionally, once a site is loaded, Javascript code in the webpage can make HTTP requests behind the scenes, so the cookie may be read much more often than you click on things.  The geeks among us can enable certain things in the operating system or on our routers to watch these accesses, but most firewall tools aimed towards "regular users" are so intrusive they're hard to leverage for this problem.

Level 3 is "Do you know what activities are tracked".  A privacy policy may give some detail on this, but you can't verify that against the actual application; usually the policy is written broadly enough that it doesn't have to be modified for every new application feature.  When a site tries to connect to your Facebook account, it makes a claim to you as to what information that application will use; for example, if I enable Posterous' FB autoposting, Posterous can not only update my status when I post to my blog, it apparently gets permission to access my FB account even at other times.  Again, the claim is not verifiable, but it's nice that FB does this.

Above level 3 we are into territory which no current online service or connected device can lay claim.  A level 4 service would tell us how our usage and behavior data are being used.  I don't mean in a nebulous "improving the product experience" way, but actually report on when usage data is used or mined and for what purpose.

Stop and imagine that for a moment.  Imagine that you could actually know specifically how your tracked behaviors were being used.  You might decide that you approve of that usage, that you consider the outcome to be to your benefit as a customer.  Or you might not.  Right now, you just don't know, and that's the primary concern of privacy advocates.

Level 5 takes us beyond the site of concern, to other sites that our data is either shared with or sold to.  At present, we don't know if that data is used in aggregate or in a personally-identifiable way.  A privacy policy typically allows the site to share or sell usage data, but when that happens, we now have even less connection to that data.

Level 6 asks if we can obtain our tracked data from the site in question.  This idea is somewhat similar to how Facebook allows you to download your profile, posts, and friend connections, but specifically refers to tracked usage data.  This idea, to me, actually seems the most straightforward notion that might exist in the debate on online privacy -- that you, the user, should at least know what the company or site or product knows about you.

Level 7 asks, "Can you dictate whether or not your usage data is used."  Some might rather place this lower on the ladder, but I think that's counterproductive for all participants.  If you don't know what's being collected and tracked, you can't make an informed choice about whether to participate in that.  Moreover, the current implementation from the users perspective requires high levels of vigilance and might not be reliable.  However, if we arrive at level 7 after the first 6 levels, then we can have a constructive discussion about whether or not our personal data is used.

Level 8 is where I began thinking about this problem.  Usage data exists in a very active market, but we, the creators of that data, can't participate in that market.  Imagine that you own your behavioral data; you created it, so by some notion along the lines of copyright, you should in some fashion be considered its owner.  "Wait," you say, "we enact those behaviors within a system or with a device, so really it should be joint ownership, right?"  Hey, that would be great.  If I and Facebook own my data in partnership, we now have a whole new framework for discussing how it should be used.  And that framework is licensing.

Perhaps you're an open-source kind of fellow.  You might decide to license your behavior along a Creative-Commons style agreement, which protects your ownership and still requires that you have access through all levels of the ladder to your own data.  Or perhaps like most of us, you'd like to choose who to share your information with.  With licensing, that framework exists.

Me, I'm happy to share my personal information and behavior, provided I have a share in its income.  Think micropayments.  Mechanical Turk, turned on its head.  Come on, Google, let's make a deal.