This is a short blog dealing with aspects of information persistence, and how this should be considered as part of one’s own personal knowledge management behavior.
With the current woes engulfing Myspace and the hints that Myspace may be gobbled up or fold, I thought it a good time to bring up the topic of data-persistence, and what that might mean to the unsuspecting user.
So maybe you actually read that long panel of T&Cs when you registered for your MySpace account a few years back, and perhaps you figured that the privacy clauses were adequate.
You have since posted some stuff that only your Friends can see, and deleted other material that you realized the morning after was a bit over the top. Those photos of your anatomy that maybe Mom wouldn’t approve of, or that rant about your boss and the questions you raised as to his parentage, species, and general intelligence.
Maybe you wrote some heartfelt stuff as a 16yr old back in 2003 when MySpace was launched that seems a bit embarrassing now that you are a worldly 23yr old. Those romantic feelings you diarized, the catfights you recorded, and boy, that flaming row you had with your Mom over the weed/empty-bottle/condom packets she found in your school-bag.
All fine though, because it was only visible to yourself and a few friends, and besides, you deleted most of it.
Well, not so fast – There are two problems that could bring this all back to haunt you: who owns the data and who has it, and whether it ever really got deleted.
Ownership and Possession
The first problem is that even if MySpace promised hand on heart to keep your data private, that promise probably does not persist beyond either purchase of the company as an entity or of their assets.
If MySpace are purchased as a corporation, the promise to keep private data out of the public domain or from being sold may not survive. The new owner might have the legal right and intention to use that data as they see fit. They might sell it, they might mine it, or they might just publish the whole lot.
They may even sell books containing all your secret stuff as novellas.
Even worse, the company may fold and their physical assets may be sold on auction – those servers and hard-drives could wind up in anyone’s hands and the data could be read and used by people who have any number of interests.
Incidentally, this nightmare is all too real in the business world where even the humble photo-copier may have hard-drives that retain sensitive images of corporate documents and private information. These copiers are often replaced at lease-end and then find their way to auctions.
Threaded through the issue of ownership and position is an even worse nightmare – data persistence, but first let’s discuss arachnoids.
Those Creepy Spiders
Every few seconds, Google and other search-engine providers send out a wave of spiders that crawl the web in search of new stuff.
They index and capture material on websites that don’t block them, generally speaking one’s that have stuff behind passwords.
However, anything that was visible for however short a time may be crawled and indexed and captured by a spider.
This data may only live for a short time until a new wave of spiders refreshes it, but it may persist either in a search engine archive as a cache (which is why some hits in a search go to dead pages), or even be sucked in by a persistent archival like The Wayback Machine
However, spiders and machines may not be the only place your data resides, it may be copied in other ways.
Friends, People, and Pesky Protocols
You may not have thought about it much, but the Internet isn’t like a telephone system, but rather like a kind of relay of electronic telegraphists.
When you sent that posting to your blog, a session was established with a router at your service provider, which duly copied the data into storage, then having figured out that the IP address of your blog was on a server elsewhere, contacted the closest router in that direction and sent it the copy of your data. That router in turn would establish where the next router in that direction was, and would send it a message and your data for it to copy and forward. At each point, the router may have its stored data copied onto permanent storage, either programmatically, or by a person.
Of course this goes on quite invisibly, and there isn’t really much chance that your joke about the teacher will wind up public due to the routers habit of copying and forwarding, and IT staff are seldom going to look at the torrents of data flying through their routers.
The real problem is people you know, or people who saw something they liked.
Your friends may copy what you said, either intentionally, or as cached data on their PC’s – and they might either republish it in ways that you might not have preferred, or quite unintentionally leave it cached on their PC when it gets sold to somebody else or handed in as scrap. I won’t dwell on how deletion actually occurs on your PC, and that it doesn’t actually erase the data, it just deletes its entry in the index.
The real worry though is where it was meant to go, and what happens to it there.
Archival – Data Persistence
So here’s a scenario: at 8PM on the night of a disappointing 16th birthday, you wrote yourself an entry that was a monument to raging hormones and youthful indiscretion.
You tore a strip off the school, you made steaming accusations about various teacher’s sex-lives, and you capped it with a few choice ideas worthy of a Stephen King thriller about what you would like to do to a few classmates.
The next afternoon after the cathartic effects had turned you back into the friendlier and more angelic person you usually are, you deleted it all.
What you didn’t know, as most non-IT people might not, is that at around midnight on that evening, the daily incremental backup was done and all that data was meticulously backed-up onto physical storage media, and then the tapes/WORM/whatever were duly put in storage, probably at an off-site location that might even be in another country.
So before you deleted it, the data was captured and put in storage, safe from floods, fire, and other threats – A well-oiled process that usually goes on invisibly to ensure user’s data is protected from loss.
That medium, tape/CD/DVD/whatever, may be recycled after a time, or may not be – It may still be sitting in a box all these years later, and whomever buys the company or its assets may read that delicious piece of online therapy you wrote way back then.
You may have deleted it, it may be private, and it may be a long time ago, but that data may still be out there and might come back and step out into the full glare of public gaze, and it may do so now, or in ten years, or in fifty.
Data is sometimes a bit like energy itself, it can be transferred, changed, but never destroyed.
It may be a bit late to get rid of that love sonnet that you wrote when you were 16 and which would embarrass the life out of you now were it to wind up public (although deleting it may certainly reduce that risk).
It is also a pity that nobody explained this stuff back then, although a 16yr old you would probably not have listened anyway.
The real take-home is to pay attention right now – don’t put stuff online that could haunt you later, and don’t assume that making it private or deleting it will be a certain safeguard.
In fact, most things are best left in your head – it may be funny, but it probably should stay in your head rather than get captured on a PC or transmitted across a leaky network and stored on a server farm over which you have no control.
In this day of Cloud Computing and SaaS, you don’t even know what country your data will be in!
Please contribute to my self-knowledge and take this 1-minute survey that tells me what my blog tells you about me. – Completely anonymous.
Matthew Loxton is a Knowledge Management professional and holds a Master’s degree in Knowledge Management from the University of Canberra. Mr. Loxton has extensive international experience and is currently available as a Knowledge Management consultant or as a permanent employee at an organization that wishes to put knowledge to work.