Posts Tagged ‘taxonomy’

Social Network Analysis – cure or curse?

August 31, 2010

In this blog I am going to outline a highly risky yet potentially foundational part of Knowledge Management and especially Intellectual Asset Management – Social Network Analysis (SNA).

SNA is a method of mapping either the connectivity of concepts (like twitter-feed), or more importantly to us, communication between people (like a kite network).

SNA is a tool of Knowledge Management in general (Tran 2007), but becomes a core aspect of building Communities of Practice (Wenger, McDermott et al. 2002)

The reason that it is highly risky should become self evident as we proceed, but in case I don’t make it clear – it stands a reasonably good chance as being seen as invasive, manipulative, and intrusive, and will alienate the very knowledge workers and staff that you are trying to unite in purpose.

You could very well unite them – against you!

The basic idea is pretty fundamental to Knowledge Management: You want to figure out who your thought-leaders, subject-matter experts, and influencers are by seeing who communicates with whom – the experts are consulted more than they consult others.

In a sense you are using the “well worn path” and letting the activity of staff show you who your SMEs are (or at least who the staff think they are) and the conduits and paths that information and knowledge takes in your organization – both internally and with partners.

Before you go down this path at all, you need to be sure of two things – How you will communicate the program to staff, and what safeguards there will be over the use of the information.

Once you know who is talking to whom, it will be very tempting to use that information for disciplinary actions as well as for knowledge management. This spells disaster because using it just once to nail the office lovers or a gossip, (or even the person leaking company secrets) will easily undermine the further trust of staff in a dramatic and probably catastrophic fashion.

So before you start planning SNA, be very sure that you are going to explain the purpose and the need very carefully as well as making it very clear if content of messages will be sampled, and that no information will be used for any other purpose than knowledge management.

I have split the methods into several categories of ever-increasing accuracy and reliability, but also unfortunately also in escalating levels of intrusiveness.

Non-Intrusive Methods

This is the easy one, simply don’t do anything, or just fish it out of your own memory or imagination.
This is how most organizations do it, and it is just marginally better than a Ouija-board or reading tea-leaves.
It is subject to all the normal human cognitive biases – halo effect, concurrency, freshness, proximity, likeness, and so forth.

If you want to be no better than any other firm, then do it this way.

Partially-Intrusive

This is where you just ask and hope you asked clearly and that the answers are accurate and truthful. You can improve the odds with a well crafted questionnaire of the “Who do you ask” variety.
The biggest problem will probably be the limit of your expertise in building a good questionnaire instrument that has high construct-validity and reliability, and the recall of the respondents. People often don’t recall who they get information from when asked to report them on the spot, and they will suffer the same biases you would – they will tend to remember the most recent more than more frequent but remote events, and some events will be more memorable and overshadow others. They will also tend (like you) to over-sample people they like and people who they perceive to be “more like them”.

This is a valid but somewhat spotty measure.

Intrusive Methods

These sample actual activity and communication traffic rather than relying on people’s memories and willingness to report on their own behavior.

Basically you are going to snoop by monitoring the source and destination of messages, and record who talks to whom by using electronic records captured on office systems:

  • Telephone records
  • Email traffic
  • Instant Message activity
  • Newsgroup activity
  • etc.

Other than the fact that people might object to what they feel is being spied upon, two immediate issues raise their heads with any of these methods – your mapping will be neither exclusive nor exhaustive.

  • Exclusivity
    It will pick up the gossips, the lovers, and the experts alike and without very careful and even more intrusive sampling you won’t easily be able to tell them apart. You simply won’t know if the high traffic between two or more people is due to knowledge exchange for business, a hobby, a secret office romance, or just plain office gossip about who is having a romance in the office!
  • Exhaustivity
    There are several modes of knowledge transfer that it won’t pick up, as well as those SMEs who are reclusive and don’t advertise or signal their competencies. In the former case, people may be physically visiting and consulting the SME, using electronic media that you aren’t monitoring, or contacting them outside the premises or business hours and thereby escape detection.

This is the best way in terms of accuracy and immediacy – it harvests more broadly and without human biases in the sampling and reporting, and it can be updated on the fly.
Some of the tools available also automatically produce very readable and attractive network maps that are easily interpreted compared to lists and numbers.

Often a single glance will show nodes, portals, and hubs – that is people who others go to often, people who connect different groups, departments, or companies, and people who connect other people.

Real, Really Intrusive

This is where you pull a [name deleted] and actually sample the content of message in addition to source and destination. This would allow you to discard a high proportion of private or business-irrelevant messages from the computation and thus tend to exclude the lovers and gossips from the mapping.

It also allows you to automatically build the foundations of a Controlled Vocabulary, and pick up information to build Concept and Topic Maps, and to find both needs and sources for specific topics.

For instance, you might be able to discover that Betty is the expert in Oracle Index tuning and that there is a popular need for Index tuning because there is a lot of and frequent traffic from several sources to Betty using key words in the messages.

Not only could you spot problems and issues and trends as they develop, but also know who needs to be served and who is serving knowledge on the fly. The potential for Just In Time training alone is quite stunning, not to mention the ability to have early detection and rapid response to business problems.

It goes without saying that this level of intrusiveness requires either a Byzantine degree of spying or an extremely high level of trust amongst staff, and whilst it would enable some pretty terrific advances with compound business advantages, it also has the capability to detonate into a big fireball that will rip your organization apart if it ever lost trust.

Really, Really, Horribly Intrusive

Ok, let’s just not go there – wires dangling from people’s bodies is just too dystopian to contemplate and besides, fMRI machines are darned expensive.

Conclusion

Knowing your Social Network Architecture allows you to know who your respected SMEs are, what the communication conduits look like, and how the knowledge in your organization is interconnected – no small achievement!

However, a proper communication plan and careful presentation and execution are vital because the level of intrusiveness can easily lead to a revolt amongst your knowledge workers.

If you use the information in a disciplinary or punitive fashion, you will do more harm in a single stroke than if you had cut wages and perks and fired the office mascot.

Bibliography

Tran, L. A. (2007). Encyclopedia of communities of practice in information and knowledge management .

Wenger, E., R. A. McDermott, et al. (2002). Cultivating communities of practice: A guide to managing knowledge , Harvard Business Press.

Please contribute to my self-knowledge and take this 1-minute survey that tells me what my blog tells you about me. – Completely anonymous.

~~~

Matthew Loxton is a Knowledge Management professional and holds a Master’s degree in Knowledge Management from the University of Canberra. Mr. Loxton has extensive international experience and is currently available as a Knowledge Management consultant or as a permanent employee at an organization that wishes to put knowledge to work.

Advertisements

Reflections on Tagging Part II.

April 23, 2010

My first reaction to tagging was surprise, followed shortly by a dose of joy.

 Although I am a longtime user of the Internet, IRC, IM, and many other communication tools on the Net, I was surprised not only that tagging could be such a powerful tool, but also that to a great extent I had been unaware of this.

For me, tagging solves two problems: firstly that my favourites can be stored externally and thus not dependant on a specific machine – changing machines or losing a hard-drive always seems to go with a loss of links to valued information. There are pictures, articles, and downloads that I no longer have, because I simply cannot remember how I found them. Even worse, I can’t even remember what they were.

Secondly, it solves a classification problem.

I sometimes struggle to decide under what category to save a new link, and this results either in a steadily growing taxonomy that becomes increasingly arcane and impenetrable with time, or to inconsistencies of where I put things.
Does something go under “Research” or “UC” or “KM”, and why did I have this folder called “NS”?

Storing multiple copies of links was a thought, but quite often Intranet links change, and it would become an administrative overhead to root out all the occurrences of a link each time.

Tagging potentially solves this because it is no longer bound to a canonical format on a specific machine, but rather takes on the nature of a relational database index, where a classification can be created dynamically by user-constructed query strings.

Unfortunately, the ability for sites like del.icio.us to handle Boolean search terms is at present very limited, and although sites like Connotea[1] allow a more structured search mechanism, they also do not yet allow a structured query language that is entirely user constructed.

At present I can use del.icio.us search with Boolean terms to logically AND tags and NOT tags, but I cannot deliberately exclude by period, language, origin, or person for example. To escape the overwhelming abundance and proliferation of the “soup”, I may for instance, need to exclude postings from a specific poster who I have identified to be prolific but untrustworthy. While this poster may have matched tags that I wish to include, I may still desire to exclude anything they have tagged, and to do this I need a meta-language that would look very much like a structured query language.

There seems to be some development along these lines with structured query languages such as “Squeal” (Spertus, 2007), WebSQL (Arocena, 2007), and Xcerpt (Furche, 2004).

However although they all have elements of it, none of those are specifically targeted at tag searching.

It cannot be long before a structured search language specifically encompassing tags  and meta-tags becomes available, and this would turn the “tag soup” into an instantly structured information subset – Of course that creates the dilemma of how and where one saves the search term itself for future use or reference.

Curiously, one of the very reasons that the “invisible” or “deep” web is opaque is because many large databases of information create classifications on the fly through user-constructed query strings of the kind I am suggesting. One hopes that tag searches would be open in a way that more is revealed, rather than driving everything into islands of information that are mostly hidden.

What I also found fascinating was that social media are curiously attractive, enjoyable, and emotionally “warm” in a way that traditional databases like EBSCO, EMERALD, and LEXUS/NEXUS are socially “cold”.  This “attractiveness” is seemingly unrelated to the level of actual knowledge acquisition or information quantity retrieved.

For example, the human-computing experiment called “ESP Game”[2] has already labeled over 10,000,000 images on the web by getting humans to work collaboratively without any tangible reward. The payoff for the individuals is somewhat in the act of participation in something socially useful – the identification of images in order to make them searchable, but is mainly in the simple pleasure that people get from playing with other people. The sensation of having an anonymous “partner” who is “in tune”, is strangely attractive, if not somewhat addictive.

The parallel between using other people’s discoveries as part of one’s own online heuristic, and normal human or even primate behaviour is to me, very striking.

All primate species appear to be highly motivated to, and to derive pleasure from, learning from others and leaving clues for others to find. In this way, social bookmarking appears to engage with some very ancient and well developed behaviour patterns, and thus fit snugly into the ergonomic requirements we have for information.

Perhaps it reveals a search for mutuality – people who like what I like, are interested in what interests me and are themselves therefore interesting to me.[i]

Rebecca Blood remarked that tagging led her to some surprising self-discovery, by looking post facto at what she had tagged, she came to realise that there were things that she was evidently interested in that she would not have previously said were her top interests. In some sense then, we are what we tag, or at least what we tag demonstrates actual interests as opposed to stated interests, and this self-revelation may be quite emancipatory, or at least informative.

On the negative side, there may be some serious privacy concerns. In a very real sense, you are what you seek – as was evidenced by the release of AOL search terms which enabled quick discovery of individual persons, even though their identities were anonymised.[3]

I am also not sure if it might actually amplify my biases by giving me an ability to screen out discordant information or see only agreeable information.

After having experimented with tagging, I am doubtful that canonical taxonomical systems will be replaced since these represent in effect a sort of hypothesis test in a truly scientific sense. We can pose what we think may be true, and by cementing this in a canonical classification, we open our claim to natural selection – will nature and reality prove it to be true or false, will it stand or fall. In these falsifiable claims – and a canonical taxonomy is indeed a truth claim in which we are able to put our beliefs to the test.

References

  1. “Squeal” (Spertus 2007) http://www9.org/w9cdrom/222/222.html a structured query language for the web

 

  1. WebSQL (Arocena 2007) http://www.cs.toronto.edu/~websql/www-conf/wsql/PAPER267.html

 

  1. Xcerpt (Furche 2004) http://www.pms.ifi.lmu.de/rewerse-wgi4/software/Xcerpt “declarative, rule-based query and transformation language for the Web”

 



[1] Connotea is located at www.connotea.org and is published free of charge by the journal Nature.

[2] www.espgame.org

[3] The release of AOL search strings allowed a researcher to quickly identify a Mrs.Thelma Arnold, even though she was identified only as “searcher #4417749” http://www.nytimes.com/2006/08/09/technology/09aol.html?ex=1312776000en=f6f61949c6da4d38ei=5090


[i] User of the participative labeling game at http://www.espgame.org report a sense of please in finding a compatible partner that in itself serves as a reward.

~~~~~~~~~

Matthew Loxton is the director of Knowledge Management & Change Management at Mincom, and blogs on Knowledge Management. Matthew’s LinkedIn profile is on the web, and has an aggregation website at www.matthewloxton.com
Opinions are the author’s and not necessarily shared by Mincom, but they should be.

Reflections on Tagging Part I

April 17, 2010

Online tagging is a relatively new form of classification based on user-defined terms to associate online or local electronic texts, objects, or representations. Various authors regard this as a de novo phenomenon that will replace formal or canonical classification systems, but it is more plausible to consider social classification as more properly an adjunct to, rather than a replacement of, classical taxonomical systems.

The term folk taxonomy or folksonomy refers to the user-created taxonomy resulting from the “worn-path” of actual usage.

Discussion

Humans are natural taxonomising machines, selectively acquiring information according to needs and desires, and classifying information and objects into categories that are learned, created, and perhaps even innate in the case of language syntax (Chomsky, 1957).

The advent of the web has enabled “living resources” as part of virtual communities built by mutual interest. (Hammond 2005: 4). These resources, unlike traditional libraries, allow classification to be independent of the information collection itself.

Most of us (those older than 20), can remember a time when information collections were most typically located in brick-and-mortar libraries, in which books, fiche, and other information objects and artifacts were stored in fixed hierarchies, themselves established in physical and canonically arranged index card systems.

These library classification systems, such as Dewey Decimal[1], MARC[2], and UDC[3] require professional training and expertise to implement and maintain, and librarians form a corpus whose cadres are often represented by official bodies such as the American Library Association which claims over 65,000 members[4]

These systems however, often prove unwieldy for the average person who may have only a passing knowledge of any particular classification system, and may also confront an artifact that poorly fits any classification system that they are familiar with.

This tension results in part due to:

  1. Lack of power and specificity in the classification system itself
  2. Multiple possible classification elements in a single information object
  3. Unfamiliarity by the user with the available formal classification systems

 

In the first case, a person may encounter a situation in which the power and range of the systems poorly covers the target object. For example the Dewey System is manifestly European in design and allows few index ranges for non-Anglophone and non-European subjects. It is thus predictable that some artifacts of different cultures and languages may not easily find a suitable classification. (Mansor, 2007).

In the second case, a single informational object may be classifiable under several distinct and potentially dynamic classifications. A book or fiche may be relatively static, but a person as an information object will not be. A person may now be middle-aged, have short hair, be brunette, and like whiskey. They may like cricket, be fit, and have aspirations suitable for a specific age – but these things will not always have been so, and will change again. They may go grey,  become old, and may lose some preferences, and gain or even regain additional ones. So where does a person classify themself? How do we classify objects that may be fluid, or even metamorphose over time? Even with static objects there can be difficulty in securing classification. In many instances post-modern art and literature have sorely tested the classification power of existing systems, and seemed to delight in producing this exact tension.[5]

In the third instance, even though there may be a large number of librarians, the information-user population greatly exceeds that of the subset of trained taxonomical professionals, to the point where the probability of a user being in a position to effectively classify something correctly according to any of the three systems listed above, is exceedingly small.

This leaves us with the goal of finding “A user-driven approach to organizing content” (Porter, 2005), perhaps through the advent of vast numbers of online users and the enormous power of the web to index specific physical objects through hypertext links, texts, and images. It may thus not be necessary for me to physically describe “Equivalent VIII”, since I can refer to an authoritative reference to it at the Tate gallery itself [6]

I could also make use of “social bookmarking” to draw the reader to it, but more importantly, to other seekers who had related searches.

The power of web browsers for locating shared informational resources as envisaged by Berners-Lee[7] was not unfortunately mirrored in the ability of most browsers for storing urls once they had been saved, and have traditionally followed a simple canonical file structure inherited from the early disk operating systems of the computers. In this schema, the user can choose to arrange the hierarchy and name the folders, but they are ill-structured to deal with objects having multiple possible or actual classifications, and thus still retain the discomfort of point #2 above, and also leave the user at the mercy of having to invent their own classification system without the benefit of a professional librarian to help.

How then to classify Equivalent VIII?

Enter “Mob indexing” (Morville 2005:134) 

What if we made use of human-computing and allowed the sheer mass of users to give a statistically-emergent set of classifications? – would large numbers of users settle on a stable structure without any overt discussion between them?

Thomas van der Wal refers to a “user-created bottom-up categorical structure development with an emergent thesaurus” as a “Folksonomy” (Morville 2005:136) in which we can use the discoveries made by other humans essentially as a cybernetic resource – by revealing the road-markers of other people who searched for something, one can browse the survivable troves of interconnected information links that other people have created.
By seeing and browsing what they had used to identify online information, we could have ready classifications left by numbers of other users.

We might further view Folksonomies as a “web2.0” phenomenon (O’Reilly, 2005) in which the “Wisdom of Crowds” (O’Reilly, 2005:7) and their massed tagging decisions lead to emergent taxonomical structures, and thus the “Trodden path” reveals ideal informational ergonomics that even expert-designed canonical forms may be unable to predict or represent – In this regard Shirky posits that Folksonomies are necessary because of difficulties in applying controlled vocabularies at the level of individual and informal users of information (Morville 2005:135).

Thus we can pave the “desire lines” to achieve controlled vocabularies of optimal utility (Merholz, 2004) by using the millions of online user’s tags.

Does this “tag soup” (Hammond 2005:4) lead however, to a chaotic situation in which users overwhelm meaning and structure by posting millions of ambiguous tags? – It is quite possible after all, that taggers will use the same term for different things, and different terms for the same things.

Golder reports that tag frequencies achieve stability rather than become chaotic, and that relative stasis is achieve at fewer than 100 bookmarks (Golder, undated), thus suggesting that in reality the “soup” becomes more congealed than liquefied.

Sifry posits that folksonomies are successful inter alia because people dislike “rigid taxonomy schemes”, but it is more accurate to say that what people dislike are rigid schemes that poorly match their needs. As studies have show, people greatly prefer reduced choice, as long as the options are simple, clear, and offer what they actually prefer. (Schwartz 2005, Godin 2003, Gilbert 2004, Gladwell 2004).

The key is thus to create formal hierarchies by deriving them from the “well worn path” and “desire lines” of actual unconstrained choices through the use of tagging.

In this sense, “Tagging” places the structure of classification outside the location of the data or information itself and potentially in the same way that the breakthrough of relational databases made in dynamic organization of data, tags may form the indices of an external user canonical structure, or simply be browsed and explored, and linked to by other users.

We have thus not replaced formal traditional forms of organization of information, as much as created better, more ergonomic ways to give rise to them, and we can retain our ability to use structured hierarchies or canonical structures as a testable truth claims, but have a better fit to the ergonomical requirements of information users.

By this process, we also escape the situation in which the “intended and unintended eventual users of the information are disconnected from the process.” (Mathes 2004:3), since they will have become part of the process of taxonomical creation itself – The user gives rise to its eventual structure by their acts of information navigation.

 

Conclusion

While folksonomies are indeed revolutionizing our ability to categorize and classify, particularly in internet-based or online information resources, canonical and traditional taxonomies are unlikely to disappear. The greatest gain from folksonomies is likely to be derivative taxonomies, or classifications resulting from “worn-path” actual behavior of large populations of users with large volumes of transactions. This provides a form of statistical smoothing and actuality-based classification events that will yield the best fit to information classification in its most human-ergonomical representation. As attractive and comfortable as this may be, however, it is unlikely to remove planned or formal taxonomies where these either serve niche functions, or where the ability to make and test truth claims by means of canonical or other formal and hierarchical taxonomies exists. Not only should formal taxonomies exist, but they should be derived from the “well worn paths” of what people actually select when unconstrained but guided in choice.

 

References

  1. Chomsky, 1957, “Syntactic Structures”, Chomsky, N. Humanities Press, 1957
  2. Gilbert 2004, “Why are we happy?” TedTalks http://www.ted.com/index.php/talks/view/id/97  Last accessed June 2007
  3. Gladwell 2004 “spaghetti sauce” TedTalks  http://www.ted.com/index.php/talks/view/id/20  Last accessed June 2007
  4. Godin 2003, “Sliced bread”, TedTalks http://www.ted.com/index.php/talks/view/id/28 Last accessed June 2007
  5. Golder,  “The Structure of Collaborative Tagging Systems”
  6. Hammond 2005 “Social bookmarking tools” A general review Hammond, T., Hannay, T., Lund, B. and Scott, J. (2005).. In D-Lib Magazine. Vol. 11, No. 4.
  7. Mansor 2007, “Library of Congress classification: catalogers’ perceptions of the new Subclass KBP” Mansor, Y. Younis al-Shawabikah, Y. in  Library Review 2007 Volume: 56 Issue: 2 Page: 117 – 126
  8. Mathes 2004. “Folksonomies: Cooperative Classification and communication through shared Metadata”. Mathes, A. http://adammathes.com/academic/computer-mediated-communication/folksonomies.pdf last accessed 5 September 2007
  9. Merholz, 2004 “Metadata for the Masses”, at  http://www.adaptivepath.com/publications/essays/archives/000361.php, last accessed 28 July 2007
  10. Morville 2005 “The Sociosemantic Web”. In Ambient Findability. Ch. 6. (O’Reilly, CA, USA.)
  11. O’Reilly 2005 “What is Web2.0: Design patterns and business models for the next generation of software”, O’Reilly, T at http://oreillynet.com/lpt/a/6228 last accessed Aug 30 2007
  12. Porter 2005. “Folksonomies: A User-Driven Approach to Organising Content. User Interface Engineering” Porter, J. at http://www.uie.com/events/uiconf/2006/articles/folksonomies last accessed September 6 2007
  13. Schwartz 2005, “The paradox of choice”, TedTalks http://www.ted.com/index.php/talks/view/id/93 Last accessed June 2007

 


[1] DDC Home http://www.oclc.org/dewey/ last accessed 14 March 2010

[2] Library of Congress at http://www.loc.gov/marc/ last accessed 14 March 2010

[3] Universal Decimal Classification  at http://www.udcc.org/scheme.htm last accessed 6th September 2007

[4] ALA at http://www.ala.org/ last accessed August 13 2007

[5]  See for example “Equivalent VIII”  1966 at http://www.tate.org.uk/servlet/ViewWork?workid=508 , last accessed September 6th 2007 

[6] Equivalent VIII at the Tate Gallery http://www.tate.org.uk/servlet/ViewWork?workid=508 . last accessed 14 March 2010

[7] Tim Berners-Lee biography at http://www.w3.org/People/Berners-Lee/ , last accessed September 1st 2007

~~~~~~~~~

Matthew Loxton is the director of Knowledge Management & Change Management at Mincom, and blogs on Knowledge Management. Matthew’s LinkedIn profile is on the web, and has an aggregation website at www.matthewloxton.com
Opinions are the author’s and not necessarily shared by Mincom, but they should be.


%d bloggers like this: