Posts Tagged ‘information foraging’

Controlled Vocabulary

August 4, 2010


Language is a powerful thing, it’s not only a prime medium of expression, but it in turn shapes concepts and thinking – terminology frames concepts and makes some ideas more expressible and others less so – it emphasizes or diminishes in turn. Some ideas flow naturally from the syntax and terminology of the language in use and others are not even expressible.

In real terms an argument or proposal resonates better if it is expressed in the dominant terminology, and seems weaker and off-key if it doesn’t, and due to concision effects and psychological set, it allows or limits innovation.

Inconsistent use of jargon and terminology results in higher cost of translation and localization, less effective training and education materials, and raises the cost of product support.

The Foundational Nature of Language

From an Organizational Psychology point of view, Language in the form of endemic jargon, special terms and terminology, and accepted forms of speech and protocol are part of the social structure of an organization.

For example, Chao (1994) proposes six dimensions of Organizational Socialization:

  1. History

  2. Language

  3. Politics

  4. People

  5. Organizational Goals and Values

  6. Performance Proficiency

Language deserves a special mention though because it is through language itself that the other dimensions are expressed and how strongly they are communicated. Historical narratives are elevated or decreased in prominence according to the terminology used to relate them, and so too are the organizational politics detailed and distributed according to the rules and parameters of internal language.

Organizational goals are couched in terms of organizational metaphors, and proficiency itself is measured according to articles of the organizational terminology.

Language thus forms part of what topics are allowable by means of both the “correct” protocols, but also at a more fundamental level by means of the terminology itself.

In this sense, Single-Loop Learning and Type I homeostatic systems in an organization (Argyris1987) are strongly influenced and delimited by the vocabulary that is allowable.

User Experience

A major part of user satisfaction is the feeling of confidence they feel in the product (whether that be using a transit system or a software suite), and in many cases also the degree to which use requires mental computation. Unwelcome processing or decision-making requirements result in low satisfaction.

A major part of this in turn is the continuity of the information architecture – the way terms confirm expectations and make sense, and are used where and when expected. While most suppliers of products take care about simple things such as a hyperlink anchor text being immediately visible on the landing page, many do not consider how multiple designers and engineers may use different text for the same meaning in different parts of the product, its documentation, its sales collateral, its training, and in communication related to the product.

Encountering terminology in unfamiliar context undermines and attenuates information scent, and reduces the user’s confidence and overall satisfaction.

OD & L10N/I18N

Cost-effective Internationalization (I18N) and Localization (L10N) depend on the source language usage being tightly controlled and not having a significant degree of equivocation and ambiguity. The more a single term is used for multiple meanings or multiple terms used for the same meaning, the higher the complexity of translation, the higher the bulk of terms to be translated, and the lower the coherence of the final translated text.

Machine Translation is powerless to fix this, and simply multiplies the variances – requiring lengthy and costly human involvement each time.

Inconsistent terminology equates to duplicated effort and difficulties when it comes to translation of product, documentation, and training materials – greatly increasing the complexity, time, and cost of translation. Creating meaningful Translation Memories when the terminology is overlapping and inconsistent is very difficult, and tends to lead to an even worse degree of inconsistency in all the translated languages.

Likewise, training becomes more costly and less effective when terminology is used with any significant degree of variation in meaning.

Knowledge Management

Most Knowledge-bases rely on keyword searches, and the more sophisticated systems also use tagging, which at heart is still a keyword search and in its best form gathers tags from a Folksonomy.

Unfortunately the power of search-engines in this situation results in very high retrieval but low precision. This results in infoglut and lower search effectiveness, and thus a significant impediment to use of Knowledge-bases to augment knowledge-workers such as customer-support staff, and lowers effective re-use of knowledge.

Since a major component of cost-reduction and quality-improvement in customer-support hinges on use of knowledge-bases, terminology control is a significant factor.

Branding and Market Mastery

Part of gaining mastery or dominating a market niche is having a degree of control over the terminology and therefore the expressible concepts – The degree of influence one player has over the terminology translates directly into their freedom of movement within the domain, the cost incurred in terms of effort to thrive, and the extent to which discourse tends to be channeled in their favor.

At the very least, a clear brand and value proposition relies on message consistency across the many external communications an organization makes – be they the deliberate marketing efforts, training materials, or even HR recruiting information. The terminology used by Recruiters should for example be consistent with those of Sales and Training Materials, and so on. Any one department or group that injects noise will reduce the brand coherence and effectiveness.

Gaining Control

Influence over terminology is not something one can beg, buy, or steal – it can only be attained by thought leadership. In other words, good knowledge management practices around intellectual expression.

It is determined by who is disseminating authoritative information, who provides attractive ideas, and who is leading in thought value – and who gets to saturate the frame of reference and the concept terrain.

An early step in gaining more control over the influence of language is to formalize usage and to self-consciously construct a lexicon detailing what terms mean and where they are used, and it sets the stage for searchable knowledge-bases, single-sourced documentation, and consistent branding.

A low-cost approach is to establish an internal terminology wiki along the lines of wikipedia, and to build and refine a corporate lexicon in three phases of limited crowdsourcing:

  1. Open invitation to internal staff

  2. Invitation to business partners (and industry luminaries) to contribute

  3. Invitation to customers to contribute

Step 1 requires some preparation to identify people who are influential in terminology as well as obtaining buy-in from content-owners and domain experts.

Steps 2&3 are a Marketing bonanza that yield many spinoff benefits.

Making the terminology visible in this manner is not just a step in protecting against erosion of meaningful terminology but also forms part of a knowledge-management approach to organizational-learning.


If an organization is inconsistent in its use of terminology and language, if it vacillates on meaning and implication, if terminology is used hesitantly and passively – then the information scent attenuates, and the audience becomes uncertain and less likely to agree with the message or see the source as trustworthy or authoritative. In addition it leads to escalating costs and loss of effectiveness in training & development, and significant barriers to cost-effective translation & localization.

To get in a position where you influence the discourse and the frame of reference in your market niche you must settle on a controlled vocabulary, use it strongly, and use it consistently over every part of your products, documentation, and communications.

The place to start is inside the company – to practice, refine, and then deliver.


Two areas I left out but deserve mention are the effects on Content Management and Health &Safety.
Inconsistent terminology can be a significant safety risk, and this is a topic that deserves its own paper.

Please contribute to my self-knowledge and take this 1-minute survey that tells me what my blog tells you about me. – Completely anonymous.


Argyris C & Schön D (1987) Argyris C & Schön D. “What is an organization that it may learn”. (1987) : .

Chao G, O’Leary-Kelly A, Wolf S et al. (1994) Chao G, O’Leary-Kelly A, Wolf S et al.. “Organizational socialization : its content and consequences”. Journal of Applied Psychology (1994) 79: pp. 730-749.


Matthew Loxton is a Knowledge Management professional and holds a Master’s degree in Knowledge Management from the University of Canberra. Mr. Loxton has extensive international experience and is currently available as a Knowledge Management consultant or as a permanent employee at an organization that wishes to put knowledge to work.

‘I found it on the Internet’ : The use of internet search engines to retrieve information.

May 8, 2010

Search engines have dramatically altered the information landscape over the last two decades, and have provided information ecosystems for many categories of information users –  ecosystems that previously did not exist and which now empower them and give them access and range that was previously only theoretical.
WiFi access and the use of handheld devices to access the web “anywhere, anytime” have made the web a ubiquitous information resource for the layperson at the same time that the increased power of advanced web-spidering and search engines make both precision and power user-malleable.

However, much of the information resources on the Internet are invisible to the web and are not spidered by the commonly used search-engines, which creates a divide between what is available to the general public and that reachable by the academic researcher.


 We live in an era of both unprecedented ubiquity of man-made information sources, as well as an immediacy that has not existed before. In his book “Cosmos”, Carl Sagan puts the size of the collection held at the library of Alexandria as running as large as a million scrolls, in comparison the website “” gives the number of pages indexed on the Internet at 19.86 billion pages (Saturday, 13 March, 2010)[i].
Clearly we have reached a degree of information availability that beggars previous collections.

However, this has come with some challenges regarding the technology itself that prioritizes technical abilities over purely literacy aspects and in a very real sense the Internet can be seen as occupied by a “special club” with a small membership of “geeks” (Morville 2005) who have privileged access to information by virtue of special knowledge, devices, and information techniques.

Access to internet resources has an entry-bar set by technology in terms of computer hardware and software, but also a special heuristic techniques (Effken, Brewer et al. 2003)

The economic force unlocked by the linking of advertising with provision of free-to-use web-based search engines led to the so-called “Search-engine wars” in which vendors apply a range of different tactics to woo the public user whilst competing for subscribers. This drives not only the functionality offered by vendors, but also the range of searchable categories of informational artifacts. It additionally leads to some vendor specialization, such as concept-search like Kartoo[ii] and meta-search engines like Dogpile and those dealing with specific media such as YouTube.

”Apart from standard web search, search engines offer other search services such as image search, news search, mp3 music search and product price search. The current search engine wars will mean that there will be fierce competition between search engines to lure users to use their services, which is good news for the consumer of search, at least in the short term.” (Levene 2006)

This is not to say that the results of search-engines cannot be manipulated or “gamed” by both the people or organizations acting as information sources, as well as by third parties who may wish to influence the behavior of search engines. The term “Google-bombing” reflects an aspect of this practice.

The user thus needs to be aware that some participants may “game” the system and manipulate search-engines to artificially raise the search ranking of a specific site or page. (Poremsky 2004).

In order to combat this practice, and to make search-engines as competitive as possible, the vendors constantly engage in search-engine optimization, and the user should bear in mind that the algorithms and techniques used by search-engine vendors are trade secrets and subject to change, and that specific sites may be systematically or even deliberately selected or de-selected based on somewhat inscrutable rules. Web sites may also trigger anti-gaming algorithms designed to detect attempts to manipulate the search-engines and be removed form the result set entirely (the so-called “google death-penalty”), and would be entirely unknown to the user. (Levene 2006).

This thrust and parry relationship between the information suppliers and the search-engine vendors has given rise to an industry of supplying various tricks and techniques to safely influence visibility and palatability of information to search-engines (Kent 2004), as well as spawning guide-books for webmasters (Reynolds 2004) and every imaginable aspect of “Findability” (Morville and Rosenfeld 2006).

Information abounds on topics ranging from “Search-zones”, to the need for and creation of thesauri to catch miss-spelling or alternative and preferred terms (Poremsky 2004)

These have been so successful that they have created a further challenge to the researcher or user that has been aptly termed by some as “infoglut” (Herhold 2004)  – that is, an overwhelming size of an informational query result-set such that a manageable hand-full of appropriate text is often not what is retrieved, but rather a result set that becomes simply too large to handle as it approaches several thousand or million texts.

As Herhold (2004) puts it:

“The implication for the design of retrieval languages is that disambiguation is a serious and very large problem. It is the homonym problem writ large, writ in the extended sense of including polysemy and contextual meaning, that is the chief cause of precision failures-i.e., infoglut-in retrieval.”

Various stratagems and approaches to infoglut from the information provider’s point of view have been suggested, ranging from clever use of information-mapping (Kim, Suh et al. 2003), to the creation of portals (Firestone and McElroy 2003) in which relevancy is driven by proximity to the user, measured in mouse-clicks[iii].
On the user end of the equation there are also guides for users and researchers including use of subscription-databases and intelligent agents (Foo and Hepworth 2000)

A very large result-set obviously challenges the information-processing capacity of the user, but also calls into question the heuristic technique used, bringing into light two distinct elements that bear attention, namely the precision of a query result, and its recall. (Herhold 2004, Pao 1989)

Precision : The proportion of retrieved documents which are also relevant. A low precision implies that most of the documents retrieved were not relevant, thus info-junk.

 Recall : The proportion of all relevant documents that were found and retrieved. A low recall factor speaks of the effectiveness of the query in finding the universe of all documents that are relevant and also speaks to the phenomenon of the “invisible web” that is not targeted by search-engines (Smith 2001)

This invisible or “deep web” is hidden from view primarily because the information sources are not amenable to discovery by the typical search-engines that troll the “surface web” and thus forms an invisible web (Henninger 2003) often estimated as being orders of magnitude bigger than the total available for search – “Deep web” being 400-550 times bigger than the surface (Bergman 2007)

Smith (2001) explains this in terms of linking and permanence:

”Traditional search engines create their indices by spidering or crawling surface Web pages. To be discovered, the page must be static and linked to other pages. Traditional search engines can not “see” or retrieve content in the deep Web — those pages do not exist until they are created dynamically as the result of a specific search. Because traditional search engine crawlers can not probe beneath the surface, the deep Web has heretofore been hidden.” (Smith 2001)

Part of dealing with these different aspects of information retrieval is to deliberately adopt a technique or heuristic to searching.


The dilemma of needing terms and knowledge to find information, but needing access to usable information in order to know terms to use is approached in a discursive browse-search-browse pattern reminiscent of how people search for food. Heuristics is the partially formalized approach to the employment of various information-stratagems.
According to Spink & Cole, it is likely that human information-seeking behavior is a evolutionary correlate to other older foraging patterns (Herhold 2004), and thus not just an individualistic behavior, but a deeply social one.

Examples of how a user (or provider) of information can approximate these patterns include social-bookmarking (Hammond, Hannay et al. 2005), and tagging.

These drive a social taxonomy that makes searching and finding on the web a more ergonomically human activity through both the social aspect of observing what other people tag and being able to create information-paths through a folk-taxonomy or “folksonomy” (Mathes 2004, Porter 2005)

A similar approach is being adopted by many retailers on the web, where finding an item often results in a list of other items that users who bought the item under view “also bought”. E-stores such as Amazon or Barnes & Noble are thus able to guide purchases with collaborative filtering using patterns of other users.(Anderson 2004). This has ramifications for the business user who might wish to know what their peers are looking at.

Information tools accessible on the web that cater for the social aspects of information-seeking have been made available by both entrepreneurial groups such as Yahoo in their tagging tool, as well as by scientifically orthodox publications such as the journal Nature with their freeware tool and site connotea.

Folksonomy is therefore an applicable tool for the business researcher as well as the general public.

The ability to identify information quality is a further dimension, since the quality of information involves inter alia “the properties of accuracy, precision, credibility, currency, pertinence, precision, relevance, reliability, simplicity and validity.” (Evernden and Evernden 2003)

Information quality tends to deteriorate over time (Evernden and Evernden 2003) which is problematic in any collection where the architecture does not require the dating of items. It is important for the seeker to use this as a guide as to the trustworthiness of a collection.

A further available heuristic tactic is to use humans as search catalysts in a more direct and old-fashioned manner – Many library services provide library research assistants who are skilled and studied in taxonomies and search techniques, and are able to provide suggestions for search strings and databases.[iv]

For the seeker, parts of this invisible web are exposed via academic and research search tools operating on organizational or subscription collections, some of which are accessible through citation-manager software such as EndNote[1] that have search and connection tools,

The Future

Crystal balls have proven notoriously inaccurate in seeing into the future with regards the Internet, and probably the best I can manage is to say that things will get bigger but more user-friendly, and that the social-bookmarking trends will continue. The drive towards “web 2.0” Social Networking and “web3.0” semantic-web technologies, and contextual search tools will doubtless shape both user-interface design and make more, and more kinds of things available, as well as continue to make available texts and artifacts previously only available in hardcopy media.

Information architecture is likely to become increasingly important as collections increase in diversity and size (Morville and Rosenfeld 2006, Batley 2007).

Privacy is also likely to become increasingly important as Internet tools make it easier to identify users purely from the search queries they use – This was made clear when  an AOL user was identified purely through her use of search terms[2] (Barbaro and Zeller 2006). The user assumption that web activity is anonymous is unwarranted, and has implications for researchers whose subject-matter might be politically or socially controversial or disclose their business intent. There are thus serious privacy concerns with regards search-engines (Cohen 2005).


  1. Anderson, C. a. (2004). “The Long Tail.” Wired Magazine 12(10).      
  2. Barbaro, M. and T. Zeller (2006). A Face Is Exposed for AOL Searcher No. 4417749. New York Times. New York.      
  3. Batley, S. (2007). Information architecture for information professionals. Oxford, Chandos.               
  4. Bergman, M. (2007). “The Deep Web: Surfacing Hidden Value.” Journal of Electronic Publishing.   
  5. Cohen, A. (2005). What Google Should Roll Out Next: A Privacy Upgrade. New York Times. New York.      
  6. du Preez, M. (2002). “Indexing on the Internet.” MOUSAION 20(1): 109-122.
  7. Effken, J. A., B. B. Brewer, et al. (2003). “Using computational modeling to transform nursing data into actionable information.” Journal of Biomedical Informatics 36(4-5): 351-361.             
  8. Evernden, R. and E. Evernden (2003). Information First:Integrating Knowledge and Information Architecture for Business Advantage. Oxford, Butterworth-Heinemann: 1-27.              
  9. Firestone, J., M. and M. McElroy, W. (2003). Key issues in the new knowledge management. Burlington MA, Elsevier Science.    
  10. Foo, S. and M. Hepworth (2000). The implementation of an electronic survey tool to help determine the information needs of a knowledge-based organization.           
  11. Hammond, T., T. Hannay, et al. (2005). “Social bookmarking tools (I): A general review.” D-Lib Magazine 11(4).           
  12. Henninger, M. (2003). Searching Digital Sources. The Hidden Web: Finding quality information on the net. Sydney, Australia, UNSW Press.
  13. Herhold, K. (2004). “The Philosophy of Information.” Library Trends 52(3): 373-665.      
  14. Kent, P. (2004). Surveying the Search Engine Landscape. Search Engine Optimisation for Dummies, Wiley.     
  15. Kim, S., E. Suh, et al. (2003). “Building the knowledge map: an industrial case study.” Journal of Knowledge Management 7(2): 34-45.
  16. Levene, M. (2006). Navigating the Web. An Introduction to Search Engines and Web Navigation. London, Addison Wesley: 174-184.           
  17. Loxton, M. H. (2003). “Patient Education: The Nurse as Source of Actionable Information.” Topics in Advanced Practice Nursing eJournal 3.               
  18. Mathes, A. (2004) Folksonomies: Cooperative Classification and communication through shared Metadata.  Volume,  DOI:
  19. Morville, P. (2005). The Sociosemantic Web. In Ambient Findability. CA, O’Reilly.          
  20. Morville, P. and L. Rosenfeld (2006). Information Architecture for the World Wide Web. California, O’Reilly Media.       
  21. Morville, P. and L. Rosenfeld (2006). Push and Pull. Information Architecture for the World Wide Web. S. St.Laurent. California, O’Reilly Media.               
  22. O’Reilly, T. (2005). “What Is Web 2.0 : Design Patterns and Business Models for the Next Generation of Software.”   Retrieved 28 August, 2007, from   
  23. Pao, M. (1989). Information Retrieval.  
  24. Poremsky, D. (2004). Search Engines and How they Work In Google and Other Search Engines. Berkeley, CA, Peachpit Press: 3-18.           
  25. Porter, J. (2005). “Folksonomies: A User-Driven Approach to Organising Content.” User Interface Engineering  Retrieved September 6, 2007, from     
  26. Reynolds, J. (2004). Search Engines and Directories. The Complete E-Commerce Book, CMPBooks: 233-247.               
  27. Smith, B. (2001) Getting to know the Invisible Web. Library Journal.Com Volume,  DOI:                


[1] EndNote is provided by the Thomson-Reuters group. See

[2] The release of AOL search strings allowed a researcher to quickly identify a Mrs.Thelma Arnold, even though she was identified only as “searcher #4417749”

[i] Which is really curious because on 21st September 2008 is said there were 27.61 billion pages. Did the web shrink or is the tool a bit buggy?

[ii] Sadly defunct now

[iii] Using “mouse-click” distance as a measure is a very effective way to put information at hand

[iv] Many libraries staff a 24×7 online helpdesk to guide patrons in finding materials. Most of these are staff pooled across many institutions and locations.


Matthew Loxton is the director of Knowledge Management & Change Management at Mincom, and blogs on Knowledge Management. Matthew’s LinkedIn profile is on the web, and has an aggregation website at
Opinions are the author’s and not necessarily shared by Mincom, but they should be.

Knowledge Management: The Disease Model discussed

May 1, 2010



Some readers of my blog on the Disease Model of Knowledge Transfer might have justifiably wondered if I had been typing after a few beers. Admittedly it was a joy to write, but the back-story is actually quite solid and very interesting. (to me at least).

The issue is one of how we can take models built for one purpose, and apply them productively for a completely unintended purpose – in fact a large proportion of technological and scientific breakthroughs occur in exactly this way. Taking a way of seeing things from one domain to an unrelated domain means that you might impose a degree of artificiality, but still derive benefit from the change in perspective and the new questions that might be productively raised.
Philosophy of Science (PoS as it is hilariously abbreviated) calls this an “Instrumental Theory” approach, and people like Ernst Mach (he of speed-of-sound fame) proposed that many if not all scientific facts and theories were actually just instruments of explanation and not real in any strict sense. Electrons, he held for example, were just a useful concept to further investigation, and not real little ball-like things.

In this way one can plot the “infection characteristics” of obesity even though nobody is saying it is “really” infectious, and Richard Dawkins could propose that one could look at ideas themselves as infectious replicators.

What Prof.Dawkins was trying to do was instill a better understanding in his students as to how evolution works at the gene level, and he emphasized that while genes are teleologically blind and not intentional in any way, variation and selection could nevertheless shape populations of people carrying the genes. To understand evolution one needs to look at the world from the perspective of genes under selective pressure in which there are not enough resources for all of them to be replicated. Successful replicants tend to slowly increase in proportion to those that aren’t simply because it is the victors whose code gets replicated.

To explain this Dawkins proposed a thought experiment in which ideas themselves are seen as a replicator.

Picture a world filled with ideas that aren’t entirely stable and can mutate or join together, and which can replicate from one host mind to the next – sometimes suffering copying errors on the way. There are more potential ideas than minds to run them, and those that don’t get run by a mind die out.
Like the DeLorian or Cuban Heels.

The idea of “memes” (as he named them) itself went viral, and soon it became evident that it was a highly productive way of looking at ideas. Whether or not memes or even temes* are “real” is not terribly important – but what is, is the ability it gives us to do useful things and ask productive questions. *See Prof. Susan Blackmore’s Meme/Teme TED talk online

It allows us to ask why some ideas transfer more readily between people, why some are more stable, why some last longer. It allows us to look at Intellectual Property, Job Aids, and Knowledgebase articles in a new way, and to try new ways of getting ideas to behave in ways that we would prefer.

For example, it asks why gossip and the “corporate grapevine” are so compelling and so fast, and begs us to consider how we could put this to use or gather information from it. In Nonaka’s “Ba” a coffee area or watercooler is a place where people will gather to exchange information – the question is how to increase the work content of that without tunring it sour and putting people off.

A second area that I find an interesting parallel, is in the work of a psychologist of human behavior by the name of Eric Berne. In his Transactional Analysis approach, he proposed that there were somewhat stable “games” that seem to be enacted by people – especially in interpersonal settings. By “games” he didn’t mean fun and party-novelty kind of behavior – he meant that the on inspection one could make out somewhat persistent “rules”, “players”, and “roles”. Important to note however that the dehumanizing form of Game Theory described by the earlier Nash is not what I have in mind at all – that path leads to a dreadfully dehumanizing approach to people and drives highly destructive behavior.

Putting the two together (part of my own research activities) one comes to a perspective in which games and ideas “fight” for space in people’s minds and to get expressed as behavior. Just like genes, some memes work well together and some are mutually exclusive. We even know why (to an extent) some ideas push others out.
For example, if you are thinking of money and especially personal reward, some very specific parts of your brain fire up and they suppress activity in some other parts – you can’t easily run the two sets of circuits at the same time. This is why economic norms suppress social norms and why somebody who was perfectly happy to donate time and effort to do something for a “good cause” might be put off if you pay them to do it. It is also why rewarding people with money is a risky approach and tends to lead to conflict and gaming of the system of rewards.

If you doubt this, try the suggestion of researcher Dan Ariely, and at your next Christmas meal offer your Mother In Law $50 for her trouble. Let me know how that works out for you.

Putting another layer on this, some ideas, like pathogens or genes, have evolved specialized penetration or adhesion mechanisms that are usually very specific to the host they will use – and this is where we can start asking how to make some information easier to use, or stick better, or be easier to locate.

For example, although digital watches and instruments were very hip, they were actually less usable – it takes more processing power to turn a digital readout into what your brain uses than analogue.

You can literally measure the time difference between how long it takes to say if a specific time is still a long way off or near when viewing either an analogue clock-face or a digital readout. For this reason many time-critical instruments in a cockpit are analogue.

This is also why it is important to decide if information is something we want somebody to remember, or if we will just present it to them at the appropriate time. Getting people to memorize product codes or server paths is not as effective as simply presenting them with the information when the time is ripe.
It is also important in GUI design and in how IT needs to be appropriate.

At a higher level, when everybody knows that the “real rules of working here” mean that you aren’t actually allowed to use the eLearning materials or the open-door policy, then they behave according to the game rules of the “real ground rules” not the ones in the employee handbook.

In a future blog I hope to go into some of the practical implications and uses, but for now, this is my story, and I am sticking to it.


Matthew Loxton is the director of Knowledge Management & Change Management at Mincom, and blogs on Knowledge Management. Matthew’s LinkedIn profile is on the web, and has an aggregation website at
Opinions are the author’s and not necessarily shared by Mincom, but they should be.

Reflections on Tagging Part II.

April 23, 2010

My first reaction to tagging was surprise, followed shortly by a dose of joy.

 Although I am a longtime user of the Internet, IRC, IM, and many other communication tools on the Net, I was surprised not only that tagging could be such a powerful tool, but also that to a great extent I had been unaware of this.

For me, tagging solves two problems: firstly that my favourites can be stored externally and thus not dependant on a specific machine – changing machines or losing a hard-drive always seems to go with a loss of links to valued information. There are pictures, articles, and downloads that I no longer have, because I simply cannot remember how I found them. Even worse, I can’t even remember what they were.

Secondly, it solves a classification problem.

I sometimes struggle to decide under what category to save a new link, and this results either in a steadily growing taxonomy that becomes increasingly arcane and impenetrable with time, or to inconsistencies of where I put things.
Does something go under “Research” or “UC” or “KM”, and why did I have this folder called “NS”?

Storing multiple copies of links was a thought, but quite often Intranet links change, and it would become an administrative overhead to root out all the occurrences of a link each time.

Tagging potentially solves this because it is no longer bound to a canonical format on a specific machine, but rather takes on the nature of a relational database index, where a classification can be created dynamically by user-constructed query strings.

Unfortunately, the ability for sites like to handle Boolean search terms is at present very limited, and although sites like Connotea[1] allow a more structured search mechanism, they also do not yet allow a structured query language that is entirely user constructed.

At present I can use search with Boolean terms to logically AND tags and NOT tags, but I cannot deliberately exclude by period, language, origin, or person for example. To escape the overwhelming abundance and proliferation of the “soup”, I may for instance, need to exclude postings from a specific poster who I have identified to be prolific but untrustworthy. While this poster may have matched tags that I wish to include, I may still desire to exclude anything they have tagged, and to do this I need a meta-language that would look very much like a structured query language.

There seems to be some development along these lines with structured query languages such as “Squeal” (Spertus, 2007), WebSQL (Arocena, 2007), and Xcerpt (Furche, 2004).

However although they all have elements of it, none of those are specifically targeted at tag searching.

It cannot be long before a structured search language specifically encompassing tags  and meta-tags becomes available, and this would turn the “tag soup” into an instantly structured information subset – Of course that creates the dilemma of how and where one saves the search term itself for future use or reference.

Curiously, one of the very reasons that the “invisible” or “deep” web is opaque is because many large databases of information create classifications on the fly through user-constructed query strings of the kind I am suggesting. One hopes that tag searches would be open in a way that more is revealed, rather than driving everything into islands of information that are mostly hidden.

What I also found fascinating was that social media are curiously attractive, enjoyable, and emotionally “warm” in a way that traditional databases like EBSCO, EMERALD, and LEXUS/NEXUS are socially “cold”.  This “attractiveness” is seemingly unrelated to the level of actual knowledge acquisition or information quantity retrieved.

For example, the human-computing experiment called “ESP Game”[2] has already labeled over 10,000,000 images on the web by getting humans to work collaboratively without any tangible reward. The payoff for the individuals is somewhat in the act of participation in something socially useful – the identification of images in order to make them searchable, but is mainly in the simple pleasure that people get from playing with other people. The sensation of having an anonymous “partner” who is “in tune”, is strangely attractive, if not somewhat addictive.

The parallel between using other people’s discoveries as part of one’s own online heuristic, and normal human or even primate behaviour is to me, very striking.

All primate species appear to be highly motivated to, and to derive pleasure from, learning from others and leaving clues for others to find. In this way, social bookmarking appears to engage with some very ancient and well developed behaviour patterns, and thus fit snugly into the ergonomic requirements we have for information.

Perhaps it reveals a search for mutuality – people who like what I like, are interested in what interests me and are themselves therefore interesting to me.[i]

Rebecca Blood remarked that tagging led her to some surprising self-discovery, by looking post facto at what she had tagged, she came to realise that there were things that she was evidently interested in that she would not have previously said were her top interests. In some sense then, we are what we tag, or at least what we tag demonstrates actual interests as opposed to stated interests, and this self-revelation may be quite emancipatory, or at least informative.

On the negative side, there may be some serious privacy concerns. In a very real sense, you are what you seek – as was evidenced by the release of AOL search terms which enabled quick discovery of individual persons, even though their identities were anonymised.[3]

I am also not sure if it might actually amplify my biases by giving me an ability to screen out discordant information or see only agreeable information.

After having experimented with tagging, I am doubtful that canonical taxonomical systems will be replaced since these represent in effect a sort of hypothesis test in a truly scientific sense. We can pose what we think may be true, and by cementing this in a canonical classification, we open our claim to natural selection – will nature and reality prove it to be true or false, will it stand or fall. In these falsifiable claims – and a canonical taxonomy is indeed a truth claim in which we are able to put our beliefs to the test.


  1. “Squeal” (Spertus 2007) a structured query language for the web


  1. WebSQL (Arocena 2007)


  1. Xcerpt (Furche 2004) “declarative, rule-based query and transformation language for the Web”


[1] Connotea is located at and is published free of charge by the journal Nature.


[3] The release of AOL search strings allowed a researcher to quickly identify a Mrs.Thelma Arnold, even though she was identified only as “searcher #4417749”

[i] User of the participative labeling game at report a sense of please in finding a compatible partner that in itself serves as a reward.


Matthew Loxton is the director of Knowledge Management & Change Management at Mincom, and blogs on Knowledge Management. Matthew’s LinkedIn profile is on the web, and has an aggregation website at
Opinions are the author’s and not necessarily shared by Mincom, but they should be.

Reflections on Tagging Part I

April 17, 2010

Online tagging is a relatively new form of classification based on user-defined terms to associate online or local electronic texts, objects, or representations. Various authors regard this as a de novo phenomenon that will replace formal or canonical classification systems, but it is more plausible to consider social classification as more properly an adjunct to, rather than a replacement of, classical taxonomical systems.

The term folk taxonomy or folksonomy refers to the user-created taxonomy resulting from the “worn-path” of actual usage.


Humans are natural taxonomising machines, selectively acquiring information according to needs and desires, and classifying information and objects into categories that are learned, created, and perhaps even innate in the case of language syntax (Chomsky, 1957).

The advent of the web has enabled “living resources” as part of virtual communities built by mutual interest. (Hammond 2005: 4). These resources, unlike traditional libraries, allow classification to be independent of the information collection itself.

Most of us (those older than 20), can remember a time when information collections were most typically located in brick-and-mortar libraries, in which books, fiche, and other information objects and artifacts were stored in fixed hierarchies, themselves established in physical and canonically arranged index card systems.

These library classification systems, such as Dewey Decimal[1], MARC[2], and UDC[3] require professional training and expertise to implement and maintain, and librarians form a corpus whose cadres are often represented by official bodies such as the American Library Association which claims over 65,000 members[4]

These systems however, often prove unwieldy for the average person who may have only a passing knowledge of any particular classification system, and may also confront an artifact that poorly fits any classification system that they are familiar with.

This tension results in part due to:

  1. Lack of power and specificity in the classification system itself
  2. Multiple possible classification elements in a single information object
  3. Unfamiliarity by the user with the available formal classification systems


In the first case, a person may encounter a situation in which the power and range of the systems poorly covers the target object. For example the Dewey System is manifestly European in design and allows few index ranges for non-Anglophone and non-European subjects. It is thus predictable that some artifacts of different cultures and languages may not easily find a suitable classification. (Mansor, 2007).

In the second case, a single informational object may be classifiable under several distinct and potentially dynamic classifications. A book or fiche may be relatively static, but a person as an information object will not be. A person may now be middle-aged, have short hair, be brunette, and like whiskey. They may like cricket, be fit, and have aspirations suitable for a specific age – but these things will not always have been so, and will change again. They may go grey,  become old, and may lose some preferences, and gain or even regain additional ones. So where does a person classify themself? How do we classify objects that may be fluid, or even metamorphose over time? Even with static objects there can be difficulty in securing classification. In many instances post-modern art and literature have sorely tested the classification power of existing systems, and seemed to delight in producing this exact tension.[5]

In the third instance, even though there may be a large number of librarians, the information-user population greatly exceeds that of the subset of trained taxonomical professionals, to the point where the probability of a user being in a position to effectively classify something correctly according to any of the three systems listed above, is exceedingly small.

This leaves us with the goal of finding “A user-driven approach to organizing content” (Porter, 2005), perhaps through the advent of vast numbers of online users and the enormous power of the web to index specific physical objects through hypertext links, texts, and images. It may thus not be necessary for me to physically describe “Equivalent VIII”, since I can refer to an authoritative reference to it at the Tate gallery itself [6]

I could also make use of “social bookmarking” to draw the reader to it, but more importantly, to other seekers who had related searches.

The power of web browsers for locating shared informational resources as envisaged by Berners-Lee[7] was not unfortunately mirrored in the ability of most browsers for storing urls once they had been saved, and have traditionally followed a simple canonical file structure inherited from the early disk operating systems of the computers. In this schema, the user can choose to arrange the hierarchy and name the folders, but they are ill-structured to deal with objects having multiple possible or actual classifications, and thus still retain the discomfort of point #2 above, and also leave the user at the mercy of having to invent their own classification system without the benefit of a professional librarian to help.

How then to classify Equivalent VIII?

Enter “Mob indexing” (Morville 2005:134) 

What if we made use of human-computing and allowed the sheer mass of users to give a statistically-emergent set of classifications? – would large numbers of users settle on a stable structure without any overt discussion between them?

Thomas van der Wal refers to a “user-created bottom-up categorical structure development with an emergent thesaurus” as a “Folksonomy” (Morville 2005:136) in which we can use the discoveries made by other humans essentially as a cybernetic resource – by revealing the road-markers of other people who searched for something, one can browse the survivable troves of interconnected information links that other people have created.
By seeing and browsing what they had used to identify online information, we could have ready classifications left by numbers of other users.

We might further view Folksonomies as a “web2.0” phenomenon (O’Reilly, 2005) in which the “Wisdom of Crowds” (O’Reilly, 2005:7) and their massed tagging decisions lead to emergent taxonomical structures, and thus the “Trodden path” reveals ideal informational ergonomics that even expert-designed canonical forms may be unable to predict or represent – In this regard Shirky posits that Folksonomies are necessary because of difficulties in applying controlled vocabularies at the level of individual and informal users of information (Morville 2005:135).

Thus we can pave the “desire lines” to achieve controlled vocabularies of optimal utility (Merholz, 2004) by using the millions of online user’s tags.

Does this “tag soup” (Hammond 2005:4) lead however, to a chaotic situation in which users overwhelm meaning and structure by posting millions of ambiguous tags? – It is quite possible after all, that taggers will use the same term for different things, and different terms for the same things.

Golder reports that tag frequencies achieve stability rather than become chaotic, and that relative stasis is achieve at fewer than 100 bookmarks (Golder, undated), thus suggesting that in reality the “soup” becomes more congealed than liquefied.

Sifry posits that folksonomies are successful inter alia because people dislike “rigid taxonomy schemes”, but it is more accurate to say that what people dislike are rigid schemes that poorly match their needs. As studies have show, people greatly prefer reduced choice, as long as the options are simple, clear, and offer what they actually prefer. (Schwartz 2005, Godin 2003, Gilbert 2004, Gladwell 2004).

The key is thus to create formal hierarchies by deriving them from the “well worn path” and “desire lines” of actual unconstrained choices through the use of tagging.

In this sense, “Tagging” places the structure of classification outside the location of the data or information itself and potentially in the same way that the breakthrough of relational databases made in dynamic organization of data, tags may form the indices of an external user canonical structure, or simply be browsed and explored, and linked to by other users.

We have thus not replaced formal traditional forms of organization of information, as much as created better, more ergonomic ways to give rise to them, and we can retain our ability to use structured hierarchies or canonical structures as a testable truth claims, but have a better fit to the ergonomical requirements of information users.

By this process, we also escape the situation in which the “intended and unintended eventual users of the information are disconnected from the process.” (Mathes 2004:3), since they will have become part of the process of taxonomical creation itself – The user gives rise to its eventual structure by their acts of information navigation.



While folksonomies are indeed revolutionizing our ability to categorize and classify, particularly in internet-based or online information resources, canonical and traditional taxonomies are unlikely to disappear. The greatest gain from folksonomies is likely to be derivative taxonomies, or classifications resulting from “worn-path” actual behavior of large populations of users with large volumes of transactions. This provides a form of statistical smoothing and actuality-based classification events that will yield the best fit to information classification in its most human-ergonomical representation. As attractive and comfortable as this may be, however, it is unlikely to remove planned or formal taxonomies where these either serve niche functions, or where the ability to make and test truth claims by means of canonical or other formal and hierarchical taxonomies exists. Not only should formal taxonomies exist, but they should be derived from the “well worn paths” of what people actually select when unconstrained but guided in choice.



  1. Chomsky, 1957, “Syntactic Structures”, Chomsky, N. Humanities Press, 1957
  2. Gilbert 2004, “Why are we happy?” TedTalks  Last accessed June 2007
  3. Gladwell 2004 “spaghetti sauce” TedTalks  Last accessed June 2007
  4. Godin 2003, “Sliced bread”, TedTalks Last accessed June 2007
  5. Golder,  “The Structure of Collaborative Tagging Systems”
  6. Hammond 2005 “Social bookmarking tools” A general review Hammond, T., Hannay, T., Lund, B. and Scott, J. (2005).. In D-Lib Magazine. Vol. 11, No. 4.
  7. Mansor 2007, “Library of Congress classification: catalogers’ perceptions of the new Subclass KBP” Mansor, Y. Younis al-Shawabikah, Y. in  Library Review 2007 Volume: 56 Issue: 2 Page: 117 – 126
  8. Mathes 2004. “Folksonomies: Cooperative Classification and communication through shared Metadata”. Mathes, A. last accessed 5 September 2007
  9. Merholz, 2004 “Metadata for the Masses”, at, last accessed 28 July 2007
  10. Morville 2005 “The Sociosemantic Web”. In Ambient Findability. Ch. 6. (O’Reilly, CA, USA.)
  11. O’Reilly 2005 “What is Web2.0: Design patterns and business models for the next generation of software”, O’Reilly, T at last accessed Aug 30 2007
  12. Porter 2005. “Folksonomies: A User-Driven Approach to Organising Content. User Interface Engineering” Porter, J. at last accessed September 6 2007
  13. Schwartz 2005, “The paradox of choice”, TedTalks Last accessed June 2007


[1] DDC Home last accessed 14 March 2010

[2] Library of Congress at last accessed 14 March 2010

[3] Universal Decimal Classification  at last accessed 6th September 2007

[4] ALA at last accessed August 13 2007

[5]  See for example “Equivalent VIII”  1966 at , last accessed September 6th 2007 

[6] Equivalent VIII at the Tate Gallery . last accessed 14 March 2010

[7] Tim Berners-Lee biography at , last accessed September 1st 2007


Matthew Loxton is the director of Knowledge Management & Change Management at Mincom, and blogs on Knowledge Management. Matthew’s LinkedIn profile is on the web, and has an aggregation website at
Opinions are the author’s and not necessarily shared by Mincom, but they should be.

%d bloggers like this: