Posts Tagged ‘infoglut’

Controlled Vocabulary

August 4, 2010


Language is a powerful thing, it’s not only a prime medium of expression, but it in turn shapes concepts and thinking – terminology frames concepts and makes some ideas more expressible and others less so – it emphasizes or diminishes in turn. Some ideas flow naturally from the syntax and terminology of the language in use and others are not even expressible.

In real terms an argument or proposal resonates better if it is expressed in the dominant terminology, and seems weaker and off-key if it doesn’t, and due to concision effects and psychological set, it allows or limits innovation.

Inconsistent use of jargon and terminology results in higher cost of translation and localization, less effective training and education materials, and raises the cost of product support.

The Foundational Nature of Language

From an Organizational Psychology point of view, Language in the form of endemic jargon, special terms and terminology, and accepted forms of speech and protocol are part of the social structure of an organization.

For example, Chao (1994) proposes six dimensions of Organizational Socialization:

  1. History

  2. Language

  3. Politics

  4. People

  5. Organizational Goals and Values

  6. Performance Proficiency

Language deserves a special mention though because it is through language itself that the other dimensions are expressed and how strongly they are communicated. Historical narratives are elevated or decreased in prominence according to the terminology used to relate them, and so too are the organizational politics detailed and distributed according to the rules and parameters of internal language.

Organizational goals are couched in terms of organizational metaphors, and proficiency itself is measured according to articles of the organizational terminology.

Language thus forms part of what topics are allowable by means of both the “correct” protocols, but also at a more fundamental level by means of the terminology itself.

In this sense, Single-Loop Learning and Type I homeostatic systems in an organization (Argyris1987) are strongly influenced and delimited by the vocabulary that is allowable.

User Experience

A major part of user satisfaction is the feeling of confidence they feel in the product (whether that be using a transit system or a software suite), and in many cases also the degree to which use requires mental computation. Unwelcome processing or decision-making requirements result in low satisfaction.

A major part of this in turn is the continuity of the information architecture – the way terms confirm expectations and make sense, and are used where and when expected. While most suppliers of products take care about simple things such as a hyperlink anchor text being immediately visible on the landing page, many do not consider how multiple designers and engineers may use different text for the same meaning in different parts of the product, its documentation, its sales collateral, its training, and in communication related to the product.

Encountering terminology in unfamiliar context undermines and attenuates information scent, and reduces the user’s confidence and overall satisfaction.

OD & L10N/I18N

Cost-effective Internationalization (I18N) and Localization (L10N) depend on the source language usage being tightly controlled and not having a significant degree of equivocation and ambiguity. The more a single term is used for multiple meanings or multiple terms used for the same meaning, the higher the complexity of translation, the higher the bulk of terms to be translated, and the lower the coherence of the final translated text.

Machine Translation is powerless to fix this, and simply multiplies the variances – requiring lengthy and costly human involvement each time.

Inconsistent terminology equates to duplicated effort and difficulties when it comes to translation of product, documentation, and training materials – greatly increasing the complexity, time, and cost of translation. Creating meaningful Translation Memories when the terminology is overlapping and inconsistent is very difficult, and tends to lead to an even worse degree of inconsistency in all the translated languages.

Likewise, training becomes more costly and less effective when terminology is used with any significant degree of variation in meaning.

Knowledge Management

Most Knowledge-bases rely on keyword searches, and the more sophisticated systems also use tagging, which at heart is still a keyword search and in its best form gathers tags from a Folksonomy.

Unfortunately the power of search-engines in this situation results in very high retrieval but low precision. This results in infoglut and lower search effectiveness, and thus a significant impediment to use of Knowledge-bases to augment knowledge-workers such as customer-support staff, and lowers effective re-use of knowledge.

Since a major component of cost-reduction and quality-improvement in customer-support hinges on use of knowledge-bases, terminology control is a significant factor.

Branding and Market Mastery

Part of gaining mastery or dominating a market niche is having a degree of control over the terminology and therefore the expressible concepts – The degree of influence one player has over the terminology translates directly into their freedom of movement within the domain, the cost incurred in terms of effort to thrive, and the extent to which discourse tends to be channeled in their favor.

At the very least, a clear brand and value proposition relies on message consistency across the many external communications an organization makes – be they the deliberate marketing efforts, training materials, or even HR recruiting information. The terminology used by Recruiters should for example be consistent with those of Sales and Training Materials, and so on. Any one department or group that injects noise will reduce the brand coherence and effectiveness.

Gaining Control

Influence over terminology is not something one can beg, buy, or steal – it can only be attained by thought leadership. In other words, good knowledge management practices around intellectual expression.

It is determined by who is disseminating authoritative information, who provides attractive ideas, and who is leading in thought value – and who gets to saturate the frame of reference and the concept terrain.

An early step in gaining more control over the influence of language is to formalize usage and to self-consciously construct a lexicon detailing what terms mean and where they are used, and it sets the stage for searchable knowledge-bases, single-sourced documentation, and consistent branding.

A low-cost approach is to establish an internal terminology wiki along the lines of wikipedia, and to build and refine a corporate lexicon in three phases of limited crowdsourcing:

  1. Open invitation to internal staff

  2. Invitation to business partners (and industry luminaries) to contribute

  3. Invitation to customers to contribute

Step 1 requires some preparation to identify people who are influential in terminology as well as obtaining buy-in from content-owners and domain experts.

Steps 2&3 are a Marketing bonanza that yield many spinoff benefits.

Making the terminology visible in this manner is not just a step in protecting against erosion of meaningful terminology but also forms part of a knowledge-management approach to organizational-learning.


If an organization is inconsistent in its use of terminology and language, if it vacillates on meaning and implication, if terminology is used hesitantly and passively – then the information scent attenuates, and the audience becomes uncertain and less likely to agree with the message or see the source as trustworthy or authoritative. In addition it leads to escalating costs and loss of effectiveness in training & development, and significant barriers to cost-effective translation & localization.

To get in a position where you influence the discourse and the frame of reference in your market niche you must settle on a controlled vocabulary, use it strongly, and use it consistently over every part of your products, documentation, and communications.

The place to start is inside the company – to practice, refine, and then deliver.


Two areas I left out but deserve mention are the effects on Content Management and Health &Safety.
Inconsistent terminology can be a significant safety risk, and this is a topic that deserves its own paper.

Please contribute to my self-knowledge and take this 1-minute survey that tells me what my blog tells you about me. – Completely anonymous.


Argyris C & Schön D (1987) Argyris C & Schön D. “What is an organization that it may learn”. (1987) : .

Chao G, O’Leary-Kelly A, Wolf S et al. (1994) Chao G, O’Leary-Kelly A, Wolf S et al.. “Organizational socialization : its content and consequences”. Journal of Applied Psychology (1994) 79: pp. 730-749.


Matthew Loxton is a Knowledge Management professional and holds a Master’s degree in Knowledge Management from the University of Canberra. Mr. Loxton has extensive international experience and is currently available as a Knowledge Management consultant or as a permanent employee at an organization that wishes to put knowledge to work.

‘I found it on the Internet’ : The use of internet search engines to retrieve information.

May 8, 2010

Search engines have dramatically altered the information landscape over the last two decades, and have provided information ecosystems for many categories of information users –  ecosystems that previously did not exist and which now empower them and give them access and range that was previously only theoretical.
WiFi access and the use of handheld devices to access the web “anywhere, anytime” have made the web a ubiquitous information resource for the layperson at the same time that the increased power of advanced web-spidering and search engines make both precision and power user-malleable.

However, much of the information resources on the Internet are invisible to the web and are not spidered by the commonly used search-engines, which creates a divide between what is available to the general public and that reachable by the academic researcher.


 We live in an era of both unprecedented ubiquity of man-made information sources, as well as an immediacy that has not existed before. In his book “Cosmos”, Carl Sagan puts the size of the collection held at the library of Alexandria as running as large as a million scrolls, in comparison the website “” gives the number of pages indexed on the Internet at 19.86 billion pages (Saturday, 13 March, 2010)[i].
Clearly we have reached a degree of information availability that beggars previous collections.

However, this has come with some challenges regarding the technology itself that prioritizes technical abilities over purely literacy aspects and in a very real sense the Internet can be seen as occupied by a “special club” with a small membership of “geeks” (Morville 2005) who have privileged access to information by virtue of special knowledge, devices, and information techniques.

Access to internet resources has an entry-bar set by technology in terms of computer hardware and software, but also a special heuristic techniques (Effken, Brewer et al. 2003)

The economic force unlocked by the linking of advertising with provision of free-to-use web-based search engines led to the so-called “Search-engine wars” in which vendors apply a range of different tactics to woo the public user whilst competing for subscribers. This drives not only the functionality offered by vendors, but also the range of searchable categories of informational artifacts. It additionally leads to some vendor specialization, such as concept-search like Kartoo[ii] and meta-search engines like Dogpile and those dealing with specific media such as YouTube.

”Apart from standard web search, search engines offer other search services such as image search, news search, mp3 music search and product price search. The current search engine wars will mean that there will be fierce competition between search engines to lure users to use their services, which is good news for the consumer of search, at least in the short term.” (Levene 2006)

This is not to say that the results of search-engines cannot be manipulated or “gamed” by both the people or organizations acting as information sources, as well as by third parties who may wish to influence the behavior of search engines. The term “Google-bombing” reflects an aspect of this practice.

The user thus needs to be aware that some participants may “game” the system and manipulate search-engines to artificially raise the search ranking of a specific site or page. (Poremsky 2004).

In order to combat this practice, and to make search-engines as competitive as possible, the vendors constantly engage in search-engine optimization, and the user should bear in mind that the algorithms and techniques used by search-engine vendors are trade secrets and subject to change, and that specific sites may be systematically or even deliberately selected or de-selected based on somewhat inscrutable rules. Web sites may also trigger anti-gaming algorithms designed to detect attempts to manipulate the search-engines and be removed form the result set entirely (the so-called “google death-penalty”), and would be entirely unknown to the user. (Levene 2006).

This thrust and parry relationship between the information suppliers and the search-engine vendors has given rise to an industry of supplying various tricks and techniques to safely influence visibility and palatability of information to search-engines (Kent 2004), as well as spawning guide-books for webmasters (Reynolds 2004) and every imaginable aspect of “Findability” (Morville and Rosenfeld 2006).

Information abounds on topics ranging from “Search-zones”, to the need for and creation of thesauri to catch miss-spelling or alternative and preferred terms (Poremsky 2004)

These have been so successful that they have created a further challenge to the researcher or user that has been aptly termed by some as “infoglut” (Herhold 2004)  – that is, an overwhelming size of an informational query result-set such that a manageable hand-full of appropriate text is often not what is retrieved, but rather a result set that becomes simply too large to handle as it approaches several thousand or million texts.

As Herhold (2004) puts it:

“The implication for the design of retrieval languages is that disambiguation is a serious and very large problem. It is the homonym problem writ large, writ in the extended sense of including polysemy and contextual meaning, that is the chief cause of precision failures-i.e., infoglut-in retrieval.”

Various stratagems and approaches to infoglut from the information provider’s point of view have been suggested, ranging from clever use of information-mapping (Kim, Suh et al. 2003), to the creation of portals (Firestone and McElroy 2003) in which relevancy is driven by proximity to the user, measured in mouse-clicks[iii].
On the user end of the equation there are also guides for users and researchers including use of subscription-databases and intelligent agents (Foo and Hepworth 2000)

A very large result-set obviously challenges the information-processing capacity of the user, but also calls into question the heuristic technique used, bringing into light two distinct elements that bear attention, namely the precision of a query result, and its recall. (Herhold 2004, Pao 1989)

Precision : The proportion of retrieved documents which are also relevant. A low precision implies that most of the documents retrieved were not relevant, thus info-junk.

 Recall : The proportion of all relevant documents that were found and retrieved. A low recall factor speaks of the effectiveness of the query in finding the universe of all documents that are relevant and also speaks to the phenomenon of the “invisible web” that is not targeted by search-engines (Smith 2001)

This invisible or “deep web” is hidden from view primarily because the information sources are not amenable to discovery by the typical search-engines that troll the “surface web” and thus forms an invisible web (Henninger 2003) often estimated as being orders of magnitude bigger than the total available for search – “Deep web” being 400-550 times bigger than the surface (Bergman 2007)

Smith (2001) explains this in terms of linking and permanence:

”Traditional search engines create their indices by spidering or crawling surface Web pages. To be discovered, the page must be static and linked to other pages. Traditional search engines can not “see” or retrieve content in the deep Web — those pages do not exist until they are created dynamically as the result of a specific search. Because traditional search engine crawlers can not probe beneath the surface, the deep Web has heretofore been hidden.” (Smith 2001)

Part of dealing with these different aspects of information retrieval is to deliberately adopt a technique or heuristic to searching.


The dilemma of needing terms and knowledge to find information, but needing access to usable information in order to know terms to use is approached in a discursive browse-search-browse pattern reminiscent of how people search for food. Heuristics is the partially formalized approach to the employment of various information-stratagems.
According to Spink & Cole, it is likely that human information-seeking behavior is a evolutionary correlate to other older foraging patterns (Herhold 2004), and thus not just an individualistic behavior, but a deeply social one.

Examples of how a user (or provider) of information can approximate these patterns include social-bookmarking (Hammond, Hannay et al. 2005), and tagging.

These drive a social taxonomy that makes searching and finding on the web a more ergonomically human activity through both the social aspect of observing what other people tag and being able to create information-paths through a folk-taxonomy or “folksonomy” (Mathes 2004, Porter 2005)

A similar approach is being adopted by many retailers on the web, where finding an item often results in a list of other items that users who bought the item under view “also bought”. E-stores such as Amazon or Barnes & Noble are thus able to guide purchases with collaborative filtering using patterns of other users.(Anderson 2004). This has ramifications for the business user who might wish to know what their peers are looking at.

Information tools accessible on the web that cater for the social aspects of information-seeking have been made available by both entrepreneurial groups such as Yahoo in their tagging tool, as well as by scientifically orthodox publications such as the journal Nature with their freeware tool and site connotea.

Folksonomy is therefore an applicable tool for the business researcher as well as the general public.

The ability to identify information quality is a further dimension, since the quality of information involves inter alia “the properties of accuracy, precision, credibility, currency, pertinence, precision, relevance, reliability, simplicity and validity.” (Evernden and Evernden 2003)

Information quality tends to deteriorate over time (Evernden and Evernden 2003) which is problematic in any collection where the architecture does not require the dating of items. It is important for the seeker to use this as a guide as to the trustworthiness of a collection.

A further available heuristic tactic is to use humans as search catalysts in a more direct and old-fashioned manner – Many library services provide library research assistants who are skilled and studied in taxonomies and search techniques, and are able to provide suggestions for search strings and databases.[iv]

For the seeker, parts of this invisible web are exposed via academic and research search tools operating on organizational or subscription collections, some of which are accessible through citation-manager software such as EndNote[1] that have search and connection tools,

The Future

Crystal balls have proven notoriously inaccurate in seeing into the future with regards the Internet, and probably the best I can manage is to say that things will get bigger but more user-friendly, and that the social-bookmarking trends will continue. The drive towards “web 2.0” Social Networking and “web3.0” semantic-web technologies, and contextual search tools will doubtless shape both user-interface design and make more, and more kinds of things available, as well as continue to make available texts and artifacts previously only available in hardcopy media.

Information architecture is likely to become increasingly important as collections increase in diversity and size (Morville and Rosenfeld 2006, Batley 2007).

Privacy is also likely to become increasingly important as Internet tools make it easier to identify users purely from the search queries they use – This was made clear when  an AOL user was identified purely through her use of search terms[2] (Barbaro and Zeller 2006). The user assumption that web activity is anonymous is unwarranted, and has implications for researchers whose subject-matter might be politically or socially controversial or disclose their business intent. There are thus serious privacy concerns with regards search-engines (Cohen 2005).


  1. Anderson, C. a. (2004). “The Long Tail.” Wired Magazine 12(10).      
  2. Barbaro, M. and T. Zeller (2006). A Face Is Exposed for AOL Searcher No. 4417749. New York Times. New York.      
  3. Batley, S. (2007). Information architecture for information professionals. Oxford, Chandos.               
  4. Bergman, M. (2007). “The Deep Web: Surfacing Hidden Value.” Journal of Electronic Publishing.   
  5. Cohen, A. (2005). What Google Should Roll Out Next: A Privacy Upgrade. New York Times. New York.      
  6. du Preez, M. (2002). “Indexing on the Internet.” MOUSAION 20(1): 109-122.
  7. Effken, J. A., B. B. Brewer, et al. (2003). “Using computational modeling to transform nursing data into actionable information.” Journal of Biomedical Informatics 36(4-5): 351-361.             
  8. Evernden, R. and E. Evernden (2003). Information First:Integrating Knowledge and Information Architecture for Business Advantage. Oxford, Butterworth-Heinemann: 1-27.              
  9. Firestone, J., M. and M. McElroy, W. (2003). Key issues in the new knowledge management. Burlington MA, Elsevier Science.    
  10. Foo, S. and M. Hepworth (2000). The implementation of an electronic survey tool to help determine the information needs of a knowledge-based organization.           
  11. Hammond, T., T. Hannay, et al. (2005). “Social bookmarking tools (I): A general review.” D-Lib Magazine 11(4).           
  12. Henninger, M. (2003). Searching Digital Sources. The Hidden Web: Finding quality information on the net. Sydney, Australia, UNSW Press.
  13. Herhold, K. (2004). “The Philosophy of Information.” Library Trends 52(3): 373-665.      
  14. Kent, P. (2004). Surveying the Search Engine Landscape. Search Engine Optimisation for Dummies, Wiley.     
  15. Kim, S., E. Suh, et al. (2003). “Building the knowledge map: an industrial case study.” Journal of Knowledge Management 7(2): 34-45.
  16. Levene, M. (2006). Navigating the Web. An Introduction to Search Engines and Web Navigation. London, Addison Wesley: 174-184.           
  17. Loxton, M. H. (2003). “Patient Education: The Nurse as Source of Actionable Information.” Topics in Advanced Practice Nursing eJournal 3.               
  18. Mathes, A. (2004) Folksonomies: Cooperative Classification and communication through shared Metadata.  Volume,  DOI:
  19. Morville, P. (2005). The Sociosemantic Web. In Ambient Findability. CA, O’Reilly.          
  20. Morville, P. and L. Rosenfeld (2006). Information Architecture for the World Wide Web. California, O’Reilly Media.       
  21. Morville, P. and L. Rosenfeld (2006). Push and Pull. Information Architecture for the World Wide Web. S. St.Laurent. California, O’Reilly Media.               
  22. O’Reilly, T. (2005). “What Is Web 2.0 : Design Patterns and Business Models for the Next Generation of Software.”   Retrieved 28 August, 2007, from   
  23. Pao, M. (1989). Information Retrieval.  
  24. Poremsky, D. (2004). Search Engines and How they Work In Google and Other Search Engines. Berkeley, CA, Peachpit Press: 3-18.           
  25. Porter, J. (2005). “Folksonomies: A User-Driven Approach to Organising Content.” User Interface Engineering  Retrieved September 6, 2007, from     
  26. Reynolds, J. (2004). Search Engines and Directories. The Complete E-Commerce Book, CMPBooks: 233-247.               
  27. Smith, B. (2001) Getting to know the Invisible Web. Library Journal.Com Volume,  DOI:                


[1] EndNote is provided by the Thomson-Reuters group. See

[2] The release of AOL search strings allowed a researcher to quickly identify a Mrs.Thelma Arnold, even though she was identified only as “searcher #4417749”

[i] Which is really curious because on 21st September 2008 is said there were 27.61 billion pages. Did the web shrink or is the tool a bit buggy?

[ii] Sadly defunct now

[iii] Using “mouse-click” distance as a measure is a very effective way to put information at hand

[iv] Many libraries staff a 24×7 online helpdesk to guide patrons in finding materials. Most of these are staff pooled across many institutions and locations.


Matthew Loxton is the director of Knowledge Management & Change Management at Mincom, and blogs on Knowledge Management. Matthew’s LinkedIn profile is on the web, and has an aggregation website at
Opinions are the author’s and not necessarily shared by Mincom, but they should be.

Information Overload, or just poorly designed information?

February 4, 2010

 We have all heard the plea about how modernity has deluged us with information, and have read the articles about how today we are bombarded with more information than ever before.
It evokes images of a simpler time, a time where information arrived at a leisurely pace.

I think that the image is dead wrong

For one thing, it ignores what and who we are, and where we came from – and only counts artificial information as information.
Picture yourself a hundred thousand years ago standing on the African savannah – no TV to be sure, no billboards, no TV Evangelists, no Viagra ads, no neon signs, no spam email, and no junk in your mailbox.
The bush is dead quiet, nothing stirring, all is peaceful – right?

Well no, not even close.

There would be a cacophony of sounds, movement all around, smells in abundance, and sensations flooding in from every square centimetre of your skin. Light and shadow, heat from the sun, the breeze, things buzzing, flying, crawling, hooting, rustling, creeping, galloping, even things landing on you, crawling on you – All mingled in an absolute deluge of sensory information.
A brain the size of an orange would probably process all that with ease, and we have a gigantic brain*

We are kitted out with information processing equipment that makes the largest computer look like a wobbly abacus with a few strings missing – in around 2000 the comparison was that a single human brain had the equivalent processing power of all the world’s computers put together. The most complex thing in the universe, and an organ that eats up the lion’s share of energy in your body.

So what’s the deal with “Infoglut” and “Information Overload” ?

My argument is that it isn’t the amount of information, or even the rate of change that is a problem – a person living in the Amazonian jungle gets information change at a far higher rate than a stockbroker, but that it is an issue of fit or Informational Ergonomics.
Information that fits our evolved processing capabilities is dealt with with consummate ease, but information that poorly matches our innate processing profile is a problem.
The time and effort required to decode and assimilate a poorly designed chunk of information is a problem.

Present us with loads of badly composed artificial information and we quickly saturate and our performance degrades steeply – and we exhibit all the natural responses: irritation, anger, stress, avoidance, etc.
The answer to infoglut isn’t to have less information, it is to have better information, where “better” means “information crafted to fit the hand”. The problem is not to reduce information, but to limit the amount that is high in unnecessary decoding and processing costs.

Information Ergonomics, not information reduction! – make the information fit the human, not the other way around.

That is my story, and I am sticking to it!

Matthew’s LinkedIn profile is on the web at

*Our brains are way bigger than they should be given our body size, and there are strong arguments that the cause of the oversize brain is the complexity of social signalling and decoding and tracking social interactions with other humans doing exactly the same thing.

%d bloggers like this: