Posts Tagged ‘tagging’

Reflections on Tagging Part I

April 17, 2010

Online tagging is a relatively new form of classification based on user-defined terms to associate online or local electronic texts, objects, or representations. Various authors regard this as a de novo phenomenon that will replace formal or canonical classification systems, but it is more plausible to consider social classification as more properly an adjunct to, rather than a replacement of, classical taxonomical systems.

The term folk taxonomy or folksonomy refers to the user-created taxonomy resulting from the “worn-path” of actual usage.


Humans are natural taxonomising machines, selectively acquiring information according to needs and desires, and classifying information and objects into categories that are learned, created, and perhaps even innate in the case of language syntax (Chomsky, 1957).

The advent of the web has enabled “living resources” as part of virtual communities built by mutual interest. (Hammond 2005: 4). These resources, unlike traditional libraries, allow classification to be independent of the information collection itself.

Most of us (those older than 20), can remember a time when information collections were most typically located in brick-and-mortar libraries, in which books, fiche, and other information objects and artifacts were stored in fixed hierarchies, themselves established in physical and canonically arranged index card systems.

These library classification systems, such as Dewey Decimal[1], MARC[2], and UDC[3] require professional training and expertise to implement and maintain, and librarians form a corpus whose cadres are often represented by official bodies such as the American Library Association which claims over 65,000 members[4]

These systems however, often prove unwieldy for the average person who may have only a passing knowledge of any particular classification system, and may also confront an artifact that poorly fits any classification system that they are familiar with.

This tension results in part due to:

  1. Lack of power and specificity in the classification system itself
  2. Multiple possible classification elements in a single information object
  3. Unfamiliarity by the user with the available formal classification systems


In the first case, a person may encounter a situation in which the power and range of the systems poorly covers the target object. For example the Dewey System is manifestly European in design and allows few index ranges for non-Anglophone and non-European subjects. It is thus predictable that some artifacts of different cultures and languages may not easily find a suitable classification. (Mansor, 2007).

In the second case, a single informational object may be classifiable under several distinct and potentially dynamic classifications. A book or fiche may be relatively static, but a person as an information object will not be. A person may now be middle-aged, have short hair, be brunette, and like whiskey. They may like cricket, be fit, and have aspirations suitable for a specific age – but these things will not always have been so, and will change again. They may go grey,  become old, and may lose some preferences, and gain or even regain additional ones. So where does a person classify themself? How do we classify objects that may be fluid, or even metamorphose over time? Even with static objects there can be difficulty in securing classification. In many instances post-modern art and literature have sorely tested the classification power of existing systems, and seemed to delight in producing this exact tension.[5]

In the third instance, even though there may be a large number of librarians, the information-user population greatly exceeds that of the subset of trained taxonomical professionals, to the point where the probability of a user being in a position to effectively classify something correctly according to any of the three systems listed above, is exceedingly small.

This leaves us with the goal of finding “A user-driven approach to organizing content” (Porter, 2005), perhaps through the advent of vast numbers of online users and the enormous power of the web to index specific physical objects through hypertext links, texts, and images. It may thus not be necessary for me to physically describe “Equivalent VIII”, since I can refer to an authoritative reference to it at the Tate gallery itself [6]

I could also make use of “social bookmarking” to draw the reader to it, but more importantly, to other seekers who had related searches.

The power of web browsers for locating shared informational resources as envisaged by Berners-Lee[7] was not unfortunately mirrored in the ability of most browsers for storing urls once they had been saved, and have traditionally followed a simple canonical file structure inherited from the early disk operating systems of the computers. In this schema, the user can choose to arrange the hierarchy and name the folders, but they are ill-structured to deal with objects having multiple possible or actual classifications, and thus still retain the discomfort of point #2 above, and also leave the user at the mercy of having to invent their own classification system without the benefit of a professional librarian to help.

How then to classify Equivalent VIII?

Enter “Mob indexing” (Morville 2005:134) 

What if we made use of human-computing and allowed the sheer mass of users to give a statistically-emergent set of classifications? – would large numbers of users settle on a stable structure without any overt discussion between them?

Thomas van der Wal refers to a “user-created bottom-up categorical structure development with an emergent thesaurus” as a “Folksonomy” (Morville 2005:136) in which we can use the discoveries made by other humans essentially as a cybernetic resource – by revealing the road-markers of other people who searched for something, one can browse the survivable troves of interconnected information links that other people have created.
By seeing and browsing what they had used to identify online information, we could have ready classifications left by numbers of other users.

We might further view Folksonomies as a “web2.0” phenomenon (O’Reilly, 2005) in which the “Wisdom of Crowds” (O’Reilly, 2005:7) and their massed tagging decisions lead to emergent taxonomical structures, and thus the “Trodden path” reveals ideal informational ergonomics that even expert-designed canonical forms may be unable to predict or represent – In this regard Shirky posits that Folksonomies are necessary because of difficulties in applying controlled vocabularies at the level of individual and informal users of information (Morville 2005:135).

Thus we can pave the “desire lines” to achieve controlled vocabularies of optimal utility (Merholz, 2004) by using the millions of online user’s tags.

Does this “tag soup” (Hammond 2005:4) lead however, to a chaotic situation in which users overwhelm meaning and structure by posting millions of ambiguous tags? – It is quite possible after all, that taggers will use the same term for different things, and different terms for the same things.

Golder reports that tag frequencies achieve stability rather than become chaotic, and that relative stasis is achieve at fewer than 100 bookmarks (Golder, undated), thus suggesting that in reality the “soup” becomes more congealed than liquefied.

Sifry posits that folksonomies are successful inter alia because people dislike “rigid taxonomy schemes”, but it is more accurate to say that what people dislike are rigid schemes that poorly match their needs. As studies have show, people greatly prefer reduced choice, as long as the options are simple, clear, and offer what they actually prefer. (Schwartz 2005, Godin 2003, Gilbert 2004, Gladwell 2004).

The key is thus to create formal hierarchies by deriving them from the “well worn path” and “desire lines” of actual unconstrained choices through the use of tagging.

In this sense, “Tagging” places the structure of classification outside the location of the data or information itself and potentially in the same way that the breakthrough of relational databases made in dynamic organization of data, tags may form the indices of an external user canonical structure, or simply be browsed and explored, and linked to by other users.

We have thus not replaced formal traditional forms of organization of information, as much as created better, more ergonomic ways to give rise to them, and we can retain our ability to use structured hierarchies or canonical structures as a testable truth claims, but have a better fit to the ergonomical requirements of information users.

By this process, we also escape the situation in which the “intended and unintended eventual users of the information are disconnected from the process.” (Mathes 2004:3), since they will have become part of the process of taxonomical creation itself – The user gives rise to its eventual structure by their acts of information navigation.



While folksonomies are indeed revolutionizing our ability to categorize and classify, particularly in internet-based or online information resources, canonical and traditional taxonomies are unlikely to disappear. The greatest gain from folksonomies is likely to be derivative taxonomies, or classifications resulting from “worn-path” actual behavior of large populations of users with large volumes of transactions. This provides a form of statistical smoothing and actuality-based classification events that will yield the best fit to information classification in its most human-ergonomical representation. As attractive and comfortable as this may be, however, it is unlikely to remove planned or formal taxonomies where these either serve niche functions, or where the ability to make and test truth claims by means of canonical or other formal and hierarchical taxonomies exists. Not only should formal taxonomies exist, but they should be derived from the “well worn paths” of what people actually select when unconstrained but guided in choice.



  1. Chomsky, 1957, “Syntactic Structures”, Chomsky, N. Humanities Press, 1957
  2. Gilbert 2004, “Why are we happy?” TedTalks  Last accessed June 2007
  3. Gladwell 2004 “spaghetti sauce” TedTalks  Last accessed June 2007
  4. Godin 2003, “Sliced bread”, TedTalks Last accessed June 2007
  5. Golder,  “The Structure of Collaborative Tagging Systems”
  6. Hammond 2005 “Social bookmarking tools” A general review Hammond, T., Hannay, T., Lund, B. and Scott, J. (2005).. In D-Lib Magazine. Vol. 11, No. 4.
  7. Mansor 2007, “Library of Congress classification: catalogers’ perceptions of the new Subclass KBP” Mansor, Y. Younis al-Shawabikah, Y. in  Library Review 2007 Volume: 56 Issue: 2 Page: 117 – 126
  8. Mathes 2004. “Folksonomies: Cooperative Classification and communication through shared Metadata”. Mathes, A. last accessed 5 September 2007
  9. Merholz, 2004 “Metadata for the Masses”, at, last accessed 28 July 2007
  10. Morville 2005 “The Sociosemantic Web”. In Ambient Findability. Ch. 6. (O’Reilly, CA, USA.)
  11. O’Reilly 2005 “What is Web2.0: Design patterns and business models for the next generation of software”, O’Reilly, T at last accessed Aug 30 2007
  12. Porter 2005. “Folksonomies: A User-Driven Approach to Organising Content. User Interface Engineering” Porter, J. at last accessed September 6 2007
  13. Schwartz 2005, “The paradox of choice”, TedTalks Last accessed June 2007


[1] DDC Home last accessed 14 March 2010

[2] Library of Congress at last accessed 14 March 2010

[3] Universal Decimal Classification  at last accessed 6th September 2007

[4] ALA at last accessed August 13 2007

[5]  See for example “Equivalent VIII”  1966 at , last accessed September 6th 2007 

[6] Equivalent VIII at the Tate Gallery . last accessed 14 March 2010

[7] Tim Berners-Lee biography at , last accessed September 1st 2007


Matthew Loxton is the director of Knowledge Management & Change Management at Mincom, and blogs on Knowledge Management. Matthew’s LinkedIn profile is on the web, and has an aggregation website at
Opinions are the author’s and not necessarily shared by Mincom, but they should be.

%d bloggers like this: