Pursuant to my last post, Michelene Orteza (@freelancelib) asked on Twitter, does tagging work? I replied, work for whom? For what purpose(s)? The answer to that is a lot longer than 140 characters, so I’m glad she blogged it out. My answer is longer than a comment, so I’m blogging it out too.
First, she’s quite right to call me out on the stuffy vs. hip, generational divide angle. I do think there’s an ethos common to both tagging and my generation, but I can’t extend that to any sort of generational-divide argument, not least because I don’t know enough about other people’s generations.
The rest of my reply gets back to my earlier question of “for whom? for what purposes?” And for this, I’m going to talk about three fallacies.
The first fallacy is that the answer to “for whom” is always “academic researchers, who need comprehensive and unimpeachable answers to every question”. When I was researching tagging & controlled vocabularies for a final project in LIS 415, I got the impression that librarians often are acting on that assumption. That was the vibe I got off Orteza’s post as well (though she is free to correct me!). And, yes, that’s a valid audience and information need, but it is not the only audience or information need.
We organize information for a variety of audiences: academic researchers, yes, but also the general public, or people in a particular organization or who share an interest, or our social circle, or ourselves. The Flickr tag “me” is not so useful when used to search the entirety of Flickr (…unless, I suppose, you are interested in identity and self-presentation). It’s awfully useful when the set of photos is mine, and the audience is myself.
And we organize, and query, information for a variety of purposes. Sometimes we are doing a comprehensive literature review. Sometimes it’s just totally bugging us that we can’t remember the capital of Libya (seriously, I can never remember that (it’s Tripoli)). Sometimes we just want an entry point, a thing to riff off of. Sometimes any fact will do.
Controlled vocabulary is useful for that sort of academic, comprehensive research. Except, of course, when it’s not — for instance, when I was doing that project on tagging. “Tagging” is too new to be an LCSH header. Tagging is too granular to be in the titles of most relevant books. But you know where you can find “tagging” in the Simmons library catalog? In its LibraryThing for Libraries plugin. Which aggregates user tags.
This, of course, enabled me to deduce that the most relevant LCSH header was “metadata”, which of course was too general to be fully applicable, but at least gave me another route into the problem, a way to find some things I wanted to read that weren’t collocated by the “tagging” tag.
And this brings me to my second fallacy, which is that I think librarians frequently fall victim to the idea of the perfect single search. I found that readings in intro cataloging and in reference tended to assume that — if we organize just right, and search just right — we will find all the answers to our query in one search. But this is not true. If we need anything more complex than a quick fact, we iterate. We use a variety of tools. And we use them in crosscutting ways: LCSH headers let me find books with relevant tags which let me find books with relevant LCSH headers…et cetera. No single tool is likely to give us everything we need in a complex search because no organizational system can anticipate the variety of queries a document might be used to answer. It is, in the end, up to us, and our own idiosyncratic information needs, and our own judgment.
I promised you, however, a third fallacy, and that is this: the illusion of guarantees. I think librarians often assume that, because they are based in rules, controlled vocabulary offers guarantees about metadata completeness or correctness, and tagging cannot. (This is half right.) I think Orteza reflects that assumption as well; e.g.,
So posts about, say, the antics of the author’s cat, might be tagged as “Fluffy,” or “lolcats,” or “cats,” and that’s fine for one person’s blog. But multiply that system to include the immensity of the Internet, and then you can see that searching for one of these terms would necessarily exclude the others, even though the subject matter of all the posts might be the same.
Yet: rules do not offer guarantees. Cataloging rules can be inadequately formulated or incorrectly applied. They can end up as poor fits for reality as it changes. The absolutism of rules feels like the absolutism of guarantees, but they are not the same.
Tagging offers no guarantees. It’s up-front about that. Your “Fluffy” can indeed be my “cat”. But I think people in general — not just librarians — underrate the emergent order that can come out of these messy systems. (And the more homogenous groups are as to demography or purpose, the more true this is.) The order may not be guaranteed, but it seems probabilistically likely. I can’t count on it the way I can count on gravity, but, at least in some circumstances, I can count on it enough. Just as with controlled vocabulary, if certain assumptions hold (the topic is adequately mainstream, it’s been around long enough to have a subject header, it is not too technical to have been properly identified by catalogers, it constitutes a large fraction of the book…), I can count on it enough.
I confess, this isn’t a thorough answer to Orteza’s question. She wanted studies addressing the empirical effectiveness of tagging. Quite rightly — there should be more empirical studies of just about everything, as far as I’ve concerned. But I don’t seem to have annotated my bibliography for that project adequately, so I can’t find any good studies. Some really interesting blog posts, but not an empirical study. (Or, rather, the only one I can think of had such terrible methodology I won’t dignify it with a citation.) So I fling myself upon the readership here: can anyone suggest any good empirical studies addressing the effectiveness of tagging and controlled vocabularies?