Module # 9 Documents: languages and properties

Representation of documents through uncontrolled vocabularies is a topic that is of great interest to me, and is the subject I will be focusing on for my major paper in this class.    For the purposes of this assignment, I will be focusing in on the subject of hashtags for information representation, and reviewing three articles not covered in my paper – one academic, and two from general trade publications.  The first article provides an overview of hashtags, their origin, benefits and drawbacks; the second, an academic look at global use of hashtags on Twitter as compared to other methods of information organization, and the third, a very different angle – best practices for using hashtags to achieve broad reach in social media marketing campaigns.

The first article I read, “Twitter Tips: How And Why To Use Hashtags (#)” by Lynch in CIO Magazine, provided an overview that I have been looking for regarding the origins of hashtag use.  It referenced a community for early Twitter adopters that proposed creating “channels” for organizing topic discussions, and in that discussion, the pound symbol was offered as a way to denote a user-contributed vocabulary category.  Since first used in 2007, the hashtag (or some form of tagging) is available across social media platforms including Instagram, Pinterest, Facebook, Tumblr, Google+ and other user communities.

The author points out selective deficiencies or cautions in using hashtags for information dissemination and retrieval.  First, as we know, hashtags are largely an uncontrolled vocabulary – there are no strict guidelines, hierarchies that must be followed, etc.  What I learned from this article is that there are (were?) some sites/communities that have attempted to provide some controls through “dictionaries” or hashtag definitions; or registries that provide usage instructions.  However, no attempt was made to reconcile these sites/communities.  Note that I said “were (?)” – as the sites the author referenced are no longer in use! (Tagalus, HashtagNation).

Second, hashtags can be overused, or misused, creating extra noise in search results.  The author offers that a consequence of overuse/misuse on Twitter is that newcomers to the community feel like exactly that – newcomers.  This, the author believes, makes recent participants feel less welcome and more likely to abandon Twitter as an information resource.  The impact on relevance is not discussed by the author, but one can surmise that if salient information is available on Twitter and a user does not feel welcome/comfortable with using the site, that information would be missed.–how-and-why-to-use-hashtags—-.html

The second article I read, “How People use Twitter in Different Languages,” by Weerkamp, Carter and Tsagkias, discussed how users organize information on Twitter in different languages, including hashtags, links, mentions, and conversations ; and the variance in usage of each tool from language to language.   Eight languages used on Twitter were analyzed.

This article really made me “think.”  Before reading it, I admit I assumed that methods of organizing information on Twitter would be similar from language to language, culture to culture.  However, in analyzing hashtag usage, for example, the authors discovered that while the average number of hashtags used in each language analyzed ranged between one and two, German language users are far more likely to use hashtags to organize their tweets (one in four tweets) than any other language examined.  Conversely, Japanese language users rarely use hashtags, with only one in 25 Japanese tweets containing hashtags.

To demonstrate the nuances uncovered, I will offer the findings regarding use of “conversations” in comparison.  The authors categorized “conversations” as direct responses to another tweet.  Here, German and Japanese language use were opposite –one in four Japanese language tweets analyzed was part of a conversation, while 14% of German language tweets could be categorized this way, the second lowest percentage of the eight languages analyzed.

While the authors did not discuss the impact on information retrieval or relevance, the findings emphasize what we have learned about controlled vocabularies like LCSH – that cultural nuances must be acknowledged and accommodated for users to be able to retrieve all potentially relevant information.  From this article we learn, for example, that a user researching in the Japanese language would likely miss relevant information if over-reliant on hashtags as a vocabulary for discovery.

The final article, “16 Tips for Using #hashtags HINT: You’re doing it wrong 😛 #socialmedia,” written by Laurel Papworth, a noted social media consultant, is targeted to social media marketers who are looking to exploit hashtags as a means of achieving wider information distribution.

The intended audience’s likely objective in reading this article is to learn how to social media marketing ROI.  As such this article is written from the angle of “how to best achieve the broadest reach” when using hashtags.  However, it does highlight certain interesting and key qualities of hashtags and their use that can impact precision/recall for information seekers, and thus relevance. I will highlight a few of the points here.

First of all, the author points out that hashtag usage is so prevalent in web content creation at this point, that creating content and not tagging it severely hampers one’s reach.  Not doing so can limit information consumption to those the content is directly distributed to – there is a diminished chance of discoverability as the information seeker must rely on search engine indexing or another less classification scheme.

Also, very specific hashtags are transitory – the author states they rarely “make it into a second or third week.”  If an information seeker is looking for all discussions on a topic via hashtags, it is likely that relevant information will be omitted from results as a tag that was contemporary at one point may no longer be.

Conversely, use of general hashtags can inhibit precision, as a multitude of users who choose the same hashtag can bury relevant information for a researcher.  The author offers the tag #AusPol as an example, stating it “has more than 50 tweets per minute, non stop, day and night.”

The final point that I will highlight here is that the author advises careful selection of hashtags to avoid generic words.  As example, the author states “#blacklist” can mean many things, but “#Blacklist” is an attempt to reference a particular TV show.  From an information retrieval standpoint, this means a user’s search could retrieve a broad range of very irrelevant content if the vocabulary chosen to categorize relevant information is too vague.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s