Introducing Supertags: Advanced Web Taxonomy
Tags are clearly a huge part of the web. If at any time you want to keep track of something, you tag it. URLs, books in Amazon, music in last.fm, the list goes on and on and on. Tag clouds are a common sight on blogs and feedreaders. While taxonomy of this kind is useful in and of itself, it’s even better when you aggregate items that have common tags and find related concepts. These relationships are used to expand your field of inquiry, find items with tags you wouldn’t consider to look for, and so on.
How can we take this concept further?
I posit that the common relationships between tags are not the only relationships users want to use when looking for data. I am currently working on a project to tag news stories. Some of them have titles like your average Reddit front page:
- D development with Emacs
- Using Amazon S3 with Django
- Why Arc is bad for exploratory programming
The first two items can be categorized with tags like d and emacs. If you were to try and find this item in del.icio.us, you’d use d+emacs and the site would be smart enough to understand you’re looking for an item tagged with those two separate phrases. Same thing with the second item. amazons3+django would work well for searching for this item.
The third item, however, is where more robust tagging could come into play. Sure you could tag it arc+exploratoryprogramming, but that’s not really the relationship being defined. The article isn’t about Arc and exploratory programming, it’s about how Arc is not good for exploratory programming. (Please note I haven’t actually read these items, and don’t particularly care about the theses therein, for the purposes of this article :) So, in the opinion of the writer, Arc and exploratory programming have a contrasting relationship. In other words, arc!exploratoryprogramming.
See where this is going?
Say you have an article:
- Why Ruby is better than Python
In the opinion of the article writer, ruby>python.
Or:
- Windows 7 coming soon after Vista
Chronologicaly, the order of the related subjects are: windowsvista~windows7.
Already, by looking at basic examples, we’ve extracted a vocabulary that we can apply to tagging. I call these new juxtapositions supertags, because I’m an uncreative shell of mediocrity.
So how can we use these new tags? Aggregation. Imagine looking up a tag like modernenglish and seeing a timeline in your browser of all the things that came before and after it, because somebody created tags on articles like greatvowelshift~modernenglish and modernenglish~ebonics. And I bet you didn’t even know what the Great Vowel Shift even was!
Imagine getting an overview of who on the internet thinks what programming language is better, not by search results through Google, or Alexa ranking, but by seeing the strength of the links c++>c and c>c++. The more of one you have, the stronger the weight, the more reinforced that idea is. Clearly it’s not an objective measurement, but when you’re looking at trends, it’s hard to say what is objective :-)
The best part is, these supertags open up an infinite number of possibilities to categorize concepts. atheism may be oppositionally opposed to christianity (atheism!christianity) but it’s also opposed in the same way to rastafarianism, to angels, to afterlife — and each of these tags is related to millions of others in millions of different ways. It’s said you should know your enemies in a way that rivals what you know about yourself, and being able to understand what concepts are diametrically opposed to yours at a glance will give you a further reach over understanding those who don’t share the same views as you.
That’s where I am, conceptually. Good idea? Bad idea?
I’m not sure if “tags” is the right name for this. You’re defining relationships between tags.
The idea of attaching these relationships to individual articles, however, is a pretty cool one. In a way, it’s suggesting a formalized way to summarize articles.
The big problem I see is that you have a limited vocabulary to express those relationships. (How many special characters are there, after all). As a result, you will lose the intuitiveness that tags have.
I am defining the relationships between tags, but what I envision is that those relationships, and the tags on either side, comprise one giant chunk of information that can be applied to an item. Some relationships may also implicitly say that both terms would be applied to the item as regular tags: if an article is about Ruby being better than Python, then the supertag ‘ruby>python’ would not only define that relationship, but apply the tags ‘ruby’ and ‘python’ to the item.
To me this seems like a narrow subset of concepts related to work being done on the semantic web. An interesting topic with a lot of potential for sure.
Have you considered an automatic system of tagging based on the language of article/item? To me this seems like the most exciting aspect, once the algorithms can be sufficiently tweaked to accurately reflect the contents meaning. Starting with simple tag suggestions by looking for common relationship building clauses.
The more we can do to help machines under the underlying meaning of our human generated content, the better. Keep up the good work.
[...] last time I talked about Supertags, but now we’re actually going to see an implementation. This is [...]