A Basic Supertag Implementation

Code Projects, Concepts, Uncategorized — Tags: , , — Ardekantur @ 10:52 pm

So, last time I talked about Supertags, but now we’re actually going to see an implementation. This is clearly very rough around the edges, but here we go.

  class SupertagParser
 
    def self.parse(s, weight = 1)
      r = []
      t = []
      a = s.split(Supertag::Separator)
      a.each do |i|
        unless i =~ /[!=~]/
          t << i
          next
        end
        Supertag::Types.each do |k, v|
          if k == :conceptual or k == :oppositional
            if i.index(v.last) and i.index(v.last) > 0
              tags = i.split(v.last).uniq
              tags.perm(2) do |x| r << [x.sort.first, x.sort.last, v.first, weight] end
            end
          end
        end
      end
      return r.uniq, t
    end
  end
 
  class Supertag
    Types = { :conceptual => [1, "="],
              :oppositional => [2, "!"],
              :chronological => [3, "~"]
              }
    Separator = " "
  end
 
  # Array#perm taken from http://blade.nagaokaut.ac.jp/~sinara/ruby/math/combinatorics/array-perm.rb
  # Author: Shin-ichiro Hara
 
  class Array
    def perm(n = size)
      if size < n or n < 0
      elsif n == 0
        yield([])
      else
        self[1..-1].perm(n - 1) do |x|
          (0...n).each do |i|
            yield(x[0...i] + [first] + x[i..-1])
          end
        end
        self[1..-1].perm(n) do |x|
          yield(x)
        end
      end
    end
  end

Okay, so say we have a hypothetical news item:

Ruby on Rails versus ASP.NET

Someone reads it, and tags it: rubyonrails=asp.net.

>> SupertagParser::parse("rubyonrails=asp.net")
=> [[["asp.net", "rubyonrails", 1, 1]], []]

What we get back is an array of arrays and a array. The arrays are relationship weights. That solo array is any other tags that didn’t receive any kind of relationship.

>> SupertagParser::parse("rubyonrails=asp.net othertag")
=> [[["asp.net", "rubyonrails", 1, 1]], ["othertag"]]

Say you were creating a website in Ruby on Rails that utilizes supertags. The arrays can just get passed to a Relationship model, and saved in, and if that relationship already exists (notice if alphabetizes to reduce the chance of redundancy), it can just increase the weight of that relationship in the database.

We can handle permutations. Relationship tags don’t have to be limited to two tags. “XFS versus EXT3 versus ZFS” could be tagged xfs=ext3=zfs:

>> SupertagParser::parse("xfs=ext3=zfs")
=> [[["ext3", "xfs", 1, 1], ["xfs", "zfs", 1, 1], ["ext3", "zfs", 1, 1]], []]

We get all the right relationships. This is done by an [extension to the Array class] I found.

This is the basic idea. So far I’ve taken a look at implementing conceptual (=), oppositional (!), and chronological (~) relationships. Chronological has to be handled differently because we can’t alphabetize them. xp~vista needs to be kept in the correct order.

Additionally I don’t prevent multiple relationship types in a relationship tag, but the output is meaningless. asp.net!rubyonrails=othertag shouldn’t be allowed until we can have some kind of order of precedence or something, but that’s probably too complicated for the average web visitor anyway.

So that’s the basic idea.

Introducing Supertags: Advanced Web Taxonomy

Concepts, Observations — Tags: , , , — Ardekantur @ 1:09 pm

Tags are clearly a huge part of the web. If at any time you want to keep track of something, you tag it. URLs, books in Amazon, music in last.fm, the list goes on and on and on. Tag clouds are a common sight on blogs and feedreaders. While taxonomy of this kind is useful in and of itself, it’s even better when you aggregate items that have common tags and find related concepts. These relationships are used to expand your field of inquiry, find items with tags you wouldn’t consider to look for, and so on.

How can we take this concept further?

I posit that the common relationships between tags are not the only relationships users want to use when looking for data. I am currently working on a project to tag news stories. Some of them have titles like your average Reddit front page:

  • D development with Emacs
  • Using Amazon S3 with Django
  • Why Arc is bad for exploratory programming

The first two items can be categorized with tags like d and emacs. If you were to try and find this item in del.icio.us, you’d use d+emacs and the site would be smart enough to understand you’re looking for an item tagged with those two separate phrases. Same thing with the second item. amazons3+django would work well for searching for this item.

The third item, however, is where more robust tagging could come into play. Sure you could tag it arc+exploratoryprogramming, but that’s not really the relationship being defined. The article isn’t about Arc and exploratory programming, it’s about how Arc is not good for exploratory programming. (Please note I haven’t actually read these items, and don’t particularly care about the theses therein, for the purposes of this article :) So, in the opinion of the writer, Arc and exploratory programming have a contrasting relationship. In other words, arc!exploratoryprogramming.

See where this is going?

Say you have an article:

  • Why Ruby is better than Python

In the opinion of the article writer, ruby>python.

Or:

  • Windows 7 coming soon after Vista

Chronologicaly, the order of the related subjects are: windowsvista~windows7.

Already, by looking at basic examples, we’ve extracted a vocabulary that we can apply to tagging. I call these new juxtapositions supertags, because I’m an uncreative shell of mediocrity.

So how can we use these new tags? Aggregation. Imagine looking up a tag like modernenglish and seeing a timeline in your browser of all the things that came before and after it, because somebody created tags on articles like greatvowelshift~modernenglish and modernenglish~ebonics. And I bet you didn’t even know what the Great Vowel Shift even was!

Imagine getting an overview of who on the internet thinks what programming language is better, not by search results through Google, or Alexa ranking, but by seeing the strength of the links c++>c and c>c++. The more of one you have, the stronger the weight, the more reinforced that idea is. Clearly it’s not an objective measurement, but when you’re looking at trends, it’s hard to say what is objective :-)

The best part is, these supertags open up an infinite number of possibilities to categorize concepts. atheism may be oppositionally opposed to christianity (atheism!christianity) but it’s also opposed in the same way to rastafarianism, to angels, to afterlife — and each of these tags is related to millions of others in millions of different ways. It’s said you should know your enemies in a way that rivals what you know about yourself, and being able to understand what concepts are diametrically opposed to yours at a glance will give you a further reach over understanding those who don’t share the same views as you.

That’s where I am, conceptually. Good idea? Bad idea?

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
(c) 2008 Ardekantur | powered by WordPress with Barecity