Subscribe to RSS

A Basic Supertag Implementation

So, last time I talked about Supertags, but now we’re actually going to see an implementation. This is clearly very rough around the edges, but here we go.

  class SupertagParser
 
    def self.parse(s, weight = 1)
      r = []
      t = []
      a = s.split(Supertag::Separator)
      a.each do |i|
        unless i =~ /[!=~]/
          t << i
          next
        end
        Supertag::Types.each do |k, v|
          if k == :conceptual or k == :oppositional
            if i.index(v.last) and i.index(v.last) > 0
              tags = i.split(v.last).uniq
              tags.perm(2) do |x| r << [x.sort.first, x.sort.last, v.first, weight] end
            end
          end
        end
      end
      return r.uniq, t
    end
  end
 
  class Supertag
    Types = { :conceptual => [1, "="],
              :oppositional => [2, "!"],
              :chronological => [3, "~"]
              }
    Separator = " "
  end
 
  # Array#perm taken from http://blade.nagaokaut.ac.jp/~sinara/ruby/math/combinatorics/array-perm.rb
  # Author: Shin-ichiro Hara
 
  class Array
    def perm(n = size)
      if size < n or n < 0
      elsif n == 0
        yield([])
      else
        self[1..-1].perm(n - 1) do |x|
          (0...n).each do |i|
            yield(x[0...i] + [first] + x[i..-1])
          end
        end
        self[1..-1].perm(n) do |x|
          yield(x)
        end
      end
    end
  end

Okay, so say we have a hypothetical news item:

Ruby on Rails versus ASP.NET

Someone reads it, and tags it: rubyonrails=asp.net.

>> SupertagParser::parse("rubyonrails=asp.net")
=> [[["asp.net", "rubyonrails", 1, 1]], []]

What we get back is an array of arrays and a array. The arrays are relationship weights. That solo array is any other tags that didn’t receive any kind of relationship.

>> SupertagParser::parse("rubyonrails=asp.net othertag")
=> [[["asp.net", "rubyonrails", 1, 1]], ["othertag"]]

Say you were creating a website in Ruby on Rails that utilizes supertags. The arrays can just get passed to a Relationship model, and saved in, and if that relationship already exists (notice if alphabetizes to reduce the chance of redundancy), it can just increase the weight of that relationship in the database.

We can handle permutations. Relationship tags don’t have to be limited to two tags. “XFS versus EXT3 versus ZFS” could be tagged xfs=ext3=zfs:

>> SupertagParser::parse("xfs=ext3=zfs")
=> [[["ext3", "xfs", 1, 1], ["xfs", "zfs", 1, 1], ["ext3", "zfs", 1, 1]], []]

We get all the right relationships. This is done by an [extension to the Array class] I found.

This is the basic idea. So far I’ve taken a look at implementing conceptual (=), oppositional (!), and chronological (~) relationships. Chronological has to be handled differently because we can’t alphabetize them. xp~vista needs to be kept in the correct order.

Additionally I don’t prevent multiple relationship types in a relationship tag, but the output is meaningless. asp.net!rubyonrails=othertag shouldn’t be allowed until we can have some kind of order of precedence or something, but that’s probably too complicated for the average web visitor anyway.

So that’s the basic idea.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*