ulrik@kaizer.se/ log/
tech

relevance.lua

relevance.lua is a port of relevance.py that we use in kupfer, to Lua.

I did the porting fairly quickly and without major obstacles; I have not benchmarked the lua implementation yet, so I don't know how the function-based string indexing in Lua compares to Python's string indexing. How things normally go, Lua should easily be faster than Python anyway. It is also very reassuring to know how easy it is to rewrite Lua modules in C when needed.

There will never be a fair comparison, since in Python we use the relevance functions with unicode strings, and normally these have an internal representation of four bytes per character (UTF-32) which adds some memory and performance overhead.

In Lua, that is without any kind of Unicode support in the core language, we use simple bytestrings (which will be UTF-8 in typical applications). Accuracy won't be impacted, since no character transformations at all are done in the relevance module.

-- relevance.lua
-- Copyright (C) 2009--2010 Ulrik Sverdrup <ulrik.sverdrup@gmail.com>
--               2008  Christian Hergert <chris@dronelabs.com>
--               2007  Chris Halse Rogers, DR Colkitt
--                     David Siegel, James Walker
--                     Jason Smith, Miguel de Icaza
--                     Rick Harding, Thomsen Anders
--                     Volker Braun, Jonathon Anderson
--
-- This program is free software: you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation, either version 3 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program.  If not, see <http://www.gnu.org/licenses/>.

-- Ported to Lua 5.1 by Ulrik Sverdrup


-- Take all builtins we need
local sub = string.sub
local find = string.find

-- If being executed as script, return true
local function is_script()
  return arg and arg[0]:find("relevance.lua", 1, true)
end

if not is_script() then
  module(...)
end

--[[
-- score (s, q)
--  A relevancy score for the string ranging from 0 to 1

--  @s: a string to be scored
--  @q: a string query to score against

--  `s' and `query' are interpreted literally, including case and whitespace.

--  Returns: a number between 0 and 1
--]]
local doctest_score = [[
  = score('terminal', 'trml')
  0.73509868421053
  = score('terminal', 'term')
  0.99230263157895
  = score('terminal', 'try')
  0
  = score('terminal', '')
  1
]]

function score(s, q)
  local qlen = #q
  local slen = #s

  if qlen == 0 then
    return 1
  end

  -- we don't lowercase here in the lua version
  local ls = s

  -- Find the shortest possible substring that matches the query
  -- and get the ratio of their lengths for a base score
  first, last = find_best_match(ls, q)
  if first == -1 then
    return 0
  end

  local score = qlen / (last - first + 1)

  -- Now we weight by string length so shorter strings are better
  score = score * (0.7 + qlen/slen * 0.3)
  -- Bonus points if the characters start words
  local good = 0
  local bad = 1
  local firstCount = 0
  for i = first, last do
    local lsi = sub(ls, i, i)
    if lsi == " " or lsi == "-" then
      if find(q, sub(ls, i+1, i+1), 1, true) then
        firstCount = firstCount + 1
      else
        bad = bad + 1
      end
    end
  end
  -- A first character match counts extra
  if sub(q, 1, 1) == sub(ls, 1, 1) then
    firstCount = firstCount + 2
  end

  -- The longer the acronym, the better it scores
  good = good + firstCount * firstCount * 4

  -- Better yet if the match itself started there
  if first == 1 then
    good = good + 2
  end

  -- Super duper bonus if it is a perfect match
  if query == ls then
    good = good + last*2 + 4
  end

  score = (score + 3 * good / (good + bad)) / 4

  -- This fix makes sure that perfect matches always rank higher
  -- than split matches.  Perfect matches get the .9 - 1.0 range
  -- everything else lower

  if last - first + 1 == qlen then
    score = .9 + .1 * score
  else
    score = .9 * score
  end

  return score
end


--[[
--  find_best_match(s, q)
--
--  Finds the shortest substring of @s that contains all characters of @q
--  in order.
--
--  @s: a string to search
--  @q: a string query to search for
--
--  Returns: a two-item tuple containing the start and end indicies of
--           the match.  No match returns (-1,-1).
--]]

local doctest_find_best_match = [[
  = find_best_match('terminal', 'trml')
  1       8
  = find_best_match('total told', 'tl')
  3       5
  = find_best_match('terminal', 'yl')
  -1       -1
  = find_best_match('terminal', 't')
  1       1
]]

function find_best_match(s, q)
  local qlen = #q
  local first, last = -1, -1
  local index = find(s, sub(q, 1, 1), 1, true)
  while index do
    -- See if we can fit the whole query into the tail of s
    -- We already know where the first char of q is in s (@index),
    -- so we start with char number two.
    local cur = index
    for qcur = 2, qlen do
      cur = find(s, sub(q, qcur, qcur), cur + 1, true)
      if not cur then
        return first, last
      end
    end

    -- take match if it is shorter
    -- if perfect match, we are done
    if first == -1 or (cur - index < (last - first)) then
      first, last = index, cur
      if (cur - index + 1) == qlen then
        break
      end
    end
    index = find(s, sub(q, 1, 1), index+1, true)
  end
  return first, last
end


-- Check if we are run as a script
if is_script() then
  require "doctests"
  doctests.test()
end
Posted in the wee hours of Thursday night, April 30th, 2010 Tags: en lua
Kupfer gains Triggers

Right now, in the master branch of Kupfer is a really exciting feature: custom keybindings for any command you can perform in Kupfer; the feature is called Triggers.

What you can do with Triggers is take a command you would normally perform in Kupfer, such as launching an application or opening a document, and bind a global key combination to run this command. As long as Kupfer is running, any application can have the current focus. This resembles the Triggers in Quicksilver very closely.

This is a great use for Composed Commands: the way you use this in Kupfer is that you construct a command as usual, but instead of pressing Return to carry out the command, you press Ctrl+Return to create a composite object, in effect the command in an object. Then use the Add Trigger... action on this command, together with a string for the keybinding, for example <Control><Alt>G. [1]

Especially together with objects that are resolved when the trigger is used, such as Selected Text and Selected File, triggers are already very powerful. Here are some examples of what you can trigger:

Firefox → Launch
Add a keybinding to launch an application (or focus if already running)
Selected Text → Search With → Google
Make a trigger that googles the text selected in the current application. You can do this with any of the search engines you have in Kupfer.
Selected File → Move To → (Downloads Folder)
A trigger to move the currently selected file to a specific folder
Songs → Search Contents
You may even add a trigger to open a specific subcatalog in Kupfer, like Rhythmbox's songs.

There is no preference pane for triggers at the moment, they are configured right in Kupfer itself, where you can list the currently defined triggers by browsing its subcatalog. You can carry out a trigger's command directly to test it, or remove defined triggers.

[1]The interface might change to something more user friendly in the future, for example using a window where the user can simply press the desired keybinding. The master branch was updated after this post's writing to make suggestions for the user, so if the user types F, suggestions of Ctrl+Alt+F, Ctrl+Shift+F etc are made.

Comments to @englabenny on twitter or @kaizer on identi.ca, or use hashtag #kupfer.

Posted Thursday afternoon, December 31st, 2009 Tags: en kupfer
Random reasons Kupfer is awesome

Kupfer is an application that I wrote mostly myself, even though lots of awesome contributors are now coming in to add tons to it; especially testing and more object and application knowledge.

The first features of Kupfer are personal and quite non-random:

[1]Although it might not be obvious if you're not half-swedish.

Now, some random reasons Kupfer is awesome:

It understands challenges in naming items. Kupfer will find the song Außer Dir even if you can only type ausserdir; it will find the Sigur Rós song Suð í eyrum even if you can only type sud! You can both enjoy correctly named icelandic, german, swedish or whatever songs from the music library, and still be able to match them when you type their name. I think many Kupfer users using different languages agree that this is an important thing, seeing the number of accents and other letter decorations used in, for example, Polish and Spanish.

The implementation is general (but has a specialization towards the cases I know about myself and tested myself) and can even handle matching "accent"-less japanese characters to "accented" characters; for example 'ヘ' will match an object with 'ペ' in its name.

I think my approach with Kupfer, going for deep and serious localization, has been very successful. (I write the application in English and localize to Swedish in parallel.) (I am still looking for feedback on behavior with Right-to-left languages.)

Sublevels are explicit. Even though most users probably don't care about sublevels (except in the common case of folders, where the concept is already familiar), Kupfer explicitly marks objects with an arrow to show that they contain more objects, and opens a browse window when you enter into them. Subcatalogs all have the action Search Contents available, which adds to consistency (and provides the mandatory "base case" action for that type of object).

Posted at midnight, December 30th, 2009 Tags: en kupfer
Composed commands

Kupfer version c19 adds a feature called Compose Command.

This is something clever from Quicksilver, that I suspect even many of its avid users don't know about.

What is Compose Command? It is a sort of higher-order use of Kupfer, let me explain: A normal command that you carry out in Kupfer is normally an object and an action, for example the combination (Documents, Open):

Now imagine we want this command to be an object of itself. In Kupfer, we press Control+Return to invoke Compose Command.

There, we created an object from the command -- and we are offered the possibility to run it right away. Which of course, you say, adds nothing. What can come out of this? We can now perform actions with the very command object itself:

The Kupfer action Run after Delay... understands a time interval that you can specify like "10 sec", "1 h 15 min", "6 m" etc.

You can create a composed command like this from any Object and an Action, or any Object, an Action and an Indirect (Secondary) Object.

One useful example of what you can do is to create a quick reminder, by telling Kupfer to show text very large over the screen in just a while:

Please stand by six minutes... Aand we have a message across the screen:

(Goes away with the tap of a key)

This may not seem much, but it is a nice display of what can be done with Kupfer right now, and what can be implemented in the future. And I think people can come up with very useful things they can do with delayed commands. Perhaps you can come up with something completely new in the domain of higher-order use of Kupfer?

It is also just one of the new features of Kupfer, I should try to blog more about this in the future.

Comments to @englabenny on twitter or @kaizer on identi.ca!

Posted at midnight, December 18th, 2009 Tags: en kupfer
relevance.py

The relevance.py module is a python module part of kupfer, providing no less function than assigning the importance of each catalog item for a search or filter query. [1]

The relevance code is taken directly from Gnome-Do and was ported to Python by Christian Hergert

The module has two user-visible functions, score and formatCommonSubstrings. The most important function, is of course score:

def score(s, query):
    """
    A relevancy score for the string ranging from 0 to 1

    @s: a string to be scored
    @query: a string query to score against

    `s' is treated case-insensitively while `query' is interpreted literally,
    including case and whitespace.

    Returns: a float between 0 and 1

    >>> print(score('terminal', 'trml'))
    0.735098684211
    >>> print(score('terminal', 'term'))
    0.992302631579
    """

I laid hand on the module and ported it to using faster Python methods, making Python's unicode.find (or str.find if the input is str) do most of the heavy lifting -- along with an algorithmic change to speed up the string score calculation by quite a factor.

Since the module's functions are pure and fundamental, the module has broad compatibility, it works identically on all python versions I have tested: python2.4, python2.5, python2.6 and python3.1!

Only two important things make the module simultaneously Python 2- and Python 3-compatible:

from __future__ import division

# This module is compatible with both Python 2 and Python 3;
# we need the iterator form of range for either version, stored in range()
try:
    range = xrange
except NameError:
    pass

Neither change is strictly necessary; the first is always a good guard (and solved a bug in the original Python port), and the second is necessary if we want the iterator form of range() in Python 2.

[1]It provides the base score, after which the catalog item may be boosted if it is used a lot.
Posted late Wednesday evening, September 23rd, 2009 Tags: en python

View weblog archive