libEditScript

A C Library for computing edit script in linear space
Download

libEditScript Ranking & Summary

Advertisement

  • Rating:
  • License:
  • LGPL
  • Price:
  • FREE
  • Publisher Name:
  • Vamsi Kundeti
  • Publisher web site:
  • https://launchpad.net/~vamsi-krishnak

libEditScript Tags


libEditScript Description

A C Library for computing edit script in linear space libEditScript is a C library for computing edit script in linear space.libEditScript is a project aimed at build a high performance edit script computation library. Edit script is used immensely in bio-informatics and several other place (e.g UNIX diff). We find that there are several applications which would need to compute the alignment of sequences and most of them are employ a straight forward algorithm to compute the edit script which takes O(n^2) space. However in this project we currently have a non-recursive space efficient algorithm to compute the edit script in O(n) space. The basic idea is based on Hirschberg's algorithm however our implementation is not recursive as in the original algorithm.Given two strings S1 and S2 and three operations (Insert, delete, change) each with different costs, the sequence of operations to convert S1 to S2 is well known as the string editing problem. The minimum cost of transforming S1 to S2 is known as the the 'Edit Distance' between the strings S1 and S2. Computing the edit distance between strings has immense applications, in fact we use edit distance in our day to day life , edit distance is what gets computed when we 'diff' two files. Computing edit script is more general than just computing the edit distance, Hirschberg's algorithm gives a space efficient dynamic programming formulation for computing the edit script, the algorithm is recursive in nature. In this work we implement a non recursive version of the Hirschberg's algorithm. Our context of this problem is to build a highly area efficient VLSI hardware.Computing the Edit script (minimum cost sequence of INSERT, DELETE and CHANGE) between two strings is a fundamental problem and occurs very frequently. Common UNIX utilities such as 'diff' are based on computing the edit script between tow strings. operations to transform string S1 to S2. I have the following idea to build a non-recursive version of the Hirschberg's algorithm, since the algorithm is non-recursive we can build an efficient digital circuit with this idea. We use a simple circular queue and apply DFS (Depth First Search) and we can prove that the capacity of this queue at any stage of the algorithm is θ(log(min(n1,n2)). The proof is simple we choose the geometrically decreasing string which has smaller length from the given strings(min(n1,n2)). Since we do a depth first search and the depth of subproblem tree is θ(log(min(n1,n2)), so we will have atmost θ(log(min(n1,n2)) subproblems in the circular queue at any stage of the algorithm.


libEditScript Related Software