当前位置 主页 > 服务器问题 > Linux/apache问题 >
本文实例讲述了Java滚动数组计算编辑距离操作。分享给大家供大家参考,具体如下:
编辑距离(Edit Distance),也称Levenshtein距离,是指由一个字符串转换为另一个字符串所需的最少编辑次数。
下面的代码摘自org.apache.commons.lang.StringUtils
用法示例:
StringUtils.getLevenshteinDistance(null, *) = IllegalArgumentException StringUtils.getLevenshteinDistance(*, null) = IllegalArgumentException StringUtils.getLevenshteinDistance("","") = 0 StringUtils.getLevenshteinDistance("","a") = 1 StringUtils.getLevenshteinDistance("aaapppp", "") = 7 StringUtils.getLevenshteinDistance("frog", "fog") = 1 StringUtils.getLevenshteinDistance("fly", "ant") = 3 StringUtils.getLevenshteinDistance("elephant", "hippo") = 7 StringUtils.getLevenshteinDistance("hippo", "elephant") = 7 StringUtils.getLevenshteinDistance("hippo", "zzzzzzzz") = 8 StringUtils.getLevenshteinDistance("hello", "hallo") = 1
Java代码:
public static int getLevenshteinDistance(String s, String t) { if (s == null || t == null) { throw new IllegalArgumentException("Strings must not be null"); } int n = s.length(); // length of s int m = t.length(); // length of t if (n == 0) { return m; } else if (m == 0) { return n; } if (n > m) { // swap the input strings to consume less memory String tmp = s; s = t; t = tmp; n = m; m = t.length(); } int p[] = new int[n+1]; //'previous' cost array, horizontally int d[] = new int[n+1]; // cost array, horizontally int _d[]; //placeholder to assist in swapping p and d // indexes into strings s and t int i; // iterates through s int j; // iterates through t char t_j; // jth character of t int cost; // cost for (i = 0; i<=n; i++) { p[i] = i; } for (j = 1; j<=m; j++) { t_j = t.charAt(j-1); d[0] = j; for (i=1; i<=n; i++) { cost = s.charAt(i-1)==t_j ? 0 : 1; // minimum of cell to the left+1, to the top+1, diagonally left and up +cost d[i] = Math.min(Math.min(d[i-1]+1, p[i]+1), p[i-1]+cost); } // copy current distance counts to 'previous row' distance counts _d = p; p = d; d = _d; } // our last action in the above loop was to switch d and p, so p now // actually has the most recent cost counts return p[n]; }
实际上,上述代码的空间复杂度还可以进一步简化,使用一维数组替换滚动数组。
Java代码:
public int minDistance(String s, String t) { if (s == null || t == null) { throw new IllegalArgumentException("Strings must not be null"); } int n = s.length(); // length of s int m = t.length(); // length of t if (n == 0) { return m; } else if (m == 0) { return n; } if (n > m) { // swap the input strings to consume less memory String tmp = s; s = t; t = tmp; n = m; m = t.length(); } int d[] = new int[n+1]; // cost array, horizontally // indexes into strings s and t int i; // iterates through s int j; // iterates through t char t_j; // jth character of t int cost; // cost for (i = 0; i<=n; i++) { d[i] = i; } for (j = 1; j<=m; j++) { t_j = t.charAt(j-1); int pre = d[0]; d[0] = j; for (i=1; i<=n; i++) { int temp = d[i]; cost = s.charAt(i-1)==t_j ? 0 : 1; // minimum of cell to the left+1, to the top+1, diagonally left and up +cost d[i] = Math.min(Math.min(d[i-1]+1, d[i]+1), pre+cost); pre = temp; } } return d[n]; }