Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if this method could extend to character transposition and weight-able costs; e.g., allow "hello"/"ehllo" = distance 1, as logically this is a single typing mistake of whacking the 'h' a bit early.

I've used the perl String::Approx module to great effect, but it takes a lot of resources to blast through a large dictionary.



You can do both supporting variable weights and transposition. The key is understanding the initial construction of the DFA.

It's pretty general and clever in its simplicity. It's really just trying all possible edits, recording the cost and progress until the costs exceed the budget or you've reached the goal.

To handle transposition, you need the state to remember the last character typed if it happens to match the next character, so the state space gets bigger. In your example for hello, from the start state you'll introduce a state for starting with e. This state encodes typed 'last char was e, matched nothing, cost 1'. From here, when you get an h, you move to the 'matched first two characters, used cost of one' state.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: