Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What version of grep are you using? Not too long ago the grep in debian/unstable was awful with UTF strings. pg135.txt is project gutenberg's les mis. What were your times?

I no longer notice this behavior:

    dfc@motherjones:~$ grep --version 
    grep (GNU grep) 2.9

    dfc@motherjones:~$ time LANG=C grep asdf < pg135.txt > /dev/null 

    real	0m0.017s
    user	0m0.008s
    sys	0m0.004s
    dfc@motherjones:~$ time LANG=UTF8 grep asdf < pg135.txt > /dev/null 

    real	0m0.017s
    user	0m0.012s
    sys	0m0.004s
    dfc@motherjones:~$ time LANG=en_us.UTF8 grep asdf < pg135.txt > /dev/null 

    real	0m0.012s
    user	0m0.004s
    sys	0m0.004s
There is not a lot of info about this in debian bug 604408

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=604408

If memory serves me correctly this upstream fixed this sometime after 2.7.1 or 2.7.3

Funner fact: GNU grep used to be slow with UTF.



  ; grep --version
  GNU grep 2.5.3
  ; time LANG=C grep asdf < lesms10.txt > /dev/null
  real	0m0.025s
  user	0m0.011s
  sys	0m0.014s
  ; time /usr/local/plan9/bin/grep asdf < lesms10.txt > /dev/null
  real	0m0.082s
  user	0m0.043s
  sys	0m0.013s
  ; time LANG=en_US.UTF-8 grep adsf < lesms10.txt > /dev/null
  real	0m1.209s
  user	0m0.818s
  sys	0m0.018s
Those are the only two grep implementations I have handy. GNU grep 2.6.3 takes the same amount of time searching for 'asdf' in both locales, but searching for '.' is still slow. Thanks for pointing that out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: