Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Awk is great and this is a great post. But dang, awk really shoots itself so much with its lack of features that it so desperately needs!

Like: printing all but one column somewhere in the middle. It turns into long, long commands that really pull away from the spirit of fast fabrication unix experimentation.

jq and sql both have the same problem :)



  $ echo "one two three four five" | awk '{$3="";print}'
  one two  four five


Oh dang, that's good.


As long as you don't mind the extra space in the middle.


Often times I don't! Entirely depends on what I'm doing. #1 thing off the top of my head is to remove That One Column that's a bajillion characters long that makes exploratory analysis difficult.


I wonder if it's possible to encode the backspace character in the replacement string?


No problem, there's \b.

  echo "one two three four five" | awk '{$3="\b"; print}'
Inserts the backspace character (^H), which you can then remove with [global] substitution:

  awk '{$3="\b"; sub(/\32\b/, ""); print}'
You can, of course, use an arbitrary sentinel value for the field to be deleted. Should work in gawk and BWK awk.


if you do:

    sed 's/  / /g'


Or sticking with awk, I have this bash alias to remove excess whitespace that is just:

    awk '{$1=$1};1'


what does the 1 at the end do? make awk print all lines? I'm a bit rusty with my awk.


It’s a condition. 1 is true-ish, so it’s a condition that is always true. The default action in awk is print, so it’s the same as:

  '{$1=$1}1{print}'
Or the same as

  '{$1=$1}{print}'
Since the default condition is true. But 1 is shorter than {print}.


> awk really shoots itself so much with its lack of features that it so desperately needs

Whence perl.



I suspect the rationale for Perl is that most Linux systems will probably have it installed already. Installing something you're familiar with is great when you can, but I'm guessing the awk script linked to here was picked more for its ubiquity than elegance.


Kinda, but not really. Of the infrastructures I've worked on, not a single one has been consistent in installing perl on 100% of hosts. The ones that get close are usually like that because one high up person really, really likes perl. And they send a lot of angry emails about perl not being installed.

Within infrastructures where perl is installed on 95% of hosts, that 5% really bites you in the ass and leads to infrastructure rot very quickly. You're kinda stuck writing and maintaining two separate scripts to do the same thing.


Same with Python. It's mostly available, but sometimes not.

With Perl, I find that a base installation is almost always available, but many packages might not be.


I dunno about that. IME, python is much, much more universally installed on the hosts I've worked on. Sure, usually it's 2.7, but it's there! I've tended to work on rhel and debian hosts, with some fedora in the mix.

(Once had a coworker reject a PR I wrote because I included a bash builtin in a deployment script. He said that python is more likely to be installed than bash, so we should not use bash. These debates are funny sometimes.)


Interesting, in my experience perl ends up pulled in as a dependency for one thing or another most of the time, but I don't have that perception about Python. Maybe there's just something I use that pulls in perl without me realizing and it's biased my experience.


Using the Perl module JSON::PP, the programme json_pp is installed.

The following command is handy for grepping the output:

    cat mydata.json  | json_pp


Us old UNIX guys would likely go for cut for this sort of task:

     cut -d " " -f1-2,4-5 file.txt
where file.txt is:

     one two three four five
and the return is:

     one two four five


>awk really shoots itself so much with its lack of features that it so desperately needs!

That's why I use Perl instead (besides some short one liners in awk, which in some cases are even shorter than the Perl version) and do my JSON parsing in Perl.

This

diff -rs a/ b/ | ask '/identical/ {print $4}' | xargs rm

is one of my often used awk one liners. Unless some filenames contain e.g. whitespace, then it's Perl again


I've been using perl instead of sed because PCRE is just better and it's the same regex that PHP uses which I've been coding in for nearly 20 years. I still don't actually know perl, but apparently Gemini does. It wrote a particularly crazy find and replace for me. Never got around to using or learning awk. Only time I see it come up is when you want to parse some tab delimited output


This is much safer: xargs -d '\n' rm -f --


Just tried to use "-d" and learned that it's a GNUism which isn't available under MacOS, so it's not a portable solution. And neither was it available under BSD 4.3 when I learned about xargs the first time.


Sure, but my example was just that and I actually use /identical$/ as the pattern. Sorry for the typo.

And I use this "historic" one liner only when I know about the contents of both directories. As soon as I need a "safer" solution I use a Perl script and pattern matching, as I said.


In this case the Perl one-liner would be conceptually identical, the same length, but more performant (no calling out to rm):

   diff -rs a/ b/ | perl -ane '/identical$/ && unlink $F[3]'


...And once you get away from the most basic, standard set of features, the several awks in existence have diverging sets of additional features.


Things are already like that, friend! We have mawk, gawk and nawk. But it's fun to think about how we could improve our ideal tooling if we had a time machine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: