Awk is great and this is a great post. But dang, awk really shoots itself so muc...

thrwwy9234 · 2025-06-28T17:01:57 1751130117

  $ echo "one two three four five" | awk '{$3="";print}'
  one two  four five

chaps · 2025-06-28T20:12:59 1751141579

Oh dang, that's good.

shawn_w · 2025-06-28T21:09:33 1751144973

As long as you don't mind the extra space in the middle.

chaps · 2025-06-28T22:35:20 1751150120

Often times I don't! Entirely depends on what I'm doing. #1 thing off the top of my head is to remove That One Column that's a bajillion characters long that makes exploratory analysis difficult.

saghm · 2025-06-28T22:37:45 1751150265

I wonder if it's possible to encode the backspace character in the replacement string?

ZoomZoomZoom · 2025-06-28T23:39:19 1751153959

No problem, there's \b.

  echo "one two three four five" | awk '{$3="\b"; print}'

Inserts the backspace character (^H), which you can then remove with [global] substitution:

  awk '{$3="\b"; sub(/\32\b/, ""); print}'

You can, of course, use an arbitrary sentinel value for the field to be deleted. Should work in gawk and BWK awk.

fragmede · 2025-06-28T22:05:26 1751148326

if you do:

    sed 's/  / /g'

lucb1e · 2025-06-28T22:55:51 1751151351

Or sticking with awk, I have this bash alias to remove excess whitespace that is just:

    awk '{$1=$1};1'

fuzztester · 2025-06-29T13:32:02 1751203922

what does the 1 at the end do? make awk print all lines? I'm a bit rusty with my awk.

thrwwy9234 · 2025-06-29T13:45:54 1751204754

It’s a condition. 1 is true-ish, so it’s a condition that is always true. The default action in awk is print, so it’s the same as:

  '{$1=$1}1{print}'

Or the same as

  '{$1=$1}{print}'

Since the default condition is true. But 1 is shorter than {print}.

SoftTalker · 2025-06-28T18:15:01 1751134501

> awk really shoots itself so much with its lack of features that it so desperately needs

Whence perl.

librasteve · 2025-06-28T21:22:52 1751145772

or raku

https://github.com/moritz/json/blob/master/lib/JSON/Tiny/Gra...

saghm · 2025-06-28T22:36:25 1751150185

I suspect the rationale for Perl is that most Linux systems will probably have it installed already. Installing something you're familiar with is great when you can, but I'm guessing the awk script linked to here was picked more for its ubiquity than elegance.

chaps · 2025-06-28T23:06:56 1751152016

Kinda, but not really. Of the infrastructures I've worked on, not a single one has been consistent in installing perl on 100% of hosts. The ones that get close are usually like that because one high up person really, really likes perl. And they send a lot of angry emails about perl not being installed.

Within infrastructures where perl is installed on 95% of hosts, that 5% really bites you in the ass and leads to infrastructure rot very quickly. You're kinda stuck writing and maintaining two separate scripts to do the same thing.

SoftTalker · 2025-06-28T23:32:25 1751153545

Same with Python. It's mostly available, but sometimes not.

With Perl, I find that a base installation is almost always available, but many packages might not be.

chaps · 2025-06-29T00:05:00 1751155500

I dunno about that. IME, python is much, much more universally installed on the hosts I've worked on. Sure, usually it's 2.7, but it's there! I've tended to work on rhel and debian hosts, with some fedora in the mix.

(Once had a coworker reject a PR I wrote because I included a bash builtin in a deployment script. He said that python is more likely to be installed than bash, so we should not use bash. These debates are funny sometimes.)

saghm · 2025-06-30T17:04:07 1751303047

Interesting, in my experience perl ends up pulled in as a dependency for one thing or another most of the time, but I don't have that perception about Python. Maybe there's just something I use that pulls in perl without me realizing and it's biased my experience.

heresie-dabord · 2025-06-29T00:01:38 1751155298

Using the Perl module JSON::PP, the programme json_pp is installed.

The following command is handy for grepping the output:

    cat mydata.json  | json_pp

toddm · 2025-06-28T23:56:20 1751154980

Us old UNIX guys would likely go for cut for this sort of task:

     cut -d " " -f1-2,4-5 file.txt

where file.txt is:

     one two three four five

and the return is:

     one two four five

jcynix · 2025-06-28T18:28:13 1751135293

>awk really shoots itself so much with its lack of features that it so desperately needs!

That's why I use Perl instead (besides some short one liners in awk, which in some cases are even shorter than the Perl version) and do my JSON parsing in Perl.

This

diff -rs a/ b/ | ask '/identical/ {print $4}' | xargs rm

is one of my often used awk one liners. Unless some filenames contain e.g. whitespace, then it's Perl again

8n4vidtmkvmk · 2025-06-28T21:43:28 1751147008

I've been using perl instead of sed because PCRE is just better and it's the same regex that PHP uses which I've been coding in for nearly 20 years. I still don't actually know perl, but apparently Gemini does. It wrote a particularly crazy find and replace for me. Never got around to using or learning awk. Only time I see it come up is when you want to parse some tab delimited output

ptspts · 2025-06-28T20:25:32 1751142332

This is much safer: xargs -d '\n' rm -f --

jcynix · 2025-07-07T13:23:06 1751894586

Just tried to use "-d" and learned that it's a GNUism which isn't available under MacOS, so it's not a portable solution. And neither was it available under BSD 4.3 when I learned about xargs the first time.

jcynix · 2025-06-28T20:57:32 1751144252

Sure, but my example was just that and I actually use /identical$/ as the pattern. Sorry for the typo.

And I use this "historic" one liner only when I know about the contents of both directories. As soon as I need a "safer" solution I use a Perl script and pattern matching, as I said.

poke646 · 2025-06-28T23:35:52 1751153752

In this case the Perl one-liner would be conceptually identical, the same length, but more performant (no calling out to rm):

   diff -rs a/ b/ | perl -ane '/identical$/ && unlink $F[3]'

mauvehaus · 2025-06-28T17:19:48 1751131188

...And once you get away from the most basic, standard set of features, the several awks in existence have diverging sets of additional features.

chaps · 2025-06-28T18:49:28 1751136568

Things are already like that, friend! We have mawk, gawk and nawk. But it's fun to think about how we could improve our ideal tooling if we had a time machine.