Sunday, November 29, 2020

Using sed "in place" (gnu vs bsd)

 I'm not crazy after all!

Well, ok, I guess figuring out a difference in "sed" between gnu and bsd is not a sign of sanity.

(TL;DR this works on both: sed -i.bak -e "s/x/y/" x.txt)

I use sed a fair amount in my shell scripts. Recently, I've been using "-i" a lot to edit files "in-place". The "-i" option takes a value that is interpreted as a file name suffix to save the pre-edited form of the file. You know, in case you mess up your sed commands, you can get back your original file.

But for some scripts, the file being edited is itself generated, so there is no need to save a backup. So, just pass a null string in as the suffix. Right?

[ update: useful page: https://riptutorial.com/sed/topic/9436/bsd-macos-sed-vs--gnu-sed-vs--the-posix-sed-specification ]

BSD SED (FreeBSD and Mac)

$ echo "x" >x.txt
$ sed -i '' -e "s/x/y/" x.txt
$ cat x
y
$ ls
x.txt

Looks good. Let's try Linux.

GNU SED (Linux and Cygwin)

$ echo "x" >x.txt
$ sed -i '' -e "s/x/y/" x.txt
sed: can't read : No such file or directory
$ cat x
y
$ ls
x.txt

Hmm ... that's odd. It "worked", which is to say the file was properly edited. But what's with that "no such file" error? Man page to the rescue:

$ man sed
...
       -i[SUFFIX], --in-place[=SUFFIX]
              edit files in place (makes backup if SUFFIX supplied)

Interesting, you can omit the suffix. And you mustn't supply a space between the "-i" and the suffix; it thinks you've omitted it and treats the empty string as an input file. Here's an example with a non-empty suffix:

$ echo "x" >x.txt
$ sed -i .bak -e "s/x/y/" x.txt
sed: can't read .bak: No such file or directory
$ cat x
y
$ ls
x.txt

See? With the space, it thinks ".bak" is an input file. But we don't want a backup file, so let's try just omitting the suffix, like the man page says.

$ echo "x" >x.txt
$ sed -i -e "s/x/y/" x.txt
$ cat x
y
$ ls
x.txt

Works. Let's try it on BSD.

BSD SED (FreeBSD and Mac)

$ echo "x" >x.txt
$ sed -i -e "s/x/y/" x.txt
$ cat x
y
$ ls
x
.txt       x.txt-e

Wait, what? Again, the file was edited properly, but what's with that file "x.txt-e"? Oh, BSD sed doesn't support a missing suffix. You can supply an empty one, but you can't just omit it. So sed looked at the above command line and thought "-e" was my desired suffix. And the "-e" option is optional in front of an in-line sed program.

ARGH!

I use both Mac and Linux, and want scripts that work on both!

THE SOLUTION

There is no portable way to tell both seds that you want in-place editing but don't want a backup suffix. So just go ahead and always generate a backup file. And remember, GNU doesn't like a space after the "-i". This works on both:

$ echo "x" >x.txt
$ sed -i.bak -e "s/x/y/" x.txt
$ cat x.txt
y
$ ls
x.txt           x.txt.bak

Works on Mac and Linux. Just delete the .bak file.

It took a long time to figure all this out, largely because the incorrect usages basically worked. I.e. the intended file did get edited in place, but with undesired side effects. So I didn't notice there was a problem until I really looked at things and saw the error or the "x.txt-e" file.

Corner cases: the bane of programmers everywhere.

Friday, November 27, 2020

Sometimes you need eval

 The Unix shell usually does a good job of doing what you expect it to do. Writing shell scripts is usually pretty straight-forward. Yes, sometimes you can go crazy quoting special characters, but for most simple file maintenance, it's not too bad.

I *think* I've used the "eval" function before today, but I can't remember why. I am confident that I haven't used it more than twice, if that many. But today I was doing something that seemed like it shouldn't be too hard, but I don't think you can do it without "eval".

RSYNC

I want to use "rsync" to synchronize some source files between hosts. But I don't want to transfer object files. So my rsync command looks somewhat like this:

rsync -a --exclude "*.o" my_src_dir/ orion:my_src_dir

The double quotes around "*.o" are necessary because you don't want the shell to expand it, you want the actual string *.o to be passed to rsync, and rsync will do the file globbing. The double quotes prevents file glob expansion. And the shell strips the double quotes from the parameter. So what rsync sees is:

rsync -a --exclude *.o my_src_dir/ orion:my_src_dir

This is what rsync expects, so all is good.

PUT EXCLUDE OPTIONS IN A SYMBOL: FAIL

For various reasons, I wanted to be able to override that exclusion option. So I tried this:

EXCL='--exclude *.o'  # default
... # code that might change EXCL
rsync -a $EXCL my_src_dir/ orion:my_src_dir

But this doesn't work right. The symbol "EXCL" will contain the string "--exclude *.o", but when the shell substitutes it into the rsync line, it then performs file globbing, and the "*.o" gets expanded to a list of files. For example, rsync might see:

rsync -a --exclude a.o b.o c.o my_src_dir/ orion:my_src_dir

The "--exclude" option only expects a single file specification.

SECOND TRY: FAIL

So maybe I can enclose $EXCL in double quotes:

rsync -a "$EXCL" my_src_dir/ orion:my_src_dir

This passes "--exclude *.o" as a *single* parameter. But rsync expects "--exclude" and the file spec to be two parameters, so it doesn't work either.

THIRD TRY: FAIL

Finally, maybe I can force quotes inside the EXCL symbol:

EXCL='--exclude "*.o"'  # default
... # code that might change EXCL
rsync -a $EXCL my_src_dir/ orion:my_src_dir

This almost works, but what rsync sees is:

rsync -a --exclude "*.o" my_src_dir/ orion:my_src_dir

It thinks the double quotes are part of the file name, so it won't exclude the intended files.

EVAL TO THE RESCUE

The solution is to use eval:

EXCL='--exclude "*.o"'  # default
... # code that might change EXCL
eval "rsync -a $EXCL my_src_dir/ orion:my_src_dir"

The shell does symbol substitution, so this is what eval sees:

rsync -a --exclude "*.o" my_src_dir/ orion:my_src_dir

And eval will re-process that string, including stripping the double quotes, so this is what rsync sees:

rsync -a --exclude *.o my_src_dir/ orion:my_src_dir

which is exactly correct.

P.S. - if anybody knows of a better way to do this, let me know!

EDIT: The great Sahir (one of my favorite engineers) pointed out a shell feature that I didn't know about:;

Did you consider setting noglob? It will prevent the shell from expanding '*'. Something like:

    EXCL='--exclude *.o' # default
    set -o noglob
    rsync -a $EXCL my_src_dir/ orion:my_src_dir
    set +o noglob

I absolutely did not know about noglob! In some ways, I like it better. The goal is to pass the actual star character as a parameter, and symbol substitution is getting in the way. Explicitly setting noglob says, "hey shell, I want to pass a string without you globbing it up." I like code that says exactly what you mean.

In contrast, my "eval" solution works fine, but the code does not make plain what my purpose was. I would need a comment that says, "using eval to prevent  the shell from doing file substitution in the parameter string." And while that's fine, I much prefer code that better documents itself.

One limitation of using noglob is that you might have a command line where you want parts of it not globbed, but other parts globbed. The noglob basically operates on a full line. So you would need to do some additional string building magic to get the right things to be done at the right time. But the same thing would be true if you were using eval. Bottom line: the shell was made powerful and flexible, but powerful flexible things tend to have subtle corner cases that must be handled in non-obvious ways. No matter what, a comment might be nice.

Tuesday, November 17, 2020

Ok, I guess I like Grammarly (grumble, grumble)

Ok, I grudgingly admit that I like Grammarly.


My complaints still hold: [UPDATE: these are all fixed now]

  1. Mac users are second-class citizens. Mac Word integration has the file size limit, and there is no Mac outlook integration. [UPDATE: Mac integration is now good]
  2. Their desktop tool won't edit a locally-stored text file. You have to do cutting and pasting. [UPDATE: it integrates well with TextEdit. But not vim.]
  3. The file size limit is too small for serious work. Yes, you can do cutting and pasting again, but really? In 2020? [UPDATE: it now operates on large files]


The grumpy old man in me really wants to mumble something about snot-nosed little kids and go back to a typewriter and liquid paper.


But ... well ... I do have some bad writing habits.


Mostly I sometimes write unnecessarily complicated sentences, including useless phrases that I must have learned sound intellectual. It's a little humbling to have it pointed out over and over, but the result of more concise writing is worth it.


Mind you, there are many MANY times that I click the trash can because I don't like Grammarly's suggestions. In much of my technical writing, I use passive voice because active is too awkward. I also deviate from the standard practice of including punctuation inside quotes, especially when the quotes are not enclosing an actual quotation, but instead are calling out or highlighting a technical term, like a variable name. If I tell you to enter "ls xyz," and you type the comma, it won't work. You have to enter "ls xyz". I also sometimes include a comma that Grammarly thinks is not needed, but I think it helps separate two ideas.


Also, Grammarly isn't good at large-scale organization of content, which can have a MUCH greater effect on clarity than a few superfluous words.


In other words, *real* editors don't have to worry about being replaced by AIs for quite a while.


And yet ... and yet ... even with its limited ability to understand what I'm trying to say, it is still improving my writing. In small ways, perhaps. But improvement is improvement.


So yeah, I'll keep paying them money (grumble, grumble).