Sunday, November 29, 2020

Using sed "in place" (gnu vs bsd)

 I'm not crazy after all!

Well, ok, I guess figuring out a difference between gnu sed and bsd sed is not a sign of sanity.

I use sed a fair amount in my shell scripts. Recently, I've been using "-i" a lot to edit files "in-place". The "-i" option takes a value which is interpreted as a file name suffix to save the pre-edited form of the file. You know, in case you mess up your sed commands, you can get back your original file.

But for a lot of applications, the file being edited is itself generated, so there is no need to save a backup. So just pass a null string in as the suffix. No problem, right?


[ update: useful page: https://riptutorial.com/sed/topic/9436/bsd-macos-sed-vs--gnu-sed-vs--the-posix-sed-specification ]

 

GNU SED (Linux and Cygwin)

echo "x" >x
sed -i '' -e "s/x/y/" x
sed: can't read : No such file or directory

Hmm ... that's odd. It's trying to interpret that null string as a file name, not the value for the "-i" option. Maybe it doesn't like that space between the option and the value.

echo "x" >x
sed -i'' -e "s/x/y/" x

There. It worked. I'm generally in the habit of using a space between the option and the value, but oh well. Learn something new every day...


BSD SED (FreeBSD and Mac)

echo "x" >x
sed -i'' -e "s/x/y/" x
ls x*
x    x-e

Hey, what's that "x-e" file? Oh, it IGNORED the empty string and interpreted "-e" as the suffix! Put the space back in:

echo "x" >x
sed -i '' -e "s/x/y/" x

Works. No "x-e" file.


ARGH!

I use both Mac and Linux, and want scripts that work on both!

 

THE SOLUTION

Go ahead and always generate a backup file. And don't use a space between the option and the value. This works on both:

echo "x" >x
sed -i.bak -e "s/x/y/" x
rm x.bak

Works on Mac and Linux.

IT TOOK ME A LONG TIME TO FIGURE ALL THIS OUT!!! Part of the reason it took so long is that for the cases that don't work as intended, they tend to basically work. For example, the first Linux case where it tried to interpret '' as a file. It printed an error. But then it went to the actual file and processed it correctly. The command did what it was suppose to do, but it printed an error. For the BSD case, it created a backup file using "-e" as the suffix, but it went ahead and interpreted the sed command string as a command string, and properly processed the file. In both cases, the end goal was accomplished, but with unintended side effects.

Corner cases: the bane of programmers everywhere.

Friday, November 27, 2020

Sometimes you need eval

 The Unix shell usually does a good job of doing what you expect it to do. Writing shell scripts is usually pretty straight-forward. Yes, sometimes you can go crazy quoting special characters, but for most simple file maintenance, it's not too bad.

I *think* I've used the "eval" function before today, but I can't remember why. I am confident that I haven't used it more than twice, if that many. But today I was doing something that seemed like it shouldn't be too hard, but I don't think you can do it without "eval".

RSYNC

I want to use "rsync" to synchronize some source files between hosts. But I don't want to transfer object files. So my rsync command looks somewhat like this:

rsync -a --exclude "*.o" my_src_dir/ orion:my_src_dir

The double quotes around "*.o" are necessary because you don't want the shell to expand it, you want the actual string *.o to be passed to rsync, and rsync will do the file globbing. The double quotes prevents file glob expansion. And the shell strips the double quotes from the parameter. So what rsync sees is:

rsync -a --exclude *.o my_src_dir/ orion:my_src_dir

This is what rsync expects, so all is good.

PUT EXCLUDE OPTIONS IN A SYMBOL: FAIL

For various reasons, I wanted to be able to override that exclusion option. So I tried this:

EXCL='--exclude *.o'  # default
... # code that might change EXCL
rsync -a $EXCL my_src_dir/ orion:my_src_dir

But this doesn't work right. The symbol "EXCL" will contain the string "--exclude *.o", but when the shell substitutes it into the rsync line, it then performs file globbing, and the "*.o" gets expanded to a list of files. For example, rsync might see:

rsync -a --exclude a.o b.o c.o my_src_dir/ orion:my_src_dir

The "--exclude" option only expects a single file specification.

SECOND TRY: FAIL

So maybe I can enclose $EXCL in double quotes:

rsync -a "$EXCL" my_src_dir/ orion:my_src_dir

This passes "--exclude *.o" as a *single* parameter. But rsync expects "--exclude" and the file spec to be two parameters, so it doesn't work either.

THIRD TRY: FAIL

Finally, maybe I can force quotes inside the EXCL symbol:

EXCL='--exclude "*.o"'  # default
... # code that might change EXCL
rsync -a $EXCL my_src_dir/ orion:my_src_dir

This almost works, but what rsync sees is:

rsync -a --exclude "*.o" my_src_dir/ orion:my_src_dir

It thinks the double quotes are part of the file name, so it won't exclude the intended files.

EVAL TO THE RESCUE

The solution is to use eval:

EXCL='--exclude "*.o"'  # default
... # code that might change EXCL
eval "rsync -a $EXCL my_src_dir/ orion:my_src_dir"

The shell does symbol substitution, so this is what eval sees:

rsync -a --exclude "*.o" my_src_dir/ orion:my_src_dir

And eval will re-process that string, including stripping the double quotes, so this is what rsync sees:

rsync -a --exclude *.o my_src_dir/ orion:my_src_dir

which is exactly correct.

P.S. - if anybody knows of a better way to do this, let me know!

EDIT: The great Sahir (one of my favorite engineers) pointed out a shell feature that I didn't know about:;

Did you consider setting noglob? It will prevent the shell from expanding '*'. Something like:

    EXCL='--exclude *.o' # default
    set -o noglob
    rsync -a $EXCL my_src_dir/ orion:my_src_dir
    set +o noglob

I absolutely did not know about noglob! In some ways, I like it better. The goal is to pass the actual star character as a parameter, and symbol substitution is getting in the way. Explicitly setting noglob says, "hey shell, I want to pass a string without you globbing it up." I like code that says exactly what you mean.

In contrast, my "eval" solution works fine, but the code does not make plain what my purpose was. I would need a comment that says, "using eval to prevent  the shell from doing file substitution in the parameter string." And while that's fine, I much prefer code that better documents itself.

One limitation of using noglob is that you might have a command line where you want parts of it not globbed, but other parts globbed. The noglob basically operates on a full line. So you would need to do some additional string building magic to get the right things to be done at the right time. But the same thing would be true if you were using eval. Bottom line: the shell was made powerful and flexible, but powerful flexible things tend to have subtle corner cases that must be handled in non-obvious ways. No matter what, a comment might be nice.

Tuesday, November 17, 2020

Ok, I guess I like Grammarly (grumble, grumble)

Ok, I grudgingly admit that I like Grammarly.


My complaints still hold: [UPDATE: these are all fixed now]

  1. Mac users are second-class citizens. Mac Word integration has the file size limit, and there is no Mac outlook integration. [UPDATE: Mac integration is now good]
  2. Their desktop tool won't edit a locally-stored text file. You have to do cutting and pasting. [UPDATE: it integrates well with TextEdit. But not vim.]
  3. The file size limit is too small for serious work. Yes, you can do cutting and pasting again, but really? In 2020? [UPDATE: it now operates on large files]


The grumpy old man in me really wants to mumble something about snot-nosed little kids and go back to a typewriter and liquid paper.


But ... well ... I do have some bad writing habits.


Mostly I sometimes write unnecessarily complicated sentences, including useless phrases that I must have learned sound intellectual. It's a little humbling to have it pointed out over and over, but the result of more concise writing is worth it.


Mind you, there are many MANY times that I click the trash can because I don't like Grammarly's suggestions. In much of my technical writing, I use passive voice because active is too awkward. I also deviate from the standard practice of including punctuation inside quotes, especially when the quotes are not enclosing an actual quotation, but instead are calling out or highlighting a technical term, like a variable name. If I tell you to enter "ls xyz," and you type the comma, it won't work. You have to enter "ls xyz". I also sometimes include a comma that Grammarly thinks is not needed, but I think it helps separate two ideas.


Also, Grammarly isn't good at large-scale organization of content, which can have a MUCH greater effect on clarity than a few superfluous words.


In other words, *real* editors don't have to worry about being replaced by AIs for quite a while.


And yet ... and yet ... even with its limited ability to understand what I'm trying to say, it is still improving my writing. In small ways, perhaps. But improvement is improvement.


So yeah, I'll keep paying them money (grumble, grumble).