Showing posts with label tips. Show all posts
Showing posts with label tips. Show all posts

Monday, July 1, 2024

Automating tcpdump in Test Scripts

 It's not unusual for me to create a shell script to test networking software, and to have it automatically run tcpdump in the background while the test runs. Generally I do this "just in case something gets weird," so I usually don't pay much attention to the capture.

The other day I was testing my new "raw_send" tool, and my test script consisted of:

INTFC=em1
tcpdump -i $INTFC -w /tmp/raw_send.pcap &
TCPDUMP_PID=$!
sleep 0.2
./raw_send $INTFC \
 01005e000016000f53797050080046c00028000040000102f5770a1d0465e0000016940400002200e78d0000000104000000ef65030b
sleep 0.2
kill $TCPDUMP_PID

Lo and behold, the tcpdump did NOT contain my packet! I won't embarrass myself by outlining the DAY AND A HALF I spent figuring out what was going on, so I'll just give the answer.

The tcpdump tool wants to be as efficient as possible, so it buffers the packets being written to the output file. This is important because a few large file writes are MUCH more efficient than many small file writes. When you kill the tcpdump (either with the "kill" command or with control-C), it does NOT flush out the current partial buffer. There was a small clue provided in the form of the following output:

tcpdump: listening on em1, link-type EN10MB (Ethernet), capture size 262144 bytes
0 packets captured
1 packets received by filter
0 packets dropped by kernel

I thought it was filtering out my packet for some reason. But no, the "0 packets captured" means that zero packets were written to the capture file ... because of buffering.

The solution? Add the option "--immediate-mode" to tcpdump:

tcpdump -i $INTFC -w /tmp/raw_send.pcap --immediate-mode &

Works fine.


Thursday, June 27, 2024

SIGINT(2) vs SIGTERM(15)

This is another of those things that I've always known I should just sit down and learn but never did: what's the difference between SIGINT and SIGTERM? I knew that one of them corresponded to control-c, and the other corresponded to the kill command's default signal, but I always treated them the same, so I never learned which was which.

  • SIGINT (2) - User interrupt signal, typically sent by typing control-C. The receiving program should stop performing its current operation and return as quickly as appropriate. For programs that maintain some kind of persistent state (e.g. data files), those programs should catch SIGINT and do enough cleanup to maintain consistency of state. For interactive programs, control-C might not exit the program, but instead return to the program's internal command prompt.

  • SIGTERM (15) - Graceful termination signal. For example, when the OS gracefully shuts down, it will send SIGTERM to all processes. It's also the default signal sent by the "kill" command. It is not considered an emergency and so does not expect the fastest possible exit; rather a program might allow the current operation to complete before exiting, so long as it doesn't take "too long" (whatever that is). Interactive programs should typically NOT return to their internal command prompt and should instead clean up (if necessary) and exit.

This differentiation was developed when the Unix system had many users and a system operator. If the operator initiated a shutdown, the expectation was that interactive programs would NOT just return to the command prompt, but instead would respect the convention of cleaning up and exiting.

However, I've seen that convention not followed by "personal computer" Unix systems, like MacOS. With a personal computer, you have a single user who is also the operator. If you, the user and operator, initiate a shutdown on a Mac, there can be interactive programs that will pause the shutdown and ask the user whether to save their work. It still represents a difference in behavior between SIGINT and SIGTERM - SIGINT returns to normal operation while SIGTERM usually brings up a separate dialogue box warning the user of data loss - but the old expectation of always exiting is no longer universal.


Friday, April 28, 2023

reMarkable 2: Loving It

I recently got a reMarkable 2 tablet, and I'm really liking it.

It's a writing tablet for note-taking. So far, I use it exclusively for hand-written notes.

It uses paperwhite technology without a backlight. Advantage: it's easy to read in bright light (high contrast). Disadvantage: you can't use it in low light. Fortunately, I'm not interested in using it in low light.

The writing experience is as close to writing on paper with a pencil as I've ever seen.

(FYI - I paid full price for mine, and this review was not solicited or compensated.)


WHY I WANT IT

I like to take notes with pen and paper. But sometimes there just isn't a pad of paper handy. Or too many pads of paper -- i.e. not the one I had been using earlier. I can't seem to train myself to carry the same pad with me everywhere, so I end up with little scraps of paper with notes written on them that get misplaced or buried.

I tried switching to an electronic writing tablet once before, with an iPad. But I never found a stylus that was very good, and the writing experience was poor (either too much or too little friction). I couldn't write small, and there were always problems with having my palm on the surface as I wrote. (Granted, I never tried the newer iPad with the special Apple pen. Maybe it's wonderful. But I also don't like the shiny glass surface.)

In spite of those problems, I used it a lot for quite a while before a change in my work duties required less note-taking, and I stopped.

I recently returned to taking lots of notes, but that old iPad is no more. So I was back to lots of little scraps of paper being misplaced. Then I saw an ad by astrophysicist Dr. Becky for the reMarkable writing tablet. Wow, this is so much better than before. The writing experience is excellent, and the information organization is very good.


IS IT WORTH IT?

It is pricey - as of April 2023, it is $300 for the tablet, but then you have to buy the pen for $130 and a case for $130. So $560 all told. And yes, you can save a bit here and there going with cheaper options.

And understand what you're getting: this does not have a web browser. No games. No word processor. No email. This is a writing tablet.

Sure, you can upload PDFs and mark them up - something I may end up doing from time to time - but think of it as an electronic pad of paper for $560. I'm not hurting for money, so I can spend that without pain, but is the available market for well-off people wanting a digital writing tablet really big enough to support a product like this?

(shrugs) Apparently so. For me, it's definitely worth it. Your mileage may vary.

There's also an optional add-on keyboard that I don't want, and a $3/month subscription service that I don't think I need (but I might change my mind later).

Update: I got the more expensive pen with the "eraser" function. Not worth it, at least not for my use case (writing words). Maybe if I was using it for artistic drawing, but for writing, I prefer to select and cut. I would buy the cheaper pen now.


SO THAT'S ALL?

Well, I love the organization. You create folders and notebooks. And you can create tags on notebooks and on individual pages within a notebook. Sometimes tagging systems just let you match a single tag (e.g. blogger). ReMarkable lets you "and" together multiple tags to zero in on what you want.

Update: For my usage, I've realized that the advantages of tagging are not worth the extra clicks. Now I just use the "favorites" screen.

Even without the subscription, it has cloud connectivity and desktop/phone apps. It's nice to be out and about and be able to bring up recent notes on my phone.

Another cute thing that I'll probably use sometimes is a whiteboard function. The desktop app can connect to it and show your drawing in real time. You can share it on Teams/Zoom/whatever. I give product training sometimes, and I think it will be useful. (Note that it is not a collaborative whiteboard.)

It also has some kind of handwriting-to-text conversion, but I'm not interested in that, so I don't know how good it is.

Oh, and the pen won't run out of ink, mark up my fingers, stain my shirt pocket, or be borrowed by anybody. :-)

Update: definitely get the "book" style case. See below.


ANY COMPLAINTS?

The battery doesn't last as long as I had hoped. Brand new, it loses about 30% charge after a day of heavy use. I suspect that once the novelty wears off, it will last longer, but batteries also get weaker over time.

And the charge time is painfully slow. Definitely need to charge overnight.

I wish it kept time/date of last modification, ideally on a page basis, but at least on a notebook basis, but it doesn't appear to have a clock.

I find it a little hard to hold and write on while standing. I think it might be a little *too* thin. I initially bought the sleeve portfolio, but I've since ordered the book-style cover. I think it will help.

Update: the book-style case solves the "hard to use standing up" problem. Definitely worth the extra cost.

I've seen some complaints that it doesn't integrate with other note-taking systems out there, like Evernote and the like. But I never got into those. If I'm typing, I prefer a Wiki, and I just don't find myself wanting to drag in images and audio and whatever other magic Evernote has.

Some people have complained about its lack of functionality, wishing it were more of a general-purpose tablet with web browser, mail, music player, games, etc, etc. Of course, then they'll complain it doesn't have a color screen and stereo sound. Others say that the lack of those features is a strength, allowing you to focus by removing distractions.

I don't like either position. If you want a general-purpose tablet, get one. reMarkable doesn't misrepresent their product at all. And I'm not sure I buy into the whole "focus through removal of distraction" thing. If I find myself in a hard-to-focus mood and I'm using the tablet, I'll just pull out my phone and be distracted. The reason I like this writing tablet isn't so much that it is *only* a writing tablet, but rather because it is such a *good* writing tablet.


ANY LAST WORDS?

Nope.

Thursday, March 9, 2023

Examining a Running Process Environment

 I don't know why I keep forgetting about this technique. I guess it's because while it is worth its weight in gold when you need it, it just isn't needed very often.

Say you're helping somebody with a networking problem. Their process isn't behaving well.

"Is it running under Onload?"

"I don't know. I think so, but how can we tell for sure?"

$ tr '\0' '\n' </proc/12345/environ | grep LD_PRELOAD
LD_PRELOAD=libonload.so

(You need the "tr" command because Linux separates entries with a null, not a newline.)

"OK cool. What Onload env vars do you set?"

$ tr '\0' '\n' </proc/12345/environ | grep EF_
EF_RXQ_SIZE=4096
EF_UDP_RCVBUF=16000000
EF_POLL_USEC=100

BAM! No need to rely on memory or what the env "should" be. We know for sure.

Wednesday, January 5, 2022

Bash Process Substitution

I generally don't like surprises. I'm not a surprise kind of guy. If you decide you don't like me and want to make me feel miserable, just throw me a surprise party.

But there is one kind of surprise that I REALLY like. It's learning something new ... the sort of thing that makes you say, "how did I not learn this years ago???"

Let's say you want the standard output of one command to serve as the input to another command. On day one, a Unix shell beginner might use file redirection:

$ ls >ls_output.tmp
$ grep myfile <ls_output.tmp
$ rm ls_output.tmp

On day two, they will learn about the pipe:

$ ls | grep myfile

This is more concise, doesn't leave garbage, and runs faster.

But what about cases where the second program doesn't take its input from STDIN? For example, let's say you have two directories with very similar lists of files, but you want to know if there are any files in one that aren't in the other.

$ ls -1 dir1 >dir1_output.tmp
$ ls -1 dir2 >dir2_output.tmp
$ diff dir1_ouptut.tmp dir2_output.tmp
$ rm dir[12]_output.tmp

So much for conciseness, garbage, and speed.

But, today I learned about Process Substitution:

$ diff <(ls -1 dir1) <(ls -1 dir2)

This basically creates two pipes, gives them names, and passes the pipe names as command-line parameters of the diff command. I HAVE WANTED THIS FOR DECADES!!!

And just for fun, let's see what those named pipes are named:

$ echo <(ls -l dir1) <(ls -1 dir2)
/dev/fd/63 /dev/fd/62

COOL!

(Note that echo doesn't actually read the pipes.)


VARIATION 1 - OUTPUT

The "cmda <(cmdb)" construct is for cmda getting its input from the output of cmdb. What about the other way around? I.e., what if cmda wants to write its output, not to STDOUT, but to a named file, and you want that output to be the standard input of cmdb? I'm having trouble thinking here of a useful example, but here's a not-useful example:

cp file1 >(grep xyz)

I say this isn't useful because why use the "cp" command? Why not:

cat file1 | grep xyz

Or better yet:

grep xyz file1

Most shell commands write their primary output to STDOUT. I can think of some examples that don't, like giving an output file to tcpdump, or the object code out of gcc, but I can't imagine wanting to pipe that into another command.

If you can think of a good use case, let me know.


VARIATION 2 - REDIRECTING STANDARD I/O

Here's something that I have occasionally wanted to do. Pipe a command's STDOUT to one command, and STDERR to a different command. Here's a contrived non-pipe example:

process_foo 2>err.tmp | format_foo >foo.txt
alert_operator <err.tmp
rm err.tmp

You could re-write this as:

process_foo > >(format_foo >foo.txt) 2> >(alert_operator)

Note the space between the two ">" characters - this is needed. Without the space, ">>" is treated as the append redirection.

Sorry for the contrived example. I know I've wanted this a few times in the past, but I can't remember why.


And for completeness, you can also redirect STDIN:

cat < <(echo hi)

But this is the same as:

echo hi | cat

I can't think of a good use for the "< <(cmd)" construct. Let me know if you can.


EDIT:

I'm always amused when I learn something new and pretty quickly come up with a good use for it. I had some files containing a mix of latency values and some log messages. I wanted to "paste" the different files into a single file with multiple columns to produce a .CSV. But the log messages were getting in the way.

paste -d "," <(grep "^[0-9]" file1) <(grep "^[0-9]" file2) ... >file.csv

Done! :-)

Wednesday, March 17, 2021

Investment Advice from an Engineer

 I have some financial advice regarding investing in stocks. But be aware that, while I am pretty wealthy, the vast majority of my wealth came from salary from a good-paying career. You will *NOT* get rich off the stock market by following my advice.


THE ADVICE

Put money every month into an U.S. exchange-traded fund that is very broad market. Like SPDR. (I prefer VTI because it is even broader and has lower fees, but the differences aren't really that big). The goal is to keep buying and never selling, every working month of your life until you retire. (I don't have retirement advice yet.)

If the market starts going down, do *NOT* stop buying. In fact, if you can afford it, put more in. Every time the market goes down, put more in. The market will go back up, don't worry. 

The same cannot be said for an individual stock -- sometimes a company's stock will dive down and then stay down, basically forever. But the market as a whole won't do that. A dive might take a few days to recover, or might take a few years to recover. But it will recover. DON'T sell a broad fund if the market is going down. BUY it.


AND THIS WILL GET ME RICH?

No. It will give you the highest probability of a good return. Back when I was a kid, I was told to put money into a bank savings account. That was poor advice then, and is terrible advice now with interest rates close to zero. Putting money into a guaranteed bank account is guaranteed to underperform inflation. I.e. it will slowly but surely lose value.

Instead, tie your money to the overall market. The broad market has its ups and downs, but if you stick with it for a while, the overall trend is higher than inflation. 


WHAT IF I WANT TO GET RICH QUICK?

Well, you could buy a lottery ticket. That will get you rich the quickest. Assuming you win. Which you won't. Or you could go to Vegas and put a quarter into a slot machine.

But you're smarter than that. You know the chances of getting rich off of lotteries or gambling are too low. You want something that has some risk, but which will probably make you rich. And you're thinking the stock market.

Your best bet? Find a company currently trading for a few dollars a share, but is definitely for sure going to be the next Apple or Microsoft. One tiny problem: if it is definitely for sure going to be the next Apple or Microsoft, the stock price won't be a few dollars a share. It will be priced as if it already IS the next Apple or Microsoft. This is because there are tens of thousands of smart traders out there who are spending a HELL of a lot more time than you are trying to find the next big company. For a "sure thing" company that is already publicly traded, those tens of thousands have already bid up the price.

The only real chance for you to get in on the ground floor is to find a garage shop start-up, and invest in them. I have a rich friend who has made a lot of money doing exactly that. For every ten companies he invests in, nine go bankrupt within 5 years. And one goes off like a rocket.

That's how you do it. And unfortunately, you have to start out pretty rich to make this work. And you need to research the hell out of startups, and have a good crystal ball.

The only other way to do it is to find a company that the tens of thousands of smart traders thinks will NOT be a big deal, but you believe something they don't. Microsoft was publically traded at a low price for many years. The tens of thousands of smart traders in the 70s didn't think home computers would become a thing. And the few people who believed otherwise became rich.

The problem is that belief usually doesn't work very well at predicting success.

Look at BitCoin. I know several people who have made huge amounts of money on BitCoin. They did their initial investments based on a belief. A belief that BitCoin would become a real currency, just like dollars and euros; that people would be use BitCoin every day to buy and sell everything from gasoline to chewing gum. They looked at the theories of money and the principles of distributed control, and thought it would definitely for sure replace national currencies.

Those friends of mine were wrong. They made money for a completely different reason: speculators. Speculators buy things that are cheap and they think will go up in price. If enough speculators find the same thing to buy, the prices *does* go up. And more speculators jump in. BitCoin is a big speculative bubble, with almost no intrinsic value. (And yes, I know BitCoin is more complicated than that. But I stand by my 10,000-foot summary.)

Now don't get me wrong. Successful speculators DO become rich. Who am I to argue with success? But getting rich off of speculation is all about timing. You have to find the next big thing before most of the other speculators do, and then jump back out *before* it has run its course. Will BitCoin come crashing back down? Not necessarily. If enough people simply *believe* in it, it will retain its value. My own suspicion is that it will eventually crash but what do I know? I thought it would have crashed by now.

That's also what happened with GameStop. A group of Reddit-based speculators decided to pump up the price of a company. If you were in on it from the start, you probably made a ton of money. But once it hit the news, it was too late for you to get rich off of it. The best you could hope for was to make a little money and then get out FAST. But most people who jumped into GameStop after it had already made the news ended up losing money.

(BTW, "pump-and-dump" is against the law. I will be interested to find out if any of the Reddit-based traders get in trouble.)

Anyway, I know of many people who have taken a chance on a stock, and made some money. But they didn't get rich. And if they keep trying to replicate their early success, they'll end up winning some and losing some. And if they're smart, and work hard at it, they may out-perform the overall market in the long run. But remember - those tens of thousands of smart traders are also trying to out-perform the overall market. For you to do it repeatedly for many years probably requires expertise that those smart traders don't have. And you don't get rich quick this way, you just make some good money.


WHAT IF I JUST WANT TO PLAY THE MARKET

(shrugs) Everybody needs a hobby. I have a friend who goes to Vegas once a year. Sometimes he comes back negative, sometimes positive. He has absolutely no illusion that he will get rich in Vegas. He assumes he will either make a little or lose a little. And he has fun doing it. There's nothing wrong with spending money to have fun.

If you play the stock market as a game, where you aren't risking your financial future, then more power to you. But I knew one person who had to stop for his own emotional well-being. He started feeling bad every time he lost some money because he should have invested less, but also felt bad when he made money because he should have invested more. Overall he made money, but he had so much anxiety doing it that he decided it wasn't worth it.

Sunday, November 29, 2020

Using sed "in place" (gnu vs bsd)

 I'm not crazy after all!

Well, ok, I guess figuring out a difference in "sed" between gnu and bsd is not a sign of sanity.

(TL;DR this works on both: sed -i.bak -e "s/x/y/" x.txt)

I use sed a fair amount in my shell scripts. Recently, I've been using "-i" a lot to edit files "in-place". The "-i" option takes a value that is interpreted as a file name suffix to save the pre-edited form of the file. You know, in case you mess up your sed commands, you can get back your original file.

But for some scripts, the file being edited is itself generated, so there is no need to save a backup. So, just pass a null string in as the suffix. Right?

[ update: useful page: https://riptutorial.com/sed/topic/9436/bsd-macos-sed-vs--gnu-sed-vs--the-posix-sed-specification ]

BSD SED (FreeBSD and Mac)

$ echo "x" >x.txt
$ sed -i '' -e "s/x/y/" x.txt
$ cat x
y
$ ls
x.txt

Looks good. Let's try Linux.

GNU SED (Linux and Cygwin)

$ echo "x" >x.txt
$ sed -i '' -e "s/x/y/" x.txt
sed: can't read : No such file or directory
$ cat x
y
$ ls
x.txt

Hmm ... that's odd. It "worked", which is to say the file was properly edited. But what's with that "no such file" error? Man page to the rescue:

$ man sed
...
       -i[SUFFIX], --in-place[=SUFFIX]
              edit files in place (makes backup if SUFFIX supplied)

Interesting, you can omit the suffix. And you mustn't supply a space between the "-i" and the suffix; it thinks you've omitted it and treats the empty string as an input file. Here's an example with a non-empty suffix:

$ echo "x" >x.txt
$ sed -i .bak -e "s/x/y/" x.txt
sed: can't read .bak: No such file or directory
$ cat x
y
$ ls
x.txt

See? With the space, it thinks ".bak" is an input file. But we don't want a backup file, so let's try just omitting the suffix, like the man page says.

$ echo "x" >x.txt
$ sed -i -e "s/x/y/" x.txt
$ cat x
y
$ ls
x.txt

Works. Let's try it on BSD.

BSD SED (FreeBSD and Mac)

$ echo "x" >x.txt
$ sed -i -e "s/x/y/" x.txt
$ cat x
y
$ ls
x
.txt       x.txt-e

Wait, what? Again, the file was edited properly, but what's with that file "x.txt-e"? Oh, BSD sed doesn't support a missing suffix. You can supply an empty one, but you can't just omit it. So sed looked at the above command line and thought "-e" was my desired suffix. And the "-e" option is optional in front of an in-line sed program.

ARGH!

I use both Mac and Linux, and want scripts that work on both!

THE SOLUTION

There is no portable way to tell both seds that you want in-place editing but don't want a backup suffix. So just go ahead and always generate a backup file. And remember, GNU doesn't like a space after the "-i". This works on both:

$ echo "x" >x.txt
$ sed -i.bak -e "s/x/y/" x.txt
$ cat x.txt
y
$ ls
x.txt           x.txt.bak

Works on Mac and Linux. Just delete the .bak file.

It took a long time to figure all this out, largely because the incorrect usages basically worked. I.e. the intended file did get edited in place, but with undesired side effects. So I didn't notice there was a problem until I really looked at things and saw the error or the "x.txt-e" file.

Corner cases: the bane of programmers everywhere.

Friday, November 27, 2020

Sometimes you need eval

 The Unix shell usually does a good job of doing what you expect it to do. Writing shell scripts is usually pretty straight-forward. Yes, sometimes you can go crazy quoting special characters, but for most simple file maintenance, it's not too bad.

I *think* I've used the "eval" function before today, but I can't remember why. I am confident that I haven't used it more than twice, if that many. But today I was doing something that seemed like it shouldn't be too hard, but I don't think you can do it without "eval".

RSYNC

I want to use "rsync" to synchronize some source files between hosts. But I don't want to transfer object files. So my rsync command looks somewhat like this:

rsync -a --exclude "*.o" my_src_dir/ orion:my_src_dir

The double quotes around "*.o" are necessary because you don't want the shell to expand it, you want the actual string *.o to be passed to rsync, and rsync will do the file globbing. The double quotes prevents file glob expansion. And the shell strips the double quotes from the parameter. So what rsync sees is:

rsync -a --exclude *.o my_src_dir/ orion:my_src_dir

This is what rsync expects, so all is good.

PUT EXCLUDE OPTIONS IN A SYMBOL: FAIL

For various reasons, I wanted to be able to override that exclusion option. So I tried this:

EXCL='--exclude *.o'  # default
... # code that might change EXCL
rsync -a $EXCL my_src_dir/ orion:my_src_dir

But this doesn't work right. The symbol "EXCL" will contain the string "--exclude *.o", but when the shell substitutes it into the rsync line, it then performs file globbing, and the "*.o" gets expanded to a list of files. For example, rsync might see:

rsync -a --exclude a.o b.o c.o my_src_dir/ orion:my_src_dir

The "--exclude" option only expects a single file specification.

SECOND TRY: FAIL

So maybe I can enclose $EXCL in double quotes:

rsync -a "$EXCL" my_src_dir/ orion:my_src_dir

This passes "--exclude *.o" as a *single* parameter. But rsync expects "--exclude" and the file spec to be two parameters, so it doesn't work either.

THIRD TRY: FAIL

Finally, maybe I can force quotes inside the EXCL symbol:

EXCL='--exclude "*.o"'  # default
... # code that might change EXCL
rsync -a $EXCL my_src_dir/ orion:my_src_dir

This almost works, but what rsync sees is:

rsync -a --exclude "*.o" my_src_dir/ orion:my_src_dir

It thinks the double quotes are part of the file name, so it won't exclude the intended files.

EVAL TO THE RESCUE

The solution is to use eval:

EXCL='--exclude "*.o"'  # default
... # code that might change EXCL
eval "rsync -a $EXCL my_src_dir/ orion:my_src_dir"

The shell does symbol substitution, so this is what eval sees:

rsync -a --exclude "*.o" my_src_dir/ orion:my_src_dir

And eval will re-process that string, including stripping the double quotes, so this is what rsync sees:

rsync -a --exclude *.o my_src_dir/ orion:my_src_dir

which is exactly correct.

P.S. - if anybody knows of a better way to do this, let me know!

EDIT: The great Sahir (one of my favorite engineers) pointed out a shell feature that I didn't know about:;

Did you consider setting noglob? It will prevent the shell from expanding '*'. Something like:

    EXCL='--exclude *.o' # default
    set -o noglob
    rsync -a $EXCL my_src_dir/ orion:my_src_dir
    set +o noglob

I absolutely did not know about noglob! In some ways, I like it better. The goal is to pass the actual star character as a parameter, and symbol substitution is getting in the way. Explicitly setting noglob says, "hey shell, I want to pass a string without you globbing it up." I like code that says exactly what you mean.

One limitation of using noglob is that you might have a command line where you want parts of it not globbed, but other parts globbed. The noglob basically operates on a full line. So you would need to do some additional string building magic to get the right things done at the right time. But the same thing would be true if you were using eval. Bottom line: the shell was made powerful and flexible, but powerful flexible things tend to have subtle corner cases that must be handled in non-obvious ways. No matter what, a comment might be nice.

FULL DISCLOSURE: I tried it and it didn't work as expected. It's probably related to all the crazy quoting. Since my "eval" solution worked, I didn't invest the time to figure out why the "noglob" method didn't work. So I'm still using eval even though noglob is arguably better for this purpose.

Wednesday, October 14, 2020

Strace Buffer Display

 The "strace" tool is powerful and very useful. Recently a user of our software sent us an strace output that included a packet send. Here's an excerpt:

sendmsg(88, {msg_name(16)={sa_family=AF_INET, sin_port=htons(14400), sin_addr=inet_addr("239.84.0.100")}, msg_iov(1)=[{"\2\0a\251C\27c;\0\0\2\322\0\0/\263\0\0\0\0\200\3\0\0", 24}], msg_controllen=0, msg_flags=0}, 0) = 24 <0.000076>

Obviously there's some binary bytes being displayed. I see a "\0a", so it's probably hex. But wait, there's also a \251. Does that mean 0x25 followed by ascii '1'? I decoded it assuming hex, and the packet wasn't valid.

So I did a bit of Googling. Unfortunately, I didn't note where I saw it, but somebody somewhere said that it follows the C string conventions. And C strings come from long ago, when phones had wires connecting them to wall jacks, stack overflow was a bug in a recursive program, and octal ruled the waves when it came to specifying binary data.

So \0a is 0x00 followed by ascii 'a' and \251 is 0xa9. Now the packet parses out correctly. (It's a "Session Message" if you're curious.)

So, I guess I'm a little annoyed by that choice as I would prefer hex, but I guess there's nothing all that wrong with it. Hex or octal: either way I need a calculator to convert to decimal. (And yes, I'm sure some Real Programmers can convert in their heads.)

Sunday, July 12, 2020

Perl Diamond Operator

As my previous post indicates, I've done some Perl noodling this past week. (I can't believe that was my first Perl post on this blog! I've been a Perl fan for a loooooooong time.)

Anyway, one thing I like about Perl is it takes a use case that tends to be used a lot and adds language support for it. Case in point: the diamond operator "<>" (also called "null filehandle" or "null angle operator").

See the "Tutorial" section below if you are not familiar with the diamond operator.

Tips

I may expand this as time goes on.

Filename / Linenumber

Inside the loop, you can use "$." as the line number and "$ARGV" as the file name of the currently open file.

*BUT*, see next tip.

Continue / Close

Always code  your loop as follows:

while (<>) {
  ...
} continue {
  close ARGV if eof;
}

The continue clause is needed to have "$." refer to the line number within the *current* file. Without it "$." will refer to the total number of lines read so far.

In my opinion, even if what you want is total lines and not line within file, you should still code it like the above and just use your own counter for the total line number. This provides consistency of meaning for "$.". Plus, it's possible that in the future you will want to add functionality that requires line within file, and it's messy to code that with your own counter.

Skip Rest of File

Sometimes you get a little ways into a file and you decide that you're done with the file and would like to skip to the next (if any). Include this inside the loop:

close ARGV;  # skip rest of current file

Positional Parameter

Let's say you're writing a perl version of grep, and you want the first positional parameter (after the options) to be the search pattern.

$ grep.pl "ford" *.txt

Unfortunately, this will try to read a file named "ford" as the first file. What to do?

my $pat = shift;  # Pops off $ARGV[0].
while (<>) {
  ...

This works because "<>" doesn't actually look at the command line. It looks at the @ARGV array. The "shift" function defaults to operating on the @ARGV array.

Security Warning

Because of the way the diamond operator opens files, it is possible for a hostile user to construct a file that can produce very bad results. For example:

$ echo "hello world" >x
$ echo "goodby world" >'rm x|'
$ ls -1
rm x|
x
$ cat *
goodby world
hello world
$ cat x
hello world

So far, so good. "rm x|" is just an unusually-named file with a space in the middle and a pipe ("|") at the end. But now let's use my perl version of grep with a pattern of "." (matches all non-empty lines):

$ grep.pl "." *
Can't open x: No such file or directory at /home/sford/bin/grep.pl line 81.
$ cat x
cat: x: No such file or directory

Yup, grep.pl just deleted the file named "x". The pipe character at the end of the file "rm x|" invoked Perl's opening a filehandle into a command functionality (with the 2-argument open). In other words, by just naming a file in a particular way, you've made grep.pl do something unexpected and potentially dangerous.

This might look like a horrible security hole (what if the name of that rogue file resulted in deleting all your files?), but it can also be a very powerful (albeit rarely used) feature. The moral of the story is don't run *any* tool over a set of files that you aren't familiar with.

You can also instead use "<<>>" instead of "<>". But this requires Perl version 5.22 or newer, which rarely seems to be on any system I try to use. This will force each input file to be opened as a file, not potentially as a command.

Unfortunately, it also prevents the special handling of input file named "-" to read standard input. This is a construct that I do use periodically.

Tutorial

Many Unix commands have the following semantics:

cmdname -options [input_file [input_file2 ...] ]

where the command will read from each input file sequentially, or from standard input if no input files are provided. File names can be wildcarded. Most such Unix commands allow you to supply "-" as a file name and the tool will read from standard input.

The diamond operator makes this ridiculously easy. Here's a minimal "cat" command in Perl:

#!/usr/bin/env perl
while (<>) {
  print $_;
}


That's the whole thing. It takes zero or more input files (if none, reads from standard input) and concatenates them  to standard out. Just like "cat".

Specifically what "<>" does is read one line from whatever input file is currently open. If it is at the end of the file, "<>" will automatically open the next file (if any) and read a line from it. As with many Perl built-ins, it leaves the just-read line in the "$_" variable.

You should be ready for the "Tips" section now.

Wednesday, June 3, 2020

Multicast Routing Tutorial

I wanted to direct a colleague to a tutorial on Multicast that explained about IGMP Snooping and the IGMP Query function. But I couldn't find one. So I'm writing one with an expand scope.


Multicast is a method of network addressing

Normally, when a publisher sends a packet, that packet has a destination address of a specific receiver. The sending application essentially chooses which host to send to, and the network's job is to transport that packet to that host. With multicast, the destination address is NOT the address of a specific host. Rather, it is a "multicast group", where the word "group" means zero or more destination hosts. Instead of the publisher choosing which receivers should get the packet, the receivers themselves tell the network that they want to receive packets sent to the group.

For example, if I send to multicast group 239.1.2.3, I have no idea which or even how many receivers might get that packet. Maybe none at all. Or maybe hundreds. It depends on the receivers - any receiver that has joined group 239.1.2.3 will get a copy of the packet.


Layer 2 (Ethernet)

For this tutorial, layer 2 means Ethernet, and layer 3 means IP. Yes, there are other possibilities. No, I don't care. :-)

Multicast exists in Ethernet and IP. They are NOT the same thing!

In the old days of Ethernet, you had a coaxial cable that had taps on it (hosts connected to the taps). At any given point in time, only one machine could be sending on the cable, and the signal was seen by all machines attached to the same cable. The NIC's job was to look at each and every packet that came by and consume the ones that are addressed to that NIC, ignoring packets not addressed to the NIC.

Ethernet multicast was a very simple affair. The NIC could be programmed to consume packets from a small number of application-specified Ethernet multicast groups. Multicast packets, like all other packets, were seen by every NIC on the network, but only NICs programmed to look for a multicast group would consume packets sent to that group.

Ethernet multicast was not routable to other networks.


LAYER 3 (IP)

IP multicast was built on top of Ethernet multicast, but added a few things, like the ability to route multicast to other networks.

When routing multicast, you don't want to just send *all* multicast packets to *all* networks. That would waste too much WAN bandwidth. So the IGMP protocol was created. Among other things, IGMP provides a way for routers to be informed about multicast *interest* in each network. So if a router sees a multicast packet for group X, and it knows that a remote network has one or more hosts joined to that group, it will forward the packet to that network. (That network's router will then use Ethernet multicast to distribute the packet.)

To maintain the router's group interest table, IGMP introduced a "join" command by which a host on a network tells the routers that it is interested in group X. There is also an IGMP "leave" message for when the host loses interest.

But what if the host crashes without sending a "leave" message? Routers have a lifetime for the multicast routing table entries. After three minutes, the router will remove a multicast group from its table unless the entry is refreshed. So every minute, the router sends out an "IGMP Query" command and the hosts will respond according to the groups they are joined to. This keeps the table refreshed for groups that still have active listeners, but lets dead table entries time out.

If the network administrator forgets to configure the routers to run an IGMP Querier, after 3 minutes the router will stop forwarding multicast packets (multicast "deafness" across a WAN).


IGMP Snooping

As mentioned earlier, in the old days of coaxial Ethernet, every packet is sent to every host, and the host's NIC is responsible for consuming only those packets it needs. This limits the total aggregate capacity of the network to the NIC speed. Modern switches greatly boost network capacity by isolating traffic flows from each other, so that host A's sending to host B doesn't interfere with host C's sending to host D. The total aggregate bandwidth capacity is MUCH greater than the NIC's link speed.

Multicast is different. Layer 2 Ethernet doesn't know which hosts have joined which multicast groups, so by default older switches "flood" multicast to all hosts. So similar to coaxial Ethernet, the total aggregate bandwidth capacity for multicast is essentially limited to NIC link speed.

So they cheated a little bit. Even though IGMP is a layer 3 IP protocol, layer 2 Ethernet switches can be configured to passively observe ("snoop") IGMP packets. And instead of keeping track of which networks are interested in what groups, the switch keeps track of which individual hosts on the network are interested in which groups. When the switch has a multicast packet, it can send it only to those hosts that are interested. So now you can have multiple multicast groups carrying heavy traffic, and so long as a given host doesn't join more groups than its NIC can handle, you can have aggregate multicast traffic far higher than NIC link speeds.

As with routers, the switch's IGMP Snooping table is aged out after 3 minutes. A router's IGMP query accomplishes the same function with the switch; it refreshes the active table entries.

Note that a layer 2 Ethernet switch is supposed to be a passive listener to the IGMP traffic between the hosts and the routers. However, for the special case where a network has no multicast routing, the IGMP Querier function is missing. So switches that support IGMP Snooping also support an IGMP querier. If the network administrator forgets to configure IGMP Snooping to query, the switch will stop delivering multicast packets after 3 minutes (multicast "deafness" on the LAN).

Friday, May 1, 2020

Wireshark: "!=" May have unexpected results

How is it that I haven't sung the praises of Wireshark in this blog? It's one of my favorite tools of all time! I can't get over how powerful it has become.

Occasionally, there are little gotchas. Like when using the "!="  operator, a yellow warning flashes a few times saying:
"!=" may have unexpected results (See the User's Guide)
My first thought: "There's a User's Guide?" My second: "Couldn't you give me a link to it?"

OK, Fine. Here's a direct link to the section "A Common Mistake with !=". The problem is related to "combined expressions", like eth.addr, ip.addr, tcp.port, and udp.port. Those expressions do an implicit "or"ing of the corresponding source and destination fields. E.g. saying tcp.port==12000 is shorthand for:
  (tcp.srcport==12000 || tcp.dstport==12000)

So if you want the opposite of tcp.port==12000, it seems logical to use tcp.port!=12000, thinking that it will eliminate all packets with either source or destination equal to 12000. But you're wrong. It expands to:
  (tcp.srcport!=12000 || tcp.dstport!=12000)
which, doing a little boolean algebra is the same as:
  !(tcp.srcport==12000 && tcp.dstport==12000)
In other words, instead of eliminating packets with either source or destination of 12000, it only eliminates packets with both source and destination of 12000. You'll get way more packets than you wanted.

The solution? Use:
  !(tcp.port==12000)
or better yet, expand it out:
  tcp.srcport!= 12000 && tcp.dstport!=12000
I prefer that.

What's So Special About "!="?

It got me to thinking about other forms of inequality, like ">" or "<". Let's say you want to display packets where source or destination (or both) are an ephemeral port:
  tcp.port > 49152
Now let's say you want the opposite - packets where *neither* port is ephemeral. If you do this:
  tcp.port <= 49152
you have the same problem (but this time Wireshark doesn't warn you). This displays all packets that where source or destination (or both) are non-ephemeral.

At the root of all this is the concept of "opposite". With:
  a compare b
you think the opposite is:
  a opposite_compare b
But given the existence of combined expressiions, it's usually safer to use:
  !(a compare b)

Tuesday, September 11, 2018

Safe sscanf() Usage

The scanf() family of functions (man page) are used to parse ascii input into fields and convert those fields into desired data types.  I have very rarely used them, due to the nature of the kind of software I write (system-level, not application), and I have always viewed the function with suspicion.

Actually, scanf() is not safe.  Neither is fscanf().  But this is only because when an input conversion fails, it is not well-defined where the file pointer is left.  Fortunately, this is pretty easy to deal with - just use fgets() to read a line and then use sscanf().

You can use sscanf() safely, but you need to follow some rules.  For one thing, you should include field widths for pretty much all of the fields, not just strings.  More on the rules later.


Input Validity Checking with sscanf()

I am a great believer in doing a good job of checking validity of inputs (see strtol preferred over atoi).  It's not enough to process your input "safely" (i.e. preventing buffer overflows, etc.), you should also detect data errors and report them.  For example, if I want to input the value "21" and I accidentally type "2`", I want the program to complain and re-prompt, not simply take the value "2" and ignore the "`".

One problem with doing a good job of input validation is that it often leads to lots of checking code.  In strtol preferred over atoi, my preferred code consumes 4 lines, and even that is cheating (combining two statements on one line, putting close brace at end of line).  And that's for a single integer!  What if you have a line with multiple pieces of information on it?

Let's say I have input lines with 3 pipe-separated fields:

  id_num|name|age

where "id_num" and "age" are 32-bit unsigned integers and "name" is alphabetic with a few non-alphas.  We'll further assume that a name must not contain a pipe character.

Bottom line: here is my preferred code:

#define NAME_MAX_LEN 60
. . .
unsigned int id;
char name[NAME_MAX_LEN+1];  /* allow for null */
unsigned int age;
int null_ofs = 0;

(void)sscanf(iline,
    "%9u|"  /* id */
    "%" STR(NAME_MAX_LEN) "[-A-Za-z,. ]|"  /* name */
    "%3u\n"  /* age */
    "%n",  /* null_ofs */
    &id, name, &age, &null_ofs);

if (null_ofs == 0 || iline[null_ofs] != '\0') {
    printf("PARSE ERROR!\n");
}


Error Checking

My long-time readers (all one of you!) will take issue with me throwing away the sscanf() return value.  However, it turns out that sscanf()'s return value really isn't as useful as it should be.  It tells you how many successful conversions were done, but doesn't tell you if there was extra garbage at the end of the line.  A better way to detect success is to add the "%n" at the end to get the string offset where conversion stopped, and make sure it is at the end of the string.

If any of the conversions fail, the "%n" will not be assigned, leaving it at 0, which is interpreted as an error.  Or, if all conversions succeed, but there is garbage at the end, "iline[null_ofs]" will not be pointing at the string's null, which is also interpreted as an error.


STR Macro

More interesting is the "STR(NAME_MAX_LEN)" construct.  This is a specialized macro I learned about on StackOverflow:

#define STR2(_s) #_s
#define STR(_s) STR2(_s)

Yes, both macros are needed.  So the construct:
  "%" STR(NAME_MAX_LEN) "[-A-Za-z,. ]"
gets compiled to the string:
  "%60[-A-Za-z,. ]"

So why use that macro?  If I had hard-coded the format string as "%60[-A-Za-z,. ]", it would risk getting out of sync with the actual size of "name" if we find somebody with a 65 character name.  (I would like it even more if I could use "sizeof(name)", but macro expansion and "sizeof()" are handled by different compiler phases.)


Newline Specifier

This one a little subtle.  The third conversion specifier is "%3u\n".  This tells sscanf() to match newline at the end of the line.  But guess what!  If I don't include a newline on the string, the sscanf() call still succeeds, including setting null_ofs to the input string's null.  In my opinion, this is a slight imperfection in sscanf(): I said I needed a newline, but sscanf() accepts my input without the newline.  I suspect sscanf() is counting newline as whitespace, which it knows to skip over.

If I *really* want to require the newline, I can do:

if (iline[null_ofs - 1] != '\n') then ERROR...


Integer Field Widths

Finally, I add field widths to the unsigned integer conversions.  Why?  If you leave it out, then it will do a fine job of converting numbers from 0..4294967295.  But try the next number, 4294967296, and what will you get?  Arithmetic overflow, with a result of 0 (although the C standards do not guarantee that value).  And sscanf() will not return any error.  So I restrict the number of digits to prevent overflow.  The disadvantage is that with "%9u" you cannot input the full range of an unsigned integer.  An alternative is to use "%10Lu" and supply a long long variable.  Then range check it in your code.  Or you could just use "%8x" and have the user input hexidecimal.  Heck, you could even input it as a 10-character string and use strtol()!

I guess "%9u" was good enough for my needs.  I just wish you could just tell sscanf() to stop converting on arithmetic overflow!


Not Perfect

I already mentioned arithmetic overflow. Any other problems with the above code?  Well, yeah, I guess.  Even though I specify "id" and "age" to be unsigned integers, I notice that sscanf() will allow negative numbers to be inputted.  Sometimes we engineers consider this a feature, not a bug; we like to use 0xFFFFFFFF as a flag value, and it's easier to type "-1".  But I admit it still bothers me.  If I input an age of -1, we end up with the 4-billiion-year-old man (apologies to Mel Brooks).


Tuesday, September 4, 2018

Safe C?

A Slashdot post led me to some good pages related to C safety.
(Update: actually I did go ahead and order the tee shirt.)


Monday, August 27, 2018

Safer Malloc for C

Here's a fragment of code that I recently wrote.  See anything wrong with it?

#define VOL(_var, _type) (*(volatile _type *)&(_var))
. . .
lbmbt_rcv_binding_t *binding = NULL;
. . .
binding = (lbmbt_rcv_binding_t *)malloc(sizeof(lbmbt_rcv_t));
memset(binding, 0xa5, sizeof(lbmbt_rcv_t));
binding->signature = SIGNATURE_OK;
. . .
VOL(binding->signature), int) = SIGNATURE_FREE;
free(binding);

No, the problem isn't in that strange VOL() macro; that is explained here.

The problem is that I am trying to allocate an "lbmbt_rcv_binding_t" structure, but because of cut-and-past programming and being in a hurry, I used the size of a different (and smaller) structure: lbmbt_rcv_t.  Since the "signature" field is the last field in the structure, the assignments to it write past the end of the allocated memory block.  GAH!

But we all knew that C is a dangerous language, with its casting and sizeof just begging to be used wrong.  But could it be at least a *little* safer?


A Safer Malloc

#define VOL(_var, _type) (*(volatile _type *)&(_var))

/* Malloc "n" copies of a type. */
#define SAFE_MALLOC(_var, _type, _n) do {\
  _var = (_type *)malloc((_n) * sizeof(_type));\
  if (_var == NULL) abort();\
} while (0)
. . .
lbmbt_rcv_binding_t *binding = NULL;
. . .
SAFE_MALLOC(binding, lbmbt_rcv_t, 1);
memset(binding, 0xa5, sizeof(lbmbt_rcv_t));
binding->signature = SIGNATURE_OK;
. . .
VOL(binding->signature), int) = SIGNATURE_FREE;
free(binding);

The above code still has the bug -- it passed in the wrong type -- but at least it generates a compile warning.  The cast of malloc doesn't match the type of the variable.  Since I insist on getting rid of compiler warnings, that would have flagged the bug to me.


If you don't need portability, you can even use the gcc-specific extension "typeof()" in BOTH macros:

#define VOL(_var) (*(volatile typeof(_var) *)&(_var))

/* Malloc "n" copies of a type. */
#define SAFE_MALLOC(_var, _n) do {\
  _var = (typeof(*_var) *)malloc((_n) * sizeof(typeof(*_var)));\
  if (_var == NULL) abort();\
} while (0)
. . .
lbmbt_rcv_binding_t *binding = NULL;
. . .
SAFE_MALLOC(binding, 1);
memset(binding, 0xa5, sizeof(lbmbt_rcv_t));
binding->signature = SIGNATURE_OK;
. . .
VOL(binding->signature) = SIGNATURE_FREE;
free(binding);

Now the malloc bug is gone ... by design!


Safer Malloc and Memset

Finally, notice that the memset is also wrong.  Since I frequently like to init my malloced segments to a known pattern (0xa5 is my habit), I can define a second macro:

#define VOL(_var) (*(volatile typeof(_var) *)&(_var))

/* Malloc "n" copies of a type and set its contents to a pattern. */
#define SAFE_MALLOC_SET(_var, _pat, _n) do {\
  _var = (typeof(*_var) *)malloc((_n) * sizeof(typeof(*_var)));\
  if (_var == NULL) abort();\
  memset(_var, _pat, (_n) * sizeof(typeof(*_var)));\
} while (0)
. . .
lbmbt_rcv_binding_t *binding = NULL;
. . .
SAFE_MALLOC_SET(binding, 0xa5, 1);
binding->signature = SIGNATURE_OK;
. . .
VOL(binding->signature) = SIGNATURE_FREE;
free(binding);

No more bugs.


P.S.

It took me WAY longer than it should have to track this down (an entire weekend).  Why?  The bug manifested as a segmentation fault.  So I figured I could just valgrind it.  But lo and behold, I didn't have Valgrind on my Mac.  "No Problem," sez I.  I sez, "I'll just install it."

$ brew install valgrind
valgrind: This formula either does not compile or function as expected on macOS
versions newer than Sierra due to an upstream incompatibility.

GAH!  "Still no problem," sez I.  I sez, "I'll just debug it on Linux."

No seg fault.  Program worked fine.  Valgrind makes no complaint.  So I spent a the weekend doing a divide-and-conquer trackdown of the bug in a large codebase to find the "Mac-specific" bug.  And finally found the bad malloc.

But wait!  That's not Mac-specific!  What's going on here?  After wasting some more time, I finally printed the sizeof() for each structure type.  On Mac, the sizes are different.  On Linux, the structures are the same size.  Of course valgrind on Linux said it was OK -- the malloc size was correct on Linux!

And now I have Valgrind on my mac  which pinpointed the bug immediately.  (Thanks Güngör Budak!).

Friday, November 18, 2016

Linux network stack tuning

Found a nice blog post that talks about tuning the Linux network stack:

http://blog.packagecloud.io/eng/2016/06/22/monitoring-tuning-linux-networking-stack-receiving-data/

I notice that it doesn't talk about pinning interrupts to the NUMA zone that the NIC is physically connected to (or at least "NUMA" doesn't appear in the post), so it doesn't have *everything*.  :-)

And it also doesn't mention kernel bypass libraries, like OpenOnload.

But it has a lot of stuff in it.

Sunday, July 31, 2016

Beginner Shell Script Examples

As I've mentioned, I am the proud father of a C.H.I.P. single-board computer.  I've been playing with it for a while, and have also been participating in the community message board.  I've noticed that there are a lot of beginners there, just learning about Linux.  This collection of techniques assumes you know the basics of shell scripting with BASH.

One of the useful tools I've written is a startup script called "blink.sh".  Basically, this script blinks CHIP's on-board LED, and also monitors a button to initiate a graceful shutdown.  (It does a bit more too.)  I realized that this script demonstrates several techniques that CHIP beginners might like to see.

The "blink.sh" script can be found here: https://github.com/fordsfords/blink/blob/gh-pages/blink.sh.  For instructions on how to install and use blink, see https://github.com/fordsfords/blink/tree/gh-pages.

The code fragments included below are largely extracted from the blink.sh script, with some simplifications.

NOTE: many of the commands shown below require root privilege to work.  It is assumed that the "blink.sh" script is run as root.


1. Systemd service, which automatically starts at boot, and can be manually started and stopped via simple commands.

I'm not an expert in all things Linux, but I've been told that in Debian-derived Linuxes, "systemd" is how all the cool kids implement services and startup scripts.  No more "rc.local", no run levels, etc.

Fortunately, systemd services are easy to implement.  The program itself doesn't need to do anything special, although you might want to implement a kill signal handler to cleanup when the service is stopped.

You do need a definition file which specifies the command line and dependencies.  It is stored in the /etc/systemd/system directory, named "<sevrice_name>.service".  For example, here's blink's definition file:

$ cat /etc/systemd/system/blink.service 
# blink.service -- version 24-Jul-2016
# See https://github.com/fordsfords/blink/tree/gh-pages
[Unit]
Description=start blink after boot
After=default.target

[Service]
Type=simple
ExecStart=/usr/local/bin/blink.sh

[Install]
WantedBy=default.target

When that file is created, you can tell the system to read it with:

sudo systemctl enable /etc/systemd/system/blink.service

Now you can start the service manually with:

sudo service blink start

You can manually stop it with:

sudo service blink stop

Given the way it is defined, it will automatically start at system boot.


2. Shell script which catches kill signals to clean itself up, including the signal that is generated when the service is stopped manually.

The blink script wants to do some cleanup when it is stopped (unexport GPIOs).

trap "blink_stop" 1 2 3 15

where "blink_stop" is a Bash function:

blink_stop()
{
  blink_cleanup
  echo "blink: stopped" `date` >>/var/log/blink.log
  exit
}

where "blink_cleanup" is another Bash function.

This code snippet works if the script is used interactively and stopped with control-C, and also works if the "kill" command is used (but not "kill -9"), and also works when the "service blink stop" command is used.


3. Shell script with simple configuration mechanism.

This technique uses the following code in the main script:

export MON_RESET=
export MON_GPIO=
export MON_GPIO_VALUE=0  # if MON_GPIO supplied, default to active-0.
export MON_BATTERY=
export BLINK_STATUS=
export BLINK_GPIO=
export DEBUG=

if [ -f /usr/local/etc/blink.cfg ]; then :
  source /usr/local/etc/blink.cfg
else :
  MON_RESET=1
  BLINK_STATUS=1
fi

The initial export commands define environment variables with default values.  The use of the "source" command causes the /usr/local/etc/blink.cfg to be read by the shell, allowing that file to define shell variables.  In other words, the config file is just another shell script that gets included by blink.  What does that file contain?  Here are its installed defaults:

MON_RESET=1       # Monitor reset button for short press.
#MON_GPIO=XIO_P7   # Which GPIO to monitor.
#MON_GPIO_VALUE=0  # Indicates which value read from MON_GPIO initiates shutdown.
MON_BATTERY=10    # When battery percentage is below this, shut down.
BLINK_STATUS=1    # Blink CHIP's status LED.
#BLINK_GPIO=XIO_P6 # Blink a GPIO.


4. Shell script that controls CHIP's status LED.

Here's how to turn off CHIP's status LED:

i2cset -f -y 0 0x34 0x93 0

Turn it back on:

i2cset -f -y 0 0x34 0x93 1

This obviously requires that the i2c-tools package is installed:

sudo apt-get install i2c-tools


5. Shell script that controls an external LED connected to a GPIO.

The blink program makes use of the "gpio_sh" package.  Without that package, most programmers refer to gpio port numbers explicitly.  For example, on CHIP the "CSID0" port is assigned the port number 132.  However, this is dangerous because GPIO port numbers can change with new versions of CHIP OS.  In fact, the XIO port numbers DID change between version 4.3 and 4.4, and they may well change again with the next version.

The "gpio_sh" package allows a script to reference GPIO ports symbolically.  So instead of using "132", your script can use "CSID0".  Or, if using an XIO port, use "XIO_P0", which should work for any version of CHIP OS.

Here's how to set up "XIO_P6" as an output and control whatever is connected to it (perhaps an LED):

BLINK_GPIO="XIO_P6"
gpio_export $BLINK_GPIO; ST=$?
if [ $ST -ne 0 ]; then :
  echo "blink: cannot export $BLINK_GPIO"
fi
gpio_direction $BLINK_GPIO out
gpio_output $BLINK_GPIO 1    # turn LED on
gpio_output $BLINK_GPIO 0    # turn LED off
gpio_unexport $MON_GPIO      # done with GPIO, clean it up


6. Shell script that monitors CHIP's reset button for a "short press" and reacts to it.

The small reset button on CHIP is monitored by the AXP209 power controller.  It uses internal hardware timers to determine how long the button is pressed, and can perform different tasks.  When CHIP is turned on, the AXP differentiates between a "short" press (typically a second or less) v.s. a long press (typically more than 8 seconds).  A "long" press triggers a "force off" function, which abruptly cuts power to the rest of CHIP.  A "short" press simply turns on a bit in a status register, which can be monitored from software.

REG4AH=`i2cget -f -y 0 0x34 0x4a`  # Read AXP209 register 4AH
BUTTON=$((REG4AH & 0x02))  # mask off the short press bit
if [ $BUTTON -eq 2 ]; then :
  echo "Button pressed!"
fi

Note that I have not figured out how to turn off that bit.  The "blink.sh" program does not need to turn it off since it responds to it by shutting CHIP down gracefully.  But if you want to use it for some other function, you'll have to figure out how to clear it.


7. Shell script that monitors a GPIO line, presumably a button but could be something else, and reacts to it.

MON_GPIO="XIO_P7"
gpio_export $MON_GPIO; ST=$?
if [ $ST -ne 0 ]; then :
  echo "blink: cannot export $MON_GPIO"
fi
gpio_direction $MON_GPIO in
gpio_input $MON_GPIO; VAL=$?
if [ $VAL -eq 0 ]; then :
  echo "GPIO input is grounded (0)"
fi
gpio_unexport $MON_GPIO      # done with GPIO, clean it up


8. Shell script that monitors the battery charge level, and if it drops below a configured threshold, reacts to it.

This is a bit more subtle that it may seem at first.  Checking the percent charge of the battery is easy:

REGB9H=`i2cget -f -y 0 0x34 0xb9`  # Read AXP209 register B9H
PERC_CHG=$(($REGB9H))  # convert to decimal

But what if no battery is connected?  It reads 0.  How do you differentiate that from having a battery which is discharged?  I don't know of a way to tell the difference.  Another issue is what if a battery is connected and has low charge, but it doesn't matter because CHIP is connected to a power supply and is therefore not at risk of losing power?  Basically, "blink.sh" only wants to shut down on low battery charge if the battery is actively being used to power CHIP.  So in addition to reading the charge percentage (above), it also checks the battery discharge current:

BAT_IDISCHG_MSB=$(i2cget -y -f 0 0x34 0x7C)
BAT_IDISCHG_LSB=$(i2cget -y -f 0 0x34 0x7D)
BAT_DISCHG_MA=$(( ( ($BAT_IDISCHG_MSB << 5) | ($BAT_IDISCHG_LSB & 0x1F) ) / 2 ))

CHIP draws over 100 mA from the battery, so I check it against 50 mA.  If it is lower than that, then either there is no battery or the battery is not running CHIP:

BAT_IDISCHG_MSB=$(i2cget -y -f 0 0x34 0x7C)
BAT_IDISCHG_LSB=$(i2cget -y -f 0 0x34 0x7D)
BAT_DISCHG_MA=$(( ( ($BAT_IDISCHG_MSB << 5) | ($BAT_IDISCHG_LSB & 0x1F) ) / 2 ))
if [ $BAT_DISCHG_MA -gt 50 ]; then :
  REGB9H=`i2cget -f -y 0 0x34 0xb9`  # Read AXP209 register B9H
  PERC_CHG=$(($REGB9H))  # convert to decimal
  if [ $PERC_CHG -lt 10 ]; then :
    echo "Battery charge level is below 10%"
  fi
fi

Sunday, June 26, 2016

snprintf: bug detector or bug preventer?

Pop quiz time!

When you use snprintf() instead of sprintf(), are you:
   A. Writing code that proactively detects bugs.
   B. Writing code that proactively prevents bugs.

Did you answer "B"?  TRICK QUESTION!  The correct answer is:
  C. Writing code that proactively hides bugs.

Here's a short program that takes a directory name as an argument and prints the first line of the file "tst.c" in that directory:
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
  char path[20];
  char iline[4];
  snprintf(path, sizeof(path), "%s/tst.c", argv[1]);
  FILE *fp = fopen(path, "r");
  fgets(iline, sizeof(iline), fp);
  fclose(fp);
  printf("iline='%s'\n", iline);
  return 0;
}
Nice and safe, right?  Both snprintf() and fgets() do a great job of not overflowing their buffers.  Let's run it:

$ ./tst .
iline='#inc'

Hmm ... didn't get the full input line.  I guess my iline array was too small.  But hey, at least it didn't seg fault, like it might have if I had just used scanf() or something dangerous like that!  No seg faults for me.

$ ./tst ././././././././.
Segmentation fault: 11

Um ... oh, silly me.  My path array was too small.  fopen() failed, and I didn't check its return status.

So I could, and should, check fopen()'s return status.  But that just gives me a more user-friendly error message.  It doesn't tell my *why* the file name is wrong.  Imagine the snprintf() being in a completely different area of the code.  Yes, you discover there's a bug by checking fopen(), but it's nowhere near where the bug actually is.  Same thing, by the way, with the fgets() not reading the entire line.  Who knows how much more code is going to be executed before the program misbehaves because it didn't get the entire line?

And that is my point.  Most of these "safe" functions work the same way: you pass in the size of your buffer, and the functions guarantee that they won't overrun your buffer, but give you *NO* indication that they truncated. I.e. they don't tell you when your buffer is too small.  It's not until later that something visibly misbehaves, and that wastes time and effort working your way back to the root cause.

Now I'm not suggesting that we throw away snprintf() in favor of sprintf().  I'm suggesting that using snprintf() is only half the job.  How about this:

#include <stdio.h>
#include <string.h>
#include <assert.h>
#define BUF2SMALL(_s) do {\
  assert(strnlen(_s, sizeof(_s)) < sizeof(_s)-1);\
} while (0)

int main(int argc, char **argv)
{
  char path[21];
  char iline[5];
  snprintf(path, sizeof(path), "%s/tst.c", argv[1]); BUF2SMALL(path);
  FILE *fp = fopen(path, "r");  assert(fp != NULL);
  fgets(iline, sizeof(iline), fp); BUF2SMALL(iline);
  fclose(fp);
  printf("iline='%s'\n", iline);
  return 0;
}

Now let's run it:

$ ./tst ./.
Assertion failed: (strnlen(iline, sizeof(iline)) < sizeof(iline)-1), function main, file tst.c, line 15.
Abort trap: 6
$ ./tst ././././././././.
Assertion failed: (strnlen(path, sizeof(path)) < sizeof(path)-1), function main, file tst.c, line 13.
Abort trap: 6

There.  My bugs are reported *much* closer to where they really are.

The essence of the BUF2SMALL() macro is that you should use a buffer which is at least one character larger than the maximum size you think you need.  So if you want an answer string to be able to hold either "yes" or "no", don't make it "char ans[4]", make it at least "char ans[5]".  BUF2SMALL() asserts an error if the string consumes the whole array.

One final warning.  Note that in BUF2SMALL() I use "strnlen()" instead of "strlen()".   I wrote BUF2SMALL() to be a general-purpose error checker after a variety of "safe" functions.  For example, maybe I want to use it after a "strncpy()".  Look at what the man page for "strncpy()" says:
Warning:  If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
If you use "strncpy()" to copy a string, the string might not be null-terminated, and  strlen() has a good chance of segfaulting.  So I used strnlen(), which is only "safe" in that it won't segfault.  But it doesn't tell me that the string isn't null-terminated!  So I still need my macro to tell me that the buffer is too small.  The "safe" functions only make the fuse a little longer on the stick of dynamite in your program.