Friday, November 18, 2016

Linux network stack tuning

Found a nice blog post that talks about tuning the Linux network stack:

http://blog.packagecloud.io/eng/2016/06/22/monitoring-tuning-linux-networking-stack-receiving-data/

I notice that it doesn't talk about pinning interrupts to the NUMA zone that the NIC is physically connected to (or at least "NUMA" doesn't appear in the post), so it doesn't have *everything*.  :-)

And it also doesn't mention kernel bypass libraries, like OpenOnload.

But it has a lot of stuff in it.

Wednesday, September 21, 2016

Review: Prairie Burn (Jazz)

I realize that this is a technical blog without many followers, but I'm really getting into a new album and wanted to share.  If you're not interested in Jazz music, you may stop reading.

Prairie Burn is a new CD by the Mara Rosenbloom Trio.  It is modern Jazz, so don't expect anything that sounds like Tommy Dorsey, or dixieland.  Unfortunately, I don't have the background or the vocabulary to be able to tell you what it *does* sound like.  In other words, this is the worst Jazz review ever.

But since when has that ever stopped me?  :-)

Prairie Burn is great.  Listening to it takes me on an emotional journey including stops at agitation, surprise, excitement, dreaming, and satisfaction.  This music draws me in effortlessly.

So, why am I flogging it in my blog?  For the same reason I flogged Mad and Grace: I like them and I want them to reach their goals.  Yes, Prairie Burn has an Indegogo campaign to raise money for a publicist so that Mara can get more of the attention she deserves.

Jazz has a strange following, few in number but passionate in their dedication.  I've read people bemoan the lack of young talent in the genre.  Most of the well-known artists are getting on in years and won't be around forever.  We've got to find new talent and support it.

I've done the finding for you.  Now it's your turn to help with the supporting.  :-)

Not sure you'll like the music?  http://www.mararosenbloom.com/html/listen.php  The two pieces from Prairie Burn are pretty different from each other, but the second (Turbulence) is probably more representative of the album as a whole.

Full disclosure: Mara is my daughter-in-law.  That certainly influenced me in terms of giving the music a try.  I believe it is not influencing my evaluation of its quality.  This is her third album, and while I like them all, this is the one that I feel passionate about.

Sunday, July 31, 2016

Beginner Shell Script Examples

As I've mentioned, I am the proud father of a C.H.I.P. single-board computer.  I've been playing with it for a while, and have also been participating in the community message board.  I've noticed that there are a lot of beginners there, just learning about Linux.  This collection of techniques assumes you know the basics of shell scripting with BASH.

One of the useful tools I've written is a startup script called "blink.sh".  Basically, this script blinks CHIP's on-board LED, and also monitors a button to initiate a graceful shutdown.  (It does a bit more too.)  I realized that this script demonstrates several techniques that CHIP beginners might like to see.

The "blink.sh" script can be found here: https://github.com/fordsfords/blink/blob/gh-pages/blink.sh.  For instructions on how to install and use blink, see https://github.com/fordsfords/blink/tree/gh-pages.

The code fragments included below are largely extracted from the blink.sh script, with some simplifications.

NOTE: many of the commands shown below require root privilege to work.  It is assumed that the "blink.sh" script is run as root.


1. Systemd service, which automatically starts at boot, and can be manually started and stopped via simple commands.

I'm not an expert in all things Linux, but I've been told that in Debian-derived Linuxes, "systemd" is how all the cool kids implement services and startup scripts.  No more "rc.local", no run levels, etc.

Fortunately, systemd services are easy to implement.  The program itself doesn't need to do anything special, although you might want to implement a kill signal handler to cleanup when the service is stopped.

You do need a definition file which specifies the command line and dependencies.  It is stored in the /etc/systemd/system directory, named "<sevrice_name>.service".  For example, here's blink's definition file:

$ cat /etc/systemd/system/blink.service 
# blink.service -- version 24-Jul-2016
# See https://github.com/fordsfords/blink/tree/gh-pages
[Unit]
Description=start blink after boot
After=default.target

[Service]
Type=simple
ExecStart=/usr/local/bin/blink.sh

[Install]
WantedBy=default.target

When that file is created, you can tell the system to read it with:

sudo systemctl enable /etc/systemd/system/blink.service

Now you can start the service manually with:

sudo service blink start

You can manually stop it with:

sudo service blink stop

Given the way it is defined, it will automatically start at system boot.


2. Shell script which catches kill signals to clean itself up, including the signal that is generated when the service is stopped manually.

The blink script wants to do some cleanup when it is stopped (unexport GPIOs).

trap "blink_stop" 1 2 3 15

where "blink_stop" is a Bash function:

blink_stop()
{
  blink_cleanup
  echo "blink: stopped" `date` >>/var/log/blink.log
  exit
}

where "blink_cleanup" is another Bash function.

This code snippet works if the script is used interactively and stopped with control-C, and also works if the "kill" command is used (but not "kill -9"), and also works when the "service blink stop" command is used.


3. Shell script with simple configuration mechanism.

This technique uses the following code in the main script:

export MON_RESET=
export MON_GPIO=
export MON_GPIO_VALUE=0  # if MON_GPIO supplied, default to active-0.
export MON_BATTERY=
export BLINK_STATUS=
export BLINK_GPIO=
export DEBUG=

if [ -f /usr/local/etc/blink.cfg ]; then :
  source /usr/local/etc/blink.cfg
else :
  MON_RESET=1
  BLINK_STATUS=1
fi

The initial export commands define environment variables with default values.  The use of the "source" command causes the /usr/local/etc/blink.cfg to be read by the shell, allowing that file to define shell variables.  In other words, the config file is just another shell script that gets included by blink.  What does that file contain?  Here are its installed defaults:

MON_RESET=1       # Monitor reset button for short press.
#MON_GPIO=XIO_P7   # Which GPIO to monitor.
#MON_GPIO_VALUE=0  # Indicates which value read from MON_GPIO initiates shutdown.
MON_BATTERY=10    # When battery percentage is below this, shut down.
BLINK_STATUS=1    # Blink CHIP's status LED.
#BLINK_GPIO=XIO_P6 # Blink a GPIO.


4. Shell script that controls CHIP's status LED.

Here's how to turn off CHIP's status LED:

i2cset -f -y 0 0x34 0x93 0

Turn it back on:

i2cset -f -y 0 0x34 0x93 1

This obviously requires that the i2c-tools package is installed:

sudo apt-get install i2c-tools


5. Shell script that controls an external LED connected to a GPIO.

The blink program makes use of the "gpio_sh" package.  Without that package, most programmers refer to gpio port numbers explicitly.  For example, on CHIP the "CSID0" port is assigned the port number 132.  However, this is dangerous because GPIO port numbers can change with new versions of CHIP OS.  In fact, the XIO port numbers DID change between version 4.3 and 4.4, and they may well change again with the next version.

The "gpio_sh" package allows a script to reference GPIO ports symbolically.  So instead of using "132", your script can use "CSID0".  Or, if using an XIO port, use "XIO_P0", which should work for any version of CHIP OS.

Here's how to set up "XIO_P6" as an output and control whatever is connected to it (perhaps an LED):

BLINK_GPIO="XIO_P6"
gpio_export $BLINK_GPIO; ST=$?
if [ $ST -ne 0 ]; then :
  echo "blink: cannot export $BLINK_GPIO"
fi
gpio_direction $BLINK_GPIO out
gpio_output $BLINK_GPIO 1    # turn LED on
gpio_output $BLINK_GPIO 0    # turn LED off
gpio_unexport $MON_GPIO      # done with GPIO, clean it up


6. Shell script that monitors CHIP's reset button for a "short press" and reacts to it.

The small reset button on CHIP is monitored by the AXP209 power controller.  It uses internal hardware timers to determine how long the button is pressed, and can perform different tasks.  When CHIP is turned on, the AXP differentiates between a "short" press (typically a second or less) v.s. a long press (typically more than 8 seconds).  A "long" press triggers a "force off" function, which abruptly cuts power to the rest of CHIP.  A "short" press simply turns on a bit in a status register, which can be monitored from software.

REG4AH=`i2cget -f -y 0 0x34 0x4a`  # Read AXP209 register 4AH
BUTTON=$((REG4AH & 0x02))  # mask off the short press bit
if [ $BUTTON -eq 2 ]; then :
  echo "Button pressed!"
fi

Note that I have not figured out how to turn off that bit.  The "blink.sh" program does not need to turn it off since it responds to it by shutting CHIP down gracefully.  But if you want to use it for some other function, you'll have to figure out how to clear it.


7. Shell script that monitors a GPIO line, presumably a button but could be something else, and reacts to it.

MON_GPIO="XIO_P7"
gpio_export $MON_GPIO; ST=$?
if [ $ST -ne 0 ]; then :
  echo "blink: cannot export $MON_GPIO"
fi
gpio_direction $MON_GPIO in
gpio_input $MON_GPIO; VAL=$?
if [ $VAL -eq 0 ]; then :
  echo "GPIO input is grounded (0)"
fi
gpio_unexport $MON_GPIO      # done with GPIO, clean it up


8. Shell script that monitors the battery charge level, and if it drops below a configured threshold, reacts to it.

This is a bit more subtle that it may seem at first.  Checking the percent charge of the battery is easy:

REGB9H=`i2cget -f -y 0 0x34 0xb9`  # Read AXP209 register B9H
PERC_CHG=$(($REGB9H))  # convert to decimal

But what if no battery is connected?  It reads 0.  How do you differentiate that from having a battery which is discharged?  I don't know of a way to tell the difference.  Another issue is what if a battery is connected and has low charge, but it doesn't matter because CHIP is connected to a power supply and is therefore not at risk of losing power?  Basically, "blink.sh" only wants to shut down on low battery charge if the battery is actively being used to power CHIP.  So in addition to reading the charge percentage (above), it also checks the battery discharge current:

BAT_IDISCHG_MSB=$(i2cget -y -f 0 0x34 0x7C)
BAT_IDISCHG_LSB=$(i2cget -y -f 0 0x34 0x7D)
BAT_DISCHG_MA=$(( ( ($BAT_IDISCHG_MSB << 5) | ($BAT_IDISCHG_LSB & 0x1F) ) / 2 ))

CHIP draws over 100 mA from the battery, so I check it against 50 mA.  If it is lower than that, then either there is no battery or the battery is not running CHIP:

BAT_IDISCHG_MSB=$(i2cget -y -f 0 0x34 0x7C)
BAT_IDISCHG_LSB=$(i2cget -y -f 0 0x34 0x7D)
BAT_DISCHG_MA=$(( ( ($BAT_IDISCHG_MSB << 5) | ($BAT_IDISCHG_LSB & 0x1F) ) / 2 ))
if [ $BAT_DISCHG_MA -gt 50 ]; then :
  REGB9H=`i2cget -f -y 0 0x34 0xb9`  # Read AXP209 register B9H
  PERC_CHG=$(($REGB9H))  # convert to decimal
  if [ $PERC_CHG -lt 10 ]; then :
    echo "Battery charge level is below 10%"
  fi
fi

Sunday, June 26, 2016

snprintf: bug detector or bug preventer?

Pop quiz time!

When you use snprintf() instead of sprintf(), are you:
   A. Writing code that proactively detects bugs.
   B. Writing code that proactively prevents bugs.

Did you answer "B"?  TRICK QUESTION!  The correct answer is:
  C. Writing code that proactively hides bugs.

Here's a short program that takes a directory name as an argument and prints the first line of the file "tst.c" in that directory:
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
  char path[20];
  char iline[4];
  snprintf(path, sizeof(path), "%s/tst.c", argv[1]);
  FILE *fp = fopen(path, "r");
  fgets(iline, sizeof(iline), fp);
  fclose(fp);
  printf("iline='%s'\n", iline);
  return 0;
}
Nice and safe, right?  Both snprintf() and fgets() do a great job of not overflowing their buffers.  Let's run it:

$ ./tst .
iline='#inc'

Hmm ... didn't get the full input line.  I guess my iline array was too small.  But hey, at least it didn't seg fault, like it might have if I had just used scanf() or something dangerous like that!  No seg faults for me.

$ ./tst ././././././././.
Segmentation fault: 11

Um ... oh, silly me.  My path array was too small.  fopen() failed, and I didn't check its return status.

So I could, and should, check fopen()'s return status.  But that just gives me a more user-friendly error message.  It doesn't tell my *why* the file name is wrong.  Imagine the snprintf() being in a completely different area of the code.  Yes, you discover there's a bug by checking fopen(), but it's nowhere near where the bug actually is.  Same thing, by the way, with the fgets() not reading the entire line.  Who knows how much more code is going to be executed before the program misbehaves because it didn't get the entire line?

And that is my point.  Most of these "safe" functions work the same way: you pass in the size of your buffer, and the functions guarantee that they won't overrun your buffer, but give you *NO* indication that they truncated. I.e. they don't tell you when your buffer is too small.  It's not until later that something visibly misbehaves, and that wastes time and effort working your way back to the root cause.

Now I'm not suggesting that we throw away snprintf() in favor of sprintf().  I'm suggesting that using snprintf() is only half the job.  How about this:

#include <stdio.h>
#include <string.h>
#include <assert.h>
#define BUF2SMALL(_s) do {\
  assert(strnlen(_s, sizeof(_s)) < sizeof(_s)-1);\
} while (0)

int main(int argc, char **argv)
{
  char path[21];
  char iline[5];
  snprintf(path, sizeof(path), "%s/tst.c", argv[1]); BUF2SMALL(path);
  FILE *fp = fopen(path, "r");  assert(fp != NULL);
  fgets(iline, sizeof(iline), fp); BUF2SMALL(iline);
  fclose(fp);
  printf("iline='%s'\n", iline);
  return 0;
}

Now let's run it:

$ ./tst ./.
Assertion failed: (strnlen(iline, sizeof(iline)) < sizeof(iline)-1), function main, file tst.c, line 15.
Abort trap: 6
$ ./tst ././././././././.
Assertion failed: (strnlen(path, sizeof(path)) < sizeof(path)-1), function main, file tst.c, line 13.
Abort trap: 6

There.  My bugs are reported *much* closer to where they really are.

The essence of the BUF2SMALL() macro is that you should use a buffer which is at least one character larger than the maximum size you think you need.  So if you want an answer string to be able to hold either "yes" or "no", don't make it "char ans[4]", make it at least "char ans[5]".  BUF2SMALL() asserts an error if the string consumes the whole array.

One final warning.  Note that in BUF2SMALL() I use "strnlen()" instead of "strlen()".   I wrote BUF2SMALL() to be a general-purpose error checker after a variety of "safe" functions.  For example, maybe I want to use it after a "strncpy()".  Look at what the man page for "strncpy()" says:
Warning:  If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
If you use "strncpy()" to copy a string, the string might not be null-terminated, and  strlen() has a good chance of segfaulting.  So I used strnlen(), which is only "safe" in that it won't segfault.  But it doesn't tell me that the string isn't null-terminated!  So I still need my macro to tell me that the buffer is too small.  The "safe" functions only make the fuse a little longer on the stick of dynamite in your program.

Saturday, June 25, 2016

Of compiler warnings and asserts in a throw-away society

Many people despair at today's "throw away" society.  If you don't want it, just throw it away.

Programmers know this is not a recent phenomenon; they've been throwing stuff away since the dawn of high-level languages.

Actual line from code I'm doing some work on:
    write(fd, str_gpio, len);

The "write" function returns a value, which the programmer threw away.  And I know why without even asking him.  If you were to challenge him, he would probably say, "I don't need the return value, and as for prudent error checking, this program has been running without a glitch for years."

Ugh.  It's never a *good* idea to throw away return values, but I've been known to do it.  But I really REALLY don't like compiler warnings:
warning: ignoring return value of 'write', declared with attribute warn_unused_result [-Wunused-result]
     write(fd, str_gpio, len);
     ^

Well, I didn't feel like analyzing the code to see how errors *should* be handled, so I just cast "write" to void to get rid of the compile warning:
    (void)write(fd, str_gpio, len);

Hmm ... still same warning.  Apparently over 10 years ago, glibc decided to make a whole lot of functions have an attribute that makes them throw that warning if the return value is ignored, and GCC decided that functions with that attribute will throw the warning *even if cast to void*.  If you like reading flame wars, the Interwebs are chock full of arguments over this.

And you know what?  Even though I'm not sure I agree with that GCC policy, it did cause me to re-visit the code and add some actual error checking.  I figured that if write() returning an error was something that "could never happen", then let's enshrine that fact in the code:
    s = write(fd, str_gpio, len);  assert(s == len);

Hmm ... different warning:
warning: unused variable 's' [-Wunused-variable]
     s = write(fd, str_gpio, len);  assert(s == len);
     ^

Huh?  I'm using it right there!  Back to Google.  Apparently, you can define a preprocessor variable to inhibit the assert code.  Some programmers like to have their asserts enabled during testing, but disabled for production for improved efficiency.  The compiler sees that the condition testing code is conditionally compiled, and decides to play it safe and throw the warning that "s" isn't used, even if the condition code is compiled in.  And yes, this also featured in the same flame wars over void casting.  I wasn't the first person to use exactly this technique to try to get rid of warnings.

*sigh*

So I ended up doing what lots of the flame war participants bemoaned having to do: writing my own assert:
#define ASSRT(cond_expr) do {\
  if (!(cond_expr)) {\
    fprintf(stderr, "ASSRT failed at %s:%d (%s)", __FILE__, __LINE__, #cond_expr);\
    fflush(stderr);\
    abort();\
} } while (0)
...
    s = write(fd, str_gpio, len);  ASSRT(s == len);

Finally, no warnings!  And better code too (not throwing away the return value).  I just don't like creating my own assert. :-(

Tuesday, May 24, 2016

TCP flow control with non-blocking sends: EAGAIN

So, let's say you're sending data on a TCP socket faster than the receiver can unload it. The socket buffers fill up. Then what happens? The send call returns fewer bytes sent than were requested. Everybody knows that. (Interestingly, http://linux.die.net/man/2/send does not mention this behavior, but I see it during testing.)

But what if the previous send exactly filled the buffer so that your next send can't put *any* bytes in? Does send return zero? Apparently not. It returns -1 with an errno of EAGAIN or EWOULDBLOCK (also verified by testing).  If I ever knew this, I forgot it till today.

Finally, here is something I did already know, but rarely include in my code, and I should (from http://linux.die.net/man/2/send):
EAGAIN or EWOULDBLOCK
The socket is marked nonblocking and the requested operation would block. POSIX.1-2001 allows either error to be returned for this case, and does not require these constants to have the same value, so a portable application should check for both possibilities.

Sunday, January 10, 2016

Saying goodbye to a bit of personal history

Ever since I was *very* young, I've been interested in science and technology.  At some point in my teens, maybe 40 years ago, I wanted a better VOM (Volt-Ohm-Milliamp meter) than the junky one I had picked up, so I did some research and spent precious funds on a high-impedance FET meter:



It saw pretty heavy use about 5 years, but as I transitioned from electronics to digital logic, and from that to software, my need for it dropped.  I've probably used it twice in the past 15 years, probably for checking if an electrical outlet is live.

As my previous post indicates, I've just gotten a single-board computer, and I was trying to indirectly measure the value of the pull-up resistor on an open-collector output.  I need a reasonably accurate, high impedance meter, so I got out my old FET.

Alas, the two small selector switches were frozen.  Not sure why or how -- it's a *switch* for goodness sake -- but I can't use it if I can't turn it on.  I'll take it apart, but I don't have high hopes.

It's passing is a sad event for me, but why?  Is it just nostalgia?  Longing for a simpler time?  Missing my childhood?  I think it's more than that.  There are certain things that have come to represent turning points in my life.  The meter may not have *caused* a significant shift in my life path, but it had come to represent it.  And maybe its a mortality thing too, like a piece of me died.

Oh well, I'll probably get a cheap DVM.