Sunday, April 30, 2017

Fraudulent spam email claiming to be Netflix

I got a phishing email.  So what?  I get lots of phishing emails.  Why blog about this one?

Well, it's at least a *little* different.

Most of them direct the victim to an existing web site which has been compromised.  I.e. the web site's real owner has no idea that his own site is being used for fraudulent purposes.

In this one, the victim is directed to the domain name "netflix-myaccount.com", which the scammer obtained properly.  Unfortunately, the scammer wasn't stupid enough to include his own contact information in the registry, instead choosing to hide behind privacyprotect.org.

Now there's nothing wrong with using privacyprotect.org to hide one's identity.  If anything, it removed any doubt in my mind (as if there were any) that the page isn't owned by Netflix.  So it reinforced that it is a phishing site.  I sent a complaint email to privacyprotect.org anyway.

Next up, the domain the registry: ilovewww.com.  Never heard of them.  Malaysian.  Sent them a complaint email too to suspend the registration.

Next, the IP address that netflix-myaccount.com resolves to: 80.82.67.155.  A whois lookup shows the block is owned by Quasi Networks LTD.  Abuse email to it as well.

Now to another nice site: phishcheck.me, a site that evaluates how likely a site is to be fraudulent.  It actually goes to the site and analyzes it.  So I went there and plugged in "http://netflix-myaccount.com", and sure enough, it says that it is probably a phishing site (no surprise there).  But on that phishcheck.me page is a tab named "resources", which shows details of the access to the site ... and well lookie there, "netflix-myaccount.com" redirects to "netflix-secureserver.com".  Which resolves to the same IP as "netflix-myaccount.com", and is registered in the same ways (ilovewww.com and privacyprotect.org).  So what the point in that?  Oh well, another set of complaint emails for the new domain name.

Finally, let's see if it is a compromised web site.  I would like to see what other domain names resolve to the same IP address.  Unfortunately, this appears not to be an exact science.  The few sites there are that claim to do this find *no* domains resolving to that IP.  However, a simple google search for "80.82.67.155" (*with* the double quotes) does find the names "netflix-myaccount.com" and a new one: "www.useraccountvalidation-apple.com".

Yep.  Another phishing site, leveraging Apple instead of Netflix.  Let's do the drill, starting with whois.  WHOA!!!  Did we hit paydirt?

Registrant Contact
Name: Jamie Wilson
Organization:
Mailing Address: 22 Madisson Road, London London SE12 8DH GB
Phone: +44.07873394485
Ext:
Fax:
Fax Ext:
Email:uktradergb@gmail.com

Now, don't be too hasty.  The *real* registrant is a scammer.  What are the chances he would list his own real contact info?  The only thing that might be valid is the email address, since I think he needs that to fully set up the domain, and even then it might have been a single-use throwaway.

Hmm ... not totally throw-away.  A google of "uktradergb@gmail.com" has 6 hits, including "netflix-iduser1.com" and "netflix-iduser2.com", both of which have Jamie as the registrant, but neither of which resolve to valid IP addresses.  So not sure there's anything actionable (i.e. complainable) there.

But just in case, I googled the phone number, and found this additional hit: "AppleId1-Cgi.com", which doesn't appear to resolve to a valid IP.

Well, much as I hate to, let's skate over to "domaintools.com", which wants my money in a bad way.  It tells me that uktradergb@gmail.com is associated with ~38 domains, but of course won't tell me what any of them are without paying them $99.  And even though I would love to send complaints regarding all 38, I wouldn't love it $99 worth.

Ok, one more thing.  http://domainbigdata.com/gmail.com/mj/LX7iN6iKwKFIRfkD7CsKXQ says that the owner of that email address is Adam Stormont, and that the email is associated with a few other sites (but not 37), including "hmrc-refundvalidation.com", which doesn't resolve to an IP.  And by the way, a whois of another uktradergb@gmail.com domain, "hni-4.com", says that the registrant is David Hassleman.  So yeah, ignore the Jamie Wilson contact.  He wasn't that stupid.  :-)

And now I've run out of gas.  Maybe those domain names will be disabled in the next few days.  Or maybe I've just wasted a half hour of my life.  (Well, I've learned a few things, so not totally wasted.)

Friday, March 31, 2017

Cisco Eating Multicast Fragments???


UPDATE: after upgrading the IOS our "MDF" switch, this problem went away.  None of my readers (all 2 of them?) have reported seeing this problem with their switches.  So I think this issue is closed.


I think we've discovered a bug in our Cisco switch related to UDP multicast and IP fragmentation.  Dave Zabel (of Windows corrupting UDP fame) did the initial detective work, and I did most of the analysis.  And I'm not quite ready to declare victory yet, but I'm pretty sure we know roughly what is going on.


BOTTOM LINE:

It appears that Cisco is not paying proper attention to whether a packet is fragmented when checking the UDP destination port for the BFD protocol.  The result is that it eats user packets that it misidentifies as being part of that protocol.


THE SETUP:

We have 4 Catalyst 3560 "LAB" switches (48 port) trunked to a Catalyst 4507 "MDF" switch.  Our lab test machines are distributed across the LAB switches.

Our messaging software multicasts UDP datagrams.  One of our regression tests involves sending messages of varying sizes with randomized data.  We saw that occasionally, one of the messages would be lost.  Doing packet captures showed that the missing datagram is NAKed and retransmitted multiple times, but the subscribing host never saw the datagram, even though it saw all the previous and subsequent datagrams.  (This particular test does not send at a particularly stressful rate.)

Further investigation showed that some hosts always got the message in question, while others never got the message.  Turns out that the hosts that got the message were on the same LAB switch as the sender.  The hosts that didn't get the message were on a different switch.

I narrowed it down to a minimal test datagram of 1476 bytes.  The first 1474 bytes can be any arbitrary values, but the last two bytes had to be either "0e c8" or "0e c9".  Any datagram with either of those two problematic byte pairs at that offset will be lost.  Note that the datagram will be split into 2 packets (IP fragments) by the sending host's IP stack.  Strategically placed tcpdumps indicated that the first IP fragment always makes it to the receiver, but the second one seems to be eaten by our "MDF" switch.

There's nothing magic about the size 1476 - it can be larger and the problem still happens.  1476 is just the smallest datagram which demonstrates the problem.


IP FRAGMENTATION:

IP fragmentation happens when UDP hands to IP a datagram that doesn't fit into a single MTU-sized Ethernet packet (1500 bytes).  A UDP datagram consists of an 8-byte header, followed by up to 65,527 bytes of UDP payload.  IP splits a large datagram up into fragments of 1480 bytes each and prepends its own 20-byte IP header to each fragment.  But note that only the first fragment will contain the UDP header.  So IP fragment #1 will hold the 8-byte UDP header and the first 1472 bytes of my datagram.

Since my test datagram is 1476 bytes long, IP fragment #2 will contain a 20-byte IP header followed by the last 4 bytes of my datagram.

I won't show you the first fragment of my test datagram because it's long and boring.  And it is successfully handled by Cisco, so it's also not relevant.

Here's a tcpdump of the second fragment of my test datagram (test datagram bytes highlighted).  Note that tcpdump includes a 14-byte Ethernet header in front of the 20-byte IP header, then the last 4 bytes of my test datagram, and finally 22 padding nulls to make up a minimum-size packet (those nulls are not counted as part of the IP payload).

07:56:38.518614 00:1e:c9:4e:a1:92 (oui Unknown) > 01:00:5e:65:03:01 (oui Unknown), ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl   2, id 2132, offset 1480, flags [none], proto: UDP (17), length: 24) 10.29.3.88 > 239.101.3.1: udp
        0x0000:  0100 5e65 0301 001e c94e a192 0800 4500  ..^e.....N....E.
        0x0010:  0018 0854 00b9 0211 afed 0a1d 0358 ef65  ...T.........X.e
        0x0020:  0301 0000 0ec8 0000 0000 0000 0000 0000  ................
        0x0030:  0000 0000 0000 0000 0000 0000            ............

This is the packet which is successfully received by hosts on the same switch as the sender, but is never received by hosts on a different switch.  Change the "0e c8" byte pair to, for example, "1e c8" or "0e c7" and everything works fine - the packet is properly forwarded.


A CASE OF MISTAKEN IDENTITY?

In my problematic datagram, the last 4 bytes occupy the same packet position in fragment #2 as the UDP header in a non-fragmented packet.  In particular, the byte pair "0e c8" occupies the same packet position as the UDP destination port in a non-fragmented packet.  Those byte values correspond to port 3784, which is used by the BFD protocol.  BFD is used to quickly detect failures in the path between adjacent forwarding switches and routers, so it is of special interest to our switches.  (The other problematic byte pair "0e c9" corresponds to port 3785, which is also used by BFD.)

So, when a LAB switch sends fragment #2 to the MDF, it looks like MDF is checking the UDP port WITHOUT looking at the IP header's "Fragment Offset" field.  It should only look for UDP port if the fragment offset is zero.  Here's that packet again with the fragment offset highlighted:

07:56:38.518614 00:1e:c9:4e:a1:92 (oui Unknown) > 01:00:5e:65:03:01 (oui Unknown), ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl   2, id 2132, offset 1480, flags [none], proto: UDP (17), length: 24) 10.29.3.88 > 239.101.3.1: udp
        0x0000:  0100 5e65 0301 001e c94e a192 0800 4500  ..^e.....N....E.
        0x0010:  0018 0854 00b9 0211 afed 0a1d 0358 ef65  ...T.........X.e
        0x0020:  0301 0000 0ec8 0000 0000 0000 0000 0000  ................
        0x0030:  0000 0000 0000 0000 0000 0000            ............

For most (non-fragmented) packets, that byte will be zero, and the UDP header will be present, in which case the 0ec8 would be the port number.  The highlighted fragment offset of b9 hex is 185 decimal, and IP fragment offset is measured in units of 8-byte blocks, so the actual offset is 8*185=1480, which is tcpdump has for "offset".

It also seems strange to me that the switch ignores which multicast group I'm sending to.  I can send to any valid multicast group, and the problematic packet will be eaten by the "MDF" switch.  Shouldn't there be a specific multicast group for BFD?  Maybe I found 2 bugs?

My employer has a support contract with Cisco, and I'm working with the internal network group to get a Cisco ticket opened.  I'll update as I learn more, but it's slow climbing through the various levels of internal and external tech support, each one of whom starts out with, "are you sure it's plugged in?"  It may take weeks to find somebody who even knows what IP fragmentation is.


TRY IT YOURSELF

I would love to hear from others who can try this out on their own networks.  Grab the source files:


To build on Linux do:
gcc -o msend msend.c
gcc -o mdump mdump.c

Note that I've tried other operating systems (Widows and Solaris), with the same test results.  This is not an OS issue.

For this test, the main purpose of mdump is to get the host to join the multicast group.

Choose three hosts: A, B, and C.  Make sure A and B are on the same switch, and C is on a different switch.  In my case, all three hosts are on the same VLAN; I don't know if that is significant.  For this example, let's assume that the three hosts' IP addresses are 10.29.1.1, 10.29.1.2, and 10.29.1.3 respectively, and that all NICs are named "eth0".

Choose a multicast group and UDP port that aren't being used in your network.  I chose 239.101.3.1 and 12000.  I've tried others as well, with the same test results.

Note that the msend and mdump commands require you to put the hosts's primary IP address as the 3rd command-line parameter.  This is because multicast needs to be told explicitly which interface to use (normal IP routing doesn't know the "right" interface to use).

Open a window to A, and two windows each for B and C.  Enter the following commands:

B1: ./mdump  239.101.3.1 12000 10.29.1.2

B2: tcpdump -i eth0 -s2000 -vvv -XX -e host 239.101.3.1

C1: ./mdump  239.101.3.1 12000 10.29.1.3

C2: tcpdump -i eth0 -s2000 -vvv -XX -e host 239.101.3.1

A: ./msend 239.101.3.1 12000 10.29.1.1

The "msend" command sends two datagrams.  The first one is small and gives the sending host's name.  The second one is the 1476-byte datagram, whose second fragment gets eaten by the Cisco "MDF" switch.

Window B1 should show both datagrams fully received.

B2 should show 3 packets:
1. The short packet with the host name.
2. Fragment #1 of the long packet
3. Fragment #2 of the long packet

C1 should only show the first datagram.

C2 should show 2 packets:
1. The short packet with the host name.
2. Fragment #1 of the long packet.

Fragment #2 is missing from C2, presumably eaten by the "MDF" switch.

Note that the two "tcpdump" windows might show additional packets, which are for the "igmp" protocol, and are unrelated to the test.  If I had more time, I would figure out how to get "tcpdump" to ignore them.

Friday, November 18, 2016

Linux network stack tuning

Found a nice blog post that talks about tuning the Linux network stack:

http://blog.packagecloud.io/eng/2016/06/22/monitoring-tuning-linux-networking-stack-receiving-data/

I notice that it doesn't talk about pinning interrupts to the NUMA zone that the NIC is physically connected to (or at least "NUMA" doesn't appear in the post), so it doesn't have *everything*.  :-)

And it also doesn't mention kernel bypass libraries, like OpenOnload.

But it has a lot of stuff in it.

Wednesday, September 21, 2016

Review: Prairie Burn (Jazz)

I realize that this is a technical blog without many followers, but I'm really getting into a new album and wanted to share.  If you're not interested in Jazz music, you may stop reading.

Prairie Burn is a new CD by the Mara Rosenbloom Trio.  It is modern Jazz, so don't expect anything that sounds like Tommy Dorsey, or dixieland.  Unfortunately, I don't have the background or the vocabulary to be able to tell you what it *does* sound like.  In other words, this is the worst Jazz review ever.

But since when has that ever stopped me?  :-)

Prairie Burn is great.  Listening to it takes me on an emotional journey including stops at agitation, surprise, excitement, dreaming, and satisfaction.  This music draws me in effortlessly.

So, why am I flogging it in my blog?  For the same reason I flogged Mad and Grace: I like them and I want them to reach their goals.  Yes, Prairie Burn has an Indegogo campaign to raise money for a publicist so that Mara can get more of the attention she deserves.

Jazz has a strange following, few in number but passionate in their dedication.  I've read people bemoan the lack of young talent in the genre.  Most of the well-known artists are getting on in years and won't be around forever.  We've got to find new talent and support it.

I've done the finding for you.  Now it's your turn to help with the supporting.  :-)

Not sure you'll like the music?  http://www.mararosenbloom.com/html/listen.php  The two pieces from Prairie Burn are pretty different from each other, but the second (Turbulence) is probably more representative of the album as a whole.

Full disclosure: Mara is my daughter-in-law.  That certainly influenced me in terms of giving the music a try.  I believe it is not influencing my evaluation of its quality.  This is her third album, and while I like them all, this is the one that I feel passionate about.

Sunday, July 31, 2016

Beginner Shell Script Examples

As I've mentioned, I am the proud father of a C.H.I.P. single-board computer.  I've been playing with it for a while, and have also been participating in the community message board.  I've noticed that there are a lot of beginners there, just learning about Linux.  This collection of techniques assumes you know the basics of shell scripting with BASH.

One of the useful tools I've written is a startup script called "blink.sh".  Basically, this script blinks CHIP's on-board LED, and also monitors a button to initiate a graceful shutdown.  (It does a bit more too.)  I realized that this script demonstrates several techniques that CHIP beginners might like to see.

The "blink.sh" script can be found here: https://github.com/fordsfords/blink/blob/gh-pages/blink.sh.  For instructions on how to install and use blink, see https://github.com/fordsfords/blink/tree/gh-pages.

The code fragments included below are largely extracted from the blink.sh script, with some simplifications.

NOTE: many of the commands shown below require root privilege to work.  It is assumed that the "blink.sh" script is run as root.


1. Systemd service, which automatically starts at boot, and can be manually started and stopped via simple commands.

I'm not an expert in all things Linux, but I've been told that in Debian-derived Linuxes, "systemd" is how all the cool kids implement services and startup scripts.  No more "rc.local", no run levels, etc.

Fortunately, systemd services are easy to implement.  The program itself doesn't need to do anything special, although you might want to implement a kill signal handler to cleanup when the service is stopped.

You do need a definition file which specifies the command line and dependencies.  It is stored in the /etc/systemd/system directory, named "<sevrice_name>.service".  For example, here's blink's definition file:

$ cat /etc/systemd/system/blink.service 
# blink.service -- version 24-Jul-2016
# See https://github.com/fordsfords/blink/tree/gh-pages
[Unit]
Description=start blink after boot
After=default.target

[Service]
Type=simple
ExecStart=/usr/local/bin/blink.sh

[Install]
WantedBy=default.target

When that file is created, you can tell the system to read it with:

sudo systemctl enable /etc/systemd/system/blink.service

Now you can start the service manually with:

sudo service blink start

You can manually stop it with:

sudo service blink stop

Given the way it is defined, it will automatically start at system boot.


2. Shell script which catches kill signals to clean itself up, including the signal that is generated when the service is stopped manually.

The blink script wants to do some cleanup when it is stopped (unexport GPIOs).

trap "blink_stop" 1 2 3 15

where "blink_stop" is a Bash function:

blink_stop()
{
  blink_cleanup
  echo "blink: stopped" `date` >>/var/log/blink.log
  exit
}

where "blink_cleanup" is another Bash function.

This code snippet works if the script is used interactively and stopped with control-C, and also works if the "kill" command is used (but not "kill -9"), and also works when the "service blink stop" command is used.


3. Shell script with simple configuration mechanism.

This technique uses the following code in the main script:

export MON_RESET=
export MON_GPIO=
export MON_GPIO_VALUE=0  # if MON_GPIO supplied, default to active-0.
export MON_BATTERY=
export BLINK_STATUS=
export BLINK_GPIO=
export DEBUG=

if [ -f /usr/local/etc/blink.cfg ]; then :
  source /usr/local/etc/blink.cfg
else :
  MON_RESET=1
  BLINK_STATUS=1
fi

The initial export commands define environment variables with default values.  The use of the "source" command causes the /usr/local/etc/blink.cfg to be read by the shell, allowing that file to define shell variables.  In other words, the config file is just another shell script that gets included by blink.  What does that file contain?  Here are its installed defaults:

MON_RESET=1       # Monitor reset button for short press.
#MON_GPIO=XIO_P7   # Which GPIO to monitor.
#MON_GPIO_VALUE=0  # Indicates which value read from MON_GPIO initiates shutdown.
MON_BATTERY=10    # When battery percentage is below this, shut down.
BLINK_STATUS=1    # Blink CHIP's status LED.
#BLINK_GPIO=XIO_P6 # Blink a GPIO.


4. Shell script that controls CHIP's status LED.

Here's how to turn off CHIP's status LED:

i2cset -f -y 0 0x34 0x93 0

Turn it back on:

i2cset -f -y 0 0x34 0x93 1

This obviously requires that the i2c-tools package is installed:

sudo apt-get install i2c-tools


5. Shell script that controls an external LED connected to a GPIO.

The blink program makes use of the "gpio_sh" package.  Without that package, most programmers refer to gpio port numbers explicitly.  For example, on CHIP the "CSID0" port is assigned the port number 132.  However, this is dangerous because GPIO port numbers can change with new versions of CHIP OS.  In fact, the XIO port numbers DID change between version 4.3 and 4.4, and they may well change again with the next version.

The "gpio_sh" package allows a script to reference GPIO ports symbolically.  So instead of using "132", your script can use "CSID0".  Or, if using an XIO port, use "XIO_P0", which should work for any version of CHIP OS.

Here's how to set up "XIO_P6" as an output and control whatever is connected to it (perhaps an LED):

BLINK_GPIO="XIO_P6"
gpio_export $BLINK_GPIO; ST=$?
if [ $ST -ne 0 ]; then :
  echo "blink: cannot export $BLINK_GPIO"
fi
gpio_direction $BLINK_GPIO out
gpio_output $BLINK_GPIO 1    # turn LED on
gpio_output $BLINK_GPIO 0    # turn LED off
gpio_unexport $MON_GPIO      # done with GPIO, clean it up


6. Shell script that monitors CHIP's reset button for a "short press" and reacts to it.

The small reset button on CHIP is monitored by the AXP209 power controller.  It uses internal hardware timers to determine how long the button is pressed, and can perform different tasks.  When CHIP is turned on, the AXP differentiates between a "short" press (typically a second or less) v.s. a long press (typically more than 8 seconds).  A "long" press triggers a "force off" function, which abruptly cuts power to the rest of CHIP.  A "short" press simply turns on a bit in a status register, which can be monitored from software.

REG4AH=`i2cget -f -y 0 0x34 0x4a`  # Read AXP209 register 4AH
BUTTON=$((REG4AH & 0x02))  # mask off the short press bit
if [ $BUTTON -eq 2 ]; then :
  echo "Button pressed!"
fi

Note that I have not figured out how to turn off that bit.  The "blink.sh" program does not need to turn it off since it responds to it by shutting CHIP down gracefully.  But if you want to use it for some other function, you'll have to figure out how to clear it.


7. Shell script that monitors a GPIO line, presumably a button but could be something else, and reacts to it.

MON_GPIO="XIO_P7"
gpio_export $MON_GPIO; ST=$?
if [ $ST -ne 0 ]; then :
  echo "blink: cannot export $MON_GPIO"
fi
gpio_direction $MON_GPIO in
gpio_input $MON_GPIO; VAL=$?
if [ $VAL -eq 0 ]; then :
  echo "GPIO input is grounded (0)"
fi
gpio_unexport $MON_GPIO      # done with GPIO, clean it up


8. Shell script that monitors the battery charge level, and if it drops below a configured threshold, reacts to it.

This is a bit more subtle that it may seem at first.  Checking the percent charge of the battery is easy:

REGB9H=`i2cget -f -y 0 0x34 0xb9`  # Read AXP209 register B9H
PERC_CHG=$(($REGB9H))  # convert to decimal

But what if no battery is connected?  It reads 0.  How do you differentiate that from having a battery which is discharged?  I don't know of a way to tell the difference.  Another issue is what if a battery is connected and has low charge, but it doesn't matter because CHIP is connected to a power supply and is therefore not at risk of losing power?  Basically, "blink.sh" only wants to shut down on low battery charge if the battery is actively being used to power CHIP.  So in addition to reading the charge percentage (above), it also checks the battery discharge current:

BAT_IDISCHG_MSB=$(i2cget -y -f 0 0x34 0x7C)
BAT_IDISCHG_LSB=$(i2cget -y -f 0 0x34 0x7D)
BAT_DISCHG_MA=$(( ( ($BAT_IDISCHG_MSB << 5) | ($BAT_IDISCHG_LSB & 0x1F) ) / 2 ))

CHIP draws over 100 mA from the battery, so I check it against 50 mA.  If it is lower than that, then either there is no battery or the battery is not running CHIP:

BAT_IDISCHG_MSB=$(i2cget -y -f 0 0x34 0x7C)
BAT_IDISCHG_LSB=$(i2cget -y -f 0 0x34 0x7D)
BAT_DISCHG_MA=$(( ( ($BAT_IDISCHG_MSB << 5) | ($BAT_IDISCHG_LSB & 0x1F) ) / 2 ))
if [ $BAT_DISCHG_MA -gt 50 ]; then :
  REGB9H=`i2cget -f -y 0 0x34 0xb9`  # Read AXP209 register B9H
  PERC_CHG=$(($REGB9H))  # convert to decimal
  if [ $PERC_CHG -lt 10 ]; then :
    echo "Battery charge level is below 10%"
  fi
fi

Sunday, June 26, 2016

snprintf: bug detector or bug preventer?

Pop quiz time!

When you use snprintf() instead of sprintf(), are you:
   A. Writing code that proactively detects bugs.
   B. Writing code that proactively prevents bugs.

Did you answer "B"?  TRICK QUESTION!  The correct answer is:
  C. Writing code that proactively hides bugs.

Here's a short program that takes a directory name as an argument and prints the first line of the file "tst.c" in that directory:
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
  char path[20];
  char iline[4];
  snprintf(path, sizeof(path), "%s/tst.c", argv[1]);
  FILE *fp = fopen(path, "r");
  fgets(iline, sizeof(iline), fp);
  fclose(fp);
  printf("iline='%s'\n", iline);
  return 0;
}
Nice and safe, right?  Both snprintf() and fgets() do a great job of not overflowing their buffers.  Let's run it:

$ ./tst .
iline='#inc'

Hmm ... didn't get the full input line.  I guess my iline array was too small.  But hey, at least it didn't seg fault, like it might have if I had just used scanf() or something dangerous like that!  No seg faults for me.

$ ./tst ././././././././.
Segmentation fault: 11

Um ... oh, silly me.  My path array was too small.  fopen() failed, and I didn't check its return status.

So I could, and should, check fopen()'s return status.  But that just gives me a more user-friendly error message.  It doesn't tell my *why* the file name is wrong.  Imagine the snprintf() being in a completely different area of the code.  Yes, you discover there's a bug by checking fopen(), but it's nowhere near where the bug actually is.  Same thing, by the way, with the fgets() not reading the entire line.  Who knows how much more code is going to be executed before the program misbehaves because it didn't get the entire line?

And that is my point.  Most of these "safe" functions work the same way: you pass in the size of your buffer, and the functions guarantee that they won't overrun your buffer, but give you *NO* indication that they truncated. I.e. they don't tell you when your buffer is too small.  It's not until later that something visibly misbehaves, and that wastes time and effort working your way back to the root cause.

Now I'm not suggesting that we throw away snprintf() in favor of sprintf().  I'm suggesting that using snprintf() is only half the job.  How about this:

#include <stdio.h>
#include <string.h>
#include <assert.h>
#define BUF2SMALL(_s) do {\
  assert(strnlen(_s, sizeof(_s)) < sizeof(_s)-1);\
} while (0)

int main(int argc, char **argv)
{
  char path[21];
  char iline[5];
  snprintf(path, sizeof(path), "%s/tst.c", argv[1]); BUF2SMALL(path);
  FILE *fp = fopen(path, "r");  assert(fp != NULL);
  fgets(iline, sizeof(iline), fp); BUF2SMALL(iline);
  fclose(fp);
  printf("iline='%s'\n", iline);
  return 0;
}

Now let's run it:

$ ./tst ./.
Assertion failed: (strnlen(iline, sizeof(iline)) < sizeof(iline)-1), function main, file tst.c, line 15.
Abort trap: 6
$ ./tst ././././././././.
Assertion failed: (strnlen(path, sizeof(path)) < sizeof(path)-1), function main, file tst.c, line 13.
Abort trap: 6

There.  My bugs are reported *much* closer to where they really are.

The essence of the BUF2SMALL() macro is that you should use a buffer which is at least one character larger than the maximum size you think you need.  So if you want an answer string to be able to hold either "yes" or "no", don't make it "char ans[4]", make it at least "char ans[5]".  BUF2SMALL() asserts an error if the string consumes the whole array.

One final warning.  Note that in BUF2SMALL() I use "strnlen()" instead of "strlen()".   I wrote BUF2SMALL() to be a general-purpose error checker after a variety of "safe" functions.  For example, maybe I want to use it after a "strncpy()".  Look at what the man page for "strncpy()" says:
Warning:  If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
If you use "strncpy()" to copy a string, the string might not be null-terminated, and  strlen() has a good chance of segfaulting.  So I used strnlen(), which is only "safe" in that it won't segfault.  But it doesn't tell me that the string isn't null-terminated!  So I still need my macro to tell me that the buffer is too small.  The "safe" functions only make the fuse a little longer on the stick of dynamite in your program.

Saturday, June 25, 2016

Of compiler warnings and asserts in a throw-away society

Many people despair at today's "throw away" society.  If you don't want it, just throw it away.

Programmers know this is not a recent phenomenon; they've been throwing stuff away since the dawn of high-level languages.

Actual line from code I'm doing some work on:
    write(fd, str_gpio, len);

The "write" function returns a value, which the programmer threw away.  And I know why without even asking him.  If you were to challenge him, he would probably say, "I don't need the return value, and as for prudent error checking, this program has been running without a glitch for years."

Ugh.  It's never a *good* idea to throw away return values, but I've been known to do it.  But I really REALLY don't like compiler warnings:
warning: ignoring return value of 'write', declared with attribute warn_unused_result [-Wunused-result]
     write(fd, str_gpio, len);
     ^

Well, I didn't feel like analyzing the code to see how errors *should* be handled, so I just cast "write" to void to get rid of the compile warning:
    (void)write(fd, str_gpio, len);

Hmm ... still same warning.  Apparently over 10 years ago, glibc decided to make a whole lot of functions have an attribute that makes them throw that warning if the return value is ignored, and GCC decided that functions with that attribute will throw the warning *even if cast to void*.  If you like reading flame wars, the Interwebs are chock full of arguments over this.

And you know what?  Even though I'm not sure I agree with that GCC policy, it did cause me to re-visit the code and add some actual error checking.  I figured that if write() returning an error was something that "could never happen", then let's enshrine that fact in the code:
    s = write(fd, str_gpio, len);  assert(s == len);

Hmm ... different warning:
warning: unused variable 's' [-Wunused-variable]
     s = write(fd, str_gpio, len);  assert(s == len);
     ^

Huh?  I'm using it right there!  Back to Google.  Apparently, you can define a preprocessor variable to inhibit the assert code.  Some programmers like to have their asserts enabled during testing, but disabled for production for improved efficiency.  The compiler sees that the condition testing code is conditionally compiled, and decides to play it safe and throw the warning that "s" isn't used, even if the condition code is compiled in.  And yes, this also featured in the same flame wars over void casting.  I wasn't the first person to use exactly this technique to try to get rid of warnings.

*sigh*

So I ended up doing what lots of the flame war participants bemoaned having to do: writing my own assert:
#define ASSRT(cond_expr) do {\
  if (!(cond_expr)) {\
    fprintf(stderr, "ASSRT failed at %s:%d (%s)", __FILE__, __LINE__, #cond_expr);\
    fflush(stderr);\
    abort();\
} } while (0)
...
    s = write(fd, str_gpio, len);  ASSRT(s == len);

Finally, no warnings!  And better code too (not throwing away the return value).  I just don't like creating my own assert. :-(

Tuesday, May 24, 2016

TCP flow control with non-blocking sends: EAGAIN

So, let's say you're sending data on a TCP socket faster than the receiver can unload it. The socket buffers fill up. Then what happens? The send call returns fewer bytes sent than were requested. Everybody knows that. (Interestingly, http://linux.die.net/man/2/send does not mention this behavior, but I see it during testing.)

But what if the previous send exactly filled the buffer so that your next send can't put *any* bytes in? Does send return zero? Apparently not. It returns -1 with an errno of EAGAIN or EWOULDBLOCK (also verified by testing).  If I ever knew this, I forgot it till today.

Finally, here is something I did already know, but rarely include in my code, and I should (from http://linux.die.net/man/2/send):
EAGAIN or EWOULDBLOCK
The socket is marked nonblocking and the requested operation would block. POSIX.1-2001 allows either error to be returned for this case, and does not require these constants to have the same value, so a portable application should check for both possibilities.

Sunday, January 10, 2016

Saying goodbye to a bit of personal history

Ever since I was *very* young, I've been interested in science and technology.  At some point in my teens, maybe 40 years ago, I wanted a better VOM (Volt-Ohm-Milliamp meter) than the junky one I had picked up, so I did some research and spent precious funds on a high-impedance FET meter:



It saw pretty heavy use about 5 years, but as I transitioned from electronics to digital logic, and from that to software, my need for it dropped.  I've probably used it twice in the past 15 years, probably for checking if an electrical outlet is live.

As my previous post indicates, I've just gotten a single-board computer, and I was trying to indirectly measure the value of the pull-up resistor on an open-collector output.  I need a reasonably accurate, high impedance meter, so I got out my old FET.

Alas, the two small selector switches were frozen.  Not sure why or how -- it's a *switch* for goodness sake -- but I can't use it if I can't turn it on.  I'll take it apart, but I don't have high hopes.

It's passing is a sad event for me, but why?  Is it just nostalgia?  Longing for a simpler time?  Missing my childhood?  I think it's more than that.  There are certain things that have come to represent turning points in my life.  The meter may not have *caused* a significant shift in my life path, but it had come to represent it.  And maybe its a mortality thing too, like a piece of me died.

Oh well, I'll probably get a cheap DVM.

Thursday, December 31, 2015

C.H.I.P. - a small, cheap computer

UPDATE: C.H.I.P. is dead

Or rather, the company is dead.  I will keep my CHIP content laying around, but I won't be doing much with it.  Eventually, I'll probably move to Raspberry Pi.


I just received the C.H.I.P. single-board computer whose kickstart I supported.

I suspect that anybody reading this blog knows what a Raspberry Pi is.  CHIP is like that, with its own spin which I found appealing.  It's goals:
  • Low cost.  The base price is currently $9 (although see the $5 Pi-zero).
  • Ease and speed of getting started.  It's literally ready to surf the net right out of the box.
  • Small size.  It's smaller around than a business card (although obviously it is thicker).
Accessories most people will use to try it out:
  • Power supply.  It has a micro USB connector, so you might already have a charger that will work.  But be careful: even though the doc says it draws 300 ma peak, get one that can handle over an amp.  My 700 ma supply craps out.  My 1.5 amp does fine.
  • Audio/video cable.  If you have a video camera, you might already have the right one (although I've heard some are wired differently).  If not, you can get one from the CHIP store.
  • TV with 3 "RCA" composite audio/video jacks.  Most TVs have them.
  • USB keyboard.
  • USB mouse.
  • Powered USB hub to power the keyboard and mouse.  Maybe you already have one.  But note that this hub will *not* power the CHIP.  You'll still need the micro USB supply.
  • Case.  Completely optional, but at $2, I suggest getting it.
I did go that route for my first bootup.  Being packrats, my wife and I had all the above items, so we were able to get the GUI operational in about 15 minutes.  Not being very familiar with Linux GUIs, it took another 15 minutes to be surfing the web.  Yep, it really is a general-purpose computer.  But being composite video, it has very low resolution.  And drawing a couple of watts, it isn't very fast (compared to modern laptops).  And being Linux, it doesn't have much PC-like software out of the box.  But really, who would want to use one of these as a replacement for your laptop?  ... Oh.  Um.  Never mind.  :-)


If you aren't interested in the GUI, you don't need any of the above accessories.  All you need is a laptop and a micro USB cable (I had 3 laying around).  Again, be careful: we had a couple cables that were wired for power only and didn't pass signals.

Here's a minimal "getting started" procedure:
    http://www.geeky-boy.com/w/CHIP_startup.html

Thursday, October 15, 2015

Windows corrupting UDP datagrams


We just discovered that under a somewhat unlikely set of circumstances, Microsoft's Windows 7 (SP 1) will corrupt outgoing UDP datagrams.  I have a simple demonstration program ("rsend") which reliably reproduces the bug.  (I'll be pointing my Microsoft contact at this blog post.)

This bug was discovered by a customer, and we were able to reproduce it locally.  I wish I could take the credit, but my friend and colleague, Dave Zabel, did most of the detective work.  And amazing detective work it was!  But I'll leave that description for another day.  Let's concentrate on the bug.


CIRCUMSTANCES FOR THE BUG

1. UDP protocol.  (Duh!)
2. Multicast sends.  (Does not happen with unicast UDP.)
3. A process on the same machine must be joined to the same multicast group as being sent.
4. Window's IP MTU size set to smaller than the default 1500 (I tested 1300).
5. Sending datagrams large enough to require fragmentation by the reduced MTU, but still small enough *not* to require fragmentation with a 1500-byte MTU.

With that mix, you stand a good chance of the outgoing data having two bytes changed.  It seems to be somewhat dependent on the content of the datagram.  For example, a datagram consisting mostly of zeros doesn't seem to get corrupted.  But it's not that hard to find datagram content that *is* consistently corrupted, so my "rsend" demonstration program has one such datagram hard-coded.

Regarding #3, for convenience the rsend program contains code to join the multicast group, but I've also reproduced it without rsend joining, and instead running "mdump" in a different window.

Finally, be aware that I have not done a bunch of sensitivity testing.  I.e. I haven't tried different datagram sizes, different multicast groups, different MTU settings, jumbo frames, etc.  Nor did I try different versions of Windows (only 7), different NICs, etc.  Sorry, I don't have time to experiment.


BUG DEMONSTRATION

This procedure assumes that you have a Windows machine with its MTU at its default of 1500.  (You change it below.)

1. Build "rsend.c" on Windows with VS 2005.  Here's how I build it (from a Visual Studio command prompt):

  cl -D_MT -MD -DWIN32_LEAN_AND_MEAN -I. /Oi -Forsend.obj -c rsend.c

  link /OUT:rsend.exe ws2_32.lib mswsock.lib /MACHINE:I386 /SUBSYSTEM:console /NODEFAULTLIB:LIBCMT rsend.obj

  mt -manifest rsend.exe.manifest -outputresource:rsend.exe;1

2. Run the command, giving it the ip address of the windows machine's interface that you want the multicast to go out of.  For single-homed hosts, just give it the IP address of the machine.  For example:
    rsend 10.1.2.3
To make the tool easy to use, it hard codes the multicast group 239.196.2.128 and destination port 12000.

3. On a separate machine, do a packet capture for that multicast group.  Note that the packet capture utilities I know of (wireshark, tcpdump) do *not* tell the kernel to actually join the multicast group.  I generally deal with this using the "mdump" tool.  Run it in a separate window.  For example:
    mdump 239.196.2.128 12000

In the packet capture, look at the 1278th and 1279th bytes of the UDP datagram data: they should both be zero.  Here they are, with a few bytes preceding them:
    0x75,0x34,0x34,0xa4,0xc5,0xb4,0x00,0x00
NOTE: at this point, the datagram will fit in a single ethernet frame, so no IP fragmentation happens.

4.  While rsend is running, open a command prompt with administrator privilege (right-click on "command prompt" icon and select "run as administrator") and enter:
    netsh interface ipv4 set subinterface "Local Area Connection" mtu=1300 store=persistent

Like magic, bytes 1278 and 1279 of the outgoing UDP datagrams change their values!  Note that with an MTU of 1300, this UDP datagram now needs to be fragmented.  If using wireshark, you'll need to examine the *second* packet to see the entire UDP datagram and get to byte 1278.  I consistently see 0x62,0x27, but that seems to be dependent on datagram content as well.

5. Undo the MTU change:
    netsh interface ipv4 set subinterface "Local Area Connection" mtu=1500 store=persistent

Magically, the bytes go back to their correct values of 0x00,0x00.

Note: if you comment out the setsockopt of IP_ADD_MEMBERSHIP, the corruption will not happen.  The multicast datagrams will still go out, but they will be undamaged when the MTU is reduced.  The obvious suspect is the internal loopback.


SOLUTION

The only solution I know of is to leave the Windows IP MTU at its default of 1500.


WHY SET MTU 1300???

I don't know why our customer set it on one of his systems.  But he said that he would just set it back to 1500, so it must not have been importat.

If you google "windows set mtu size" you'll find people asking about it.  In many cases, the user is trying to reach a web site which is across a VPN or some other private WAN link which does not have an MTU of 1500.  The way TCP works is that it tries to send segments (a TCP segment is basically an IP datagram) as large as possible while avoiding IP fragmentation.  So a TCP instance sending data might start with a 1500-byte segment size.  If a network hop in-transit cannot handle a segment that large, it has a choice: either fragment it or reject it.  TCP explicitly sets an option to say, "do not fragment," so the network hop drops the segment.  It is supposed to return an ICMP error, which the sender's TCP instance will use to reduce its segment size.  This algorithm is known as TCP's "path MTU discovery".

But many network components either do not generate ICMP errors, or do not forward them.  This is supposedly done in the name of "security" (don't get me started).  This breaks path MTU discovery.  But the segments are still being dropped, so eventually the TCP sender times out and the web site doesn't work.  Apparently this is fairly rare, but it does happen.  Hence the  "set MTU" questions.  If the user reduces IP's MTU setting, it artificially reduces the maximum segment size used by TCP.  Do a bit of experimenting to find the right value, et Voila!  (French for "finally, I can download porn!")

So, how could Microsoft possibly not find this during their extensive testing?  Well first of all, UDP use is rare compared to TCP.  Multicast UDP is even more rare.  Sending UDP multicast datagrams larger than MTU is getting close to unicorn rare.  And doing all that with the IP MTU set to a non-standard value?  Heck, I consider myself to be a pretty rigorous tester, and I would never have tried that.


UDPATE:

Thanks to Mr. Anonymous for asking the question about NIC offloading.  We had considered the question previously (see my response in the comments), but in composing my response, I got to thinking about the offset of the corruption.

It's always in the second packet of the fragmented datagram, and always at 1278.  But that offset is with respect to the start of the UDP payload.  What is the offset with respect to the start of the second packet?  I didn't look at this before since Wireshark's ability to reassemble fragmented datagrams is so handy.  But I went ahead and clicked the "Frame" tab and saw that the corruption happens at offset 40 from the start of the packet.

Guess where the UDP checksum belongs in an UNfragmented datagram!  Yep, offset 40.  Something decided to take the second packet of the fragmented datagram and insert a separate UDP checksum where it *should* go if it were not a fragment.

This still seems like a software bug in Windows.  Sure, maybe the NIC is doing the actual calculation.  Maybe it's not.  But it only happens when IP is configured for a non-standard MTU.  If I have MTU=1500 and I send a fragmented datagram, there is no corruption.


UPDATE 2:

I did some experimenting with datagram size and verified something that I suspected.  When the MTU is set to 1300, the corruption only happens when the datagram size is such that a 1500-byte MTU would *not* fragment but a 1300-byte MTU does.  I.e. there is a size range of 200 bytes (the difference between 1300 and 1500).   This is another reason Microsoft's testers apparently didn't discover this.  Even if they tested fragmentation with non-standard MTUs, would they think to test a size in that specific range?  With the benefit of hindsight, sure, it's "obvious".  But if you're just testing combinations of configurations, you would just pick the "send fragments" combination, which is probably chosen to fragment with MTU 1500.  (FYI: I've updated the original post to refine the conditions of the bug.)

I'm normally not a Microsoft cheerleader, so it feels weird to be defending them on this bug.  :-)


UPDATE 3:

Since we noticed that the corruption always happens at offset 40 in the second packet, I decreased the size of the datagram to only include half of the corrupted pair.  Sure enough, the last byte of the datagram got corrupted.  An the second corrupted byte?  Who knows.  I kind of hoped it would corrupt something in Windows and maybe blue-screen it, but no such luck.  I didn't "see" any misbehavior.

Does that mean there *was* no misbehavior?  NO!  The outgoing datagrams suddenly had bad checksums!  Meaning that the mdump tool stopped receiving them since Linux discards datagrams with bad checksums.  But tcpdump captures the packets *before* UDP discards them, so you can see the bad checksums.

I kept decreasing the size of the datagram till it was 1273 bytes.  That still triggers fragmentation when MTU=1300.  The outgoing datagrams had no visible corruption but had bad checksums.  Reduce one more byte, and the datagram fits in one packet.  Suddenly the checksums are OK.

I tried a few things, like sending packets hard, and varying their sizes, but other than the bad checksums I could not see any obvious Windows misbehavior.

I guess my days as a white-hat hacker are over before they started.  (Did I get the tenses right on that sentence?)

Well, I think I'm done experimenting.  If anybody else reproduces it, please let me know your Windows version.


UPDATE 4:

I heard back from my contact at MS.  He said:

We've looked into this, and see what is happening.  If the customer needs to pursue this rather than using a work around (e.g. not setting the MTU size on the loopback path to a different size than the non-loopback interface, etc.) they will need to open a support ticket.  Thank you for letting me know about this."
Which I suspect translates to, "We'll fix it in a future version, but not urgently.  If you need urgency, pay for support."  :-)


REDDIT:

Finally, Hi Reddit users!  Thanks for pushing the hits on this post to many times the total hit count for the whole rest of the blog.  :-)  I read the comments and saw that my first update had already been noticed by somebody else.

Also, something a lot of Reddit comments have fixated on is my claim that UDP multicast is rare.  I meant that the number of programs (and programmers) that use it is very small compared to all software, not that multicast is hardly ever used.  As pointed out, there are several areas of network infrastructure which are multicast-based, so it gets used all the time.  My point is that the number of programmer-hours spent *writing and testing* multicast-based software is very small compared to the overall networking software field.  And as such, it tends not to be as burned-in as, say, TCP.

Also, in most multicast software that I have learned the guts of, the programmer makes sure that datagram sizes are kept small so as to avoid fragmentation.  This seems to be due to the commonly-held idea that you should *never* let IP fragment, which I think comes from the fact that, at least historically, router performance is hurt if it has to perform fragmentation while a datagram is in transit.  I'm not sure if this is still true for modern routers, but historically fragmentation needed to be handled by the supervisory processor.  For the odd packet every now and then, no problem.  For high-rate data flows, it can kill a router.

That seems to be the basis on which a lot of multicast software avoids fragmentation, preferring instead to split large messages into multiple datagrams.  But this reasoning is often not applicable.  Our software intended primarily to be used within a single data center.  When we send a 2K datagram, no router needs to worry about fragmenting it; the sending host's IP stack splits the datagram into packets before they hit the wire.  The intermediate switches and routers all have 1500 MTU, allowing the packets to traverse unmolested.  The final receiving host(s) reassemble and pass the datagram to user space.  This has a noticeable advantage for high-performance applications since the same amount of user data is passed with fewer system calls (the overhead of switching between user and kernel space is significant).

So while I'm sure our software is not alone in sending fragmented multicast datagrams, I stand behind my claim that sending fragmented multicast is relatively rare.

Wednesday, September 30, 2015

Coding Pet Peeves

Ok, these things are not the end of the world.  During a code inspection, I might not even bring them up.  Or maybe I would - it would depend on my mood.


INT OR BOOLEAN

    if (use_tls != 0) {
        /* Code having to do with TLS */
    }

So, what is "use_tls"?  It's obviously somewhat of a flag - if it's non-zero then we need to do TLS stuff - but is it a count?  If it's a count, then it should have been named "use_tls_cnt", or maybe just "tls_cnt".  If it's being used as a boolean, then it should have tested as a boolean:

    if (use_tls) {
        /* Code having to do with TLS */
    }

And yes, this is actual code that I've worked on.  It was being used as a boolean, and should have been tested as one.  Is this a big deal?  No, like I said, I might not have brought it up in a review.  But I am a believer that the code should communicate information to the reader.  The boolean usage tells the reader more than the numeric version (and is easier on the eyes as well).


MALLOC AND FREE

    state_info = OUR_MALLOC_1(sizeof state_info_t);
    /* Code using state_info */
    free(state_info);

Again, actual code I've worked on.  We have a code macro for malloc which does some useful things.  But the author couldn't think of anything useful to do with free so he didn't make a free macro.  He should have.  There have been at least two times (that I know of) when it would have been tremendously useful.  One time we just wanted to log a message for each malloc and free because we suspected we were double-freeing.  Another time we thought that a freed structure was being accessed, and we wanted free to write a pattern on the memory before freeing.

"No problem," you say, "just create that macro and name it free."  Nope.  There are other mallocs: OUR_MALLOC_2, OUR_MALLOC_3, etc (no they aren't actually named that).  And some of the code doesn't use any of the macros, it just calls malloc directly!  For that second feature (writing a pattern before freeing), you need special code in the malloc part as well as the free part.  These all should have been done consistently so that OUR_MALLOC_1 only works with OUR_FREE_1, etc.  That would have allowed us to do powerful things, but as it is, we couldn't.


UNNATURAL CONDITIONAL ORDER

I bet there is an official name for this:

    if ('q' == cmd) break;

I've always felt that code should be written the way you would explain it to somebody.  Would you say, "If q is the command, then exit the loop."  No.  You would say, "If the command is q, then exit the loop."  I've seen this construct a lot where they put the constant in front of the comparison operator, and it is almost always awkward.

Oh, I know there's a good reason for it: it's a form of defensive programming.  Consider:

    if (cmd = 'q') break;

Oops, forgot an equals.  But the compiler will happily assign 'q' to cmd, test it against zero, and always do the break.  Swap the variable and the constant and you get a compile error.

But come on.  Gcc supports the "-parentheses" option which warns the above.  In fact, gcc supports a *lot* of nice optional warnings, which people should be using all the time.  "-Wall" is your friend!


DECLARE CLOSE TO FIRST USE

Ok, I'm more of a recent convert here.  I've written hundreds of functions like this:

    void my_funct(blah blah blah)
    {
        int something = 0;
        int something_else = 1;
        .... and another 10 or 15 variables

In the REALLY old days, you *had* to declare all local variables immediately after the function's open brace.  But modern C compilers let you declare variables anywhere you want.  Variables should be declared and initialized close to their first use.  This has two advantages: as the reader is reading the code and he encounters a variable, he can see right away what its type and initial value is without having to scroll up and back down again.  But more importantly, when the user sees the variable declared, he knows that the variable is not used anywhere above that declaration.  Whereas if it is declared at the top of the function, he has to examine all the function code above the first use to make sure that it isn't secretly being changed from its initial value.


POINTLESS COMMENTS

    i++;  /* increment i */

Oh, thank goodness somebody taught me what the "++" operator does!  Note to programmer: you don't need to teach me the language.

    fclose(file[i]);  /* close the ith file */

Are you sure?  I thought the "fclose" function *opened* a file.  Note to programmer: you don't have to teach me the standard library functions.  They all have man pages.

    session_cleanup(client[i]);  /* clean up the client session */

OK, session_cleanup() probably doesn't have a man page.  But the comment doesn't contain any more information than the function and variable names!

One might add comments to say what the variable "i" is for and what the files in the "filep[]" array are for.  That at least gives information that a reader might not know.  But even those should not be comments, they should be descriptive variable names.

Most comments I've encountered should simply be removed as unnecessary clutter that makes it hard to read the code.  The best comments don't try to explain *what* the code is doing, they explain *why* it is being done.  Usually that is pretty obvious, so only give the "why" explanations when the answer may not be obvious.

    client_num++;

    /* Usually we would close the log file *after* we clean up the
     * session, in case the cleanup code wants to log an error.  But
     * the session_cleanup() function is legacy and is written to
     * open and close the log file itself on error.  */
    fclose(log_file[client_num]);
    session_cleanup(client[client_num]);

That's better.


PETTY PEEVE

"Really Steve?  Brace indention?  What are you, 12?"

(grumble)  I know, I should have grown past this way back when I realized that emacs v.s. vi just doesn't matter.  And really, I don't much care between the various popular styles.  If writing from scratch, I put the open brace at the end of the "if", but I'm also happy to conform to almost any style.

But I just recently had the pleasure of peeking inside the OpenSSL source code and I saw something I hadn't seen for many years:

    if (blah)
        {
        code_to_handle_blah;
        }

It's called "Whitesmith" and supposedly has advantages.  But those advantages seem more philosophical than practical ("the alignment of the braces with the block that emphasizes the fact that the entire block is conceptually (as well as programmatically) a single compound statement").  Um ... sorry, I prefer actual utility over making a subtle point.  Having the close unindented makes it easier to find the end of the block.   The jargon file claims that, "Surveys have shown the Allman and Whitesmiths styles to be the most common, with about equal mind shares."  Really?  I've been a C programmer for over 20 years with 7 different companies, and none of those places used Whitesmith.  This is only the second time I've even seen it.  Equal mind share?   Here's a survey which has Allman 17 times as popular as Whitesmith.  I think jargon file got it wrong.

And yeah, I know that if I were immersed in it for a few months, I wouldn't even notice it any more.  Just like emacs v.s. vi, it doesn't really matter, so long as a project is internally consistent.  But hey, this is *my* rant, and I don't like it!  So there.

Thursday, September 10, 2015

RS-232 serial ports

Remember RS-232?  It is rare now-a-days to find a need for RS-232, but it can come up.  In my case, a couple of machines in our lab have their system consoles tied to a dedicated serial lines which expect a "dumb-ish" terminal (I actually owned a real-life VT-100 for a while).  Again, it is rare that one needs the system console, but especially when you have hardware issues, sometimes it is the only way to get in.

RS-232 is an interface specification which was basically invented to do one thing: connect a MODEM to either a computer or a terminal.  If you look at the RS-232 spec, you will see very MODEM-ish signals, like carrier detect, ring indicator, etc.

The RS-232 interface has two different sides, one named "DTE" and the other named "DCE".  The DTE side is for a terminal or computer.  The DCE side is for a MODEM.  RS-232 is a standard which, among other things, defines what pins on a connector are for which purpose.  If you look at an RS-232 pinout, you will see names on the pins.  For example, on a DB-25 or a DB-9 connector, pin 2 is labeled "Tx".  It is named from the point of view of the DTE, which is to say that the DTE side of the interface is supposed to output to pin 2 serial data it is transmitting to the DCE.  The DCE side is supposed to input from pin 2 serial data it is receiving from the DTE.  So the names make sense for the DTE (it transmits on Tx and I receive on Rx) and the names are backwards for the DCE (it receives on Tx and transmits on Rx).  DTE devices normally have male connectors (with pins) and DCE devices normally have female connectors, although back in the old days, I did find the odd device which violated this standard.

These days, we rarely use MODEMs.  Most of the time we deal with RS-232, we are trying to connect two DTE devices together, like connecting a terminal to a computer.  You can't just use a normal cable for that since both the terminal and the computer are DTEs and will try to transmit on Tx (pin 2).  So to get two DTEs to talk, you need a "null modem".  This is usually a specially-wired cable with a female connector on both ends.  For example, one side's Tx pin is wired to the other side's Rx pin, and vice versa.  See https://en.wikipedia.org/wiki/Null_modem for details.

Modern laptops often don't have serial lines any more, so you'll probably need a USB serial adapter.  I have a Belkin which connects to a Windows PC and presents as COM3 (required driver).  I use putty, which lets you select "serial" and specify which COM port.  I also have a TRENDnet which connects to my Mac and presents as /dev/tty.usbserial (also required driver).  I use the normal Mac "terminal" application and enter:

    screen /dev/tty.usbserial 9600

Both of my USB devices are male DTE.  To use either one as a serial console, I need to use a female-to-female null modem.

Saturday, July 18, 2015

Rsync, Ssh as Root, and Solaris

In my distant past, I dabbled in a bit of Unix system administration.  (How distantly in the past?  SunOS 4.2.  Yeah, I'm old.)

This year, my department lost its system administrator and the decision was made not to replace him. So I and a co-worker (who has also dabbled) are now responsible for administering a lab with 50+ machines, most of them Linux, Windows, and Solaris, but with a scattering of other Unixes like AIX, HP-UX, and FreeBSD.

To make our lives easier, I took the probably inadvisable step of using SSH shared keys to allow our primary file server to log into all other Unix machines as root without password.  (Yes, I know - this allows an intruder who gains root access to our main file server to have root access to all other Unix systems in the lab.  However, understand that the main file server is the only system that particularly *needs* to be secure; the other systems are primarily for testing.  And I don't allow root access in the opposite direction, from test system to file server.)

Anyway, one thing I use this for is to back up selected system areas, like /etc, using rsync.  I was successfully using rsync to back up /etc on most of our test systems, but it wasn't working on Solaris.  A bit of detective work revealed the following (run from our main file server):

    # ssh SolarisBox let | grep PATH
    PATH=/usr/sbin:/usr/bin

This, in spite of the fact that root's .profile set PATH to include /usr/local/bin, which is where rsync lives on our Solaris boxes.  Turns out that on Solaris, .profile is only executed for interactive shells.  I needed to edit /etc/default/login and add:

    SUPATH=desired path

Et voilà!  (Like my Unicode there?)  Backups achieved.

Don't worry, I don't imagine that I'm any kind of expert at system administration.  I won't be posting much on the subject.  :-)

Wednesday, July 8, 2015

UTF-8, Unicode, and International Character Sets (Oh My!)

I'm a little embarrassed to admit that in my entire 35-year career, I've never really learned anything about international character sets (unicode, etc).  And I still know very little.  But recently I had to learn enough to at least be able to talk somewhat intelligently, so I thought I would speed up somebody else's learnings a bit.

A more in-depth page that I like is: http://www.cprogramming.com/tutorial/unicode.html

Unicode - a standard which, among other things, specifies a unique code point (numeric value) to correspond with a printed glyph (character form).  The standard defines a code point space from 1 to 1,114,111 (1 to 0x10FFFF), but as of June 2015 it only assigns 120,737 of those code points to actual glyphs.

UTF-8 - a standard which is basically a means to encode a stream of numeric values with a range from 1 to 2,147,483,647 (1 to 0x7FFFFFFF), using a variable number of bytes (1-6) to represent each value such that numerically smaller values are represented with fewer bytes.  For example, the number 17 requires a single byte to represent, whereas the number 2,000,000,000 requires 6 bytes.  Notice that the largest Unicode code point only requires 4 bytes to represent.


So the Unicode standard specifies that a particular numeric value corresponds to a specific printed character form, and UTF-8 specifies a way to encode those numeric values.  Although the UTF-8 encoding scheme could theoretically be used to encode numbers for any purpose, it was designed to encode Unicode characters reasonably efficiently.  The efficiency derives from the fact that Unicode biases the most-frequently used characters to smaller numeric values.

UTF-8 and Unicode were designed to be backward compatible with 7-bit ASCII, in that the "printable" ASCII characters have numeric values equal to the first 127 Unicode characters (1-127), and UTF-8 can represent those values with a single byte.  Thus the string "ABC" is represented in ASCII as 3 bytes with values 0x41, 0x42, 0x43, and those same three bytes are the valid UTF-8 encoding of the same string in Unicode.  Thus, an application designed to read UTF-8 input is able to read plain ASCII text.  And an application which only understands 7-bit ASCII can read a UTF-8 file which restricts itself to the first 127 Unicode characters.

Another nice thing about UTF-8 is that the bytes of a multi-byte character cannot be confused with normal ASCII characters; every byte of a multi-byte character has the most-significant bit set.  For example, the tilda-n character "ñ" has a Unicode code point 241, and the UTF-8 encoding of 241 is the two-byte sequence  0xC3, 0xB1.  It is also easy to differentiate the first byte of a multi-byte character from subsequent bytes.  Thus, if you pick a random byte in a UTF-8 buffer, it is easy to detect whether the byte is part of a mutli-byte character, and easy to find that character's first byte, or to move past it to the next character.

One thing to notice about UTF-8 is that it is trivially easy to contrive input which is illegal and will not properly parse.  For example, the bytes 0xC3, 0x41 is an illegal sequence in UTF-8 (0xC3 introduces a 2-byte character, and all bytes in a multi-byte character *must* have the most-significant byte set).


Other Encoding Schemes

There are other Unicode encoding schemes, such as UCS-2 and UTF-16, but their usage is declining.  UCS-2 cannot represent all characters of the current Unicode standard, and UTF-16 suffers from problems of ambiguous endianness (byte ordering).  Neither is backward compatible with ASCII text.

Another common non-Unicode-based encoding scheme is ISO-8859-1.  It's advantage is that all characters are represented in a single byte.  It's disadvantage is that it only covers a small fraction of the worlds languages.  It is backward compatible with ASCII text, but it is *not* compatible with UTF-8.  For example, the byte sequence 0xC3, 0x41 is a perfectly valid ISO-8859-1 sequence ("ÃA") but is illegal in UTF-8.  According to Wikipedia, ISO-8859-1 usage has been declining since 2006 while UTF-8 has been increasing.

There are a bunch of other encoding schemes, most of which are variations on the ISO-8859-1 standard, but they represent a small installed base and are not growing nearly as fast as UTF-8.

Unfortunately, there is not a reliable way to detect the encoding scheme being used simply by examining the input.  The input data either needs to be paired with metadata which identifies the encoding (like a mime header), or the user simply has to know what he has and what his software expects.

Most Unixes have available a program named "iconv" which will convert files of pretty much any encoding scheme to any other.  The user is responsible for telling "iconv" what format the input file has.


Programming with Unicode Data

Java and C# have significant features which allow them to process Unicode strings fairly well, but not perfectly.  The Java "char" type is 16 bits, which at the time Java was being defined, was adequate to hold Unicode.  But Unicode evolved to cover more writing systems and 16 bits is no longer adequate, so Java now supports UTF-16 which encodes a Unicode character in either 1 or 2 of those 16-bit chars.  Not being much of a Java or C# programmer, I can't say much more about them.

In C, Unicode is not handled natively at all.

A programmer needs to decide an encoding scheme for in-memory storage of text.  One disadvantage to using something like UTF-8 is that the number of bytes per character varies, making it difficult to randomly access characters by their offset.  If you want the 600th character, you have to start at the beginning and parse your way forward.  Thus, random access is O(n) time instead of O(constant) time that usually accompanies arrays.

One approach that evolved a while ago was the use of a "wide character", with the type "wchar_t".  This would allow you to declare an array of "wchar_t" and be able to randomly access it in O(constant) time.  In earlier days, it was thought that Unicode could live within the range 1-65535, so the original "wchar_t" was 16 bits.  Some compilers still have "wchar_t" as 16 bits (most notably Microsoft Visual Studio).  Other compilers have "wchar_t" as 32 bits (most notably gcc), which makes them a candidate for use with full unicode.

Most recent advice I've seen tells programmers to avoid the use of "wchar_t" due to its portability problems and instead use a fixed 32-bit type, like "uint32_t", which sadly did not exist in Windows until Visual Studio 2010, so you still need annoying conditional compiles to make your code truly portable.  Also, an advantage of wchar_t over uint32_t is the availability of wide flavors of many standard C string handling functions (standardized in C99).

Other opinionated programmers have advised against the use of wide characters altogether, claiming that constant time lookup is vastly overrated since most text processing software spends most of its time stepping through text one character at a time.  The use of UTF-8 allows easy movement across multi-byte characters just by looking at the upper bits of each byte.  Also, the library libiconv provides an API to do the conversions that the "iconv" command does.

And yet, I can understand the attraction of wide (32-bit) characters.  Imagine I have a large code base which does string manipulation.  The hope is that by changing the type from char to wchar_t (or uint32_t), my for loops and comparisons will "just work".  However, I've seen tons of code which assumes that a character is 1 byte (e.g. it will malloc only the number of characters without multiplying by the sizeof the type), so the chances seem small of any significant code "just working" after changing char to wchar_t or uint32_t.

Finally, note that UTF-8 is compatible with many standard C string functions because null can be safely used to indicate the end of string (some other encoding schemes can have null bytes sprinkled throughout the text).  However, note that the function strchr() is *not* generally UTF-8 compatible since it assumes that every character is a char.  But the function strstr() *is* compatible with UTF-8 (so long as *both* string parameters are encoded with UTF-8).


Bottom Line: No Free (or even cheap) Lunch

Unfortunately, there is no easy path.  If I am ever tasked with writing international software, I suspect I will bite the bullet and choose UTF-8 as my internal format.