The Meditative Coder: testing

It's not unusual for me to create a shell script to test networking software, and to have it automatically run tcpdump in the background while the test runs. Generally I do this "just in case something gets weird," so I usually don't pay much attention to the capture.

The other day I was testing my new "raw_send" tool, and my test script consisted of:

INTFC=em1
tcpdump -i $INTFC -w /tmp/raw_send.pcap &
TCPDUMP_PID=$!
sleep 0.2
./raw_send $INTFC \
01005e000016000f53797050080046c00028000040000102f5770a1d0465e0000016940400002200e78d0000000104000000ef65030b
sleep 0.2
kill $TCPDUMP_PID

Lo and behold, the tcpdump did NOT contain my packet! I won't embarrass myself by outlining the DAY AND A HALF I spent figuring out what was going on, so I'll just give the answer.

The tcpdump tool wants to be as efficient as possible, so it buffers the packets being written to the output file. This is important because a few large file writes are MUCH more efficient than many small file writes. When you kill the tcpdump (either with the "kill" command or with control-C), it does NOT flush out the current partial buffer. There was a small clue provided in the form of the following output:

tcpdump: listening on em1, link-type EN10MB (Ethernet), capture size 262144 bytes
0 packets captured
1 packets received by filter
0 packets dropped by kernel

I thought it was filtering out my packet for some reason. But no, the "0 packets captured" means that zero packets were written to the capture file ... because of buffering.

The solution? Add the option "--immediate-mode" to tcpdump:

tcpdump -i $INTFC -w /tmp/raw_send.pcap --immediate-mode &

Works fine.

Oh no, not another little language! (Perhaps better known as a "domain-specific language".)

Yep. I wrote a little scripting language module intended to assist in the design and implementation of a useful network traffic generator. It's called "tgen" and can be found at https://github.com/fordsfords/tgen. I won't write much about it here other than to mention that it implements a simple interpreter. In fact, the simplicity of the parser might be the thing I'm most pleased about it, in terms of bang for buck. See the repo for details.

Little Languages Considered Harmful?

So yeah, I'm reasonably pleased with it. But I am also torn. Because "little languages" have both a good rap (Jon Bentley, 1986) and a bad rap (Olin Shivers, 1996).

In that second paper, Shivers complains that little languages:

Are usually ugly, idiosyncratic, and limited in expressiveness.
Basic linguistic elements such as loops, conditionals, variables, and subroutines must be reinvented and re-implemented. It is not an approach that is likely to produce a high-quality language design.
The designer is more interested in the task-specific aspects of his design, to the detriment of the language itself. For example, the little language often has a half-baked variable scoping discipline, weak procedural facilities, and a limited set of data types.
In practice, it often leads to fragile programs that rely on heuristic, error-prone parsers.

Of course, Shivers doesn't *really* think that little languages are a bad idea. He just thinks that they are usually implemented poorly, and his paper shows the right way to do it (in Scheme, a Lisp variant).

But there are some good arguments against developing little languages at all, even if implemented well. At my first job out of college, I wrote a little language to help in the implementation of a menu system. The menus were tedious and error-prone to write, and the little language improved my productivity. I was proud of it. An older and wiser colleague gently told me that there are some fundamental problems with the idea. His reasoning was as follows:

We already have a programming language that everybody knows and is rich and well-tested.
You've just invented a new programming language. It probably has bugs in the parser and interpreter that you'll have to find and fix. Maybe the time you spend doing that is paid for by the increased productivity in adding menus. Maybe not.
The new language is known by exactly one person on the planet. Someday you'll be on a different project or a different company, and we can't hire somebody who already knows it. There's an automatic learning curve.
Instead of writing an interpreter for a little language, you could have simply designed a good API for the functional elements of your language, and then used the base language to call those functions. Then you have all the base language features at your disposal, while still having a high level of abstraction to deal with the menus.

He was a nice guy, so while his criticism stung, he didn't make it personal, and he was tactful. I was able to see it as a learning experience. And ever since then, I've been skeptical of most little languages.

Then Why Did I Make a Little Language?

Well, first off, I *DID* create an API. So instead of writing scripts in the scripting language, you *can* write them in C/C++. I expect this to be interesting to QA engineers wanting to create automated tests that might need sophisticated usage patterns (like waiting to receive a message before sending a burst of outgoing traffic). I would not want to expand my scripting language enough to support that kind of script. So being able to write those tests in C gives me all the power of C while still giving me the high level of abstraction for sending traffic.

But also, a network traffic generator is a useful thing to be able to run interactively for ad-hoc testing or exploration. It would be annoying to have to recompile the whole tool from source each time you want to change the number or sizes of messages.

Of course, most traffic generation tools take care of that by letting you specify the number and sizes of the messages via command-line options or GUI dialogs. But most of them don't let you have a message rate that changes over time. My colleagues and I deal with bursty data. To properly test the networking software, you should be able to create bursty traffic. Send at this rate for X milliseconds, then at a much higher rate for Y milliseconds, etc. The "tgen" module lets you "shape" your data rate in non-trivial ways.

Did I get it right? My looping construct is ugly and could only be loved by an assembly language programmer. Maybe I should have made it better? Or just omitted it? Dunno. I'm open to discussion.

Anyway, I'm hoping that others will be able to take the tgen module and do something useful with it.

The Meditative Coder

Monday, July 1, 2024

Automating tcpdump in Test Scripts

Tuesday, December 27, 2022

Tgen: Traffic Generator Scripting Language

About Me

Tags (see here)

Blog Archive