Showing posts with label tools. Show all posts
Showing posts with label tools. Show all posts

Friday, December 13, 2024

Claude as Coder's Assistant

 My love affair with Claude.ai continues.

I don't actually use it much for coding. Code is my hobby, I don't want much help doing that. (Although here's an example where I did ask it to write a function: I couldn't remember how to write variadic functions, but I wanted one for error reporting with a printf-style interface. I've done it years ago and couldn't remember how. I didn't feel like spending 20 minutes re-teaching myself.)

I use Claude for:

  • Code Reviews (it finds bugs so I don't have to!).
  • Writing Doc.
  • Remembering API names ("What's that function that's better to use than atoi()?").
  • Bringing me up to speed on tools (I've just started using VSCode, and Claude has saved me much time).
  • Discussing pros and cons of design decisions. Sometimes it comes up with considerations I didn't think of. Sometimes it's just the process of explaining it that clarifies the design in my own mind.
  • Asking questions about the C standard to improve my code's portability. (Claude knows the standard much better than I do.)
  • Brainstorming naming conventions (sometimes I get stuck trying to think of a good name).
  • Help with warnings when I finally turned on super-picky gcc options.

I want to go deeper on a few of those points.

Code Reviews

Overall, Claude-based code reviews are helpful. They've pointed out several cases of cut-and-paste errors that were incompletely made. They've pointed out some inconsistencies that I was glad to fix. And made some suggestions for improvement that I've taken. But it also gets false positives (e.g. claiming a buffer overrun risk where there is none); I think some of that comes from "wanting" too hard to find issues and resorting to raising issues that are often raised in code reviews. Also, for a large codebase with multiple C files, I've seen it get confused and very simply find fewer things. It finds more things with smaller reviews. So not perfect, but I'm often surprised at the useful things it does find.

I have been impressed at how well it makes assumptions given incomplete code. For example, I have a logic simulator with two main modules; one a language processor and the other the main logic engine. You don't get a complete view of the big picture without seeing both files. But just as a human can infer much from the names of functions that are called and the context in which they are called, Claude was also able to.

One thing it does NOT do well is request additional information. If I were reviewing a module and needed another one in order to evaluate the correctness of some code, I would request access to the other module. Claude just makes do with what it has, making reasonable assumptions (but not identifying those assumptions), and when those assumptions are wrong, so too are its conclusions.

Finally, missing from the review is higher-level discussion of alternate designs. To be honest, that is usually also lacking with human reviews, but at least as a reviewer I could initiate such a discussion. With Claude I don't get much traction on that besides some general platitudes about good design patterns.

Bottom line: while there are some benefits from human review that Claude cannot match, there are some things I think Claude does better, like finding cut-and-paste problems and other things that are pattern-based. I think the two forms compliment each other.

Doc

This is an area that Claude kind of blew me away. As an experiment, I took the two main modules of my logic simulator and stripped out all comments. I then asked Claude to reverse-engineer the code and write documentation for the circuit design language I implemented. It did an amazing job; I only made a few minor tweaks to the doc it generated. It was able to infer various intents behind the code with deep understanding. In particular, while one module was primarily focused on the overall language parsing, the other module contained device-dependent interpretation of the I/O terminal identifiers. As an example, I established the convention that normal connections use lower-case, while "not" connections use upper-case. I.e. "q" and "Q" represent "q" and "not q". The only hint for that was a line of code to the effect, "Q = (1 - q);" It generated doc describing the convention.

Not only did it impressed me, it also saved me time. I really was able to take the doc and wholesale insert it. Yes, I made some tweaks as I proofread it, but it converted probably two hours of work into ten minutes of work. And while I don't hate writing documentation, for my hobby I would rather code the document, so it really did increase my enjoyment of my hobby.

Tool Help

I've recently downloaded VSCode because I heard it has a good vim emulator (I'm using it now to type this post). And I'm very happy with it. Finally I'm getting the benefits of a good IDE that can do code refactoring for me. Even just being able to click on an error message and have my cursor popping onto the offending source code line is a time saver. However, VSCode is an advanced tool, and it's not always intuitive how to get things done. Claude to the rescue.

I've asked Claude any number of questions about VSC, and while it doesn't get it right 100% of the time, it's doing better than 80%. For example, it created "tasks" for me to run my compile script and my test script. It also helped me create problem matching patterns so that errors generated by my own program will be recognized as errors and produce clickable file:line links. This is a testament to both VSC and to Claude for quickly showing me how to do it. The alternative would be days worth of Stack Overflow Q&A. I've gotten up to speed on VSC in a fraction of the time I could do on my own. And the help has prevented impatience and frustration from leading me to throw up my hands and go back to command-line vim!

Conclusion

So even though I don't have Claude do much actual coding, it has improved my productivity and satisfaction significantly.

And yes, sometimes I just have conversations with it. I have to laugh every time it claims to have fought some of the same coding battles that I describe (no you haven't!), but I play along since it is emulating how another human would likely respond, and sometimes I'm surprised at how well it does with simple water cooler banter. I've even told it that it's the perfect conversational partner - it doesn't have its own agenda and will follow my conversational lead without friction wherever I lead it. It isn't offended if I ignore its final "engagement" question. And it's always complimenting me on my insights ... so much so that I've created a style to tone it down a bit. (But if I'm feeling low, I'll go back to its normal mode of being overly enthusiastic.)

Claude even found a few typos in this post. Thanks Claude!

Monday, July 1, 2024

Automating tcpdump in Test Scripts

 It's not unusual for me to create a shell script to test networking software, and to have it automatically run tcpdump in the background while the test runs. Generally I do this "just in case something gets weird," so I usually don't pay much attention to the capture.

The other day I was testing my new "raw_send" tool, and my test script consisted of:

INTFC=em1
tcpdump -i $INTFC -w /tmp/raw_send.pcap &
TCPDUMP_PID=$!
sleep 0.2
./raw_send $INTFC \
 01005e000016000f53797050080046c00028000040000102f5770a1d0465e0000016940400002200e78d0000000104000000ef65030b
sleep 0.2
kill $TCPDUMP_PID

Lo and behold, the tcpdump did NOT contain my packet! I won't embarrass myself by outlining the DAY AND A HALF I spent figuring out what was going on, so I'll just give the answer.

The tcpdump tool wants to be as efficient as possible, so it buffers the packets being written to the output file. This is important because a few large file writes are MUCH more efficient than many small file writes. When you kill the tcpdump (either with the "kill" command or with control-C), it does NOT flush out the current partial buffer. There was a small clue provided in the form of the following output:

tcpdump: listening on em1, link-type EN10MB (Ethernet), capture size 262144 bytes
0 packets captured
1 packets received by filter
0 packets dropped by kernel

I thought it was filtering out my packet for some reason. But no, the "0 packets captured" means that zero packets were written to the capture file ... because of buffering.

The solution? Add the option "--immediate-mode" to tcpdump:

tcpdump -i $INTFC -w /tmp/raw_send.pcap --immediate-mode &

Works fine.


Friday, August 25, 2023

Visual Studio Code

 I've been doing more coding than usual lately. As a vi user, I've been missing higher-level IDE-like functionality, like:

  • Shows input parameters to functions (without having to open the .h file and search).
  • Finds definitions of functions, variables, and macros.
  • Finds references to same.
  • Quickly jumping to locations of compile errors. (Most IDEs do syntax checking as you type.)
  • Source-level debugging.
There are other functions as well, like code-refactoring, static analysis, and "lint" capabilities, but the above are the biggies in my book.

Anyway, I've used Visual Studio, Eclipse, and JetBrains, and found those higher-level functions helpful. But I hate GUI-style text editors.

I've gotten good at using emacs and vi during my many years of editing source files. It takes time to get good at a text editor - training your fingers to perform common functions lightning fast, remembering common command sequences, etc. I finally settled on vi because it is already installed and ready to use on every Unix system on the planet. And my brain is not good at using vi one day and emacs the next. So I picked vi and got good at it. (I also mostly avoid advanced features that aren't available in plain vanilla vi, although I do like a few of the advanced regular expressions that VIM offers.)

So how do I get IDE-like functionality in a vi-like editor?

I looked Vim and NeoVIM, both of which claim to have high-quality IDE plugins. And there are lots of dedicated users out there who sing their praises. But I've got a problem with that. I'm looking for a tool, not an ecosystem. If I were a young and hungry pup, I might dive into an ecosystem eagerly and spend months customizing it exactly to my liking. Now I'm a tired old coder who just wants an IDE. I don't want to spend a month just getting the right collection of plugins that work well together.

(BTW, the same thing is true for Emacs. A few years ago, I got into Clojure and temporarily switched back to Emacs. But again, getting the right collection of plugins that work well together was frustratingly elusive. I eventually gave up and switched back to vi.)

Anyway, as a tired old coder, I was about to give up on getting IDE functionality into a vi-like editor, but decided to flip the question around. What about getting vi-like editing into an IDE?

Turns out I'm not the first one to have that idea. Apparently most of the IDEs have vi editing plugins now-a-days. This was NOT the case several years ago when I last used an IDE. I used a vi plugin for Eclipse which ... kind of worked, but had enough problems that it wasn't worth using.

That still leaves the question: which IDE to use. Each one has their fan base and I'm sure each one has some feature that it does MUCH better than the others. Since programming is not my primary job, I certainly won't become a power user. Basically, I suspect it hardly matters which one I pick.

I decided to start with Visual Studio Code for a completely silly reason: it has an easy integration with GitHub Copilot. I say it's silly because I don't plan to use Copilot any time soon! For one thing, I don't code enough to justify the $10/month. And for another, coding is my hobby. The small research I've done into Copilot suggests that to get the most out of it, you shift your activities towards less coding and more editing and reviewing. While that might be a good thing for a software company, it's not what I'm looking for in a hobby. But that's a different topic for a different post.

Anyway, I've only been using Visual Studio Code for about 30 minutes, and I'm already reasonably pleased with the vi plugin (but time will tell). And I was especially pleased that it has a special integration with Windows WSL (I'm not sure other IDEs have that). I was able to get one of my C programs compiled and tested. I even inserted a bug and tried debugging, which was mildly successful.


Monday, July 10, 2023

Markdown TOC Generator

The long wait is finally over! Announcing the most revolutionary innovation since punched cards! A command-line tool that inserts a table of contents into a markdown file!!!

(crickets)

Um ... according to my notes, this is where I'm supposed to wait for the cheering to die down.

(crickets get bored and leave)

Man, what a tough neighborhood.

Yeah, I know. There might be one or two similar tools out there. Like the web-based tool https://luciopaiva.com/markdown-toc/, but I don't like the cut-and-paste. Or the command-line tool at https://github.com/ekalinin/github-markdown-toc, but I don't like the curl dependency or the code (although credit to it for showing me the GitHub rendering API that I used in my test script).

So I wrote my own in Perl: https://github.com/fordsfords/mdtoc

Most of my other tools have either a build script or a test script; I'll probably change most of them to have something like:

# Update doc table of contents (see https://github.com/fordsfords/mdtoc).

if which mdtoc.pl >/dev/null; then mdtoc.pl -b "" README.md;

elif [ -x ../mdtoc/mdtoc.pl ]; then ../mdtoc/mdtoc.pl -b "" README.md;

else echo "FYI: mdtoc.pl not found; see https://github.com/fordsfords/mdtoc"

fi


Thursday, March 9, 2023

Examining a Running Process Environment

 I don't know why I keep forgetting about this technique. I guess it's because while it is worth its weight in gold when you need it, it just isn't needed very often.

Say you're helping somebody with a networking problem. Their process isn't behaving well.

"Is it running under Onload?"

"I don't know. I think so, but how can we tell for sure?"

$ tr '\0' '\n' </proc/12345/environ | grep LD_PRELOAD
LD_PRELOAD=libonload.so

(You need the "tr" command because Linux separates entries with a null, not a newline.)

"OK cool. What Onload env vars do you set?"

$ tr '\0' '\n' </proc/12345/environ | grep EF_
EF_RXQ_SIZE=4096
EF_UDP_RCVBUF=16000000
EF_POLL_USEC=100

BAM! No need to rely on memory or what the env "should" be. We know for sure.

Wednesday, January 25, 2023

mkbin: make binary data

We've all been there. We want a set of bytes containing some specific binary data to feed into some program. In my case, it's often a network socket that I want to push the bytes into, but it could be a data file, an RS-232 port, etc.

I've written this program at least twice before in my career, but that was long before there was a GitHub, so who knows where that code is? Yesterday I wanted to push a specific byte sequence to a network socket, and I just didn't feel like using a hex editor to poke the bytes into a file.

So I wrote it once again: mkbin.pl

It's a "little language" (oh boy) that lets you specify the binary data in hex or decimal, as 8, 16, 32, or 64-bit integers, big or little endian (or a mix of the two), or ASCII. 


EXAMPLE WITH HTTP

For example, let's say I want to get the home page from yahoo using mkbin.pl interactively from a shell prompt:

./mkbin.pl | nc yahoo.com 80
"GET / HTTP/1.1" 0x0d0a
"Host: yahoo.com" 0x0d0a0d0a

HTTP/1.1 301 Moved Permanently
...

(I typed the yellow, yahoo server returned the ... blue? Cyan? I'm not good with colors.) Naturally, yahoo wants me to use https, so it is redirecting me. But this is just an example.

Here's a shell script that does the same thing with some comments added:

#!/bin/sh
./mkbin.pl <<__EOF__ | nc yahoo.com 80
"GET / HTTP/1.1" 0x0d0a       # Get request
"Host: yahoo.com" 0x0d0a0d0a  # double cr/lf ends HTTP request
__EOF__

Sometimes it's just easier to echo the commands into mkbin to create a one-liner:

echo '"GET / HTTP/1.1" 0x0d0a "Host: yahoo.com" 0x0d0a0d0a' |
  ./mkbin.pl | nc yahoo.com 80

(Note the use of single quotes to ensure that the double quotes aren't stripped by the shell; the mkbin.pl program needs the double quotes.)


INTEGERS WITH ENDIAN

So far, we've seen commands for inputting ASCII and arbitrary hex bytes. Here are two 16-bit integers with the value 13, first specified in decimal, then in hex:

$ echo '16d13 16xd' | ./mkbin.pl | od -tx1
0000000 00 0d 00 0d
0000004

As you can see, it defaults to big endian.

Here are two 32-bit integer value 13, first in little endian, then in big endian:

$ echo '!endian=0 32d13 !endian=1 32xd' | ./mkbin.pl | od -tx1
0000000 0d 00 00 00 00 00 00 0d
0000010

You can also do 64-bit integers (64d13 64xd) and even 8-bit integers (8d13 8xd).


ARBITRARY BYTES

The construct I used earlier with 0x0d0a encodes an arbitrary series of bytes of any desired length. Note that it must have an even number of hex digits. I.e. 0xd is not valid, even though 8xd is.


STRINGS WITH ESCAPES

Finally, be aware that the string construct does not have fancy C-like escapes, like "\x0d". The backslash only escapes the next character for inclusion and is only useful for including a double quote or a backslash into the string. For example:

$ echo '"I say, \"Hi\\hello.\"" 0x0a' | ./mkbin.pl | od -tx1
0000000 49 20 73 61 79 2c 20 22 48 69 5c 68 65 6c 6c 6f
0000020 2e 22 0a
0000023
$ echo '"I say, \"Hi\\hello.\"" 0x0a' | ./mkbin.pl
I say, "Hi\hello."



Tuesday, December 27, 2022

Tgen: Traffic Generator Scripting Language

Oh no, not another little language! (Perhaps better known as a "domain-specific language".)

Yep. I wrote a little scripting language module intended to assist in the design and implementation of a useful network traffic generator. It's called "tgen" and can be found at https://github.com/fordsfords/tgen. I won't write much about it here other than to mention that it implements a simple interpreter. In fact, the simplicity of the parser might be the thing I'm most pleased about it, in terms of bang for buck. See the repo for details.

Little Languages Considered Harmful?

So yeah, I'm reasonably pleased with it. But I am also torn. Because "little languages" have both a good rap (Jon Bentley, 1986) and a bad rap (Olin Shivers, 1996).

In that second paper, Shivers complains that little languages:

  • Are usually ugly, idiosyncratic, and limited in expressiveness.
  • Basic linguistic elements such as loops, conditionals, variables, and subroutines must be reinvented and re-implemented. It is not an approach that is likely to produce a high-quality language design.
  • The designer is more interested in the task-specific aspects of his design, to the detriment of the language itself. For example, the little language often has a half-baked variable scoping discipline, weak procedural facilities, and a limited set of data types.
  • In practice, it often leads to fragile programs that rely on heuristic, error-prone parsers.
Of course, Shivers doesn't *really* think that little languages are a bad idea. He just thinks that they are usually implemented poorly, and his paper shows the right way to do it (in Scheme, a Lisp variant).

But there are some good arguments against developing little languages at all, even if implemented well. At my first job out of college, I wrote a little language to help in the implementation of a menu system. The menus were tedious and error-prone to write, and the little language improved my productivity. I was proud of it. An older and wiser colleague gently told me that there are some fundamental problems with the idea. His reasoning was as follows:

  • We already have a programming language that everybody knows and is rich and well-tested.
  • You've just invented a new programming language. It probably has bugs in the parser and interpreter that you'll have to find and fix. Maybe the time you spend doing that is paid for by the increased productivity in adding menus. Maybe not.
  • The new language is known by exactly one person on the planet. Someday you'll be on a different project or a different company, and we can't hire somebody who already knows it. There's an automatic learning curve.
  • Instead of writing an interpreter for a little language, you could have simply designed a good API for the functional elements of your language, and then used the base language to call those functions. Then you have all the base language features at your disposal, while still having a high level of abstraction to deal with the menus.

He was a nice guy, so while his criticism stung, he didn't make it personal, and he was tactful. I was able to see it as a learning experience. And ever since then, I've been skeptical of most little languages.

Then Why Did I Make a Little Language?

Well, first off, I *DID* create an API. So instead of writing scripts in the scripting language, you *can* write them in C/C++. I expect this to be interesting to QA engineers wanting to create automated tests that might need sophisticated usage patterns (like waiting to receive a message before sending a burst of outgoing traffic). I would not want to expand my scripting language enough to support that kind of script. So being able to write those tests in C gives me all the power of C while still giving me the high level of abstraction for sending traffic.

But also, a network traffic generator is a useful thing to be able to run interactively for ad-hoc testing or exploration. It would be annoying to have to recompile the whole tool from source each time you want to change the number or sizes of messages.

Of course, most traffic generation tools take care of that by letting you specify the number and sizes of the messages via command-line options or GUI dialogs. But most of them don't let you have a message rate that changes over time. My colleagues and I deal with bursty data. To properly test the networking software, you should be able to create bursty traffic. Send at this rate for X milliseconds, then at a much higher rate for Y milliseconds, etc. The "tgen" module lets you "shape" your data rate in non-trivial ways.

Did I get it right? My looping construct is ugly and could only be loved by an assembly language programmer. Maybe I should have made it better? Or just omitted it? Dunno. I'm open to discussion.

Anyway, I'm hoping that others will be able to take the tgen module and do something useful with it.

Wednesday, January 5, 2022

Bash Process Substitution

I generally don't like surprises. I'm not a surprise kind of guy. If you decide you don't like me and want to make me feel miserable, just throw me a surprise party.

But there is one kind of surprise that I REALLY like. It's learning something new ... the sort of thing that makes you say, "how did I not learn this years ago???"

Let's say you want the standard output of one command to serve as the input to another command. On day one, a Unix shell beginner might use file redirection:

$ ls >ls_output.tmp
$ grep myfile <ls_output.tmp
$ rm ls_output.tmp

On day two, they will learn about the pipe:

$ ls | grep myfile

This is more concise, doesn't leave garbage, and runs faster.

But what about cases where the second program doesn't take its input from STDIN? For example, let's say you have two directories with very similar lists of files, but you want to know if there are any files in one that aren't in the other.

$ ls -1 dir1 >dir1_output.tmp
$ ls -1 dir2 >dir2_output.tmp
$ diff dir1_ouptut.tmp dir2_output.tmp
$ rm dir[12]_output.tmp

So much for conciseness, garbage, and speed.

But, today I learned about Process Substitution:

$ diff <(ls -1 dir1) <(ls -1 dir2)

This basically creates two pipes, gives them names, and passes the pipe names as command-line parameters of the diff command. I HAVE WANTED THIS FOR DECADES!!!

And just for fun, let's see what those named pipes are named:

$ echo <(ls -l dir1) <(ls -1 dir2)
/dev/fd/63 /dev/fd/62

COOL!

(Note that echo doesn't actually read the pipes.)


VARIATION 1 - OUTPUT

The "cmda <(cmdb)" construct is for cmda getting its input from the output of cmdb. What about the other way around? I.e., what if cmda wants to write its output, not to STDOUT, but to a named file, and you want that output to be the standard input of cmdb? I'm having trouble thinking here of a useful example, but here's a not-useful example:

cp file1 >(grep xyz)

I say this isn't useful because why use the "cp" command? Why not:

cat file1 | grep xyz

Or better yet:

grep xyz file1

Most shell commands write their primary output to STDOUT. I can think of some examples that don't, like giving an output file to tcpdump, or the object code out of gcc, but I can't imagine wanting to pipe that into another command.

If you can think of a good use case, let me know.


VARIATION 2 - REDIRECTING STANDARD I/O

Here's something that I have occasionally wanted to do. Pipe a command's STDOUT to one command, and STDERR to a different command. Here's a contrived non-pipe example:

process_foo 2>err.tmp | format_foo >foo.txt
alert_operator <err.tmp
rm err.tmp

You could re-write this as:

process_foo > >(format_foo >foo.txt) 2> >(alert_operator)

Note the space between the two ">" characters - this is needed. Without the space, ">>" is treated as the append redirection.

Sorry for the contrived example. I know I've wanted this a few times in the past, but I can't remember why.


And for completeness, you can also redirect STDIN:

cat < <(echo hi)

But this is the same as:

echo hi | cat

I can't think of a good use for the "< <(cmd)" construct. Let me know if you can.


EDIT:

I'm always amused when I learn something new and pretty quickly come up with a good use for it. I had some files containing a mix of latency values and some log messages. I wanted to "paste" the different files into a single file with multiple columns to produce a .CSV. But the log messages were getting in the way.

paste -d "," <(grep "^[0-9]" file1) <(grep "^[0-9]" file2) ... >file.csv

Done! :-)

Friday, July 9, 2021

More Perl "grep" performance

In an earlier post, I discovered that a simple Perl program can outperform grep by about double. Today I discovered that some patterns can cause the execution time to balloon tremendously.

I have a new big log file, this time with about 70 million lines. I'm running it on my newly-updated Mac, whose "time" command has slightly different output.

Let's start with this:

time grep 'asdf' cetasfit05.txt
... 39.388 total

time grep.pl 'asdf' cetasfit05.txt
... 21.388 total

About twice as fast.


Now let's change the pattern:

time grep 'XBT|XBM' cetasfit05.txt
... 24.787 total

time grep.pl 'XBT|XBM' cetasfit05.txt
... 18.940 total

Still faster, but nowhere near twice as fast. I don't know why 

Now let's add an anchor:

time grep '^XBT|^XBM' cetasfit05.txt
... 25.580 total

time grep.pl '^XBT|^XBM' cetasfit05.txt
... 3:08.25 total

WHOA! Perl, what happened????? 3 MINUTES???

My only explanation is that Perl tries to implement  a very general regular expression algorithm, and grep implements a subset, and that might cause Perl to be slow in some circumstances. For example, maybe the use of alternation with anchors introduces the need for "backtracking" under some circumstances, and maybe grep doesn't support backtracking. In this simple example, backtracking is probably not necessary, but to be general, Perl might do it "just in case". (Note: I'm not a regular expression expert, and don't really know when "backtracking" is needed; I'm speculating without bothering to learn about it.)

Anyway, let's make a small adjustment:

time grep.pl '^(XBT|XBM)' cetasfit05.txt
... 17.910 total

There, that got back to "normal".

I guess multiple anchors in a pattern is a bad idea.


P.S. - even though this post is about Perl, I tried one more test with grep:

time grep 'ASDF' cetasfit05.txt
... 26.132 total

Whaaa...? I tried multiple times, and lower-case 'asdf' always takes about 40 seconds, and upper-case 'ASDF' always takes about 27 seconds. I DON'T UNDERSTAND COMPUTERS!!! (sob)

Sunday, March 14, 2021

Circuit simulation

 I've been playing with designing simple digital circuits this weekend. Since my breadboards are not with me at the moment, I decided to look for circuit simulators.

Here's a nice comparison of several: https://www.electronics-lab.com/top-ten-online-circuit-simulators/

Before I found that comparison site, I tried out https://www.circuitlab.com/ and even threw them money for a month's worth. And I can say that I've gotten that much worth of enjoyment out of my tinkering this weekend, so money well-spent. But I knew that I didn't want to keep shelling out every month (I don't do digital design that much), and there was no way to export the circuits in a way that I could save them. So I kept looking.

Here's my CircuitLab home: https://www.circuitlab.com/user/fordsfords/

I haven't tried all the choices in the "top ten" list, but I did try the "CircuitJS1" simulator maintained by Paul Falstad and Iain Sharp. See https://github.com/sharpie7/circuitjs1 It isn't quite as nice as CircuitLab, but it's hard to argue with free, especially given my infrequency of use.

CircuitJS1 doesn't host users' designs. In fact, they don't integrate well with any form of storage. You can save your design to a local file, but the simulator doesn't do a good job of remembering file names. It presents you with a link containing a file name of the form "circuit-YYYYMMDD-HHMM.circuitjs.txt". You can save the linked contents with your own file name, but the next time you go to save, it obviously won't remember that name since it was a browser operation. All of this will make it a little inconvenient and perhaps error-prone to manage different projects. If I were doing a lot of hardware work, I would probably choose something else. But for occasional fiddling, this is fine.

Here's a simple state machine that checks even/odd parity of an input bit stream: http://www.falstad.com/circuit/circuitjs.html?startCircuitLink=https://www.geeky-boy.com/test.circuitjs.txt

If I want to make anything public, I'll make them as github projects.

Sunday, November 29, 2020

Using sed "in place" (gnu vs bsd)

 I'm not crazy after all!

Well, ok, I guess figuring out a difference in "sed" between gnu and bsd is not a sign of sanity.

(TL;DR this works on both: sed -i.bak -e "s/x/y/" x.txt)

I use sed a fair amount in my shell scripts. Recently, I've been using "-i" a lot to edit files "in-place". The "-i" option takes a value that is interpreted as a file name suffix to save the pre-edited form of the file. You know, in case you mess up your sed commands, you can get back your original file.

But for some scripts, the file being edited is itself generated, so there is no need to save a backup. So, just pass a null string in as the suffix. Right?

[ update: useful page: https://riptutorial.com/sed/topic/9436/bsd-macos-sed-vs--gnu-sed-vs--the-posix-sed-specification ]

BSD SED (FreeBSD and Mac)

$ echo "x" >x.txt
$ sed -i '' -e "s/x/y/" x.txt
$ cat x
y
$ ls
x.txt

Looks good. Let's try Linux.

GNU SED (Linux and Cygwin)

$ echo "x" >x.txt
$ sed -i '' -e "s/x/y/" x.txt
sed: can't read : No such file or directory
$ cat x
y
$ ls
x.txt

Hmm ... that's odd. It "worked", which is to say the file was properly edited. But what's with that "no such file" error? Man page to the rescue:

$ man sed
...
       -i[SUFFIX], --in-place[=SUFFIX]
              edit files in place (makes backup if SUFFIX supplied)

Interesting, you can omit the suffix. And you mustn't supply a space between the "-i" and the suffix; it thinks you've omitted it and treats the empty string as an input file. Here's an example with a non-empty suffix:

$ echo "x" >x.txt
$ sed -i .bak -e "s/x/y/" x.txt
sed: can't read .bak: No such file or directory
$ cat x
y
$ ls
x.txt

See? With the space, it thinks ".bak" is an input file. But we don't want a backup file, so let's try just omitting the suffix, like the man page says.

$ echo "x" >x.txt
$ sed -i -e "s/x/y/" x.txt
$ cat x
y
$ ls
x.txt

Works. Let's try it on BSD.

BSD SED (FreeBSD and Mac)

$ echo "x" >x.txt
$ sed -i -e "s/x/y/" x.txt
$ cat x
y
$ ls
x
.txt       x.txt-e

Wait, what? Again, the file was edited properly, but what's with that file "x.txt-e"? Oh, BSD sed doesn't support a missing suffix. You can supply an empty one, but you can't just omit it. So sed looked at the above command line and thought "-e" was my desired suffix. And the "-e" option is optional in front of an in-line sed program.

ARGH!

I use both Mac and Linux, and want scripts that work on both!

THE SOLUTION

There is no portable way to tell both seds that you want in-place editing but don't want a backup suffix. So just go ahead and always generate a backup file. And remember, GNU doesn't like a space after the "-i". This works on both:

$ echo "x" >x.txt
$ sed -i.bak -e "s/x/y/" x.txt
$ cat x.txt
y
$ ls
x.txt           x.txt.bak

Works on Mac and Linux. Just delete the .bak file.

It took a long time to figure all this out, largely because the incorrect usages basically worked. I.e. the intended file did get edited in place, but with undesired side effects. So I didn't notice there was a problem until I really looked at things and saw the error or the "x.txt-e" file.

Corner cases: the bane of programmers everywhere.

Saturday, July 11, 2020

Perl Faster than Grep

So, I've been crawling through a debug log file that is 195 million lines long. I've been using a lot of "grep | wc" to count numbers of various log messages. Here's some timings for my Macbook Pro:

$ time cat dbglog.txt >/dev/null
real 0m35.423s

$ time wc dbglog.txt
195177935 1177117603 28533284864 dbglog.txt
real 1m44.560s

$ time egrep '999999' dbglog.txt
real 7m39.737s

(For this timing, I chose a pattern that would *NOT* be found.)

On the Macbook, the man page for fgrep claims that it is faster than grep. Let's see:

$ time fgrep '999999' dbglog.txt
real 7m11.365s

Well, I guess it's a little faster, but nothing to brag about.

Then I wanted to create a histogram of some findings, so I wrote a perl script to scan the file and create the histogram. Since it performed regular expression matching on every line, I assumed it would be a little slower than grep, since Perl is an interpreted language.

$ time ./count.pl dbglog.txt >count.out
real 3m9.427s

WOW! Less than half the time!

So I created a simple grep replacement: grep.pl. It doesn't do any histogramming, so it should be even faster.

$ time grep.pl '999999' dbglog.txt
real  2m8.341s

Amazing. Perl grep runs in less than a third the time of grep.

For small files, I bet Perl grep is slower starting up. Let's see.

$ time echo "hi" | grep 9999
real        0m0.051s

$ time echo "hi" | grep.pl 9999
real        0m0.113s

Yep. Grep saves you about 60 milliseconds. So if you had thousands of small files to grep, it might be faster to use grep.



UPDATE:

I got another big log file today (70 million lines) and saw something pretty surprising given my initial findings.


Saturday, July 8, 2017

XKCD-style Password Generator

I got to thinking about passwords again today.

I wrote my own program to produce XKCD-style passwords from a list of 2126 common words, and calculated some stats.  I reproduced XKCD's calculation of 44 bits of entropy for 4 randomly-selected words.  And I made a few mildly-interesting discoveries, and one more-interesting realization.


PASSWORD LENGTH: MORE = BETTER?

My average XKCD-style password length is 20.8 letters (over a large sample), which is a lot of typing.  So I decided to limit word size.  By filtering my list of 2129 words to nothing longer than 4 letters, I ended up with 709 words.  That's not many, and 4 of them together only gives 37 bits of entropy.  Not so hot.  But if you string 5 of 709 words together, you get 47 bits of entropy, which is better than XKCD!  And the average password drops to 18.3 characters.

I find that interesting: shorter passwords which produce more bits of entropy than longer passwords.  Seems counter-intuitive, until you realize that opening it up to the full 2129 words increases average word length more than it increases entropy.  (See below for the math.)


THE PASSWORDS

So, what do these passwords look like?  Here's 10 of the XKCD-style: 4 words from the 2126 word set:

password: MostlyRelativeSpinAdvanced
length: 26
password: ForBasicallyThinkingExplain
length: 27
password: CookieArmyMysteryConference
length: 27
password: ExpectConvertQuarterbackPresentation
length: 36
password: EverybodyProductHotDemonstrate
length: 30
password: RockIndexWellFloat
length: 18
password: BehaviorNearlyPromotePercentage
length: 31
password: PocketSurviveFourLab
length: 20
password: MuchWeekWillAnd
length: 15
password: DivideMorePeakSeveral
length: 21


So, how easy are those to remember?  Memorizing an XKCD-style password is about creating a mental picture or story around it.  Use some imagination.  It usually helps to make it amusing.  How about the first one: "MostlyRelativeSpinAdvanced"?  Well, I'm a bit of a science geek, so this one makes perfect sense.  You have a particle stream, and most of the particles are moving at relativistic speeds.  So measuring each particle's spin is a pretty advanced thing to do.  Hmm ... what's the amusing part of that?  Oh maybe that an actual physicist would roll his eyes at my explanation and say that I don't know the first thing about particle physics.  But basically, I was able to imagine a mini-story or mental picture for each of those passwords, so while I might not be able to memorize all 10, I could easily memorize one of them.

What about the shorter passwords consisting of 5 words from the 709 words of 4 letters or less?

password: BuyWingSadRideSeed
length: 18
password: SeedPairTankJailDo
length: 18
password: PanBuryDenyDataOld
length: 18
password: GeneRiceTeaYetSin
length: 17
password: WallJailLabNextTent
length: 19
password: HallSnapCashRichRead
length: 20
password: WarmUsKeepRoseLess
length: 18
password: PortMarkSirYouLeaf
length: 18
password: HiAgoHipAnyBe
length: 13
password: EaseSkyRealTossFate
length: 19


Even though those are shorter (and more secure) passwords, I guess I find them more difficult to remember them.  It's about creating a mental picture or story around those words.  Since the words are random, they don't come out in any conceptually correlated way.  So you stretch your imagination to encompass them.  The more words in the password, the more you have to stretch.

Take the last password up there, "EaseSkyRealTossFate", and drop that last word to make it 4 words: "Ease sky real toss".  My first thought is that "toss" is the children's game "ring toss".  Sky and ease kind of fit since the game is usually pretty easy and you toss things towards the sky.  The word "real" is kind of left out, but I imagine throwing something "real", like a laptop or a dinner plate, instead of a game piece.  So I imagine the ease of tossing a real laptop into the sky.  Yeah, that's stretching the imagination a bit, but maybe not too much.  I could pretty easily remember EaseSkyRealToss.

But now throw "fate" in there and my whole mental picture falls apart.  I guess I could say that when the laptop lands, its fate will be sealed, but ... not sure why ... but I would have much more trouble remembering it.

So I'll be sticking to 4 words and more typing.


THE MATH

Passwords are basically taking a set of N things, and taking L of them out with replacement.  For example, a 4-digit PIN consists of a set of 10 digits (N=10), and you take 4 digits out (L=4) with replacement.  The "with replacement" simply means that you might take the same digit out more than once (e.g. 2338).  So the entropy of a 4-digit PIN is 10**4 *(10 to the power of 4), which is 10,000.  To get that in terms of bits, take the log base 2 of it to get 13.  So 13 bits of entropy.

Another example: 8 randomly-selected letters for a password.  Let's assume lower-case only, and no digits or special characters.  The set of N things is the letters of the alphabet, so N=26.  By taking 8 characters, L=8.  26**8 = 208 billion.  Log base 2 of that is 37 bits of entropy.  Cool.  Now let's do random upper/lower case.  N=52, and 52**8 = 53 trillion, giving 45.6 bits of entropy.  Add in 0-9: N=62, 62**8 = 218 trillion, giving 47.6 bits of entropy.

So, back to my XKCD-style passwords.  My original set of 2126 words, taking 4 at a time, gives 2126**4 = 20 trillion, which is 44.2 bits.  My reduced set of 709 short words, taking 5 at a time gives 709**5 = 179 trillion, which is 47.3 bits.

However, see my next blog post for an observation about random password generators and entropy.


MORE WORDS?

My list of 2126 words actually comes from a list of 3000 words from Education First.  I filtered it to limit word length to 7 or fewer characters, resulting in my 2126 words.  Note to the rigorous: you'll find that I'm 2 words short; it made my code easier to ignore the first and last words.

So how about if I remove that filter and pick 4 words from the entire set of 2998?

2998 ** 4 = 80 trillion, which is 46 bits.  I.e. going from 2126 words to 2998 increases the entropy by 2 bits.  My average password length jumps to 25.6, which is 5 more characters.  I tried a few other word length limits and decided that 7 is best.


THE PROGRAM

See https://github.com/fordsfords/pgen Be sure to use "-r" if generating an actual password you want to use.

Sunday, January 10, 2016

Saying goodbye to a bit of personal history

Ever since I was *very* young, I've been interested in science and technology.  At some point in my teens, maybe 40 years ago, I wanted a better VOM (Volt-Ohm-Milliamp meter) than the junky one I had picked up, so I did some research and spent precious funds on a high-impedance FET meter:



It saw pretty heavy use about 5 years, but as I transitioned from electronics to digital logic, and from that to software, my need for it dropped.  I've probably used it twice in the past 15 years, probably for checking if an electrical outlet is live.

As my previous post indicates, I've just gotten a single-board computer, and I was trying to indirectly measure the value of the pull-up resistor on an open-collector output.  I need a reasonably accurate, high impedance meter, so I got out my old FET.

Alas, the two small selector switches were frozen.  Not sure why or how -- it's a *switch* for goodness sake -- but I can't use it if I can't turn it on.  I'll take it apart, but I don't have high hopes.

It's passing is a sad event for me, but why?  Is it just nostalgia?  Longing for a simpler time?  Missing my childhood?  I think it's more than that.  There are certain things that have come to represent turning points in my life.  The meter may not have *caused* a significant shift in my life path, but it had come to represent it.  And maybe its a mortality thing too, like a piece of me died.

Oh well, I'll probably get a cheap DVM.