Tuesday, July 5, 2022

The Time I Found a Hardware Bug

As I approach age 65, I've been doing some reminiscing. And discovering that my memory is imperfect. (pause for laughter) So I'm writing down a story from earlier days in my career. The story has no real lessons that are applicable today, so don't expect to gain any useful insights. But I remember the story fondly so I don't want it to slip completely from my brain.

WARNING: this story is pretty much self-indulgent bragging. "Wow, that Steve guy sure was smart! I wonder what happened."

I think it was 1987, +/- 2 years (it was definitely prior to Siemens moving to Hoffman Estates in 1989).

The product was "Digitron", an X-ray system based on a Multibus II backplane and an 8086 (or was it 80286?) CPU board running iRMX/86. I think the CPU board was off-the-shelf from Intel, but most of the rest of the boards were custom, designed in-house.

At some point, we discovered there was a problem. We got frequent "spurious interrupts". I *think* these spurious interrupts degraded system performance to the degree that sometimes the CPU couldn't keep up with its work, resulting in a system failure. But I'm not sure -- maybe they just didn't like having the mysterious interrupts. At any rate, I worked on diagnosing it.

The CPU board used an 8259A interrupt controller chip (datasheet here or here) that supported 8 vectored interrupts. There was a specific hardware handshake between the 8259A and the CPU chip that let the 8259A tell the CPU the interrupt vector. The interrupt line is asserted and had to be held active while the handshake took place. At the end of the hardware handshake, the CPU calls the ISR, which interacts with the interrupting hardware. The ISR clears the interrupt (i.e. makes the hardware stop asserting the interrupt line) before returning.

According to the 8259A datasheet, spurious interrupts are the result of an interrupt line being asserted, but then removed, before the 8259A can complete the handshake. Essentially the chip isn't smart enough to remember which interrupt line was asserted if it went away too quickly. So the 8259A declares it "spurious" and defaults to level 7.

I don't remember how I narrowed it down, but I somehow identified the peripheral board that was responsible.

For most of the peripheral boards, there was a single source of interrupt, which used an interrupt line on the Multibus. But there was one custom board (don't remember which one) where they wanted multiple sources of interrupt, so the hardware designer included an 8259A on that board. Ideally, it would have been wired to the CPU board's 8259A in its cascade arrangement, but the Multibus didn't allow for that. So the on-board 8259A simply asserted one of the Multibus interrupt lines and left it to the software to determine the proper interrupt source. The 8259A was put in "polled mode" and ISR for the board's interrupt would read the status of the peripheral 8259A to determine which of the board's "sub-interrupts" had happened. The ISR would then call the correct handler for that sub-interrupt.

Using an analog storage scope, I was able to prove that the peripheral board's 8259A did something wrong when used in its polled mode. The peripheral board's 8259A asserted the Multibus interrupt level, which led to the CPU board properly decoding the interrupt level and invoking the ISR. The ISR then performed the polling sequence, which consisted of reading the status and then writing something to clear the interrupt. However, the scope showed that during the status read operation, while the multibus read line was asserted, the 8259A released its interrupt output. When the read completed, the 8259A re-asserted its interrupt. This "glitch" informed the CPU board's 8259A that there was another interrupt starting. Then, when the ISR cleared the interrupt, the 8259A again released its interrupt. But from the CPU board's 8259A's point of view, that "second" interrupt was not asserted long enough for it to handshake with the CPU, so it was treated as a spurious interrupt.

(Pedantic aside: although I use the word "glitch" to describe the behavior, that's not right terminology. A glitch is typically caused by a hardware race condition and would have zero width of all hardware had zero propagation delay. This wasn't a glitch because the release and re-assert of the interrupt line was tied to the bus read line. No race condition. But it resembled a glitch, so I'll keep using that word.)

HARDWARE BUG?

The polling mode of operation of the 8259A was a documented and supported use case. I consider it a bug in the chip design that it would glitch the interrupt output during the status read operation. But I didn't have the contacts within Intel to raise the issue, so I doubt any Intel engineer found out about it.

WORKAROUND

I designed a simple workaround that consisted of a chip - I think it was a triple, 3-input NAND gate, or maybe NOR, possibly open collector - wired to be an AND function. The interrupt line was active low, so by driving it with an AND, it was possible to force it to active (low). I glued the chip upside-down onto the CPU board and wire-wrapped directly to the pins. One NAND gate was used as an inverter to make another NAND gate into an AND circuit. One input to the resulting AND was driven by the interrupt line from the Multibus, and the other input was driven by an output line from a PIO chip that the CPU board came with but wasn't being used. I assume I had to cut at least one trace and solder wire-wrap wire to pads, but I don't remember the details.

The PIO output bit is normally inactive, so that when the peripheral board asserts an interrupt, the interrupt is delivered to the CPU. When the ISR starts executing, the code writes the active value to the PIO bit, which forces the AND output to stay low. Then the 8259A is polled, which glitched the multibus interrupt line, but the AND gate keeps the interrupt active, masking the glitch. Then the ISR writes a inactive to the PIO and clears the interrupt, which releases the Multibus interrupt line. No more spurious interrupt.

Kludge? Hell yes! And a hardware engineer assigned to the problem figuratively patted me on the head and said they would devise a "proper" solution to the spurious interrupt problem. After several weeks, that "proper" solution consisted of using a wire-wrap socket with its pins bent upwards so that instead of wire-wrapping directly to the chip's pins, they wire-wrapped to proper posts.

Back in those days, people didn't have a digital camera in their pocket, so I have no copy of the picture I took of the glitch. And I'm not confident that all the details above are remembered correctly. E.g. I kind of remember it was a NOR gate, but that doesn't make logical sense. Unless maybe I used all 3 gates and boolean algebra to make an AND out of NOR gates? I don't remember. But for sure the point was to mask the glitch during the execution of the ISR.

But I remember the feeling of vindication. My hardware training made me valuable over a pure software engineer.

Sunday, July 3, 2022

Math Nerd?

I just made two posts on recreational math. I'm what you might call a math nerd wannabe. I'm NOT a math nerd - I don't have a flair or the rigor required to make that claim - but I've always wished I were.

I used to read Martin Gardner in Scientific American. And I tried to enjoy it with mixed success. More recently, I subscribed to Numberphile, but finally unsubscribed when I realized I tend to lose focus about halfway through most of the videos. And 3Blue1Brown? The same but more. It's not just that I have trouble following the math (although I often do), I'm just not interested enough to try hard enough. But darn it, I wanna be! :-)

When I was very young, I aspired to be a scientist so I could invent cool things. Never mind that theoretical scientists and inventors tend to be very different kinds of people; in both cases, I don't have the knack. I think I'm more of a hobbyist who discovered that he could be paid well for his hobby. I've never invented a cool algorithm, but I've enjoyed implementing cool algorithms that real scientists have invented. I like tinkering, taking things apart to see what makes them tick, and sometimes even putting them back together.

Not that there's anything wrong with this. I've led, and continue to lead, a happy, productive, and fulfilling life. I'm reasonably well-liked and respected by my peers. I have no complaints about how life has treated me.

But I am sometimes wistful about what might have been ... being a math nerd/scientist/inventor would be pretty cool too.

Anyway, I won't be making regular posts about math ... unless I do. ;-)

Information in the Noise

Wow, a non-math nerd posting twice about math. What's that about?

Derek Muller of Veritasium posted a video about the 100 prisoners puzzle (I like "puzzle" in this context better than "riddle" or "problem"). Unlike my earlier post, I have no complaints about this video. Derek is one of the top-tier educational YouTubers, and he did a fantastic job of explaining it. (As before, I'm not going to explain it here; watch his video. Seriously, just watch it.)

So why do I feel the need to comment? I guess I feel I have a small but interesting (to me) tidbit to add.

Derek et al. describe the puzzle's "linked list" solution (my name) as giving a counter-intuitive result, and I guess I have to agree. The numbers are distributed to the boxes randomly, so how could any strategy give a prisoner a better chance of success than random selection? IT'S RANDOM!!!!!!

AN INTUITIVE UNDERSTANDING

And here's my tidbit: it's not as random as it seems. For this puzzle, the numbers are assigned randomly to boxes, without replacement. I.e., you won't find a given number in more than one box, and no number between 1 and 100 is skipped. This is obvious for the setup of the puzzle, but randomizing without replacement puts constraints on the system. Those constraints add information to the noise.

If prisoner number 13 randomly opens box 52, he knows he has a one in 100 chance of seeing his number in that box. He opens it and sees the number 1. He now knows FOR SURE that no other box has the number 1 in it. So his second random choice will have a one in 99 chance of being his number. Each choice gives some information that affects the probability of the next choice. (I.e., the samples are not independent.)

It is these constraints that lead directly to the cycles that are at the heart of the puzzle. And clever people have calculated the probability of having a cycle greater than 50 to be about 0.688. So the "linked list" strategy has ~= 0.312 probability of the prisoners being set free. That's the point of Derek's video.

Let's ruin the puzzle for a moment. Let's assign a random number between 1 and 100 to each box with replacement. It's entirely possible, even probable, that you'll have duplicates (the same number in more than one box) and skips (a number that is not in any box). One effect of this change is that the numbers will no longer necessarily be arranged in cycles. You can have many numbers NOT in a cycle. So the "linked list" solution to the puzzle doesn't improve your chances of survival over pure chance. Getting rid of the "without replacement" constraint removes the information from the noise.

This is how I get an intuitive feeling that you can have a much higher probability of success with the "linked list" solution to the original puzzle - you're taking advantage of the information that's in the noise.

WITH REPLACEMENT

What about my ruined version, where the numbers are assigned to boxes with replacement? To start with, let's calculate the probability that you get a distribution of numbers in boxes that is even possible for the prisoners to win (i.e., every number 1-100 is assigned exactly once). My probability-fu is weak, but I'll try. I think it is (100!)/(100**100) ~= 9.33e-48. Wow, that's a really low probability.

On the off chance that you get a solvable distribution, the probability of success with the linked list solution is ~= 0.312. So the total probability of success for my ruined version, WITH the linked list solution, is ~= 6.4e-43. If instead the prisoners choose their boxes randomly, then it's ~= 7.36e-73.

The prisoners had better appeal to Amnesty International.

There is no Vase

 I'm not a math nerd, so I probably shouldn't be posting on this subject. But when has a lack of expertise ever stopped me from having an opinion?

I just watched the Up and Atom video: An Infinity Paradox - How Many Balls Are In The Vase? In it, Jade describes the Ross–Littlewood paradox related to infinite pairings. I liked the video but was not satisfied with the conclusion.

I won't give the background; if you're interested in this post, go watch the video and skim the Wikipedia article. Basically, she presents the "Depends on the conditions" solution (as described in the Wikipedia article) without mentioning the "underspecified" and "ill-informed" solutions. And I guess that's an OK choice since the point of her video was to talk about infinities and pairings. But she kept returning to the question, "how many balls are there *actually*?"

Infinity math has many practical applications, especially if the infinity is related to the infinitely small. An integral is frequently described as the sum of the areas of rectangles under a curve as the width of the rectangles becomes infinitesimal - i.e., approaches zero. This gives a mathematically precise calculation of the area. Integrals are a fundamental tool for any number of scientific and engineering fields.

But remember that math is just a way of modeling reality. It is not *really* reality.

There is no such thing as an infinitesimal anything. There is a minimum distance, a minimum time, and the uncertainty principle guarantees that even as you approach the minimum in one measure, your ability to know a different measure decreases. When the numbers become small enough, the math of the infinitesimal stops being an accurate model of reality, at least not in the initially intuitive ways.

But they are still useful for real-world situations. Consider the paradox of Achilles and the tortoise, one of Zeno's paradoxes. (Again, go read it if you don't already know it.) The apparent paradox is that Achilles can never catch up to the tortoise, even though we know through common experience that he will catch up with and pass the tortoise. The power of infinity math is that we can model it and calculate the exact time he passes the tortoise. The model will match reality ... unless an eagle swoops down, grabs the tortoise, and carries it across the finish line. :-)

But models can break down, even without eagles, and a common way for infinity models to break down is if they don't converge. 1/2 plus 1/4 plus 1/8 plus 1/16 ... converges on a value (1). As you add more and more terms, it approaches a value that it will never exceed with a finite number of terms. So we say that the sum of the *infinite* series is *equal* to the limit value, 1 in this case. But what about 1/2 plus 1/3 plus 1/4 plus 1/5, etc.? This infinite series does NOT converge. It grows without bound. And therefore, we cannot claim that it "equals" anything at infinity. We could claim that the sum equals infinity, but this is not well defined since infinity is not a number.

Here's a similar train of thought. What is 1/0? If you draw a graph of 1/X, you will see the value grow larger and larger as X approaches 0. So 1/0 must be infinity. What is 0 * (1/0)? Again, if you graph 0 * (1/X), you will see a horizontal line stuck at zero as X approaches 0. So I guess that 0 * (1/0) equals 0, right? Not so fast. Let's graph X * (1/X). That is a horizontal line stuck at 1. So as X approaches 0, X * (1/X) equals 1. So 0 * 1/0 equals 1. WHICH ONE IS RIGHT???????? What *really* is 0 * (1/0)?

The answer is that the problem is ill-formed. The 1/X term does not converge. The value of 1/0 is not "equal to infinity", it is undefined. My train of thought above is similar to the fallacious "proof" that 1 equals 2. And it seems to me that the "proof" that the number of balls in the vase can be any number you want it to be is another mathematical fallacy.

The only way to model the original vase problems is to draw a graph of the number of balls in the vase over time. Even in the case where you remove the balls sequentially starting at 1, you will see the number of balls growing without bound as time proceeds. Since this function does not converge, you can't say that it "equals" anything at the end. But it tends towards infinity, so claiming that it equals some finite value *at* the end is another example of an invalid application of math to reality.

But I shouldn't complain. Jade used the "paradox" to produce an engaging video teaching about pairing elements in infinite sets. And she did a good job of that.

Wednesday, May 4, 2022

CC0 vs GPL

I've been writing little bits and pieces of my own code for many years now. And I've been releasing it as CC0 ("public domain"; see below). I've received a bit of criticism for it, and I guess I wanted to talk about it.

I like to write software. And I like it when other people benefit from my software. But I don't write end-user software, so the only people who benefit from my code are other programmers. But that's fine, I like a lot of programmers, so it's all good.

There are different ways I could offer my software. Much open-source software is available under a BSD license, an Apache license, or an MIT license. These differ in ways that are probably important to legal types, but for the most part, they mean that you can use the code for pretty much any purpose as long as you give proper attribution to the original source. So if I write a cool program and use some BSD code, I need to state my usage of that code somewhere in my program's documentation.

So maybe I should do that. After all, if I put in the effort to write the code, shouldn't I get the credit?

Yeah, that and a sawbuck will get me a cup of coffee. I don't think those attributions are worth much more than ego-boosting, and I guess my programmer ego doesn't need that boost.

With the exception of the GNU Public License (GPL), I don't think most open source ego-boosting licenses buy me anything that I particularly want. And they do introduce a barrier to people using my code. I've seen other people's code that I've wanted but decided not to use because of the attribution requirement. I don't want the attributions cluttering up my documentation, and adding licensing complications to anybody who wants to use my code. (For example, I was using somebody else's getopt module for a while, but realized I wasn't giving proper attribution, so I wrote my own.)

But what about GNU?

The GPL is a different beast. It is intended to be *restrictive*. It puts rules and requirements for the use of the code. It places obligations on the programmers. The stated goal of these restrictions is to promote freedom.

But I don't think that is really the point of GPL. I think the real point of GPL is to let certain programmers feel clean. These are programmers who believe that proprietary software is evil, and by extension, any programmer who supports proprietary software is also evil. So ignoring that I write proprietary software for a living, my CC0 software could provide a small measure of support for other proprietary software companies, making their jobs easier. And that makes me evil. Not Hitler-level evil, but at least a little bit evil.

If I license my code under GPLv3, it will provide the maximum protection possible for my open-source code to not support a proprietary system. And that might let me sleep better at night, knowing that I'm not evil.

Maybe somebody can tell me where I'm wrong on this. Besides letting programmers feel clean, what other benefit does GPL provide that other licenses (including CC0) don't?

I've read through Richard Stallman's "Why Open Source Misses the Point of Free Software" a few times, and he keeps coming back to ethics, the difference between right and wrong. Some quotes:

  • "The free software movement campaigns for freedom for the users of computing; it is a movement for freedom and justice."
  • "These freedoms are vitally important. They are essential, not just for the individual users' sake, but for society as a whole because they promote social solidarity—that is, sharing and cooperation."
  • "For the free software movement, free software is an ethical imperative..."
  • "For the free software movement, however, nonfree software is a social problem..."
I wonder what other things a free software advocate might believe. Is it evil to have secret recipes? Should Coke's secret formula be published? If I take a recipe that somebody puts on youtube and I make an improvement and use the modified recipe to make money, am I evil? What if I give attribution, saying that it was inspired by so-and-so's recipe, but I won't reveal my improvement? Still evil?

How about violin makers that have secret methods to get a good sound? Evil?

I am, by my nature, sympathetic to saying yes to all of those. I want the world to cooperate, not compete. I used to call myself a communist, believing that there should be no private property, and that we should live according to, "From each according to his ability, to each according to his needs". And I guess I still do believe that, in the same way that I believe we should put an end to war, cruelty, apathy, hatred, disease, hunger, and all the other social and cultural evils.

Oh, and entropy. We need to get rid of that too.

But none of them are possible, because ... physics? (That's a different subject for a different day.)

But maybe losing my youthful idealism is nothing to feel good about. Instead of throwing up my hands and saying it's impossible to do all those things, maybe I should pick one of them and do my best to improve the world. Perhaps the free software advocates have done exactly that. They can't take on all the social and cultural ills, so they picked one in which they could make a difference.

But free software? That's the one they decided was worth investing their altruism?

Free software advocates are always quick to point out that they don't mean "free" as in "zero cost". They are referring to freedoms - mostly the freedom to run a modified version of a program, which is a freedom that is meaningless to the vast majority of humanity. I would say that low-cost software is a much more powerful social good. GPL software promotes that, but so do the other popular open-source licenses. (And so does CC0).

So anyway, I guess I'm not a free software advocate (big surprise). I'll stick with CC0 for my code.

What is CC0

The CC0 license attempts to codify the concept of "public domain". The problem with just saying "public domain" is that the term does not have a universally agreed-upon definition, especially legally. So CC0 is designed to approximate what we think of as public domain.

Tuesday, February 15, 2022

Pathological cases

Jacob Kaplan-Moss said something wonderful yesterday:

Designing a human process around pathological cases leads to processes that are themselves pathological.

This really resonated with me.

Not much to add, just wanted to share.

Thursday, February 3, 2022

Nice catch, Grammarly

 I was writing an email and accidentally left out a word. I meant to write, "I've asked the team for blah...". But I accidentally omitted "asked", so it just said, "I've the team for blah...".

Grammarly flagged "I've", suggesting "I have". Since my brain still couldn't see my mistake, I thought it was complaining about "I've asked the team...". I was about to dismiss, but decided to click the "learn more" link. It said that, except in British English, using the contraction "I've" to express possession sounds unnatural or affected. As in: "Incorrect: I've a new car".

Ah HAH! That triggered me to notice the missing word "asked". I put it in, and Grammarly was happy. I consider this a good catch. Sure, it misdiagnosed the problem, but it knew it was a problem.

Thanks, Grammarly!


Wednesday, January 5, 2022

Bash Process Substitution

I generally don't like surprises. I'm not a surprise kind of guy. If you decide you don't like me and want to make me feel miserable, just throw me a surprise party.

But there is one kind of surprise that I REALLY like. It's learning something new ... the sort of thing that makes you say, "how did I not learn this years ago???"

Let's say you want the standard output of one command to serve as the input to another command. On day one, a Unix shell beginner might use file redirection:

$ ls >ls_output.tmp
$ grep myfile <ls_output.tmp
$ rm ls_output.tmp

On day two, they will learn about the pipe:

$ ls | grep myfile

This is more concise, doesn't leave garbage, and runs faster.

But what about cases where the second program doesn't take its input from STDIN? For example, let's say you have two directories with very similar lists of files, but you want to know if there are any files in one that aren't in the other.

$ ls -1 dir1 >dir1_output.tmp
$ ls -1 dir2 >dir2_output.tmp
$ diff dir1_ouptut.tmp dir2_output.tmp
$ rm dir[12]_output.tmp

So much for conciseness, garbage, and speed.

But, today I learned about Process Substitution:

$ diff <(ls -1 dir1) <(ls -1 dir2)

This basically creates two pipes, gives them names, and passes the pipe names as command-line parameters of the diff command. I HAVE WANTED THIS FOR DECADES!!!

And just for fun, let's see what those named pipes are named:

$ echo <(ls -l dir1) <(ls -1 dir2)
/dev/fd/63 /dev/fd/62

COOL!

(Note that echo doesn't actually read the pipes.)


VARIATION 1 - OUTPUT

The "cmda <(cmdb)" construct is for cmda getting its input from the output of cmdb. What about the other way around? I.e., what if cmda wants to write its output, not to STDOUT, but to a named file, and you want that output to be the standard input of cmdb? I'm having trouble thinking here of a useful example, but here's a not-useful example:

cp file1 >(grep xyz)

I say this isn't useful because why use the "cp" command? Why not:

cat file1 | grep xyz

Or better yet:

grep xyz file1

Most shell commands write their primary output to STDOUT. I can think of some examples that don't, like giving an output file to tcpdump, or the object code out of gcc, but I can't imagine wanting to pipe that into another command.

If you can think of a good use case, let me know.


VARIATION 2 - REDIRECTING STANDARD I/O

Here's something that I have occasionally wanted to do. Pipe a command's STDOUT to one command, and STDERR to a different command. Here's a contrived non-pipe example:

process_foo 2>err.tmp | format_foo >foo.txt
alert_operator <err.tmp
rm err.tmp

You could re-write this as:

process_foo > >(format_foo >foo.txt) 2> >(alert_operator)

Note the space between the two ">" characters - this is needed. Without the space, ">>" is treated as the append redirection.

Sorry for the contrived example. I know I've wanted this a few times in the past, but I can't remember why.


And for completeness, you can also redirect STDIN:

cat < <(echo hi)

But this is the same as:

echo hi | cat

I can't think of a good use for the "< <(cmd)" construct. Let me know if you can.


EDIT:

I'm always amused when I learn something new and pretty quickly come up with a good use for it. I had some files containing a mix of latency values and some log messages. I wanted to "paste" the different files into a single file with multiple columns to produce a .CSV. But the log messages were getting in the way.

paste -d "," <(grep "^[0-9]" file1) <(grep "^[0-9]" file2) ... >file.csv

Done! :-)

Tuesday, November 9, 2021

Keychron K2 Keyboard

 I mentioned buying a Keychron K2 keyboard almost two years ago, but that post was primarily about a different vendor's keyboard which was a fail.

I just bought a second Keychron K2 keyboard ("blue" switches), but not because of a problem. It's because the keyboard is wonderful, and I want a second one to keep at an alternate worksite.

The laptop keyboard on the 2017-vintage Macbook Pro is almost unusable. Really, even the 2015-vintage Air's laptop keyboard is not that great. I prefer a full-stroke "clicky" keyboard with good tactile feedback.

Enter the Keychron K2. Lots of nice features that I won't bother listing since I don't use most of them and the site explains them fine. I like the noisy "blue" style switches, but you can get quieter ones.

Also, it has all the Mac-specific special keys, right there (I don't like the touch-bar at the top of the Macbooks.) And I also like the compact size.

My only complaint is that the key caps are not dual-injected, which means that the paint can wear off the tops of the frequently-used keys (E and A). But this is a problem for most keyboards I use; I apparently have a heavy typing hand.

Count me as a satisfied customer.

Thursday, August 5, 2021

Timing Short Durations

 I don't have time for a long post (HA!), but I wanted to add a pointer to https://github.com/fordsfords/nstm ("nstm" = "Nano Second Timer"). It's a small repo that provides a nanosecond-precision time stamp portably between MacOS, Linux, and Windows.

Note that I said precision, not resolution. I don't know of an API on Windows that gives nanosecond resolution. The one Microsoft says you should use (QueryPerformanceCounter()) always returns "00" as the last two decimal digits. I.e. it is 100 nanosecond resolution. They warn against using "rdtsc" directly, although I wonder if most of their arguments are mostly no longer applicable. I would love to hear if anybody knows of a Windows method of getting nanosecond resolution timestamps that is reliable and efficient.

One way to measure a short duration "thing" is to time doing the "thing" a million times (or whatever) and take an average. One advantage of this approach is that taking a timestamp itself takes time; i.e. making the measurement changes the thing you are measuring. So amortizing that cost over many iterations minimizes its influence.

But sometimes, you just need to directly measure short things. Like if you are histogramming them to get the distribution of variations (jitter).

I put some results here: https://github.com/fordsfords/fordsfords.github.io/wiki/Timing-software


Friday, July 9, 2021

More Perl "grep" performance

In an earlier post, I discovered that a simple Perl program can outperform grep by about double. Today I discovered that some patterns can cause the execution time to balloon tremendously.

I have a new big log file, this time with about 70 million lines. I'm running it on my newly-updated Mac, whose "time" command has slightly different output.

Let's start with this:

time grep 'asdf' cetasfit05.txt
... 39.388 total

time grep.pl 'asdf' cetasfit05.txt
... 21.388 total

About twice as fast.


Now let's change the pattern:

time grep 'XBT|XBM' cetasfit05.txt
... 24.787 total

time grep.pl 'XBT|XBM' cetasfit05.txt
... 18.940 total

Still faster, but nowhere near twice as fast. I don't know why 

Now let's add an anchor:

time grep '^XBT|^XBM' cetasfit05.txt
... 25.580 total

time grep.pl '^XBT|^XBM' cetasfit05.txt
... 3:08.25 total

WHOA! Perl, what happened????? 3 MINUTES???

My only explanation is that Perl tries to implement  a very general regular expression algorithm, and grep implements a subset, and that might cause Perl to be slow in some circumstances. For example, maybe the use of alternation with anchors introduces the need for "backtracking" under some circumstances, and maybe grep doesn't support backtracking. In this simple example, backtracking is probably not necessary, but to be general, Perl might do it "just in case". (Note: I'm not a regular expression expert, and don't really know when "backtracking" is needed; I'm speculating without bothering to learn about it.)

Anyway, let's make a small adjustment:

time grep.pl '^(XBT|XBM)' cetasfit05.txt
... 17.910 total

There, that got back to "normal".

I guess multiple anchors in a pattern is a bad idea.


P.S. - even though this post is about Perl, I tried one more test with grep:

time grep 'ASDF' cetasfit05.txt
... 26.132 total

Whaaa...? I tried multiple times, and lower-case 'asdf' always takes about 40 seconds, and upper-case 'ASDF' always takes about 27 seconds. I DON'T UNDERSTAND COMPUTERS!!! (sob)

Wednesday, March 17, 2021

Investment Advice from an Engineer

 I have some financial advice regarding investing in stocks. But be aware that, while I am pretty wealthy, the vast majority of my wealth came from salary from a good-paying career. You will *NOT* get rich off the stock market by following my advice.


THE ADVICE

Put money every month into an U.S. exchange-traded fund that is very broad market. Like SPDR. (I prefer VTI because it is even broader and has lower fees, but the differences aren't really that big). The goal is to keep buying and never selling, every working month of your life until you retire. (I don't have retirement advice yet.)

If the market starts going down, do *NOT* stop buying. In fact, if you can afford it, put more in. Every time the market goes down, put more in. The market will go back up, don't worry. 

The same cannot be said for an individual stock -- sometimes a company's stock will dive down and then stay down, basically forever. But the market as a whole won't do that. A dive might take a few days to recover, or might take a few years to recover. But it will recover. DON'T sell a broad fund if the market is going down. BUY it.


AND THIS WILL GET ME RICH?

No. It will give you the highest probability of a good return. Back when I was a kid, I was told to put money into a bank savings account. That was poor advice then, and is terrible advice now with interest rates close to zero. Putting money into a guaranteed bank account is guaranteed to underperform inflation. I.e. it will slowly but surely lose value.

Instead, tie your money to the overall market. The broad market has its ups and downs, but if you stick with it for a while, the overall trend is higher than inflation. 


WHAT IF I WANT TO GET RICH QUICK?

Well, you could buy a lottery ticket. That will get you rich the quickest. Assuming you win. Which you won't. Or you could go to Vegas and put a quarter into a slot machine.

But you're smarter than that. You know the chances of getting rich off of lotteries or gambling are too low. You want something that has some risk, but which will probably make you rich. And you're thinking the stock market.

Your best bet? Find a company currently trading for a few dollars a share, but is definitely for sure going to be the next Apple or Microsoft. One tiny problem: if it is definitely for sure going to be the next Apple or Microsoft, the stock price won't be a few dollars a share. It will be priced as if it already IS the next Apple or Microsoft. This is because there are tens of thousands of smart traders out there who are spending a HELL of a lot more time than you are trying to find the next big company. For a "sure thing" company that is already publicly traded, those tens of thousands have already bid up the price.

The only real chance for you to get in on the ground floor is to find a garage shop start-up, and invest in them. I have a rich friend who has made a lot of money doing exactly that. For every ten companies he invests in, nine go bankrupt within 5 years. And one goes off like a rocket.

That's how you do it. And unfortunately, you have to start out pretty rich to make this work. And you need to research the hell out of startups, and have a good crystal ball.

The only other way to do it is to find a company that the tens of thousands of smart traders thinks will NOT be a big deal, but you believe something they don't. Microsoft was publically traded at a low price for many years. The tens of thousands of smart traders in the 70s didn't think home computers would become a thing. And the few people who believed otherwise became rich.

The problem is that belief usually doesn't work very well at predicting success.

Look at BitCoin. I know several people who have made huge amounts of money on BitCoin. They did their initial investments based on a belief. A belief that BitCoin would become a real currency, just like dollars and euros; that people would be use BitCoin every day to buy and sell everything from gasoline to chewing gum. They looked at the theories of money and the principles of distributed control, and thought it would definitely for sure replace national currencies.

Those friends of mine were wrong. They made money for a completely different reason: speculators. Speculators buy things that are cheap and they think will go up in price. If enough speculators find the same thing to buy, the prices *does* go up. And more speculators jump in. BitCoin is a big speculative bubble, with almost no intrinsic value. (And yes, I know BitCoin is more complicated than that. But I stand by my 10,000-foot summary.)

Now don't get me wrong. Successful speculators DO become rich. Who am I to argue with success? But getting rich off of speculation is all about timing. You have to find the next big thing before most of the other speculators do, and then jump back out *before* it has run its course. Will BitCoin come crashing back down? Not necessarily. If enough people simply *believe* in it, it will retain its value. My own suspicion is that it will eventually crash but what do I know? I thought it would have crashed by now.

That's also what happened with GameStop. A group of Reddit-based speculators decided to pump up the price of a company. If you were in on it from the start, you probably made a ton of money. But once it hit the news, it was too late for you to get rich off of it. The best you could hope for was to make a little money and then get out FAST. But most people who jumped into GameStop after it had already made the news ended up losing money.

(BTW, "pump-and-dump" is against the law. I will be interested to find out if any of the Reddit-based traders get in trouble.)

Anyway, I know of many people who have taken a chance on a stock, and made some money. But they didn't get rich. And if they keep trying to replicate their early success, they'll end up winning some and losing some. And if they're smart, and work hard at it, they may out-perform the overall market in the long run. But remember - those tens of thousands of smart traders are also trying to out-perform the overall market. For you to do it repeatedly for many years probably requires expertise that those smart traders don't have. And you don't get rich quick this way, you just make some good money.


WHAT IF I JUST WANT TO PLAY THE MARKET

(shrugs) Everybody needs a hobby. I have a friend who goes to Vegas once a year. Sometimes he comes back negative, sometimes positive. He has absolutely no illusion that he will get rich in Vegas. He assumes he will either make a little or lose a little. And he has fun doing it. There's nothing wrong with spending money to have fun.

If you play the stock market as a game, where you aren't risking your financial future, then more power to you. But I knew one person who had to stop for his own emotional well-being. He started feeling bad every time he lost some money because he should have invested less, but also felt bad when he made money because he should have invested more. Overall he made money, but he had so much anxiety doing it that he decided it wasn't worth it.

Sunday, March 14, 2021

Circuit simulation

 I've been playing with designing simple digital circuits this weekend. Since my breadboards are not with me at the moment, I decided to look for circuit simulators.

Here's a nice comparison of several: https://www.electronics-lab.com/top-ten-online-circuit-simulators/

Before I found that comparison site, I tried out https://www.circuitlab.com/ and even threw them money for a month's worth. And I can say that I've gotten that much worth of enjoyment out of my tinkering this weekend, so money well-spent. But I knew that I didn't want to keep shelling out every month (I don't do digital design that much), and there was no way to export the circuits in a way that I could save them. So I kept looking.

Here's my CircuitLab home: https://www.circuitlab.com/user/fordsfords/

I haven't tried all the choices in the "top ten" list, but I did try the "CircuitJS1" simulator maintained by Paul Falstad and Iain Sharp. See https://github.com/sharpie7/circuitjs1 It isn't quite as nice as CircuitLab, but it's hard to argue with free, especially given my infrequency of use.

CircuitJS1 doesn't host users' designs. In fact, they don't integrate well with any form of storage. You can save your design to a local file, but the simulator doesn't do a good job of remembering file names. It presents you with a link containing a file name of the form "circuit-YYYYMMDD-HHMM.circuitjs.txt". You can save the linked contents with your own file name, but the next time you go to save, it obviously won't remember that name since it was a browser operation. All of this will make it a little inconvenient and perhaps error-prone to manage different projects. If I were doing a lot of hardware work, I would probably choose something else. But for occasional fiddling, this is fine.

Here's a simple state machine that checks even/odd parity of an input bit stream: http://www.falstad.com/circuit/circuitjs.html?startCircuitLink=https://www.geeky-boy.com/test.circuitjs.txt

If I want to make anything public, I'll make them as github projects.

Friday, January 8, 2021

Racism in America

 As my readers have no doubt noticed (all 2 of you!), I keep this blog pretty technical, without a lot of politics. And I intend to keep it that way ... for the most part. But occasionally I will let my politics peek out.

Yeah, you're expecting me to talk about the events in Washington DC in January, 2021. I might post about that some day, but I'm nowhere near ready yet.

No, I'm going to talk about a class I took last fall. See https://www.rootstorevolution.org/courses

These are left-leaning classes that not only teach history, they also encourage and facilitate activism. Their focus is on racism, but touch on other "isms" as well. I learned a heck of a lot of history that wasn't covered very well back when I went to high school. The material is well-researched and well-sourced. I consider myself a better person for having participated.

The bottom line is that it isn't enough to be "not racist". We have to be "anti-racist".

The classes are not cheap. As of this writing, they are $200 a pop. (And worth it, in my humble opinion.) That said, the class organizers don't want cost to be a barrier to participation, and are willing to make adjustments. Plus, I am willing to kick in $100 for anybody who comes to them from my recommendation. Tell 'em fordsfords sent ya. (-:

Anybody who wants more "informal" information on the classes, send me an email.

Steve

Sunday, November 29, 2020

Using sed "in place" (gnu vs bsd)

 I'm not crazy after all!

Well, ok, I guess figuring out a difference between gnu sed and bsd sed is not a sign of sanity.

I use sed a fair amount in my shell scripts. Recently, I've been using "-i" a lot to edit files "in-place". The "-i" option takes a value which is interpreted as a file name suffix to save the pre-edited form of the file. You know, in case you mess up your sed commands, you can get back your original file.

But for a lot of applications, the file being edited is itself generated, so there is no need to save a backup. So just pass a null string in as the suffix. No problem, right?


[ update: useful page: https://riptutorial.com/sed/topic/9436/bsd-macos-sed-vs--gnu-sed-vs--the-posix-sed-specification ]

 

GNU SED (Linux and Cygwin)

echo "x" >x
sed -i '' -e "s/x/y/" x
sed: can't read : No such file or directory

Hmm ... that's odd. It's trying to interpret that null string as a file name, not the value for the "-i" option. Maybe it doesn't like that space between the option and the value.

echo "x" >x
sed -i'' -e "s/x/y/" x

There. It worked. I'm generally in the habit of using a space between the option and the value, but oh well. Learn something new every day...


BSD SED (FreeBSD and Mac)

echo "x" >x
sed -i'' -e "s/x/y/" x
ls x*
x    x-e

Hey, what's that "x-e" file? Oh, it IGNORED the empty string and interpreted "-e" as the suffix! Put the space back in:

echo "x" >x
sed -i '' -e "s/x/y/" x

Works. No "x-e" file.


ARGH!

I use both Mac and Linux, and want scripts that work on both!

 

THE SOLUTION

Go ahead and always generate a backup file. And don't use a space between the option and the value. This works on both:

echo "x" >x
sed -i.bak -e "s/x/y/" x
rm x.bak

Works on Mac and Linux.

IT TOOK ME A LONG TIME TO FIGURE ALL THIS OUT!!! Part of the reason it took so long is that for the cases that don't work as intended, they tend to basically work. For example, the first Linux case where it tried to interpret '' as a file. It printed an error. But then it went to the actual file and processed it correctly. The command did what it was suppose to do, but it printed an error. For the BSD case, it created a backup file using "-e" as the suffix, but it went ahead and interpreted the sed command string as a command string, and properly processed the file. In both cases, the end goal was accomplished, but with unintended side effects.

Corner cases: the bane of programmers everywhere.