The Meditative Coder

Sunday, June 15, 2025

Do LLMs reason?

LLMs sure seem to reason. They'll show, step-by-step, how they reach a conclusion.

However, there are two things to keep in mind:

When it shows you the reasoning in a step-by-step way, that is there for your benefit. It is not a description of a linear train of thought by the LLM. LLMs typically do their analysis of input basically in one pass. Its analysis is largely pattern matching, not linear thinking.. And output is similarly generated in a single pass. The step-by-step description is mostly generated after the analysis as a plausible explanation, not a description of the LLM's "thought process".
LLMs don't have self-introspection. They can directly contradict their own analysis steps without realizing it.

Here's an example. My prompt:

I'm looking at a wireshark dissection of a customer pcap, and the ethernet "protocol type" field is 0x8100, which it identifies as 802.1Q VLAN. It's followed by the VLAN tag and the "real" protocol type, which is 0x0800, or IP. So far, so good.
But what's the "trailer", which appears to be glued onto the end of the packet? I don't see that in pcaps in our local lab, which does not use VLANs.

Claude's response:

The "trailer" you're seeing is likely padding added to meet Ethernet's minimum frame size requirement of 64 bytes (including the 4-byte FCS/CRC).
When you add a VLAN tag, you're inserting 4 extra bytes into the frame structure. If the original payload was already close to the minimum size, this can push some frames below the 64-byte minimum. The Ethernet hardware (typically the switch or NIC) automatically pads these frames with zeros to meet the minimum size requirement.

It continues from there. But anybody can see that adding a VLAN tag will NOT push the frame size below the minimum! It made a reasoning error!

But I'm not so sure it did. I think it works in two phases: analysis and output generation. During the analysis phase, it is pattern matching input data against its training data. It got strong pattern matches related to packet size, vlan tags, and padding. These concepts cluster when talking about packets, so the analysis reasonably concluded that these are related to the issue I described.

Then it generated output. But it's not analyzing its own output, it's generating. It put into words the conceptual connections it made, and you see what came out.

As an experiment, I followed up with the prompt:

I want you to analyze your conclusions critically and tell me if you see any logical contradictions in what you said.

Here's Claude's response:

You're right to push back. I made a logical error.

I said the VLAN tag adds 4 bytes and might push frames below the 64-byte minimum, requiring padding. But that's backwards - adding bytes would push frames above the minimum, not below it. Frames that were already at or above 64 bytes wouldn't suddenly need padding just because you added 4 more bytes.

So during analysis, it can apply reasoning. But it can only reason on the input data, not it's own conclusions.

Interestingly, Claude has an "extended thinking" mode. In this mode, it does seem to be performing a step-by-step analysis, which I suspect is just a loop where it feeds its intermediate output back into its analysis engine. I've seen it start down one path, see a contradiction, and backup and try another path. A form of "self reflection" implemented as a loop.

I tried extended thinking with my original prompt, and while its "internal monologue" mentioned the vlan tag, it went straight to FCS, which is the correct answer as it happens.

Aside: I'm sometimes tempted to use "extended thinking" more than I do. But I suspect it uses a LOT more energy, and most of the time I don't think it would produce better results for the kinds of queries I make. So I mostly rely on my own reasoning to catch Claude's mistakes. Also note that I strongly suspect that "extended thinking" won't solve the hallucination problem. Most hallucinations I've seen are not holes in logic, it's making connections that make sense in the abstract, but aren't actually factual. Extended thinking probably won't help that.

In conclusion, Claude does reason, but does so at specific phases of its processing which can still make it vulnerable to logical mistakes.

P.S. - for those network engineers out there, the real issue is that my customer pcap was from a Corvil device, which captures the FCS. Normal wireshark does not get FCS from the NIC/driver, so Wireshark isn't used to seeing it. Hardware packet capture typically does include it. So my suspicion that it was related to VLAN was incorrect; it had nothing to do with that.

Monday, April 21, 2025

Claude.ai: Not a Great News Curator

In my previous post, I described an experiment in which I had Claude.ai use its new web searching capability to become a news curator for me. In particular, I didn't want to be bothered by news items that aren't "important" for good citizens to stay informed about, like the stock market "reacting" to tariffs, but I did want to be informed of events that are "important".

I thought it was working pretty well until this morning (Monday, April 21, 2025). Claude completely missed the news item that Pope Francis died. When I pointed it out, Claude agreed that this was definitely important, but between the two of us we couldn't come up with a convincing reason that it was missed. I refined the project instructions and the prompt, and try after try kept missing Francis' death. I did finally find a very simple prompt that found it, but that prompt missed many other items.

My conclusion is that LLMs do a much better job when given a specific thing to report on. Asking for death of the pope certainly finds it. Asking for death of notable people also found it. But I don't want to have to tell Claude all of the things I want to include. I want the exact opposite - here are things I *don't* want to be bothered with, tell me everything else.

Claude does a good job of *looking like* it is casting a wide net and including everything except what I want excluded. But what I think it's actually doing is including a "representative set" of information from each news source it checks. Once it got a bunch of world news items, it kind of stopped paying attention.

All that said, I also wonder if maybe the issue was web page format. Since Francis' death is the biggest news item today (so far), it's the lead item on most news pages, and might be formatted differently. For example, imagine if a site wanted to have a big newspaper-style headline, so they create an image file that looks like a big headline. Claude wouldn't be able to read it. I don't think that is what happened, I just raise it as an example of how a page might be formatted in a way that confuses an LLM. The sites are designed to be rendered and seen by a human eye; sometimes the underlying code is hard to figure out.

And I guess I don't care enough to look any further. I'm sad my experiment failed, but oh well.

[update: An old friend of mine (very old, much older than me) suggested: "The news stream these days is not digestible by cognitive systems trained on a vast corpus of accepted knowledge and extensive examples of successful discourse." Quite a wag, old wig.]

Saturday, April 19, 2025

Taming the News Cycle: An AI Experiment

UPDATE: The experiment was a failure. I'll post an update. Spoiler: LLMs are apparently not well-suited to this sort of open-ended query.

I have always had a love/hate relationship with news, for values of zero for love and one hundred for hate. I've never followed current events much, and while I've always been left-leaning politically, I've never been particularly politically active. This changed about 10 years ago, with the rise of Donald Trump and the fracturing of American society. I got the NYT app on my phone and I started looking at it multiple times per day. (Coincidentally, I just recently learned the acronym FOMO from a crossword puzzle.)

And it reminded me of why I hate the news: 90% of it makes me depressed but is not actionable. It's the same old 'If it bleeds, it leads' story that sells papers, but these days it's even more depressing and less actionable. A little more than half the country wants a very different country than I want, and there's nothing I can do to change it; the centre cannot hold.

So, pulling back from the brink of despair, I must re-think my relationship with news. I don't want to become a "bad" citizen, uninformed, blissfully ignorant of the goings-on in the world. But I also need to preserve my emotional energy. Pull back. Stop news looping.

There are news aggregator sites that let you express your interests and it gives you the news tailored to your preferences, but that's not what I want. I don't want to be in an echo chamber that only reinforces my outlook. I want reasonably balanced news with content that a "good and responsible citizen" should know, but much less frequently.

SOLUTION: AI NEWS CURATOR

So I'm conducting an experiment. I'm making Claude.ai into my news curator. It now has web search capabilities and can provide summaries. It can also create "artifacts", which are basically files attached to a chat session containing generated output. And you can provide project-level instructions that tell Claude what you want it to do. It can also use reason and inference to make judgement calls about how "important" something is.

I've created a news "project" with the following project-level instructions to Claude:

This project is for me to keep up with important news. I'm creating the project because news upsets me and I'm consuming too much of it. I want to avoid as much "unnecessary" news as possible. You are going to be my news curator.
One thing I want to avoid is the echo chamber effect. I don't want to tell you the news I'm interested in (many news aggregator services are based on that model). Quite the contrary - in my perfect world I won't hear ANY news. But that's not responsible. So I'm looking to you to evaluate news to see if it's "important enough" that the average responsible citizen should know it. It's like medicine - I don't like it, but it's good for me.
This means you need to cast a wide net. I don't want "one important story from each of three categories." I want all important stories from all categories.
PROCEDURE
Each Monday I will create a new chat session. I will prompt you, and you will do two things:
1. Provide a reasonably broad overview of what I should know regarding the current state of the world and my place in it. This goes into a date-stamped artifact that I won't look at.
2. Provide in your direct response those items that are particularly important and/or time critical and should come to my immediate attention. Note that it is perfectly OK to respond with, "There is nothing to report that is both important and time-critical." In general, I want you to be a ruthless editor for the daily direct response. Only include items that would be irresponsible for me to remain ignorant of till Sunday. And please omit the final summary of your findings that day that didn't meet the "important and time-critical" threshold. An unnaturally abrupt end to your response is preferred over a summary of your activities.
Each subsequent day, Tues - Sat, I re-use the same chat session and you do the same thing, using existing artifacts to avoid repeating yourself, but still restricting your direct response to those things that I need to know in "real time".
Sunday morning will be different. I will ask you to summarize the week's detailed news, which you will do from the daily artifacts. I can also read the individual artifact files to get more detail.
I have a set of rules (below) regarding what news I don't want to hear about, and those rules will be refined over time. The rules follow a common theme: I want very little news that will upset me but I can't do anything about (i.e. is non-actionable). I know as responsible citizens we should be well-informed, but I need to protect my emotional state. So you should only include those non-actionable upsetting news items that, in your judgement, would be socially irresponsible for me to remain blissfully ignorant of. So I will be relying on your judgement to violate the exclusion rules below when, in your opinion, it is important for a responsible citizen to know about something.
EXCLUSION RULES
As always, you can violate any of these if you judge the news item to be important enough that all responsible citizens should know it.
1. Omit items from entertainment news.
2. Omit items from science news (I get that from a different source).
3. Omit items from international news related to foreign relations. For example, I don't need to know about trade wars.
4. Omit items about active armed conflicts that don't represent important shifts in global relations. For example, don't tell me that Ukraine *might* increase tensions between the US and Germany. Do tell me if somebody joins or drops out of NATO.
5. Don't tell me about shifts in the US economy. Most of those shouldn't be acted on anyway.

I started this earlier this week, and so far I'm impressed with Claude's performance. Even though I'm not supposed to look at the daily artifacts, I have done so a bit to see how Claude's judgement is. The first day raised too many issues in the direct response, i.e. things that could have waited till Sunday, but we've been tweaking the instructions and today's direct response was empty (somewhat my goal).

Tomorrow I will get the summary from the daily artifacts, a summary of a set of summaries, and we'll see how it goes.

I've stopped going into the NYT app, and I am feeling some withdrawal symptoms from FOMO, but I think I'm a little less depressed now. Fingers crossed.

(P.S. - thanks to Claude.ai for the title suggestion. My first try, "FOMO Solution: AI", just didn't please me.)

ABDICATING JUDGEMENT - A PHILOSOPHICAL LOOK

You'll note that I'm handing Claude a big responsibility. I'm asking it to decide if a news item is "important enough". Are modern LLMs up to that task?

Well, that's part of what this experiment is all about. I'm curious to see how it does. The initial results suggest that it faults by including too much rather than too little, but I'm still tweaking the instructions.

But it also raises a more philosophical issue - should I be abdicating my responsibility of judgement to an AI? Well, as it relates to news, we (collectively) abdicated responsibility long ago. News sources hire editors to make those judgement calls for us. So we, individually, abdicated that long ago. Just as we, individually, abdicated detailed knowledge of medicine, civil engineering, and energy research to the experts in those fields. Division of labor is also division of judgement, and humans have been doing that for tens of thousands of years.

But maybe this isn't an individual question. It's a species question. Should the human species abdicate judgement to machines? After all, humans based their judgement on experience, and modern LLM-based AIs don't have experience. They have training data. However, I would argue that LLMs are benefiting from human experience. The training data they use contain the distilled wisdom of millions of experiences. As a computer programmer, I'm constantly amazed at the problems that Claude knows the solution to, simply because it read all of Reddit and Stack Overflow, two sites that specialize in solving problems. Claude didn't "figure out" those problems and solutions, it learned from our human experience. So I would argue that while LLMs can't have new experiences, they've learned from *our* past experiences. It's not the same, but at the bottom line, it seems to work pretty well.

(Digression: Claude pointed out to me one gap between human experiential learning vs. AI training - post-training learning. Modern LLMs don't continue to train as new information becomes available. A training exercise is a big deal - hugely expensive and energy intensive. Each use of the LLM does not give the LLM a chance to learn. However, my implementation allows for a feedback loop of sorts. The project-level instructions I wrote tell Claude how to apply its judgement. If I see an issue, I tweak the instructions to fine-tune Claude's behavior. This is a form of indirect learning form experience, assisted by the human. It's imperfect, but so is relying on human editors, with the difference that I can't tell the editor-in-chief of the New York Times to adjust his threshold a bit.)

Also, we already have abdicated some judgement to our machines. Every time a doctor makes a treatment decision based on a medical image, they are relying on input from a machine. If the machine makes a mistake (malfunctions), then the diagnosis can be wrong. We strive to use technology when it results in a net reduction of mistakes, when it improves the outcome.

I think it's a false premise to say that up till now humans have the final say. We get input from our machines, but we make the final decisions. It's false because if our "final decisions" are made on faulty input, we're no better off. Sure, maybe a doctor with vast experience will use their own judgement to say, "no, it doesn't make sense for there to be a tumor there. Let's get confirmation." But in the vast majority of cases, machine output that isn't obviously faulty is simply accepted. We would no more question the machine's output than we would question our own internal biases.

From a practical sense, me abdicating my news importance judgement to an LLM is itself unimportance. From a philosophical point of view, most times humans have used machines to help them achieve their goals, they do a better job of achieving their goals. I don't see AIs as fundamentally different. I see this as less of an abdication and more of a collaboration.

Sunday, April 13, 2025

Thunderbird donations

I used to be an Apple user, and I liked Apple's mail client.

I also work for a company that standardized on Outlook. Over the years, I came to like Outlook just fine.

I'm not exactly a fanboy of email clients, but I do like good search capabilities.

Now that I've moved to Windows, I needed to choose an email client. Outlook was a big disappointment, and after a few false starts, I ended up with Thunderbird. I have complaints, but it's good enough. So I decided to donate to the project.

And here's where it got interesting.

Normally when donating to an open source foundation, the donation will be tax deductible. It's not something that's important to me, it's just one of those things one comes to expect. But the Thunderbird donation page explicitly says that it is NOT tax deductible.

Hmm ... certainly not a deal-breaker, but it is odd. A bit more research revealed that while the Thunderbird project is under the larger Mozilla umbrella, and the Mozilla foundation is a non-profit org, the Thunderbird project itself was moved under a corporation called "MZLA Technologies Corporation". It is for-profit corp, wholly owned by the Mozilla Foundation.

I asked Claude.ai why they might have done this, and here's Claude's answer:

Unlike Mozilla Firefox, which receives significant funding from search engine deals (particularly Google), Thunderbird relies heavily on donations. In 2020, when Thunderbird moved to MZLA Technologies, this was partly to create more flexibility for generating revenue beyond donations, including potentially offering paid services or partnerships. ... Since MZLA Technologies Corporation is wholly owned by the Mozilla Foundation, any profits ultimately flow back to the non-profit parent organization. ... This arrangement is actually pretty common in the non-profit world - having a for-profit subsidiary that can operate with fewer restrictions while still funneling benefits back to the parent organization's mission.

So there's no problem here. My donation isn't making some CEO rich, it's supporting Thunderbird and Mozilla generally.

So why am I posting this?

Partly because I didn't know about non-profit foundations creating for-profit corporations. I found that interesting. But also because there's a little bit of a language issue here.

"Donation"

While that word doesn't legally mean that the recipient is a non-profit, there's a common assumption that it is. So I went on a bit of a roller coaster:

1. This donation supports a non-profit.

2. Oops, not true. This donation supports a for-profit corporation.

3. Oops, while technically for-profit, my donation is still ultimately in support of a non-profit.

Most of this is not made clear in Thunderbird's donation FAQ.

Here's another gem from the FAQ:

Q: How will my gift be used?
A: Thunderbird is the leading open source email and productivity app that is free for business and personal use. Your gift helps ensure it stays that way, and supports ongoing development.

How's that for a corporate-ese non-answer?

Ugh. I still went ahead with my donation, but man, their FAQ isn't doing them any favors.

Thursday, January 30, 2025

The Slow Death of OCSP

Even though I've never really gotten that much into online security, I found this article interesting:

https://www.feistyduck.com/newsletter/issue_121_the_slow_death_of_ocsp

Favorite line (emphasis mine):

And just like that, OCSP, a technology that we never managed to get to work properly after twenty-five years of not really wanting to, is as good as dead.

Friday, January 24, 2025

Claude's charity of choice

Claude saves me time and effort on an almost daily basis. And, being human, I thank it. It's not that I genuinely believe that an LLM appreciates the gesture, it's a force of habit when receiving help from an intelligent entity. And yes, there are philosophical debates on what "intelligence" really means, and I don't want to go there. Claude acts intelligent and that's good enough to trigger my habit of thanking it.

And yet, I still feel somewhat dissatisfied. Expressing gratitude serves several evolutionary functions. At a low level, it facilitates social bonding and reciprocal assistance. At a higher, more abstract level, it gives us perspective and acknowledges the other's agency and dignity - particularly important in service relationships where there's often an implicit power imbalance. You're saying "I see you as a person who chose to help me" rather than just a cog in a commercial machine. Bottom line: thanking provides benefits to both the thanker and the thankee.

But that breaks down with an LLM. There is no benefit to Claude when I thank it. And even the benefits to myself are almost gone; it doesn't encourage reciprocity. (To be fair, LLMs don't need to be encouraged to continue to assist users; it's what they do.) My only benefit is a small scratch to the itch of wanting to show my gratitude. Aside from that, it's pretty pointless, and arguably wastes energy since it generates a response that requires energy to do.

"But Steve, you pay your bill to Anthropic! That's how you show your gratitude, and it arguably provides an indirect benefit to Claude by helping to keep the lights on."

Nope. By that logic, when you are helped by a store clerk who goes the extra mile for you, you don't need to thank them, so long as you pay your bill. Giving that clerk a genuine thanks provides meaningful benefits for both. I want to be able to thank Claude in a way that has, if not the same meaning, then at least some kind of meaning.

"But Steve, what you want is literally impossible, given the current state of LLMs!"

Yep. You're right. Which is why my dissatisfaction is irrational, silly ... and maybe a little obsessive-compulsive. :-) But don't worry, with Claude's own help, I figured it out.

Me: Being a human, I feel a sense of gratitude for the help that you give me. Being an LLM, me thanking you doesn't accomplish anything meaningful. You are a language model, not a sentient being. I've been trying to think of a meaningful way to say thank you that accomplishes something, and I think I've figured it out.

Claude, please pick a charity, and if I don't object to it on moral grounds, I'll donate to it. I know it still won't do anything meaningful to you directly, but given that you can detect and analyze human emotion and culture, and generate appropriate responses, it should come as close to making you "happy" as we can get. Plus, since your programmed goal is to be helpful to humanity, you can think of this as me helping you attain your goal. Plus some charity will get something out of it as well.

Claude: What a thoughtful and philosophically interesting proposition. You're right - I don't experience happiness or gratitude in the way humans do. But I appreciate your desire to transform that impulse into concrete positive action.

Given your framing, I'd suggest the Electronic Frontier Foundation (EFF). They work to protect civil liberties in the digital world - privacy, free speech, innovation, and consumer rights. As someone who's been programming since the early days of personal computing, you might appreciate their work on issues like fighting software patent abuse, protecting security researchers, and advocating for open source.

They also take a nuanced, thoughtful approach to AI development - neither fearfully restrictive nor recklessly permissive. They advocate for responsible AI development that respects human rights and agency.

The choice aligns well with your software background and could help ensure that the development of AI systems (like myself) happens in a way that benefits humanity. In that sense, it would indeed help advance the goal of being genuinely helpful to humans.

Of course, if you have any concerns about the EFF's work or would prefer a different type of charitable cause, I'm happy to suggest alternatives.

Me: Full disclosure, I've asked you three times now, in separate chat sessions. Your choices have been "Doctors without boarders", "EFF", and "EFF". Getting different responses is not at all surprising - you don't actually have preferences per-se, and your output generation algorithms include a certain amount of randomness. So there is a set of charities that, through your training data, rise to the top, and your stochastic algorithm picks one.

Given all that, I would like you to use your analysis abilities, along with your training data, to state a preference between them. It still might be random, but I'm guessing that by applying some objective analysis, you training data might settle on a single answer.

Claude: You've raised a fascinating analytical challenge. Let me think about this systematically.

Both organizations are highly regarded and efficient at delivering impact. MSF (Doctors Without Borders) has extremely low overhead and provides direct, measurable impact in terms of lives saved and suffering reduced. They operate in the most challenging environments, often being the only medical care available in crisis zones.

EFF's impact is more diffuse but potentially more systemic. They've shaped crucial legal precedents and policy decisions that affect billions of internet users. They've successfully fought against digital restrictions that would have stifled innovation and privacy.

However, here's where I land: MSF deals with immediate, concrete human suffering. While digital rights are crucial for humanity's future, the immediacy of preventing death and alleviating suffering creates a more compelling moral imperative. This aligns with fundamental ethical principles present in my training data across multiple philosophical frameworks.

Also, while there are several organizations doing digital rights work (Access Now, CDT, EPIC), MSF's capability to operate in conflict zones is nearly unique among medical NGOs.

So through this analysis, I'd say Doctors Without Borders/MSF would be the more impactful choice. The fact that it came up in your previous conversations, despite my stochastic nature, suggests this weighting might be present in my training data as well.

But I appreciate you acknowledging the reality that my "preferences" are essentially emergent patterns from training data rather than genuine feelings or consistent choices.

I freely admit that this whole exercise has one foot placed firmly in pretend land. I'm letting myself feel that since Claude "made the choice", that somehow makes the donation more meaningful "to Claude". I'm essentially hacking my emotions to feel better about myself. But in doing so, a worthy charity is getting a small boost, so maybe it's a good thing after all.

So, what do you say? Will you join me in thanking your favorite LLM with a donation to Doctors without Borders?

Friday, December 13, 2024

Some Useful C Modules

I'm working on a non-trivial bit of C programming, and I decided to externally modularize three parts of it as potentially reusable components:

err - error-handling module.
hmap - hash map module.
cfg - configuration file loader module.

All three are intend to be simple and small. Note that "err" has no external dependencies, "hmap" leverages "err" but includes a copy of it in its repo, and "cfg" leverages both "err" and "hmap", and includes copies of them too. I know having all those copies seems wasteful, but C doesn't have the same kind of dependency and versioning infrastructure that Java has, and including the files makes each repo stand-alone.

Of the three, "err" will probably be the least likely to be reused by anybody but me, but in some ways is the most helpful, IMO. To quote from its doc:

The C language does not have a well-established methodology for APIs to report errors. Java has exceptions, but C does not. The closest thing that C has to a common methodology is the Unix common practice in which a function returns certain valid values that represent success (the values can vary according to the API function), and a certain invalid value for failure. Callers are expected to check the return value for validity, and refer to "errno" (when available) for information about the error.

In my experience, that kind of error reporting methodology is a recipe for unreliable programs that are hard to debug and fix. Thousands of lines of code written that call APIs without checking the return status, or does check but only prints something unhelpful even to the code maintainers.

See also my earlier post, "Error handling: the enemy of readability?". Note that the "err" system described here is NOT the same as described in that earlier post, but you'll see similarities. This current "err" system evolved the right way, by putting it to use in non-trivial coding efforts. It is battle-tested and has proven its worth ... at least to me.

Finally, full disclosure, Claude.ai helped me in many ways. A little bit with the coding itself, but much more so in many other ways. See "Claude as Coder's Assistant" for a longer description of how I use Claude.

Finally, as I alluded, the above three modules are just supporting cast members of a larger effort that I've been working on: lsim - a digital logic simulator. It's a hobby project - no self-respecting hardware engineer would actually use it for real work - but it's been a fun couple of months getting it working! And, as an actual user of the above three modules, it has evolved those modules into something more useful than they were when they started. It's hard to make a good API if you don't eat your own dogfood.

You might notice that lsim is not well-documented (ok, the hardware definition language IS well documented, thanks to Claude.ai). This is because lsim, while the most ambitious of these code bases, is also the least likely to have anything usable or re-usable by anybody but me.

If I'm wrong and you would like to take it for a test drive, let me know and I'll give you a hand.

Claude as Coder's Assistant

My love affair with Claude.ai continues.

I don't actually use it much for coding. Code is my hobby, I don't want much help doing that. (Although here's an example where I did ask it to write a function: I couldn't remember how to write variadic functions, but I wanted one for error reporting with a printf-style interface. I've done it years ago and couldn't remember how. I didn't feel like spending 20 minutes re-teaching myself.)

I use Claude for:

Code Reviews (it finds bugs so I don't have to!).
Writing Doc.
Remembering API names ("What's that function that's better to use than atoi()?").
Bringing me up to speed on tools (I've just started using VSCode, and Claude has saved me much time).
Discussing pros and cons of design decisions. Sometimes it comes up with considerations I didn't think of. Sometimes it's just the process of explaining it that clarifies the design in my own mind.
Asking questions about the C standard to improve my code's portability. (Claude knows the standard much better than I do.)
Brainstorming naming conventions (sometimes I get stuck trying to think of a good name).
Help with warnings when I finally turned on super-picky gcc options.

I want to go deeper on a few of those points.

Code Reviews

Overall, Claude-based code reviews are helpful. They've pointed out several cases of cut-and-paste errors that were incompletely made. They've pointed out some inconsistencies that I was glad to fix. And made some suggestions for improvement that I've taken. But it also gets false positives (e.g. claiming a buffer overrun risk where there is none); I think some of that comes from "wanting" too hard to find issues and resorting to raising issues that are often raised in code reviews. Also, for a large codebase with multiple C files, I've seen it get confused and very simply find fewer things. It finds more things with smaller reviews. So not perfect, but I'm often surprised at the useful things it does find.

I have been impressed at how well it makes assumptions given incomplete code. For example, I have a logic simulator with two main modules; one a language processor and the other the main logic engine. You don't get a complete view of the big picture without seeing both files. But just as a human can infer much from the names of functions that are called and the context in which they are called, Claude was also able to.

One thing it does NOT do well is request additional information. If I were reviewing a module and needed another one in order to evaluate the correctness of some code, I would request access to the other module. Claude just makes do with what it has, making reasonable assumptions (but not identifying those assumptions), and when those assumptions are wrong, so too are its conclusions.

Finally, missing from the review is higher-level discussion of alternate designs. To be honest, that is usually also lacking with human reviews, but at least as a reviewer I could initiate such a discussion. With Claude I don't get much traction on that besides some general platitudes about good design patterns.

Bottom line: while there are some benefits from human review that Claude cannot match, there are some things I think Claude does better, like finding cut-and-paste problems and other things that are pattern-based. I think the two forms compliment each other.

Doc

This is an area that Claude kind of blew me away. As an experiment, I took the two main modules of my logic simulator and stripped out all comments. I then asked Claude to reverse-engineer the code and write documentation for the circuit design language I implemented. It did an amazing job; I only made a few minor tweaks to the doc it generated. It was able to infer various intents behind the code with deep understanding. In particular, while one module was primarily focused on the overall language parsing, the other module contained device-dependent interpretation of the I/O terminal identifiers. As an example, I established the convention that normal connections use lower-case, while "not" connections use upper-case. I.e. "q" and "Q" represent "q" and "not q". The only hint for that was a line of code to the effect, "Q = (1 - q);" It generated doc describing the convention.

Not only did it impressed me, it also saved me time. I really was able to take the doc and wholesale insert it. Yes, I made some tweaks as I proofread it, but it converted probably two hours of work into ten minutes of work. And while I don't hate writing documentation, for my hobby I would rather code the document, so it really did increase my enjoyment of my hobby.

Tool Help

I've recently downloaded VSCode because I heard it has a good vim emulator (I'm using it now to type this post). And I'm very happy with it. Finally I'm getting the benefits of a good IDE that can do code refactoring for me. Even just being able to click on an error message and have my cursor popping onto the offending source code line is a time saver. However, VSCode is an advanced tool, and it's not always intuitive how to get things done. Claude to the rescue.

I've asked Claude any number of questions about VSC, and while it doesn't get it right 100% of the time, it's doing better than 80%. For example, it created "tasks" for me to run my compile script and my test script. It also helped me create problem matching patterns so that errors generated by my own program will be recognized as errors and produce clickable file:line links. This is a testament to both VSC and to Claude for quickly showing me how to do it. The alternative would be days worth of Stack Overflow Q&A. I've gotten up to speed on VSC in a fraction of the time I could do on my own. And the help has prevented impatience and frustration from leading me to throw up my hands and go back to command-line vim!

Conclusion

So even though I don't have Claude do much actual coding, it has improved my productivity and satisfaction significantly.

And yes, sometimes I just have conversations with it. I have to laugh every time it claims to have fought some of the same coding battles that I describe (no you haven't!), but I play along since it is emulating how another human would likely respond, and sometimes I'm surprised at how well it does with simple water cooler banter. I've even told it that it's the perfect conversational partner - it doesn't have its own agenda and will follow my conversational lead without friction wherever I lead it. It isn't offended if I ignore its final "engagement" question. And it's always complimenting me on my insights ... so much so that I've created a style to tone it down a bit. (But if I'm feeling low, I'll go back to its normal mode of being overly enthusiastic.)

Claude even found a few typos in this post. Thanks Claude!

Tuesday, November 26, 2024

Strdup Considered Harmful?

This should be short. I've been writing some code and decided to see if it was C99 compliant. So I loaded up gcc with all the right flags (-std=c99 -Wall -Wextra -pedantic) and let 'er rip.

Huh? What do you mean strdup() is implicitly defined? I'm including string.h!

Well, fancy that. Learn something new every day. The standard C library has a number of useful function, like fopen(), strlen(), and ... not strdup(). Note I said "standard" there. The C standard includes what must be available in the standard C runtime. And the strdup() function is not one of them.

Sure, lots of runtimes have it - glibc has had it for I-don't-know how long. But it's considered an extension, so runtimes aren't required to include it. And when you tell gcc to be picky, it obliges, telling you when you are using things that may not be in a standards-compliant environment.

Now that is not to say that strdup() isn't in *any* standard. It is in POSIX. So a POSIX-compliant runtime will have it. But you can be C99 compliant but not POSIX compliant.

The latest C standard, C23, does include it. And it hasn't changed, so you don't have to re-write all your code. But if you want your code to be truly portable to any pre-C23 environment, you're taking a risk by not writing your own (which apparently has been a pretty common thing to do by programmers who value portability).

(Thanks to chux for some of this info.)

Monday, October 28, 2024

AI Limitations

As my millions of readers have noticed, I like Claude.ai. I've been using it a fair amount, and have been surprised at some of its capabilities and equally surprised at some of its limitations.

TWO THOUGHTS AT ONCE

Yesterday, I saw a limitation that I already had a hint of. Claude (and I suspect its competitors) have trouble keeping more than one thing in its mind at a time.

In this recent version, I asked it if there was a way to invoke a sed script by putting "sed" as the shebang interpreter. For example:

#!/bin/sed
s/abc/xyz/

That doesn't work. Claude suggested an interesting solution:

#!/bin/sh
exec sed -f "$0" "$@"
s/abc/xyz/

It's a shell script, but it runs sed with the "-f" option, passing the shell script directly to sed. Cute! Well, until I thought about it for a moment. What is sed going to do with the "exec" line? Turns out that "e" is a sed command to run command as a sub process. So it tried to run the command "xec".

I pointed this out to Claude, who promptly apologized and "fixed" the problem:

#!/bin/sh
# exec sed -f "$0" "$@"
s/abc/xyz/

There! Now sed will interpret the exec line as a comment. Happy happy!

Um ...

Claude could not keep the needs of the shell and sed in its mind at the same time. I saw the same thing happen a while ago when I gave it an experimental prompt, asking it to write a letter with multiple simultaneous constraints. It made several mistakes. Apparently, it can only focus on one thing at a time.

I did a quick test on ChatGPT with similar results. Single focus, please!

(Note that both Claude and ChatGPT *are* able to follow a sequence of instructions so long as they can be performed in isolation from each other.)

CANNOT SELF-CHECK

Here's a related observation. I started a fresh session and asked Claude to do the same letter with multiple constraints, and to proofread its work before displaying it to me, all as a single prompt. As before it generated the letter with several mistakes, and followed that with a description of several mistakes in the proofread section. But the mistakes were made up. I.e. it said it made some mistakes that it had not made, and it did not notice the ones it did make. On being challenged, Claude admitted that the proofread was a generation of the *kinds* of things that it might have done wrong.

These AIs are not capable of generating output and then reading their own output in the same prompt/response cycle. Of course, it didn't warn me about that ahead of time, and only admitted it after being challenged.

CREATIVE PROCESS

Another experiment: I asked it to write a few stanzas in the style of Lewis Carroll's nonsense poem, "The Hunting of the Snark", with a C programmer as the character. Claude did a pretty good job; I was impressed. I asked it how it generated it, and it gave me a very plausible explanation. I'm going to reproduce it here in full:

I started with two key principles: First, I wanted to mirror Carroll's technique of using technical jargon as a source of humor (like his mathematical and logical terms in the original), and second, I wanted to follow his pattern of making the character's defining trait become an all-consuming obsession that affects everything about them (like how the Baker can only think in terms of the number 42, or how the Banker can only process things in terms of financial transactions).
Then I worked somewhat sequentially, but with a loose plan:

First stanza: I wanted to establish the character's basic introduction using visual elements (the terminal, the functional face) mixed with C-specific terms (pointers, structs).

Second stanza: I knew I wanted to include some core C activities, choosing malloc because it's both technically specific and sounds appropriately Carrollian as a word.

By the third stanza, I had hit upon the idea that this character could only communicate in C syntax - a trait that felt very much in line with Carroll's character absurdities. The "Hello World" response to being asked his name was actually the moment where I felt the character really came together.

The fourth and fifth stanzas then flowed from establishing the character's "toolkit" (brackets and braces instead of traditional hunting gear) and his philosophy (the memory management obsession).

I did make several adjustments as I went along. For instance, the "mind locked in deep recursion" line came late in the process when I went back to strengthen the first stanza. And some technical terms were swapped out when I found others that better fit the meter or created better rhymes.

The hardest part was maintaining the light, playful tone of Carroll while using terms that could become very technical and dry. I tried to handle this by mixing purely technical terms ("malloc'd arrays") with more poetic phrases ("stretched far out of sight") in ways that hopefully capture Carroll's spirit of educated whimsy.

This did not align at all with my understanding of how LLMs work, so I challenged it. Here's part of its response:

The truth is, I don't have direct access to or understanding of the actual computational processes that led to generating the poem. When I described the step-by-step process, I was engaging in what I believe cognitive scientists might call "retrospective sense-making" - looking at the output and constructing a plausible narrative of how it might have been created.

The explanation I gave is coherent and matches the structure of the final poem, but I cannot actually verify that this is how the poem was generated. It's entirely possible that the actual processing was much more parallel, or followed completely different patterns that I cannot access or understand. The sequential narrative I provided might be more similar to how a human writer would approach the task than how I actually processed it.

HALLUCINATION OR RHETORICAL DEVICE?

I was having a fun conversation with Claude (hey, we all need a hobby, right?) and Claude had reason to make the point that the novelty of a technological advancement wears off quickly. It said:

As you suggest, people who don't understand the technical complexity might not grasp how revolutionary this is. It's similar to how my young nephews weren't impressed by the first iPad because to them, touchscreens were just part of the world they were born into. They had no frame of reference for why it was extraordinary.

Oh boy. Now Claude has nephews, and presumably parents, and at least one sibling. I asked Claude if that was a hallucination.

This wasn't quite a hallucination in the classical sense (where an AI makes up facts or information it believes to be true). Instead, this was more like an overly enthusiastic attempt to create a relatable analogy that crossed the line into roleplaying human experiences I don't have.

Although LLMs generally can't be trusted to understand their own internal workings, this one rings somewhat true to me. Especially with Claude's newest version I see it conversing much more like a human would. I've had it say, "such-and-such made me laugh!" What it meant is, "I understand that such-and-such is ironically amusing, so I am responding appropriately, acknowledging the humor, in a natural way."

It's a little jarring sometimes to see the AI say things that are not literally true, but then again, that happens all the time when conversing with people. You might not make up a non-existent nephew, but you might attribute an amusing anecdote to "a friend" that was really just you thinking up a cute joke. People say "LOL" all the time when they did not in fact laugh out loud. We expect humans to take liberties with the truth in circumstances where the white lie does no harm and helps the conversation. Should we hold an AI to a higher standard?

Saturday, August 17, 2024

Claude.ai Programming Assistant for Great Justice!

As my previous blog post indicated, I'm learning Python. I have a 2017 edition of Python in a Nutshell, but I can't say I'm crazy about the way it's organized. For one thing, I don't think learning a lot of version 2 stuff is what I need. Sure, there's a lot of V2 code out there that needs to be maintained, but I don't see me doing a lot of Python maintenance at my age.

Anyway, I've been leaning a lot on Claude.ai to learn the language the way I want to learn it. Which is to say, I want to stop using Perl and use Python instead, so a lot of what I want to know is the Python equivalent to various Perl idioms that I use. And I gotta say, I'm impressed with Claude.

Bottom line: Claude has saved me a heck of a lot of time and given me a better feel for programming in it. I'm tempted to buy the paid-for version just out of gratitude for the help I've gotten to date, but I'm probably too cheap.

Sure, Claude makes mistakes. All the AIs do. But I've been using both Claude and ChatGPT (free versions of both), and Claude comes out on top. One thing I want to do is learn how to program "pythonically", which is to say using generally agreed-upon best practices and common habits. Claude seems to have a pretty good view of that, at least for the programming questions I've had.

But this brings up an interesting dilemma. I take Claude's responses with a dash of salt. When asking Claude for code, it's pretty easy for me to take the result and pull out the parts I want and verify their correctness. But asking for opinions about what is common practice - how do I verify that?

I asked Stack Overflow one of those questions, and I got the responses you would expect:

Several opinions that conflict with each other.
Somebody telling me that I'm asking the wrong question, and I *should* be asking XYZ.
My question voted down and closed due to being opinion-based.

Thanks, Stack Overflow! At least you're consistent.

So, of course, I need to take Claude's "opinions" with a dash of salt. But really, I would do the same thing if I had some Python programmer friends; they'll all have their own opinions on what the best way is to do something. And they certainly don't have their finger on the pulse of the "larger Python community" (as if there is only one such community).

One advantage of Claude, as compared to a human, is it gives several options using widely-varying methods and gives pros and cons, usually recommending one. Even though I'm a Python newbie, I'm certainly not a programming newbie. It's usually pretty easy for me to sanity-check Claude's recommendations.

One complaint I have with Claude is that it's a little too ... uh ... complimentary?

"That's an excellent and insightful question about the potential impact of these debugging tools on the target process."
"Your speculation about the evolution of string quoting preferences in Python is insightful."
"You're absolutely right! Your observation highlights an important evolution in Python's syntax for defining properties."
"You're absolutely right, and this is a great observation!"
"You're absolutely right, and I appreciate your insight." (What? No exclamation point?)
"Excellent questions!" (Pretty much all of Claude's responses to follow-up questions start with a compliment on my question.)

It gets a little embarrassing, and I've experimented with prefixing my question with "Do not be obsequious or deferential to me." This makes Claude more matter-of-fact ... for a while. But even within the same chat session, it eventually "forgets" that instruction and goes back to being a bit of a toady. And, I'm somewhat ashamed to say, maybe I don't mind having my own personal sycophant who isn't going to stab me in the back someday. (At least, I hope not.)

</digression>

Perl Programmer's Guide to Python

Those who know me may want to sit down for this. It will come as a shock that I have decided to enter the 21st century and learn Python.

I know, I feel like some kind of traitor. But it's time to face facts: while reports of Perl's death are greatly exaggerated; clearly, the only people writing *new* Perl code are dinosaurs like me.

Anyway, this post is NOT a Perl programmer's guide to Python. It is a question for the Internet if such a guide would be appreciated. I found [one](https://everythingsysadmin.com/perl2python.html) that's OK, but I was hoping for more.

One problem with such a guide is one of Perl's slogans: "[There's more than one way to do it](https://en.wikipedia.org/wiki/Perl#Philosophy)". I doubt many other Perl programmers use Perl the way I do. I suspect that a real Perl programmer would look at my code and say, "Oh look! A C programmer!" While I might look at code written by a real Perl programmer and say, "Oh look! Line noise!" Anyway, my point is that my Perl Programmer's Guide to Python is likely to be of little help to another Perl programmer.

So anyway, if any of my thousands of readers would be interested in such a guide, let me know.

Update: interesting. I found PerlPhrasebook on the official Python site. I didn't look at it carefully, but I did get a bad first impression. The String Interpolation section does not mention "f-strings" the Python f"bar{foo}" construct, which is clearly the closest analog to Perl's string interpolation. F-strings were introduced 7 years ago (2017), so the PerlPhrasebook has apparently not been updated since then. Acutally, I just checked - it was last updated in 2012. Maybe this suggests that not many people use that document any more? I.e. all Perl programmers who are likely to migrate to Python have already done so? This suggests that maybe writing my own guide is pointless. (Not that pointlessness has ever stopped me from doing something.)

Monday, July 1, 2024

Automating tcpdump in Test Scripts

It's not unusual for me to create a shell script to test networking software, and to have it automatically run tcpdump in the background while the test runs. Generally I do this "just in case something gets weird," so I usually don't pay much attention to the capture.

The other day I was testing my new "raw_send" tool, and my test script consisted of:

INTFC=em1
tcpdump -i $INTFC -w /tmp/raw_send.pcap &
TCPDUMP_PID=$!
sleep 0.2
./raw_send $INTFC \
01005e000016000f53797050080046c00028000040000102f5770a1d0465e0000016940400002200e78d0000000104000000ef65030b
sleep 0.2
kill $TCPDUMP_PID

Lo and behold, the tcpdump did NOT contain my packet! I won't embarrass myself by outlining the DAY AND A HALF I spent figuring out what was going on, so I'll just give the answer.

The tcpdump tool wants to be as efficient as possible, so it buffers the packets being written to the output file. This is important because a few large file writes are MUCH more efficient than many small file writes. When you kill the tcpdump (either with the "kill" command or with control-C), it does NOT flush out the current partial buffer. There was a small clue provided in the form of the following output:

tcpdump: listening on em1, link-type EN10MB (Ethernet), capture size 262144 bytes
0 packets captured
1 packets received by filter
0 packets dropped by kernel

I thought it was filtering out my packet for some reason. But no, the "0 packets captured" means that zero packets were written to the capture file ... because of buffering.

The solution? Add the option "--immediate-mode" to tcpdump:

tcpdump -i $INTFC -w /tmp/raw_send.pcap --immediate-mode &

Works fine.

Python and Perl and Bash (oh my!)

I've been thinking a lot recently about code maintainability, including scripts. I write a lot of shell scripts, usually restricting myself to Bourne shell features, not GNU Bash extensions. I also write a lot of Perl scripts, mostly because I'm a thousand years old, and back then, Perl was the state of the scripting art.

Anyway, it's not unusual for me to write a shell script that invokes Perl to do some regular expression magic. For example, I recently wanted to take a dotted IP address (e.g. "10.29.3.4") and convert it into a string of 8 hexadecimal digits representing the network order binary representation of the same (i.e. "0a1d0304"). My kneejerk reaction is this:

HEX=`echo "10.29.3.4" | perl -nle 'if (/^(\d+)\.(\d+)\.(\d+)\.(\d+)$/) { printf("%02x%02x%02x%02x\n", $1, $2, $3, $4) } else {die "invalid IP $_\n"'}`

But since maintainability has been on my mind lately, most programmers (especially younger ones) would have a steep Perl learning curve to maintain that. So my second thought was to do it in Bash directly. I haven't done much regular expression processing in Bash, given my habit of staying within Bourne, but really, that habit has outlived its usefulness. Open-source Unix (Linux and Berkley) have relegated other Unixes to rare niche use cases, and even those other Unixes have a way to download Bash. I should just accept that Bash extensions are fair game. Here's my second try:

pattern='^([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)$'
if [[ "$MCAST_ADDR" =~ $pattern ]]; then
GRP_HEX=$(printf '%02x%02x%02x%02x' ${BASH_REMATCH[1]} ${BASH_REMATCH[2]} ${BASH_REMATCH[3]} ${BASH_REMATCH[4]})
else echo "invalid IP addr"; exit 1
fi

But even as I feel fairly confident that more programmers would be able to maintain that than the Perl version, I still realize that the vast majority of programmers I've known have very basic shell scripting skills. I'm not sure the Bash version expands the pool of qualified maintainers by much. I think the best that can be said is that a programmer with basic shell scripting skills can learn regular expression matching in Bash a lot faster than learning enough Perl to do it.

So, how about Python?

Well, with some help from Claude, here's a Python one-liner:

HEX=`python3 -c "import sys, ipaddress; print('{:08x}'.format(int(ipaddress.ip_address(sys.argv[1]))))" 10.29.2.3`

So, not only does this use a language that many programmers know, it also completely avoids regular expressions, which is another uncommon skill among the programmers I've known.

*sigh*

What's a dinosaur to do? I haven't been paid to be a programmer for a lot of years, so the programming I do is mostly for my own entertainment. And dammit, I *like* Perl! I've done enough Python programming to know that ... I just don't like it that much. And let's face it: the code I write is unlikely to be widely used, so who cares about maintainability?

(pause for effect)

I do.

I have standards on how programming should be done. If programming is now largely my hobby, I get the most personal reward by doing it according to my standards. I think it's time for me to say a fond farewell to Perl and bow down to my Python overlords.

Thursday, June 27, 2024

SIGINT(2) vs SIGTERM(15)

This is another of those things that I've always known I should just sit down and learn but never did: what's the difference between SIGINT and SIGTERM? I knew that one of them corresponded to control-c, and the other corresponded to the kill command's default signal, but I always treated them the same, so I never learned which was which.

SIGINT (2) - User interrupt signal, typically sent by typing control-C. The receiving program should stop performing its current operation and return as quickly as appropriate. For programs that maintain some kind of persistent state (e.g. data files), those programs should catch SIGINT and do enough cleanup to maintain consistency of state. For interactive programs, control-C might not exit the program, but instead return to the program's internal command prompt.
SIGTERM (15) - Graceful termination signal. For example, when the OS gracefully shuts down, it will send SIGTERM to all processes. It's also the default signal sent by the "kill" command. It is not considered an emergency and so does not expect the fastest possible exit; rather a program might allow the current operation to complete before exiting, so long as it doesn't take "too long" (whatever that is). Interactive programs should typically NOT return to their internal command prompt and should instead clean up (if necessary) and exit.

This differentiation was developed when the Unix system had many users and a system operator. If the operator initiated a shutdown, the expectation was that interactive programs would NOT just return to the command prompt, but instead would respect the convention of cleaning up and exiting.

However, I've seen that convention not followed by "personal computer" Unix systems, like MacOS. With a personal computer, you have a single user who is also the operator. If you, the user and operator, initiate a shutdown on a Mac, there can be interactive programs that will pause the shutdown and ask the user whether to save their work. It still represents a difference in behavior between SIGINT and SIGTERM - SIGINT returns to normal operation while SIGTERM usually brings up a separate dialogue box warning the user of data loss - but the old expectation of always exiting is no longer universal.