July 2024 and it is still surprising how erratic LLMs can be when they get tasked to help with very small coding jobs. I work on many projects on my Mac. For a while, I have been using my own directory simple stack implementation. Remembering paths is what the computer can do for me.
It works well and has two parts: zsh functions loaded via .zshrc and a Python program doing the actual work, naturally except for the cd, change of the prompt.
I thought it would be nice if my ‘cd’ variation could check if a Python environment under */bin/activate would exist in the directory I changed into. If so, it can source it. If there is none, then it should not care, and if there are multiple, it should list them so that I could pick and choose.
Simple enough.
Parts would require zsh shell coding. Not something I tend to do a lot. Since Sonnet 3.5 has a limit even in the paid version, I tend to use my paid gpt4o first.
For this simple thing, I should not have. Today gpt4o was stunningly stupid. It managed to do zsh syntax well enough, but then completely failed. For a while, stuck in that dreadful loop where one hopes the next version would finally work. I still abort those loops of idiocy way too late.
Claude 3.5 got it right. In my frustration, I also had introduced a bug / typo on my end. Both gpt4o and Claude would have pointed it out easily if they had seen that part of the code. Claude stood out since its debug hints let me see what I had done wrong. That was beyond my current expectation horizon.
Speaking of: I am amazed how dumb LLMs still can be. Today gpt4o was utterly stupid. Not sure why. Is it zsh that it is not familiar with? Did the system prompt that got assigned to me, or my region, suddenly change? Who knows.
It must be hard to make a living based on some expectations from LLMs. They are really awesome, but can fall off a cliff at any point. Pretty much the opposite from computer work in general.
I expect that people develop all sorts of Cargo Cults in their work with these tools.