Friday, June 28, 2024

Bullshit All The Way Down

The bubble currently inflating the tech world is "Artificial Intelligence" or "AI". What the software actually is: statistical Large Language Models, aka, "autocomplete on steroids". It's creating strings of words, statistically based on the trillions of words it has stolen used to train itself on how to pick the next word.

In my 40 year career in computing, 1972-2012, I went through at least 3 waves of "AI". Wave 2 maybe, Neural Nets, which underly LLMs, came out in, what the 90s? CYC, which was going to be the "common sense" module, kind of disappeared (into the DoD?) a while ago??? No, here it is: automating claims denials by managing hospitals?!?!?

LOL, I just found, on my Lexington home office bookshelves, BrainMaker Software, Copyright 1988, 1989, 1990, Neural Network Simulation Software. Complete with a floppy disk to install!

I am offended by the current use of the term "AI". I understand, this is The Thing now, so, no one cares.

To me, AI is something much bigger than LLMs. I have always felt that the overall strategy was, find new tools, keep building a toolbox, let the tools all talk to each other, and keep your fingers crossed for emergent behavior. So saying LLMs are "AI" is totally bogus. LLMs should be viewed as just another tool in the AI toolbox. Maybe the coolest tool ever, but, still, not "AI".

Oops, "emergent behavior" maybe means "stuff starts happening that we like that we have no idea why it is happening" - oh boy, another indecypherable oracle = just like LLMs!

Hmmm, this is getting serious. How many "waves" of "AI" have we had? Early chatbot, what, 50s? LISP in the 60s & 70s? Early 80s, Rule Based Expert Systems. Then inference engines? Neural nets starting, as per "BrainMaker", late 80s.

Then an AI desert? It seems like there were several of those. CYC is in there somewhere, coming out of the inference engine / LISP machine world?

How would I describe/define an LLM? I would describe it as a Bullshit Generator. It's job it to construct an answer that statistically matches the many corpuses it trained on. Does that answer make sense? Is it a correct answer? Statistically, yes. Logically, no one has any idea - LLMs are completely clueless. [But they are incredibly good BullShit generators.]

This is where they should be integrated with inference engines & rule-based expert systems. But that's not what is happening.

Here is a recent (June 12) Washington Post article, describing many of the shortfalls of LLMs:

Is that chatbot smarter than a 4-year-old? Experts put it to the test.
These LLMs have been training themselves for years. For several years I have followed Janelle Shane's blog:
AI Weirdness
She's been playing with all the big LLM systems for 2-4 years? Occasionally they get some OK stuff, after much tweaking. But the vast majority of what is generated is complete crap. It's laughable! (Note, Janelle subtitles her name with "A. I. HUMORIST".) But, every corporation in the world is scrambling to incorporate these BS generators into their infrastructure ASAP!

I was going to go to one of her posts and pull an example, but then I realized, if I did that, I could not post the following:

NO LLM OR OTHER ARTIFICIAL INTELLIGENCE HAS BEEN USED IN THE CREATION OF THIS BLOG AND ITS POSTS.
I read somewhere - Doctorow maybe? - that once you start training LLMs on the output of LLMs, the bullshit factor is all there is. It death spirals immediately.

So that finally brings us to the title of this post. Tim O'Reilly is promoting, via ORA, "AI success stories". That is appropriate for him, he has in the past heralded the start of eras in computing. And, yes, there are many, many success stories for LLM use - to play devil's advocate, to generate more possibilities. It can clearly be an incredible tool to help creatives be creative. [Doctorow, who is fairly negative on "AI", is appearing soon somewhere speaking with O'Reilly - that should be interesting. Oops, already happened, now available as an online course of videos?]

But such uses are of course not the concern. What is the concern is: content generation mills getting rid of all their human editors and experts & replacing them with these bullshit generators.

That is clearly the road to hell.

They all say they're not going to do it, then, oops, there they are, busted. Tsk, tsk, tsk.

The other issue here, per Doctorow: is this centaurs, humans guiding machines; or reverse centaurs, machines guiding humans (very bad, what horrible jobs to have)?

Back to the title: I do believe we are in danger of moving into a world where it is, indeed, bullshit all the way down. This borrows from the concept, attributed to hindu cosmology, of the world being supported by 4 elephants standing on the back of a giant turtle, who is standing on the back of a gianter turtle, after which, "it's turtles all the way down.". Here's the wikipedia article.

An odd rumination: I wonder to what degree the orange turd former bullshitter-in-chief's term as president greased the skids for the coming bullshit apocalypse?

I think to live in this world, we need the following:

  1. Every piece of media created should be stamped "AI used in the generation of this content." or "No AI was used in the generation of this content." They did it for GMOs, and, IMO, AI generated bullshit is much more dangerous than GMOs (which aren't dangerous, per the USDA, etc.)

    YouTube just started recently allowing you to specify whether or not your video is fake.

  2. We need places where AI content is banned - like Wikipedia, this blog, most blogs. [Already I get query responses and find myself trying to determine if I am looking at LLM word salad or not.] These places would be like wildlife sanctuaries, where we protect human thinking from AI bullshit.

    Of course, humans are capable of generating their own bullshit - but not in an automated manner that will be impossible to keep up with. It is indeed a labor of Hercules shoveling the BS of the orange turd former bullshitter-in-chief, but, it is doable. Against "AI"s, no way.

For 2-3 years ending January 1, 2023, this blog was getting ~500 hits/day from PCs and Macs in France - so somebody's training system was using me as a corpus. After Jan 1, 2023, views went down to a handful/week - about right. That ended around the start of 2024. This blog now gets ~50 hits/day from Hong Kong? And from Singapore before that. So apparently it is still being used as a corpus.

Meanwhile ...

My daughter-in-law has gotten a Master's in Data Science (yay!!!) & is leading a team doing LLM work for 1 of the biggies. It really does seem like the biggest craze I can remember since ... forever? A corporate employee who is not catching this ship is clearly severely limiting their career prospects.

LOL, maybe the bottom line here is, I will remain as a 100% human source of information - how quaint - a pre-AI reference human! Of course, this gives me an excuse to not learn how to use the LLMs. I'm too old for long learning curves on anything new at this point.

Plus, I think I understand the shape of this technology, & I don't think it would be that interesting to me. The only thing I think would be interesting is, figuring out how to communicate with "CommonSense™".

Here's a late addition: AI Snake Oil. Their blog posts have been interesting.

[Updated 2024-11-26]

This post today from the AWS (Amazon Web Services) team gave me LOL after LOL. I'm going to do some bolding of the best of the best.

Hallucinations in large language models (LLMs) refer to the phenomenon where the LLM generates an output that is plausible but factually incorrect or made-up. This can occur when the model’s training data lacks the necessary information or when the model attempts to generate coherent responses by making logical inferences beyond its actual knowledge. Hallucinations arise because of the inherent limitations of the language modeling approach, which aims to produce fluent and contextually appropriate text without necessarily ensuring factual accuracy.

Remediating hallucinations is crucial for production applications that use LLMs, particularly in domains where incorrect information can have serious consequences, such as healthcare, finance, or legal applications. Unchecked hallucinations can undermine the reliability and trustworthiness of the system, leading to potential harm or legal liabilities. Strategies to mitigate hallucinations can include rigorous fact-checking mechanisms, integrating external knowledge sources using Retrieval Augmented Generation (RAG), applying confidence thresholds, and implementing human oversight or verification processes for critical outputs.

Oh boy, so on top of LLM Bullshit Generators, we can add RAG Bullshit Generators! &, even worse, "human oversight", reverse centaurs! I cannot imagine a worse job than trying to fact-check machine-generated bullshit.

Meanwhile, every corporation in the world is being pushed to implement this crap. Generate bullshit, or be left behind?!?!? I think I'll choose, be left behind. Sigh.