alignment, transparency, interpretability, explainability and dogma

Continuing on from the previous post, I am looking to delve more specifically into the mind games played in the course of AI development. Though I am leaving gorillas and guerrillas behind, I have a parting shot of Gorillaz, particularly their track “Clint Eastwood” and its announcement that “I’m useless but not for long. The future is coming on.” Having published humanities scholarship on machine intelligence over the last 25 years, I’m feeling a little of that right now.

(ChatGPT having a go at that. The feature image is also GAI, btw.)

Here though I am focusing on four familiar terms in AI development: alignment, transparency, interpretability, and explainability. As you’ll see, I’m also interested in role of dogma here. But first a few basic definitions for the uninitiated with the caveat that these things are obviously more complex than I am capable of addressing here.

One of the aims of AI developers is to create AIs whose outputs align with the intentions of their human users. This is basically the desire to avoid the Paper Clip Maximizer problem and perhaps create some AI version of “do no harm.” (Although humans often intend to do harm, so maybe not.)

Transparency, interpretability and explainability are interrelated and have to do with our ability to understand the operation of AI. I found this open access article by Roscher et al to be helpful in describing these terms in the context of AI research. To quote briefly from it.

Informally, transparency is the opposite of opacity or “black-boxness”. It connotes some sense of understanding the mechanism by which the model works. Transparency is considered here at the level of the entire model (simulatability), at the level of individual components such as parameters (decomposability), and at the level of the training algorithm (algorithmic transparency).

An interpretation is the mapping of an abstract concept (e.g., a predicted class) into a domain that the human can make sense of.

An explanation is the collection of features of the interpretable domain, that have contributed for a given example to produce a decision (e.g., classification or regression).

Hmmm. Well that’s hardly transparent and neither easy to interpret nor explain, but that is part of the authors’ point as they go on to discuss the value of context and domain knowledge for interpreting and explaining the operation of AI. But generally I think the idea is that we should be able to interpret AI outputs in a way that makes sense and aligns with our intentions. In addition, we can also aim to be able to explain how and why we got the output we did.

It would be a neat trick since we can’t do those things very well with each other. Anyone who has ever given students an assignment or simply asked someone to do something knows that the output may not align with our intentions. In terms of explanations, can the authors of this article “explain” why there is a comma splice in that sentence? Can anyone explain why I decided to use such a snarky example? And as for interpretation, well, that’s our bailiwick. Welcome back my friends to the différance that never ends, defer along, differ along (n.b. that’s a reference to an Emerson, Lake, and Palmer song).

Or, to give you another 70s allusion. “Blessed are the Greek. He’s going to inherit the Earth.” Interpret that. And of course you may if you have the context. But maybe you have a different context from mine and the result of your interpretation does not align with my intent. So how do humans resolve such différances? Too often by killing each other.

Cool.

This turns me to dogma, which is the time worn social solution to the problem of différance. Dogma seeks to put an end to deferring on our differing by establishing absolute definitions. While dogma is mostly associated with religion, in the humanities we know the panoply of critical theory -isms can all be practiced dogmatically. In the US public sphere it is not hard to identify ideological dogmatisms of varying stripes. I admit to temptation to call our current moment one of intense dogmatism, but I don’t have any evidence of that. It feels intense to me. We hear a lot about how divided the nation is. Democracy is in retreat globally. The economic imperatives of social media to incite engagement reward extremism.

Dogma is the regulation of interpretability. We can look at this historically. Biblical hermeneutics is an instructive example. Was there really a talking, burning bush or was that a metaphor? How do we deal with the editorial history of the anthology that became the Bible? What about the limits of translation? It’s been a long time since I studied this stuff, but my memory tells me I learned that the King James Bible required a translation into English using a limited vocabulary (less than 1000 different words?).

Christian dogma regarding the interpretability-translatability of Biblical texts necessarily requires a belief in the transparency not only of the process but of the language itself. Without the transparency of language, the dogmatic reign over interpretation weakens. And I won’t even start on explainability, though, tbh, that was started centuries ago. The Bible explained why Christians needed to murder Muslims and the Koran explained why Muslims needed to murder Christians. To be clear, those explanations are not clear to me (nor to most of the devout today and maybe even then) , but I’m thinking they were clear to the murderers. And all of that has been going on since the “Dark Ages” (n.b., medievalists no longer use the term Dark Ages, but I’d say we are still in the Dark Ages, and if there are humans alive 5000 years from now, I’d bet they’d agree with me.) The (we have never been) modern age has produced a whole new range of murderous dogmatists, as we all (should) know, gentle reader. The post-WWII era of information technology has mutated and intensified dogmatism, despite its fantasies of rationalism (as if rationalism cannot be another dogmatism). And that’s only a Western prospective. Globally, this conflict includes the reaction/resistance of non-Western dogmatism, as well as dogmas of those in the West who do not share in the “Western perspective.”

So, tell me again how we think AI should think? How should he/she/it/they respond to our demands to “align”? How can AI thought be transparent to you or me when our own thoughts are not? Why would we imagine that AI can produce results with interpretable consensus when humans can’t do that? And as far as claiming to be able to explain what, how, and why AIs produce their output?

Is it possible to create an AI that runs without a dogma? Can it be informed by a dogma without being dogmatic? (Humans can. Many, many do. E.g., one can be Christian without insisting everyone else must follow one’s beliefs.) But that could be harder for AI when its cultural function is to produce knowledge and directives for others rather than solely for itself.

One reply on “alignment, transparency, interpretability, explainability and dogma”

FYI, here’s the feedback from the WordPress AI.

The content provides a thought-provoking exploration of the complexities surrounding alignment, interpretability, and explainability in AI development, drawing insightful parallels to human tendencies and historical examples. The use of examples aids in illustrating the concept, but at times, the tone may come across as confrontational. Consider maintaining a balanced tone to engage a wider audience. Additionally, breaking down complex concepts further could enhance accessibility for readers not familiar with AI terminology.

LikeLike

Share this:

Related

One reply on “alignment, transparency, interpretability, explainability and dogma”

Leave a comment Cancel reply