
import { EntryBody } from "../components/Entry/EntryBody";

export const the_antimony_of_language_models: { [id: string]: any } = {

    id: "the_antimony_of_language_models",
    title: <>☙ THE ANTINOMY OF LANGUAGE MODELS ❧</>,
    date: "October 2024",

    Body: (
        <EntryBody
        paragraphs={[

<div className="font-mono">


</div>,

<div className="font-mono">
☙ THESIS ❧ Language models play language games

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
A language game is the whole consisting of language and the actions into which it is woven (W). Examples of language games are learning new words, guessing riddles, ordering a coffee, speculating, and many others. Under what circumstances can we say that a language model plays a language game?

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
A common way (2024) of interacting with a language model are interfeces where a (human) interlocutor types something and a language model generates textual responses, the exchange proceeding in conversational fashion. Humans _talk_ to these interfaces in a similar way they would if they were chatting to other humans. As these interfaces are framed within an AI-Assistant product, the human usually seeks-assistance, and, judging by the success of these apps, they often _get_ the assistance they were seeking. Seeking-assistance is already a well defined language game, where language follows a specific distribution (asking questions, making follow-ups, sometimes politeness, etc.) and in human-to-human interactions is also commonly played through text (asking help from a colleague on Slack).

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
If we further focus on the _coding-assistance_ language game, we find a very concrete distribution of language one can observe, for example, in stack overflow threads; specific follow-up questions and structure in the responses, usually involving chunks of code. Here it is also apparent what actions are woven into language, as responses have the direct effect of changes in a codebase, which can be executed to find further errors in need of clarification.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
All of these linguistic characteristics are found in language-model-powered interfaces when used for _code-assistance_, and even the actions can be oberved as some of these include virtual enviornments where code the language model generates can be executed and its outputs examined for further feedback. _I_ have played the _coding-assistance_ language game on LM-powered interfaces using a language distribution to the one _I_ use when playing the _coding-assistance_ language game with human interlocutors.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
It seems clear language models play the _coding-assistance_ language game. And, more generally, _seeking-assistance_ language games. Other too: roleplaying or academic distributions... And, as the interest of corporations that design LM-powered systems expand, one has to be sure they will conversationally be involved in language games woven in many many many (human) situations.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
⠀

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
☙ ANTITHESIS ❧ Language models do not play language games

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
Asserting is the fundamental speech act (B). Questions are recognizable in relation to their possible answers. Orders and commands describe what is and is not appropiate. It is assertions that lay at the heart of the language game of giving and asking for reasons.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
Asserting requires discursive respect (A). If interlocutors engage in a language game, they necessarily enter a reciprocal cooperation (respect) as otherwise there would be no language game, because there would be no shared practice to build the rules that govern the game itself. Even while disagreeing in discussion, so much is common ground (W?). And an assertion is only within a language game, as otherwise its inferential content, in terms of what consequences should the assertion have, cannot be derived (how could a consequence be derived if the interlocutors do not share a basis for such derivation).

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
Language models are not capable of discursive respect. They are not capable because they are a token-predicting system. If more arguments are needed: humans do not have a reciprocal attitude towards language models comparable with respect; we close the tab. Thus, language models are not capable of assertion and not only do not but can not play language games.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
The (lack of) discursive respect observed hints at a break in a symmetry between players in a language game, a symmetry that can somehow be strongly _felt_ but is hard to point at precisely.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
We can describe language games through scorekeeping (L). Linguistic practice then depends on an (abstract) score, that changes in a (somewhat) rule-governed way as conversation evolves. If this score can be understood as deontic statuses (B), then the significance of a speech act consists in the way it interacts with the deontic score: which commitments, entitlements and incompatibilities interlocutors have, attribute or undertake. This picture gives a way to make explicit the implicit norms that govern conversation: the _score function_, which maps conversation-stages to scores.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
One fundamental difference between a game such as baseball and the game of giving and asking for reasons is the perspectival nature of the score-keeping involved in the latter (B). While in baseball there is just one single official score, in linguistic scorekeeping, scores are kept _for_ and _by_ each interlocutor. This doubly perspectival dynamic is related to discursive respect. Keeping track of the score of different interlocutors is a reciprocal, cooperative and symmetrical, as those interlocutors are expected (attributed) to be keeping track of our score too.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
SOTA language models (2024) are decoder-only autoregressive transformers. For each token in the text, it is processed sequentially (including information from the execution at previous tokens, hidden states) and the output of the model is a probability distribution over the 50.000ish tokens in the vocabulary. Then, to generate the next token, a sampling algorithm is used (a way of choosing which token is next according to its probabilities). When the language model is not generating text, but rather processing text that is the context of a generation (for example text by an interlocutor), the model follows the same mechanism. It generates all the hidden states for the current token and the probability distribution, the only difference is that, instead of sampled from that distribution, the next token is decided by the interlocutor.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
If we were to prompt a conversational-assistant-model with a conversation but finishing the prompt with 'User:' instead of 'Assistant:', a plausible continuation would still be generated.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
Thus, the difference between machine-generated and interlocutor token is the just sampling, a step which the rest of the parameters of the language model have no interaction with (except by the token predicted as next). Tokens interlocutors generate are still processed by the model as if they were machine generated, but instead of sampling from the token distribution, the next token is that which the interlocutors used.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
The LM keeps something akin to a score, as tokens generated often 'make sense' and follow human linguistic practice, but even if a score is kept _by_ the language model, this is a global score of the coversation and in no way can be a score _for_ the interlocutors. What makes a conversation a conversation is missing, it is rather a _blob of tokens_. (This could be subverted in the near future.)

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
Much as in a baseball game, it is not possible to keep track of a score _for_ different interlocutors, but just a global conversational score that, of course, may change when tokens by a certain interlocutor are processed. The impossibility of keeping scores in a perspectival manner certainly seems incompatible with playing a language game.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
⠀

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
☙ SYNTHESIS ❧ Language models are (approximators of) the score function

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
Picture scorekeeping in a language game where the scores are the deontic statuses of sentences. The significance of an assertion according to a scorekeeper maps the (social deontic) score characterizing the conversation before the utterance with the set of conversational scores for the conversational stage that results from the assertion (B).

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
Lets make this treatable. Conversation C, utterance u, all possible sentences s∈S. The deontic status of a sentence s is either commitment (α), entitlement (ω) or incompatibility (κ). A score X is a mapping from all sentences to their deontic status X:S-&gt;&#123;α,ω,κ&#125;. A score function Δ then takes a conversation and its current scores and yields new scores given an utterance u. We have Δ(C,X,u)=X'. It may make sense to consider here conversation as including the scores (more like a context in a wide sense) so that we have Δ(u;C)=X': the score function returns a score after utterance u in the context of conversation C.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
We can tentatively link deontic statuses with probability. For example and qualitatively: commitments are highly probable sentences, entitlements are medium-low probability sentences and incompatibilities are quasi-zero probability sentences (inspired by likelihoods in (H)). So, a score X is now a probability distribution over sentences S, in which we can qualitatively establish a partition that corresponds with the deontic statuses.If we take this set of sentences S to be a finite (but arbitrarily big) set, then a score function is defined as the deontic statuses it assigns to sentences in set S.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
We can approximate the probability of a sentence in a context by force decoding context+sentence with a language model. From the model we can get a score for each sentence. Then, we can get the scores for all sentences in S and build a probability distribution from there. Given a set of sentences S, we can use a language model to partition it in a way that it gets us a score in terms of deontic statuses (a qualitative partition using probabilities bridges the gap).

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
Observe that the language model M had as input a conversation C and an utterance u, and its outputs can assign deontic statuses to sentences in a set S: given C and u, the outputs of model M form a score over S! Thus, the model M is (approximates) a score function!!

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
We have granted big assumptions here, such as S being finite or the relation between the deontic status of a sentence s and its probability. A formalization of the picture of scorekeeping in conversation would be useful to understand the role of 'double-perspectivity'. I think the intuition is there: the language model is an approximator of a score function, and it makes sense: to predict the next token you have to know what is 'being-talked-about', and knowing what is 'being-talked-about' needs of some sort of scorekeeping. An investigation of the next-token objective with the sentence-as-semantic-unit framework is interesting too.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
And then it makes sense that language models play language games: one can sample assertions that are coherent with the constellation (B) of commitments, entitlements and incompatibilities from a score function!

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
And then it makes sense that language models do not play language games: they are an approximator of the score function, thus categorically different from an (approximator of an) interlocutor which plays the game.

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
REFERENCES

</div>,

<div className="font-mono">


</div>,

<div className="font-mono">
(W) Philosophical Investigations; Wittgenstein, 1953

</div>,

<div className="font-mono">
(B) Making it Explicit; Brandom, 1994

</div>,

<div className="font-mono">
(A) Language Models Don"t Give a Damn; Cooperation in Conversation; Almotahari, 2024

</div>,

<div className="font-mono">
(L) Scorekeeping in a Language Game; Lewis, 1979

</div>,

<div className="font-mono">
(H) A Grammar of English on Mathematicl Principles; Harris, 1982

</div>,

      ]}
    />
  ),
};
