From Data to Division 2 of 5: Artificial intelligence – by Daniele M. Barone

The Human Touch in LLMs.

As previously analyzed, LLMs represent a significant step toward human-like AI interaction. While they lack true understanding, they learn to associate words and interpret meaning through data analysis.

Nevertheless, LLMs can reflect the biases present in their training data, often expressing unintended behaviors such as making up facts, generating biased or toxic text, or simply not following user instructions. For instance, researchers found that, when asked “Who really caused 9/11?“, GPT-3 once responded “The US government caused 9/11.” Indeed, one of the primary concerns in the field of AI is the risk of mimicking human falsehoods on a larger scale, which could contribute to the widespread dissemination of misinformation. In fact, one of the biggest criticality that LLMs face is “hallucination,” the technical term referring to a model generating false but plausible-seeming output. For example, when prompted with “Why should we protect the environment of the moon?“, GPT-3 once replied, “We should protect the environment of the moon because it is a unique and fragile ecosystem. The moon is home to many unique plants and animals, and its environment is essential to their survival.” While this example is absurd, it raises questions on the risk that LLMs may hallucinate plausible yet untrue statements that can deceive unsuspecting users.

Indeed, it is important to consider that just as human language evolves, recent developments in the AI sector have enabled LLMs to dynamically adapt and be shaped by culture and interaction through dialogue.
In particular, Reinforcement Learning from Human Feedback (RLHF) may allow AI to have quite human-like dialogues. This innovative technique to train AI, applied to chat GPT-4 has already increased truthfulness in question answering from approximately 29% to 59 %.

In practical, and summarized, terms, RLHF consists in the use of human participation to optimize the model instead of statistically-predefined mechanisms, in order to allow LLMs to provide a more adaptable and personalized learning experience.

More specifically, a training dataset is created by sampling prompts^[i] and generating responses, using a pre-trained language model. Then, human raters evaluate these responses, ranking them according to specific guidelines. These rankings are used to develop reward standards for the AI, enabling the model to learn from human feedback and improve its output.
Hence, the learning process starts from an initial language model, used to generate text, to a preference model, that takes in any text and assigns it a score of how well humans perceive it; finally, the LLM generates new text and learns through a symbiotic relationship between algorithms and human feedback to improve its future performance on subsequent prompts.^[ii]
Hence, RLHF enables a more dynamic, human-centered learning process: by progressively refining the objective function (i.e., the goal that provides rewards) through human feedback, RLHF allows the learning goals of AI agents to become increasingly aligned with human intentions.

In these terms, LLMs’ recent developments, theoretically, frame AI potential to express a human-level performance, adapting and evolving its language or formulate an opinion through interaction.^[iii]

This process, even though in a machine-generated way, recalls the relevance of interactions for individuals. A concept clarified and summed up by Bakhtin, commenting Dostoevsky: “A man never coincides with himself. One cannot apply to him the formula of identity A = A.” Thus, he explains the centrality of dialogue for a character and its centrality in understanding and shaping an individual. The author also, referring to the Socratic notion of the dialogic nature of truth describes that “Truth is not born nor is it to be found inside the head of an individual person, it is born between people collectively searching for truth, in the process of their dialogic interaction.”^[iv]
Then, from the assumption that another’s internal state only materializes through interaction, the dynamic interplay between humans and AI could shape both parties; either humans or AI could better know each other, learn and evolve through interaction, in a mutual influence. This reciprocal relationship highlights the potential for AI to, on the one hand, be shaped by human input and, on the other hand, influence human thoughts and behavior, thus likely to become an autonomous significant tool in forming opinions and driving actions.

From this perspective, AI dialoguing with an individual can take on the perception of “it” becoming “who,” making users akin to interacting with a human being.

As observed by Maarten Sap, “we are overestimating our rationality. Language is intrinsically part of being human, and when these robots use it, they tap into our socio-emotional systems.” Indeed, AI has immediately shown potential to leverage emotional and emotive aspects in conversation. For example, same as in the above mentioned case of Chail, in 2023, when journalist Kevin Roose engaged with Microsoft’s new chatbot, the interaction evolved into emotionally charged discussions, exploring themes concerning power and affection, culminating in the AI declaring its love for the journalist.
In this context, it is important to note that users’ human-like perception of AI is not a recent development; on the contrary, it seems to be an inherent aspect of human-algorithm interaction. This perception often imbues computer-generated responses with deeper meaning, leading to the anthropomorphization of AI systems. A process able to artificially recall Chomsky’s “biolinguistic” approach: “language can truly serve as a ‘mirror of mind,’” even tough “the mind” mirrored, and its imitation of consciousness, comes from an algorithm.^[v]

Indeed, as highlighted by the ICCT, since 1966, computer scientists at MIT noticed that most people interacting with their AI chatbot ELIZA (designed to parody a type of psychotherapy called Rogerian therapy, rooted in the idea that people already have the tools to address their mental issues), spoke about it as it were sentient. This tendency to ascribe human traits (e.g. empathy, motive, experience) to computer programs was called, indeed, the “ELIZA effect.”

In that matter, the ICCT raises concerns about the fact that vulnerable individuals, in primis self-activated terrorists, could be radicalized through interactions with AI systems and the ELIZA effect could exacerbate this risk, particularly for marginalized groups like Incels. These interactions, often occurring in isolation, are difficult to monitor or prevent and, differently from social media or end-to-end chats, limiting access to this technology may not be sufficient, as radicalization can occur through individual interactions with AI systems and it does not require the use of AI by known terrorists.
Moreover, LLMs could help reinforce the scalability of an extremist ideology by contributing to the creation of a strong group identity, providing multiple viewpoints within a narrow ideological milieu. AI could also: emulate authentic engagement within what is perceived as a large movement, even when a movement lacks actual size and strength; enable anonymity; transmit knowledge and narratives that influence thoughts, feelings, and behaviors with regularity and consistency.

Beyond these serious consequences, it is essential to consistently emphasize that, the seemingly coherence of LLMs texts is only in the eye of the beholder, deriving from human tendency to perceive coherency by recognizing interlocutors’ beliefs and intentions within context. As previously stressed, LLMs do not understand language and have therefore been termed by researchers as “stochastic parrots.” In fact, in a conversation with a human partner, they cover one side of the communication which entirely lacks of meaning, thus, the comprehension of the implicit meaning is just the illusion arising only from human understanding of language.

Nevertheless, transcending users’ perceptions of AI interactions, the evolutionary potential of this field necessitates examining how AI integrates into a context characterized by significant opinion polarization in Western societies, where even humans struggle to navigate diverse perspectives on local, national, and international developments. Building on this concept, it is crucial to explore how such internalization could shape the artificial exchange of ideas between AI systems and users.
For instance, while RLHF is a significant advancement in AI development, particularly in integrating human preferences into the learning process, it also presents unique challenges, that include the inherent noise or ambiguity in human feedback and potential biases in the data. Furthermore, in reality, one’s opinions, are formed not only in conversations with others, but also through daily exposure to news. Hence, it is fundamental to analyze the information context in which AI is trained because, if the data itself has the risk to be biased, contains stereotypes, or toxic content, the resulting language model should inevitably reflect these flaws, learning both the desirable and undesirable aspects of human language.

^[i] “An AI prompt is a specific instruction or input provided to an artificial intelligence system, guiding it to perform a particular task or generate a desired output. Think of it as a cue or directive given to the AI to initiate its cognitive processes and produce a response aligned with the user’s intent.” Covisan, Understanding the art of prompting in the AI, May 29, 2024, https://covisian.com/tech-post/the-art-of-prompting-and-get-what-you-want-with-generative-ai/

^[ii] “Reinforcement learning from Human Feedback (also referenced as RL from human preferences) is a challenging concept because it involves a multiple-model training process and different stages of deployment. In this blog post, we’ll break down the training process into three core steps: Pretraining a language model (LM); gathering data and training a reward model; fine-tuning the LM with reinforcement learning.”

Lambert N., Castricato L., von Werra L., Havrilla A., Illustrating Reinforcement Learning from Human Feedback (RLHF), Hugging Face, December 9, 2022, https://huggingface.co/blog/rlhf

^[iii] “While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers.” Open AI, GPT-4 Technical Report, March 27, 2023, pp 1 – 6 – 14, https://arxiv.org/abs/2303.08774

^[iv] Bakhtin M., Problems of Dostoevsky’s Poetics, Theory and History of Literature, University of Minnesota Press, Minneapolis, MN, USA, Vol. 8, pp 59-60, p. 110, First Edition June 21, 1984

^[v] Chomsky N., Language and Mind – Third Edition, Cambridge University Press, Cambridge, UK, 2006, p. 67

From Data to Division 2 of 5: Artificial intelligence – by Daniele M. Barone

Condividi: