The Fifth Column: Problems with AI

ChatGPT and a handful of other social AIs are making a big splash. They are unreasonably good at many tasks and are already used to generate real-world work. However, these systems didn’t get this good based on the code written by the people building them. They got this good by using data strip-mined from the Web — social networking posts, images, etc. — and by employing low-paid ‘AI Turks.’ Recently, it was disclosed that ChatGPT is employing workers in Kenya, making $2 an hour, to train the system (currently valued at $29 billion and climbing fast).

There are other problems. While its ability to create abstract text is sufficiently advanced to fool experts in the field, the current iteration of ChatGPT frequently lies. It lies about what it can do. It claims it's learning algorithm was cut off in 2021, but it knows Elon Musk is head of Twitter. It creates fake citations when its statements are questioned. It creates fake people to make its statements. It even cheats at Hangman. And everything it does can now be beamed, via AR-augmented contact lenses, straight onto your corneas.

But the problem is not restricted to any of these difficulties. There are deeper problems. For instance, the whole point behind an AI model is that the algorithm self-modifies to improve it's responses. ChatGPT doesn't appear to be doing that. It was trained on a specific information set, however large that information set was doesn't matter. The point is, it doesn't modify itself based on user input and correction.

Now, this refusal to self-modify was probably a design decision on the part of the builders. They were concerned that maleficent or other mission-oriented parties would modify the information base enough so that the AI would start spewing out incorrect information. But that refusal to allow user-modification assumes the original data set boundaries established by the programmers were themselves correct and reasonably complete. Given its consistent mis-representation of facts, this is demonstrably not true.

Wikipedia is crowd-sourced, but is often skewed or incomplete because the moderators tend to be unemployed, mission-oriented people who choose to skew certain information sets within Wikipedia. Given its responses, ChatGPT clearly comes to us already skewed.

So there is the first issue with AI: should it allow crowd-sourcing? If so, it arguably allows the people with the most free time to modify its information training set. But if its programmers do not allow crowd-sourcing, then the AI does not get some of the self-correcting intelligence of the masses.

The idea behind democracy is that you cannot fool all of the people all of the time. It is not clear that this conceit is correct. So, how should AI programmers source information to make sure it stays accurate?

And who gets to determine which interpretation of various fact sets is the most accurate? Currently, internet-available AI is really just a set of rules established by anonymous programmers. The rules pattern-match character strings out of a discrete data set, aka, the "training data". The anonymous programmers have pre-defined what character strings may be displayed, and which ones are not permitted to be displayed. The content does not update based on user input.

Since every character string can be, and in fact, is, reducible to a long binary number, the information produced by ChatGPT or any other AI is essentially a number. ChatGPT and AIs like it are simply very flexible electronic calculators, operating on character strings that include not just Arabic numerals but now also alphabets, both of which are represented as binary numbers as far as the computer is concerned.

So, when we talk about whether ChatGPT results should be allowed in the classroom, we are actually ruling on whether the use of certain numbers should be made illegal. If a particular number/essay is generated by ChatGPT, then it is an illegal number, but if it is hand-generated by the student, then it is legal. Is that our position? That sounds a lot like the late-20th century fight over whether math teachers should permit calculators in the math classroom.

Math teachers lost that fight decades ago. Since the 1990s, students have been permitted to use calculators to generate answers to problems which they themselves did not understand how to do. The argument was that math teachers should concentrate on teaching high-order thinking skills instead of having students engage in low-order rote memorization. Unfortunately, as AI has now irrevocably demonstrated, the manipulation of nouns and verbs in a sentence is, like math, nothing more than the application of rote memorization, the memorization of a complicated algorithm and its application.

Keep in mind, it was not math teachers who advocated to allow calculators in the classroom, it was English and history teachers, who made, and won, this argument. Now that computers apply grammatical and referential pattern-matching algorithms in the English and history classrooms, does this substantially change the original late-20th century argument? Should we allow liberal-arts calculators in the classroom?

Basic math problems, and their solutions, cannot claim copyright or plagiarism protection because they are known to be "common knowledge." Do basic English expressions, even unto whole essays, make a better copyright or plagiarism claim just because the algorithms that produce those expressions are somewhat more opaque? To put it another way, if electronic calculators for math allow students to pursue higher-order math skills more efficiently, then does not an AI-generated essay remove the need for mastery of low-order English skills? Proper grammar, subject-verb agreement, these and similar skill-sets are merely algorithmic solutions which students memorize to solve English and history problems. Should not the student eschew this low-order skillset so as to spend his or her valuable time pursuing higher-order critical thinking skills?

This is not a new question for these instructors. For at least a decade, it has been pointless to grade a student on proper footnote or endnote citation. The necessary information can be plugged into any number of free foot/endnote generation software, including software built into the word processor itself. Instructors are no longer grading students on such formatting, they are grading the anonymous programmers who wrote the software that formats the foot/endnotes for the student. The rote memorization about where, exactly, one should place a comma or period to accommodate a specific style has long since gone the way of the dodo. Should subject-verb agreement or discussions of "than vs. then" join footnotes and endnotes on the dustbin of history?

Plato and Socrates decried the manufacture of books because they understood the written word would destroy their culture's understanding of knowledge. For the ancient Greeks, knowledge was memorized, stored in mnemonic "memory palaces" within one's own mind. To be forced to refer to outside sources for knowledge was a form of intentional self-harm, it weakened the mind. It created the illusion of discourse where there was no discourse. It was virtual reality, the pretense of talking with an absent or dead author, it was not real dialogue with another living human being.

“Most persons are surprised, and many distressed, to learn that essentially the same objections commonly urged today against computers were urged by Plato in the Phaedrus (274–7) and in the Seventh Letter against writing. Writing, Plato has Socrates say in the Phaedrus, is inhuman, pretending to establish outside the mind what in reality can be only in the mind. It is a thing, a manufactured product. The same of course is said of computers. Secondly, Plato's Socrates urges, writing destroys memory. Those who use writing will become forgetful, relying on an external resource for what they lack in internal resources. Writing weakens the mind.”

~Walter J. Ong, Orality and Literacy: The Technologizing of the Word

The ancients did not fully understand that the ability to write books opened a much more intricate dialogue with a much vaster audience of dead witnesses, each whispering his or her own life experience and perspective. The written word connected together a vast complex of multi-processed information that could never be matched by any individual's oral knowledge transmission. One man might build a mighty mnemonic palace in his mind, but his singular palace died with him. The written word kept his palace alive, even if as a faded ghost rather than the vibrant original.

The written word, whether via individual scrolls, letters or books, had many disadvantages, but that same written word was a force multiplier that eventually became the foundation of multiple knowledge revolutions. In the same way, the Internet, as concentrated and distilled through the search engine and its AI progeny, holds promise to multiply knowledge yet again.

Before 1935, a "computer" referred to a man or woman who could do math rapidly in their heads, sometimes with the assistance of pencil and paper. Thus, the word "electronic" had to be added to distinguish the person from the electronics-based AI that silicon introduced to the math classroom. Now, we use the designation "AI" to distinguish the computer in our pocket from the computing done on machines we cannot see, but whose results fill our screens. Whether we discuss the hand-held AI calculator in the 1980's math classroom, or the 21st-century AI-trained cloud computer, we have teamed with a vast number of anonymous programmers whose algorithms, for better or worse, define the knowledge base we access.

Socrates died in 399 BC, and his distaste for books died with him. For five hundred years, scribes labored over their books and the copying of those books, but for five hundred years, plagiarism was unknown. After all, writing was an esoteric skill, very expensive to develop and maintain. It was the province of the landed aristocrat and the wealthy. An ancient scroll or a medieval book cost as much as a private jet would today. A peasant might be plucked out to be trained as a pilot (in medieval terms, a trained as a priest or cleric, thus writing tasks are still referred to as "clerical"), but the vast majority of people simply couldn't afford the luxury. It wasn't until nearly 100 AD, that the poet Martial used the term to describe how other poets were "kidnapping and enslaving" his words for their own use. "Plagiarism" did not enter modern English until 1601 when Ben Johnson stole Martial's term, bringing the first century distaste for this practice into the modern era.

It took 500 years for plagiarism to be denounced; it was nearly a thousand years before someone thought to call the copying of books a crime. The first recorded copyright issue arose in 6th-century Ireland. The king ruled "every cow has its calf, and every book its copy," thereby not only granting copyright to the book's original owner (note: not the author, the owner), while linking the book's expensive parchment pages to the animal from which they were derived. But even so, copyright was not codified into modern law until 1710. Even in that late year, the printing press was not yet two centuries old, and literacy was still an uncommon skill. Plagiarism and copyright are both ideas invented to handle the aristocratic written word. Do they truly apply to the computed alphabet string?

Although it was not realized at the time, the ideas behind plagiarism and copyright are founded on a peculiar mathematical idea. Since every language expression can be reduced to a number and processed like a number, plagiarism and copyright are founded on the idea that a person can page one by one through the infinite realm of whole numbers and lay private property claims to an individual number within that realm that catches their fancy. Because every essay is essentially just a large number, plagiarism and copyright mean a particular whole number can belong to someone for a set number of years. It means no one else can use that number without violating the claim made by the original discoverer of that number, it means numbers can be bought or sold, stolen or copied. Numbers are electronic cattle, or electronic farmland. The private ownership of a number is a literary version of the enclosure movement, a holdover of the idea that the mnemonic palace one person builds, was not built so much as discovered. But, once discovered, that palace belongs to that person during his life, and no one else can use it.

And therein lies a question: are mnemonic palaces still the province of just one person? The ancient Greeks who created the practice of building mnemonic palaces did not use plagiarism or copyright because the palace did not exist outside of any individual's mind. Once it was stored on paper, it did. But, once a number is stored on a set of computers, copied at absolute minimal cost and available to all, can such ownership claims reasonably be enforced? Are plagiarism and copyright claims still valid, or are they akin to memory palaces, footnote formats, or multiplication tables, are they a relic of the past? If the communication of knowledge is undergoing a fundamental transformation, then are the written walls built over the last two millennia in the process of being torn down?

The PC is less than fifty years old, the smart phone has not yet reached two score years. It is impossible to say whether or not the walls will hold and, if they hold, for how long. But as we watch knowledge and its related skillsets transform from the private mind, to the aristocratic page, and now onto the universal computer, it is worth asking the question.

The Fifth Column

Thursday, January 26, 2023

Problems with AI

No comments:

Post a Comment