Machine Intelligence
From psychotic calculators to ChatGPT
Today’s AI saw the light of day eighty-two years ago, when Warren S. McCulloch and Walter Pitts published “A logical calculus of the ideas immanent in nervous activity” in the December 1943 issue of Bulletin of Mathematical Biophysics.
The paper, in which they discussed networks of idealized and simplified artificial “neurons” and how they might perform simple logical functions, became the inspiration for computer-based “artificial neural networks” (later renamed “deep learning”) and their popular description as “mimicking the brain.”
The journey from that paper to ChatGPT was full of trials and tribulations, troughs (also known as “AI Winter”) and peaks (also known as “AI Bubble”).
In 1949, going beyond McCulloch and Pitts’ conjectures, psychologist Donald Hebb speculated about neural networks’ involvement in human learning. Hebb’s theory is often summarized as “neurons that fire together wire together,” describing how synapses—the connections between neurons—strengthen or weaken over time. It paved the way for the development of computer algorithms that were presumed to emulate the cognitive processes of the human brain. The manipulation of “weights”—numerical values representing the strength of the connection between two nodes in an artificial neural network—became the main preoccupation of researchers working in the approach to machine learning and AI called “Connectionism.”
In 1957, psychologist Frank Rosenblatt, sponsored by the Office of Naval Research, developed the Perceptron, an artificial neural network for image identification. The previous year, however, a different approach to AI had its birth event, a summer workshop at Dartmouth College. This was Symbolic AI—defining formal rules for manipulating symbols (e.g., words, numbers) and expressing in code human reasoning, i.e., drawing inferences and arriving at logical conclusions. It became the dominant approach to AI for the next several decades.
With the Perceptron failing to meet its sponsor’s expectations of human-level intelligence and astute marketing by Symbolic AI proponents, funding for neural networks research evaporated and Connectionism went into hibernation that lasted for more than 15 years.
Japanese computer scientist Kunihiko Fukushima proposed in 1979 the neocognitron, a hierarchical, multilayered artificial neural network, first used for Japanese handwritten character recognition and other pattern recognition tasks. The neocognitron served as the inspiration for the development of convolutional neural networks (CNN) which automatically select the properties or characteristics of the data that are important for the task at hand and are less dependent on humans for their selection compared to other image classification algorithms.
The most important breakthrough in the evolution of artificial neural networks came in 1986 when Geoffrey Hinton, David Rumelhart, and Ronald Williams published a pair of landmark papers popularizing “backpropagation” and showing its positive impact on the performance of neural networks. The term reflected a phase in which the algorithm propagated backwards through its neurons’ measures of the errors produced by the network’s guesses, starting with those directly connected to the outputs. This allowed networks with intermediate “hidden” neurons between input and output layers to learn efficiently.
Three years later, Yann LeCun and other researchers at AT&T Bell Labs successfully applied a backpropagation algorithm to a multi-layer artificial neural network, recognizing handwritten ZIP codes. But given the hardware limitations at the time, it took about 3 days (still a significant improvement over earlier efforts) to train the network.
Computer hardware, however, continued to improve following the trajectory of “Moore’s Law.” Richard Sutton, this year’s winner of the Turing Award, attributed in The Bitter Lesson (2019) all past AI breakthroughs to Moore’s Law “or rather its generalization of continued exponentially falling cost per unit of computation.”
Sutton made this observation to support his attack on Symbolic AI (who, for years, ignored or belittled his seminal work on reinforcement learning) and he was right in his conclusion: “We have to learn the bitter lesson that building in how we think we think does not work in the long run.” But he was wrong about Moore’s Law and the falling cost of computation—it does not explain the 2012 triumph of Connectionism/Deep Learning, what we now call “AI.”
AlexNet, a GPU-supported artificial neural network designed by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton, achieved in October 2012 an error rate of 15.3% in the ImageNet Large Scale Visual Recognition Challenge, compared to the 26.2% error rate achieved by the second-best entry. The combination of special-purpose GPUs, a different type of chip than the general-purpose CPUs that followed Moore’s Law, and neural network algorithms, managed to process more efficiently and accurately the 1.43 million labeled images retrieved from the Web and organized in the ImageNet database.
On December 13, 1999, Nvidia released the GeForce 256 DDR graphics card. It was the first graphics card to be marketed as a “Graphics Processing Unit” (GPU). Six years earlier, Nvidia’s three cofounders identified the emerging market for specialized chips that would generate faster and more realistic graphics for video games. But they also believed that these graphics processing units could solve new challenges that general-purpose computer chips could not because of their ability to break data into multiple components and process them simultaneously.
A key factor in Nvidia’s success was the release in 2007 of its Compute Unified Device Architecture or CUDA. This parallel processing programming language allows developers to leverage the power of GPUs for general-purpose computing. On December 13, 2011, Nvidia announced the opening of the CUDA platform by releasing the compiler source code to enable broader adoption and innovation.
Nvidia’s “accelerated computing” (massively parallel computing) has been applied to many domains, but the rapid development of GPU-based artificial neural networks since 2012—processing images, video, audio, and text—dramatically elevated its fortunes. Nvidia became the world’s most valuable company because the performance of the artificial neural networks at the heart of today’s AI depends on the parallelism of the hardware they are running on, specifically the GPU’s ability to process many linear algebra multiplications simultaneously.
Parallelism is important not just because of the nature of the algorithms, but because what they process is vast quantities of data, quantities that choke and slow down traditional CPUs. “Scientific visualization,” representing numerical and spatial data as images to better understand complex scientific phenomena, was the field that coined the term “Big Data” in the late 1990s. The first Communications of the ACM article mentioning big data, published in August 1999, observed: “…fast computations spew out massive amounts of data. Where megabyte data sets were once considered large, we now find data sets from individual simulations in the 300GB range.”
The gigabytes turned into terabytes which turned into exabytes of data.
In 1986, 99.2% of all storage capacity in the world was analog, but in 2007, 94% of storage capacity was digital, a complete reversal of roles. The Web drove this digitization and data explosion and the development of new data management software and algorithms specially designed to take advantage of big data, or algorithms, such as artificial neural networks, that benefited from big data. It turned out that “learning” by machines powered by artificial neural networks is highly correlated with the quantity of examples presented to them.
The digital data that trains by examples the artificial neural network that is helping ChatGPT “think,” was born at the same time as the concept of artificial neural networks. On December 8, 1943, the prototype of the Mark 1, the world’s first electronic computer, was successfully tested at Dollis Hill, United Kingdom. It was taken apart and shipped to Bletchley Park, where it was delivered and re-assembled on January 18, 1944. As it was a large structure, it was dubbed Colossus by its operators, who used it and the subsequent improved models to obtain a vast amount of high-level military intelligence from radiotelegraphy messages sent between the German High Command and their army commands throughout occupied Europe. By the end of the war, 63 million characters of high-grade German communications had been decrypted by 550 people helped by the ten Colossus computers.
On December 2, 1954, the Naval Ordnance Research Calculator (NORC) was presented by IBM to the United States Navy at the Naval Surface Weapons Center in Dahlgren, Virginia. At the presentation ceremony, this one-of-a-kind, first-generation vacuum tube computer, calculated pi to 3,089 digits, which was a record at the time, and the calculation took only 13 minutes.
Fast calculations, those that “spew out massive amounts of data,” were the first indicator of machine “intelligence” or even “superintelligence,” the definition of “AI” as computers that are as smart or smarter than humans.
As computer technology continued to evolve, it expanded to other “cognitive” functions beyond calculations, but also in ways that drove the expansion and growth of digital data. The networking of computers was an important stage in this evolution.
On December 6, 1967, the Advanced Research Projects Agency (ARPA) at the United States Department of Defense issued a four-month contract to Stanford Research Institute (SRI) for the purpose of studying the “design and specification of a computer network.”
SRI was expected to report on the effects of selected network tasks on Interface Message Processors (today’s routers) and “the communication facilities serving highly responsive networks.” In August of 1968, ARPA sent out a RFQ to 140 companies, and in December 1968, awarded the contract for building the first 4 IMPs to Bolt, Beranek and Newman (BBN). These will become the first nodes of the network we know today as the Internet.
On December 9, 1968, Doug Engelbart demonstrated the oNLine System (NLS) to about one thousand attendees at the Fall Joint Computer Conference held by the American Federation of Information Processing. The presentation later became known as “the mother of all demos,” defined the potential of computer networks as tools for collaboration and the “augmentation of our collective intelligence.”
On December 13, 1977, Bob Metcalfe, David Boggs, Charles Thacker, and Butler Lampson received a patent for the Ethernet, titled “Multipoint Data Communication System with Collision Detection.” The Ethernet became the dominant standard for local PC networks. The first successful application of these networks was email, substituting paper-based memos with digital ones.
The shift from analog to digital was not limited to text. It happened in all types of information including video, images, and audio.
On December 24, 1877, Thomas Edison applied for a patent for a Phonograph that used tin foil cylinders to write and playback music. On December 3, 1998, the Rio portable player was released to stores by Diamond Multimedia. The Rio was the second MP3 player on the market, but the first one to be commercially successful. The device ran on a single AA battery and featured 32 megabytes of storage, enough for about half an hour of music encoded in the MP3 compression format. It retailed for $200.
Special-purpose chips and lots of data brought us modern AI and ChatGPT. The latter, like all Large Language Models, is infamously known for its “hallucinations.” On December 27, 1948, the New York Times published Norbert Wiener’s solution for curing “psychotic calculators,” albeit one based on false premises. From the Times’s review of Wiener’s Cybernetics:
By copying the human brain, says Professor Wiener, man is learning how to build better calculating machines. And the more he learns about calculators, the better he understands the brain. The cyberneticists are like explorers pushing into a new country and finding that nature, by constructing the human brain, pioneered there before them.
Psychotic Calculators. If calculators are like human brains, do they ever go insane? Indeed, they do, says Professor Wiener. Certain forms of insanity in the brain are believed to be caused by circulating memories which have got out of hand. Memory impulses (of worry or fear) go round & round, refusing to be suppressed. They invade other neuron circuits and eventually occupy so much nerve tissue that the brain, absorbed in its worry, can think of nothing else.
The more complicated calculating machines, says Professor Wiener, do this too. An electrical impulse, instead of going to its proper destination and quieting down dutifully, starts circulating lawlessly. It invades distant parts of the mechanism and sets the whole mass of electronic neurons moving in wild oscillations.
The cures administered to psychotic calculators are weirdly like the modern cures for insanity. One method is to overload the calculator with an extra strong electrical impulse in hope that the shock will stop the machine’s oscillations. This is rather like the shock treatment given to human psychotics. Another cure is to isolate part of the calculator’s mechanism, hoping to cut off the source of trouble. This is “like the lobotomies which brain surgeons perform. Lobotomies sometimes work (for both machine and brain) but are apt to reduce, in both cases, the subject’s judgment.


