Behavioral Economics and GPT-4: From William Shakespeare to Elena Ferrante

There is a new paper on LLMs by Gabriel Abrams, here is the abstract:

We prompted GPT-4 (a large language model) to play the Dictator game, a classic behavioral economics experiment, as 148 literary fictional characters from the 17th century to the 21st century.

Of literary interest, this paper analyzed character selfishness by century, the relative frequency of literary character personality traits, and the average valence of these traits. The paper also analyzed character gender differences in selfishness.

From an economics/AI perspective, this paper generates specific and quantifiable Turing tests which the model passed for zero price effect, lack of spitefulness and altruism, and failed for human sensitivity to relative ordinal position and price elasticity (elasticity is significantly lower than humans). Model updates from March to August 2023 had relatively minor impacts on Turing test outcomes.

There is a general and mainly monotonic decrease in selfish behavior over time in literary characters. 50% of the decisions of characters from the 17th century are selfish compared to just 19% of the decisions of characters from the 21st century. Overall, humans exhibited much more selfish behavior than AI characters, with 51% of human decisions being selfish compared to 32% of decisions made by AI characters.

Historical literary characters have a surprisingly strong net positive valence across 2,785 personality traits generated by GPT-4 (3.2X more positive than negative). However, valence varied significantly across centuries. The most positive century, in terms of personality traits, was the 21st — over 10X the ratio of positive to negative traits. The least positive century was the 17th at just 1.8X. “Empathetic,” “fair” and “selfless,” were the most overweight traits in the 20th century. Conversely, “manipulative,” “ambitious” and “ruthless” were the most overweight traits in the 17th century.

Male characters were more selfish than female characters: 35% of male decisions were selfish compared to just 24% for female characters. The skew was highest in the 17th century where selfish decisions for male and female were 62% and 20% respectively.

This analysis offers a specific and quantifiable partial Turing test. In a few ways, the model is remarkably human-like; The key human-like characteristics are the zero price effect, lack of spitefulness and altruism. However, in other ways, GPT-4 reflects unusual or inhuman preferences. The model does not appear to have human sensitivity to relative ordinal position and has significantly lower price elasticity than humans.

Model updates in GPT-4 have made it slightly more sensitive to ordinal value, but not more selfish. The model shows preference consistency across model runs for each character with respect to selfishness.

To which journal might you advise him to send this paper?