Fakespeak – the language of fake news

Linguistic cues in fake news may be the key to its detection.

The picture shows the words real and fake.

Photo: Silje Susanne Alvestad.

About the project

Fake news is defined as "news" items in which case the author knows that they are false and intends to deceive. The lion's share of the research on the detection of fake news is conducted by computer scientists alone.

However, linguists have shown that the linguistic features of a text vary according to its purpose. Thus, the language of fake news may be the key to its detection. This is the background of the linguistics-driven project "Fakespeak - the language of fake news. Fake news detection based on linguistic cues". The project involves a core team of linguists and computer scientists based in Norway and the UK.

Methods

The linguists will seek to reveal the grammatical and stylistic features of the language of fake news, referred to as "Fakespeak", in English, Norwegian and Russian.

To achieve this goal they will first build, and make use of existing, corpora of fake and real news from various online media outlets in all three languages, and then subject the datasets to thorough linguistic analyses.

The fake and genuine articles that we will compare will be written by one and the same author. This is to control for several potential sources of error. The linguists will apply methods and draw on insights from corpus linguistics, computational linguistics, applied linguistics, including forensic linguistics, as well as pragmatics and rhetoric.

Taking the linguists' findings as their point of departure, along with existing fake news detection systems, the computer scientists will seek to improve these systems by automating the defining features of Fakespeak.

Objectives

The overall aim of the project is to enable fake news detection systems to discover and flag potentially harmful fake news items in a more accurate, efficient and timely manner than offered by current state-of-the-art systems.

By automating all and only the features of Fakespeak, the project team will enable the systems to detect and flag only deliberate disinformation, excluding, for example, (inadvertent) misinformation, satirical texts, parody, and texts reflecting a certain set of opinions. Thus, the project will take societal safety and security into consideration while at the same time guarding the freedom of speech.

Financing

The Research Council of Norway, project-ID 302573.

Duration

2020-2025.

Cooperation

Events

The annual Fakespeak workshop 2024

Time: 24 January 2024, 10.15–16.15

Venues: HumSam-biblioteket, Georg Svedrups hus, rom 2531 (stort møterom) (first half of the day); Niels Treschows hus, rom 1224 (12th floor) (second half of the day)

Full programme (abstracts below):

10.15	Welcome
10.30–11.15	Jack Grieve (University of Birmingham) "New directions in the linguistic analysis of fake news"
11.15–12.00	Rui Sousa-Silva (University of Porto) "From hallucination to disinformation: A forensic linguistics approach to AI-generated fake news detection"
12.00–12.30	Silje Susanne Alvestad, Nele Põldvere, Elizaveta Kibisova (University of Oslo) Linguistics at Fakespeak
12.30–14.00	Break
14.00–14.45	Petter Bae Brandtzaeg (University of Oslo and SINTEF Digital) "The future of free speech and fake news in an AI-driven society"
14.45–15.30	Morten Goodwin (University of Agder) "AI in the realm of fake speech: Capabilities and challenges"
15.30–16.00	Zia Uddin, Aleena Thomas, Asbjørn Følstad (SINTEF Digital) Computer science at Fakespeak
16.00	Closing

Abstracts:

Jack Grieve (University of Birmingham)
Building on research reported in my recent book The Language of Fake News, in this presentation, I consider directions for the linguistic analysis of fake news. I begin by discussing false fake news, focusing on the recent results of a register-based analysis of a diverse corpus of true and false news conducted with Bashayer Baissa and Matteo Fuoli, comparing these results to the similar register-based analysis of deceptive fake news presented in my book. I then propose two new directions for linguistic research on fake news. First, I outline possible research on the language of fake news by omission, arguing that this approach can facilitate the collection of deceptive news at scale and from across a wide range of authors, providing a basis for generalisable and meaningful empirical research on the language of fake news. Second, I consider the value of language modelling for the linguistic analysis of fake news, highlighting the potential of conducting analysis at the level of word tokens to identify deceptive language within texts, as opposed to conducting analysis at the level of word types to identify deceptive texts holistically, as is common in current research.

Rui Sousa-Silva (University of Porto)
The launch of ChatGPT in late 2022 made users aware that they could generate virtually any text, at any time, instantly. While generative AI systems, based on large language models (LLMs), may have a potential positive impact on several different applications (e.g. language learning), their ability to assist the generation of human-like language has enabled the fabrication of all types of toxic and illegal contents, which in turn raised diverse challenges, including for forensic linguistic analyses. One of the nefarious effects of generative AI is the swift production of disinformation. Although some systems report being protected against such production, experience shows that they can be easily manipulated. This presentation revisits the concept of disinformation in the context of generative AI, and presents possible ways to address it from a Forensic Linguistics perspective.

Petter Bae Brandtzaeg (University of Oslo and SINTEF Digital)
In this talk, I will explore the transformative potential of communicative artificial intelligence (AI) on the foundations of free speech. While optimistic perspectives propose that AI will catalyze and equalize political participation, others express concern that it could potentially exert detrimental effects, thereby threatening our autonomy and free speech. I will discuss how communicative AI shifts our conceptualization of technology from a mere tool to an active partner in decision-making processes and content generation. Two key concepts, "content regimes" and "visibility regimes," will be introduced to elucidate AI's influence on public discourse, fake news and free speech. Content regimes are AI-powered systems, such as large language models, that dictate the nature and type of information we engage with, while visibility regimes control the prominence given to such information. By examining these regimes, I aim to assess their implications on political viewpoints and free speech. I will also present recent empirical work on public perception concerning AI's role in free speech.

Morten Goodwin (University of Agder)
Morten Goodwin will delve into the capabilities and challenges of artificial intelligence in the realm of fake speech. It is well know that technologies like ChatGPT makes it possible to make believable fake text, but did you know that same is now true for vocal content? For example, James Earl Jones's agreement with Disney to allow AI to use his voice, ensuring Darth Vader's continued legacy in future Star Wars films even when he has retirered. Similarly, a case in England demonstrated the potential to access banking services using a fabricated voice, all made possible by AI. In this session, Morten will explore the advancements in AI that allow for the creation of realistic audio from simple text inputs, affecting industries from entertainment to cybersecurity. In his talk, he will navigate the complex ethical landscape, tackling issues of consent and misinformation, and ponder how society might respond as synthetic voices become increasingly indistinguishable from human ones, highlighting the critical need for discernment and sophisticated verification techniques.

The annual Fakespeak workshop 2022

Venue: PAM 389 (This room only takes 16 people, but the whole workshop can also be followed via Zoom.)

Zoom links:

Topic: The annual Fakespeak workshop, Day 1

Time: Nov 14, 2022 09:15 AM Oslo

Join Zoom Meeting

https://uio.zoom.us/j/64430865176?pwd=QnU2cVVaTTRZMGJMTEFraVVWOUZCZz09
Meeting ID: 644 3086 5176
Passcode: 830664

Topic: The annual Fakespeak workshop, Day 2

Time: Nov 15, 2022 09:15 AM Oslo

Join Zoom Meeting

https://uio.zoom.us/j/61103380184?pwd=bVNkTnZKQlIrZHBpSDFLbXYrV2xRZz09
Meeting ID: 611 0338 0184
Passcode: 776473

Workshop programme

Monday, November 14

09:15 – 09:20 Opening

09:20 – 09:55 The Fakespeak project

Introduction, Silje S. Alvestad, ILOS, UiO

The English part of the project, Nele Põldvere, ILOS, UiO and Zia Uddin, SINTEF, Oslo

Collaborating projects:

10:00 – 10:30 “Tech companies’ approaches to fake news”, Bente Kalsnes, the SCAM project, Kristiania University College

10:30 - 11:00 "Navigating social media. experiences from PAR-TS (Pandemic rhetoric, trust and social media)", Jannicke Fiskvik, SINTEF, Trondheim (via Zoom)

11:00 – 11:30 Break, lunch for project members and presenters

Industrial partners:

11:30 – 12:00 “How the Norwegian Media Authority works to fight fake news”, Pernille Huseby, Director of Communications and Consulting at the Norwegian Media Authority (via Zoom)

12:00 – 12:30 “A story of a team of rivals: a historical collaboration”, Helje Solberg, News Director in NRK and Chair of Faktisk

12:30 – 13:00 "Covid, conspiracy and war: thoughts on the challenges ahead", Kristoffer Egeberg, Chief Editor of Faktisk

13:00 – 13:30 Break

Computer science:

13:30 – 14:00 “Challenges and Opportunities in Explainable Fact-Checking”, Vinay J. Setty, University of Stavanger (via Zoom)

14:00 – 14:30 “Content-based Fake News Detection with Logical Tsetlin Machine Rules”, Ole Christoffer Granmo, University of Agder (via Zoom)

Tuesday, November 15

09:15 – 10:00 The Fakespeak project, ctd.

09:15 – 09:35 The Norwegian part of the project, Aleena Thomas, SINTEF, Oslo and Silje S. Alvestad, ILOS, UiO

09:35 – 09:55 The Russian part of the project, Elizaveta Kibisova and Silje S. Alvestad, ILOS, UiO

09:55 – 10:00 Break

Media science:

10:00 – 10:30 “The ritual function of propaganda: Observations about language in Russian collective trolling of Norway”, Johanne Berge Kalsaas, University of Bergen (via Zoom)

Industrial partner:

10:30 – 11:00 “The role of fact-checking in defending Ukraine against Russian aggression” Yevhen Fedchenko, Editor in Chief of StopFake.org (via Zoom)

Computer science:

11:00 – 11:30 “Technologies serving the needs of verification”, Nikos Sarris, CERTH (via Zoom)

11:30 – 12:00 Break, lunch for project members and presenters

12:00 – 14:30 Linguistics

12:00 – 12:30 “Exploring hybrid information operations online: how the Russian Internet Research Agency weaponised text and images”, William Dance, Lancaster University (via Zoom)

12:30 – 13:00 “Lying during speaking and writing: evidence from studying the process in experimentally collected narratives”, Victoria Johansson, Högskolan Kristianstad & Lund University) and Kajsa Gullberg, Lund University (via Zoom)

13:00 – 13:30 “The functional domain of epistemicity in language”, Henrik Bergqvist, Stockholm University (via Zoom)

Collaborating project:

13:30 – 14:00 “New linguistic methods applied to Russian political discourse”, Laura A. Janda, THREAT DEFUSER, University of Tromsø (via Zoom)

14:00 – 14:30 “A taxonomy of fake news for linguistic analysis”, Jack Grieve, University of Birmingham (via Zoom)

14:30 – 14:35 Closing

Start-up workshop, February 16-17

Programme

Tuesday, February 16:

9:15-9:45 Edson C. Tandoc Jr. Scholarly definitions of fake news
9:50-10:20 Sharon Levy On Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection, and related works from William Wang’s lab
10:30-11:00 Geir Hågen Karlsen Influence operations in social and other kinds of media, political communication, one-sided history writing etc. The case of Russia
12:00-12:45 Industrial collaboration partners
12:00-12:20 NTB (Geir Terje Ruud/Sarah Sørheim)
12:20-12:45 Faktisk.no (Kristoffer Egeberg)

Wednesday, February 17:

9:00-9:30 Maite Taboada The language of fake news and misinformation
9:30-10:00 Helena Woodfield and Jack Grieve The language of fake news. Corpus studies
10:15- Potentially collaborating projects
10:15-10:40 PAR-TS (Tor Olav Grøtan)
10:40-11:00 Threat-defuser (Laura Janda)
11:00-11:20 SCAM (Bente Kalsnes)

Published June 17, 2020 11:55 AM - Last modified Jan. 29, 2024 11:02 AM

Contact

Silje Susanne Alvestad, project leader

Fakespeak on Mastodon

Participants

Silje Susanne Alvestad Universitetet i Oslo
Nele Poldvere Universitetet i Oslo
Petter Bae Brandtzæg Universitetet i Oslo
Atle Grønn Universitetet i Oslo
Johan Laurits Tønnesson Universitetet i Oslo
Elizaveta Kibisova Universitetet i Oslo
Asbjørn Følstad
Zia Uddin
Jack Grieve
Till Christopher Lech
Aleena Thomas

Detailed list of participants