Algorithms and human journalists need to work together

BY ANDREAS GRAEFE| IN Digital Media | 04/09/2017
As a researcher and creator of automated journalism, I’ve found that computerized news reporting can offer key strengths. I’ve also identified important weaknesses,

 A news article written by an algorithm, PollyBot


Reprinted from The Conversation


Ever since the Associated Press automated the production and publication of quarterly earnings reports in 2014, algorithms that automatically generate news stories from structured, machine-readable data have been shaking up the news industry. The promises of this technology – often referred to as automated (or robot) journalism – are enticing: Once developed, such algorithms could create an unlimited number of news stories on a specific topic at little cost. And they could do it faster, cheaper, with fewer errors and in more languages than any human journalist ever could.

This technology provides an opportunity to make money creating content for very small audiences – even, perhaps, customized news feeds for an audience of just one person. And when it works well, readers perceive the quality of automated news as on par with news written by human journalists.

As a researcher and creator of automated journalism, I’ve found that computerized news reporting can offer key strengths. I’ve also identified important weaknesses that highlight the importance of humans in journalism.


Identifying automation’s abilities

In January 2016, I published the “Guide to Automated Journalism,” which reviewed the state of the technology at the time. It also raised key questions for future research, and discussed potential implications for journalists, news consumers, media outlets and society at large. I found that, despite its potential, automated journalism is still in an early phase.

Right now, automated journalism systems are serving specialized audiences, large and small, with very particular information, producing recaps of lower-league sports events, financial news, crime reports and earthquake alerts. The technology is constrained to these types of tasks because there are limits to what sorts of information it can take in and process into text that humans can easily read and understand.

It works best when handling structured data that is accurate like stock prices. In addition, algorithms can only describe what happened – not why, making it best for routine stories based solely on facts that have little room for uncertainty and interpretation, such as when and where an earthquake happened.

And because the major benefit of computerized reporting is that it can do repetitive work quickly and easily, it is best used to cover repetitive topics that require producing a large number of similar stories, such as sporting event reports.


Covering elections

Another useful area for automated news reporting is election coverage – specifically regarding results of the numerous polls that come out almost daily during major campaigns. In late 2016, I teamed up with fellow researchers and the German company AX Semantics to develop automated news based on forecasts for that year’s U.S. presidential election.

The forecasting data were provided by the PollyVote research project, which also hosted the platform for publishing the resulting texts. We established a completely automated process, from collecting and aggregating the raw forecasting data, to exchanging the data with AX Semantics and generating the texts, to publishing those texts.

Over the course of the election season, we published nearly 22,000 automated news articles in English and German. Because they came from a fully automated process, the final texts often had errors, such as typos or missing words. We also had to spend much more time than we had expected troubleshooting problems. Most of the issues came from errors in the source data, rather than the algorithm – highlighting another key challenge of automated journalism.


Finding the limits

The process of developing our own text-generating algorithms taught us firsthand about the potential and limits of automated journalism. It’s crucial to make sure the data is as accurate as possible. And it is easy to automate the process of creating text from a single set of facts, such as the results of a single poll. But adding insights, like comparing that poll to others in the past, is much harder.

Perhaps the most important lesson we learned was how quickly we reached the limits of automation. When developing the rules governing how the algorithm would turn data into text, we had to make decisions that might seem easy for people to make – such as whether a candidate’s lead should be described as “large” or “small,” and what signals could suggest a candidate had momentum in the polls.

Those sorts of subjective decisions are very hard to formulate into predefined rules that should apply to any situation that has occurred historically – much less to any situation that might occur in future data. One reason is that context matters: A four-point lead for Clinton in the run-up to the election, for example, was normal, whereas a four-point lead for Trump would have been big news. The ability to understand that difference and interpret the numbers accordingly is crucial for readers. It remains a barrier that algorithms will have a hard time overcoming.

But human journalists will have a hard time outcompeting automation when covering routine and repetitive fact-based stories that merely require a conversion of raw data into standard writing, such as sports recaps or company earnings reports. Algorithms will be faster at identifying anomalies in the data and generating at least first drafts of many stories.

All is not lost for the people, though. Journalists have plenty of opportunities to take on tasks algorithms cannot perform, like putting those numbers in proper context – as well as providing in-depth analyses, behind-the-scenes reporting and interviews with key people. The two types of coverage will likely become closely integrated, with computers using their strengths and the humans focusing on ours.



AndreasGraefe is Endowed Sky Research Professor, Macromedia University of Applied Sciences.


Disclosure statement

Andreas Graefe received funding from the Tow Center for Digital Journalism, Columbia Journalism School, and the Volkswagen Foundation for his work on automated journalism. He will also receive funding from the Google Digital News Initiative to continue this work in the context of automated election coverage.


The Hoot is the only not-for-profit initiative in India which does independent media monitoring. Your support is vital for this website. Click here to make a contribution.
Subscribe To The Newsletter
The back story of the huge apology notice published by the Hindustan Times on September 18 (see this Hoot brief) is to be found in the record of sittings of the Privileges Committee of the Lok Sabha. The apology was published three days after the last sitting to which the editor of HT was summoned. The notice given by  Andhra Pradesh MP Jithender Reddy was taken up five times by the Committee  between July end and September 15. This too has fed into the wide ranging speculation over the reason for the resignation of the current editor of the paper, Aparisim Ghosh.                       

Did it really take the Hindustan Times almost six months to figure out that it had got the figures on the attendance  in Parliament of certain MPs, wrong? Or is there more to why it carried a front page apology covering half the page on September 18? It said, "In the edition of March 24, 2017, we had, because of a technical glitch, erroneously reported the attendance in Parliament of certain MPs. Below are the accurate figures. Hindustan Times offers an unconditional apology, and deeply regrets any offence or inconvenience caused." Of the seven MPs whom it said had 100 per cent attendance  not one had it, the paper listed six other names for this statistic. And the list of those whom it said had the worst attendance in Parliament is headed by Abhijeet Mukherjee, the former President's son, who in fact has a figure of 97 per cent attendance.                                    

View More

The Washington Post  is rolling out Talk  a new commenting system that will allow the paper to better engage with readers who comment on its stories and help promote civil conversations. The software was developed by the Coral Project, a collaboration between The Post, the NYT and Mozilla, funded by a grant from the  Knight Foundation. The Post will integrate Talk with ModBot, its AI-powered comment moderation technology.                                                                         

Propublica has built a  Facebook bot which is a tiny computer program that automatically converses with you over Facebook Messenger to determine you experiences with reporting hate speech on Facebook. Its says its objective is to learn more about Facebook’s secret censorship rules and what the social media determines is hate speech. (Nieman Lab)                                       
View More