Tech

Can Forensic Linguistics Pin Down the Author of a Trump Tweet?

Using data to assess whether the president's lawyer really wrote a message that renewed questions about obstruction of justice.

December 8, 2017

By Ben Zimmer

The Atlantic

On Saturday, a single tweet from Donald Trump—or at least one from the @realDonaldTrump Twitter account—seemed to turn everyone into an amateur forensic linguist, sifting for textual clues. Forty words (and an exclamation point) were enough to set off a frenzy of analysis: “I had to fire General Flynn because he lied to the Vice President and the FBI. He has pled guilty to those lies. It is a shame because his actions during the transition were lawful. There was nothing to hide!”

Trump’s personal lawyer, John Dowd, took the hit for a message that, some legal analysts argued, provided evidence that the president had obstructed justice. Dowd told CNN, Axios, and ABC News that he wrote those 40 ill-advised words. A corroborating source described the phrasing to ABC News as “sloppy.” The implication that Trump knew Flynn lied to the FBI may land the president in legal hot water with the Mueller investigation, regardless of whether Dowd really wrote the tweet or if he was just ineptly covering for his client.

But real forensic linguists offer a more complicated picture than internet sleuths. In an earlier piece, I suggested that a single tweet was not enough evidence to dust for linguistic fingerprints. And a single word couldn’t be a dead giveaway either, no matter how much people would like to portray the use of pled rather than pleaded as an obvious Trumpian solecism, especially when Dowd himself has been documented using pled at least once.

All the experts I’ve talked to since the tweet was published agree that there is no way to rule definitively whether it was written by Trump or Dowd—or by Trump and Dowd together, or by someone else entirely. But it’s still possible to analyze the tweet according to its linguistic features.

Two scholars at the University of Birmingham, Jack Grieve and Isobelle Clarke, have been conducting some linguistically informed data analysis of Trump’s Twitter style. Using methods previously applied to a study of trolling on Twitter, Grieve and Clarke have mined the complete corpus of tweets from @realDonaldTrump, helpfully collected in Brendan Brown’s Trump Twitter Archive, in order to discern stylistic and syntactic patterns. While their research is still in the preliminary stages, the latest controversy over the Flynn tweet was too good an opportunity to pass up.

Grieve and Clarke refrained from speculating about authorship, noting that this would require first determining which @realDonaldTrump tweets are actually written by the president, and then comparing those confirmed Trump tweets to a similar collection of texts known to be written by Dowd. That’s the type of approach that allows forensic linguists to determine, for instance, whether a famous letter purported to be written by Abraham Lincoln might actually have been written by his secretary, John Hay.

In the absence of such clear-cut authorship data, Grieve and Clarke looked at 65 linguistic features to see how typical or atypical the Flynn tweet (or as they call it, the “pled tweet”) is compared to the overall oeuvre of @realDonaldTrump. They found a number of features that are highly atypical for that Twitter account.

While many observers have focused on the word pled, Grieve and Clarke find it more noteworthy that it is preceded by the auxiliary verb has. This type of “perfect” aspect in the verb construction is actually rather rare in the @realDonaldTrump archive, occurring only 7 percent of the time.

The tweet displays even rarer linguistic features, all appearing in less than 5 percent of the overall corpus 21,320 tweets. (Retweets have been filtered out.) “Specifically, ‘had to’ occurs in only 30 tweets, ‘those’ as a determiner occurs in only 43 tweets, existential ‘there’ occurs in only 262 tweets, and ‘because,’ which is used twice in the questioned tweet, occurs in only 280 tweets.” That @realDonaldTrump uses because so infrequently may reflect how seldom Trump—and those who speak on his behalf—explain their reasoning.

Grieve and Clarke build up from these basic linguistic features to look at “dimensions of stylistic variation” in the @realDonaldTrump collection. This kind of analysis allows for a deep dive into what the tweets might be achieving pragmatically, not just how they are structured. So, for instance, the absence or presence of certain features are associated with dimensions they call opinion(“overt claims related to judgment and beliefs of the poster”), prediction (“claims about plans, upcoming events, and the future state of the world”), and critique(“criticism of various people and policies”). The Flynn tweet has an unusually high score in the opinion and critique dimensions and a low one in the prediction dimension.

But even if the tweet is, according to this analysis, substantially non-Trumpian in various aspects of its style, we still cannot necessarily conclude that Dowd was therefore the real author. It could still come from Trump but its language may appear unusual simply because its rhetorical purpose is unusual compared to his account’s tweets overall. Another major confounding factor is that the @realDonaldTrump account is not always controlled by the real Donald Trump—though some analysts have tried to find clues to determine when it is.

In August 2016, with the presidential campaign raging, David Robinson, a data scientist at Stack Overflow, published a text analysis of @realDonaldTrump, distinguishing tweets that Trump actually wrote from those composed by his staff. At the time, Trump was using his Samsung Galaxy smartphone for Twitter, but other presumably staff-written tweets were identified by their metadata as coming from an iPhone rather than an Android device. Robinson compared the Android and iPhone tweets and determined that they were stylistically quite different, with the Android tweets typically displaying more angry and negative language than the benign pronouncements from the staff iPhone.

Unfortunately for Twitterologists, Trump himself started using an iPhone in March 2017, so the metadata could no longer distinguish between the president’s preferred device and the one used by his staff. But the thousands of tweets collected during the campaign, when the Android/iPhone divide was stark, can now serve as training data to tell whether a new Twitter dispatch looks more like one of the old Android tweets presumed to be from Trump or one of his staff’s old iPhone tweets. Ben Sugerman, a senior scientist at the space and defense firm Areté Associates, created the site Did Trump Tweet It?, applying a machine-learning approach to tackle this question for every tweet emanating from @realDonaldTrump (and every one from the official @POTUS account for good measure).

So what does Sugerman’s site say about the Flynn tweet? Contrary to the findings of Grieve and Clarke that the tweet is in many ways not typically Trumpian in style, Did Trump Tweet It? estimates the probability that Trump wrote those 40 words as 96.2%. While that seems like a big discrepancy, Sugerman acknowledges that there is no hard-and-fast “ground truth” for determining whether a particular tweet is from Trump or not. The fact that the features that Sugerman analyzes (which he details here) say that the Flynn tweet matches up well with the tweets Trump seemed to pen during the campaign days is hardly conclusive.

For instance, if Dowd was indeed the primary author, was he trying to inhabit Trump’s voice to make it seem genuine—right down to that final exclamation point? That might explain the high score from Did Trump Tweet It?, but the analysis from Grieve and Clarke may be picking up on more subtle grammatical nuances, like the auxiliary verb has in has pled, or the because twofer. Those are examples of “function words” (including prepositions, pronouns, and conjunctions) that usually fly below the radar of our linguistic consciousness, as the University of Texas sociologist James W. Pennebaker argues in his book The Secret Life of Pronouns. It’s easy enough to add a Trumpian exclamation point at the end of a tweet, but there are other aspects of his style that might be more difficult to mimic.

If nothing else, the conflicting analyses of Trump’s tweet offers a window into how forensic linguists tackle these issues, and how the complexity of communication so often makes it difficult to find tidy solutions to linguistic whodunits.

NEXT STORY: Army Researchers Will Have a Solution For Fake News Right After the Next U.S. Election