How would you test Facebook’s Artificial Intelligence Enforcement against Hate Speech?

This is a real example from the past few days. The Washington Post reported:
At first glance, the Vindicator’s Facebook promotion did not seem designed to make waves.
The small newspaper, based out of Liberty, a Texas town of 75,000 outside of Houston, planned to post the Declaration of Independence on Facebook in 12 daily installments leading up to the Fourth of July — 242 years since the document was adopted at the Second Continental Congress in 1776.
But on the 10th day, the Vindicator’s latest installment was removed by Facebook. The company told the newspaper that the particular passage, which included the phrase “merciless Indian Savages,” went against its “standards on hate speech,”…
The Vindicator’s managing editor, Casey Stinnett, wrote that the newspaper believed that the post had been flagged through an automated process in the piece it wrote about the ordeal. The passage that Facebook blocked, paragraphs 27-31, speak unsparingly of England’s King George III as part of a list of dozens of complaints about the king that follow the text’s much repeated opening lines.


The issue was resolved. The Vindicator reported:
Earlier this evening, July 3, the good folks at Facebook restored the post that is the subject of this article. An email from Facebook came in a little after The Vindicator’s office closed today and says the following:
“It looks like we made a mistake and removed something you posted on Facebook that didn’t go against our Community Standards. We want to apologize and let you know that we’ve restored your content and removed any blocks on your account related to this incorrect action.”
The Vindicator extends its thanks to Facebook. We never doubted Facebook would fix it, but neither did we doubt the usefulness of our fussing about it a little.

And here is what Facebook said about their enforcement in April:
Our policies are only as good as the strength and accuracy of our enforcement – and our enforcement isn’t perfect.
One challenge is identifying potential violations of our standards so that we can review them. Technology can help here. We use a combination of artificial intelligence and reports from people to identify posts, pictures or other content that likely violates our Community Standards. These reports are reviewed by our Community Operations team, who work 24/7 in over 40 languages. Right now, we have more than 7,500 content reviewers, over 40% more than the number at this time last year.
Another challenge is accurately applying our policies to the content that has been flagged to us. In some cases, we make mistakes because our policies are not sufficiently clear to our content reviewers; when that’s the case, we work to fill those gaps. More often than not, however, we make mistakes because our processes involve people, and people are fallible.

Facebook’s enforcement is an immense challenge from which I doubt fallible people can ever be removed. Nevertheless, the question remains: How would you test to avoid this false positive?

Is this really a “false” positive? Whether it’s a historical document or not (I imagine we could find all manner of diatribes from Nazi Germany about Jews and such that are official historical record…) the phrase “merciless Indian Savages” tags correctly as hate speech in my book. If you’re going to post historical records and want them preserved as-is, I’d say this calls for a feature to petition FB (or whoever) to have a human review it, make a determination, and then post it with a disclaimer.

If I’d been involved in testing this feature, I’d have told my leads/managers that it would never work without a lot of human intervention, which would defeat the purpose.

Here’s why:

  • What’s considered hate speech changes from country to country, from person to person, and from time to time. The acceptable language of the 1860s is not the acceptable language of the 1900s, which is not the acceptable language of the 1940s, the 1980s, the 2000s, and so on. Setting up a learning algorithm such as FB is using will progressively eliminate more and more terms and phrases from the permitted lexicon until it becomes impossible to have any kind of meaningful discussion.
  • I’ve yet to find the algorithm that can handle satire, sarcasm, or irony. One common use in Australian culture is to deliberately call friends insulting nicknames, many of them classifiable as hate speech (although I have to admit that the college acquaintance I remember only as “Gonad” may have earned that nickname as a compliment. I just never heard him called anything else). At least one wickedly satirical piece about Australia written as a supposed “preamble to the Constitution” is full of what looks like hate speech where it’s actually poking fun at everyone involved. Brits and the Irish often do the same kind of thing (Swift’s “Sensible Proposal” comes to mind).
  • By blocking what appear to be hate speech, FB would also block discussion of said speech, and ultimately of the attitudes behind that kind of language - which would have the ultimate effect of entrenching those attitudes. People do not often change their minds without being placed in a situation where they are confronted with the evidence that they could be wrong - and even then it can take a long time for the cognitive dissonance to push people into reconsidering. All dictatorial regimes make specific topics unmentionable. Calling the unmentionable topics hate speech makes people feel good about helping to enforce the decrees (yes, I’m getting rather meta here).
  • Back to the original topic, “merciless Indian Savages” was at the time the accepted viewpoint of the people writing the books. By their standards it was objectively true. The indigenous peoples of the Americas at the time were mostly hunter-gatherers or mixed hunter-gatherer and farming communities. Their beliefs and justice system were, to the European culture, barbaric, savage, and merciless. What they thought of the Europeans immigrants was probably equally insulting.
  • The point above leads to an interesting problem: is it hate speech if it’s true? There are a number of factual statements that are absolutely insulting to members of certain groups - does this make stating those facts hate speech? Or should a duty of fact override a duty of not offending? I can’t give an answer to that question, although I personally would lean towards facts over not offending.

I apologize for what’s turned into a lengthy rant. My view is that enforcing any kind of hate speech ban is counterproductive at best, and the rather ranty list above is part of the reasons why. If I had been testing at FB when this was introduced, I probably would have said as much.

1 Like