Paola Mata will introduce us to the natural language processing APIs, an underutilized but powerful set of APIs that have been updated for iOS 11 and explore the possibilities of harnessing their power to improve the user experience in apps.
Introduction
In this talk, I will cover Natural Language Processing. This year I had the pleasure of attending WWDC, and I was able to attend a session on natural language processing. Going into it, I knew little about the topic, except that it had to do with machine learning and linguistics.
What exactly is Natural Language Processing (NLP)? It’s a field in computer science, artificial intelligence, and computational linguistics. It’s concerned with the interactions between computers and humans using “natural language” (i.e. human language).
Core ML Framework
One of the most exciting announcements at WWDC was the new Core ML framework. What’s really cool about Core ML is that it allows us to incorporate machine learning into our apps without having much prior knowledge of how it works.
You can get up and running really quickly. The Core ML framework also supports several domain-specific frameworks. One of these is Vision, which allows us to add high-performance image analysis in computer vision to our apps. Another is GameplayKit, which allows us to architect and organize the game logic in gaming apps. Last is Foundation, which deals with natural language processing.
The NLP APIs are already being used, and have been for a while, in some of the apps that we already know and love - most notably, Siri!
Who should use the APIs?
Anyone whose app handles natural language as the input or output can use the APIs. For example, it can be useful if your app consumes a feed from an API, or maybe your users generate content within your app in the form of typed text, recognized handwriting, or transcribed speech.
Once we have raw text, we want to convert it into useful information that we can then use to improve the experience between our user and the device, or between two devices.
Let’s try to understand what we mean by useful information. To do that, we have to look at the fundamental building blocks of natural language processing. This starts with the concept of Tokenization, segmenting text into a specified unit, which can be a paragraph, sentence, or a word. Tokenization allows us to accomplish other tasks including:
- Language recognition
- Part of speech identification - determining whether a particular word may be a noun or a verb, etc.
- Lemmatization, which is a fancy NLP word that essentially means getting the root form of a word. For example, in English, that would mean the word without any pluralization or verb tenses, or maybe removing the possessive form from a word. Spanish is a little bit more complicated because there are a ton of irregular verbs.
- Name entity recognition - identifying whether a word or a set of words corresponds to a person, an organization or a company, or maybe a location.
Code Examples
Overview
The NSLinguisticTagger
class is a Foundation class, which means it’s available across all platforms, and it helps us with much of the language processing. The class has been around since iOS 5.0.
Whats New in NSLinguisticTagger
:
- The concept of units, which allows us to tag a specific body of text, sentence, or individual words.
- Ability to check for available schemes, using the function
availableTagSchemes
. - Additional language support for 52 other languages.
Language Identification
Now let’s take a look at the code. I played around with language identification in Playground because it is a quick way to get up and running. First, initialize our NSLinguisticTagger
with specific tagSchemes
we’re interested in. In this example, we’re only interested in the .language
tagScheme
. So we set the string on the tagger, like this:
//: Playground
import Foundation
let tagger = NSLinguisticTagger(tagSchemes: [.language], options: 0)
tagger.string = "Fuimos al cine y despues a tomar un helado"
let language = tagger.dominantLanguage
The string is a sentence in Spanish. And then we just call dominantLanguage
and we get the response that corresponds to Español.
Tagging Tokens
Tagging tokens is similar but a little bit more complex:
private func tag(text: String, scheme: NSLinguisticTagScheme, unit: NSLinguisticTaggerUnit = .word) -> [TaggedToken] {
let tagger = NSLinguisticTagger(tagSchemes: NSLinguisticTagger.availableTagSchemes(for: unit, language: "en"),
options: 0)
tagger.string = text
let range = NSMakeRange(0, text.utf16.count)
let options: NSLinguisticTagger.Options = [.omitWhitespace, .omitOther]
var taggedTokens: [TaggedToken] = []
tagger.enumerateTags(in: range, unit: unit, scheme: scheme, options: options) { tag, tokenRange, _ in
guard let tag = tag else { return }
let token = (text as NSString).substring(with: tokenRange)
taggedTokens.append((token, tag, tokenRange))
}
return taggedTokens
}
We initialize our linguistic tagger because we do not know which unit might be passing into this function. I’m using the available tagSchemes
and specifying the language.
For options, I want to ignore any white space and anything that might be identified as unknowns, So I call .omitWhitespace
and .omitOther
.
Then, we enumerate through each token and pass in our range, unit, scheme, and options. When the closure is called, it takes the arguments tag
and tokenRange
.
Name Entity Identification
Name Entity Identification is not very different:
func tagNames(text: String) -> [TaggedToken] {
let tagger = NSLinguisticTagger(tagSchemes: [.nameType], options: 0)
tagger.string = text
let range = NSMakeRange(0, text.utf16.count)
let options: NSLinguisticTagger.Options = [.omitWhitespace, .omitPunctuation, .joinNames]
let tags: [NSLinguisticTag] = [.personalName, .placeName, .organizationName]
var taggedTokens: [TaggedToken] = []
tagger.enumerateTags(in: range, unit: .word, scheme: .nameType, options: options) { tag, tokenRange, _ in
// Make sure that the tag that was found is in the list of tags that we care about.
guard let tag = tag, tags.contains(tag) else { return }
let token = (text as NSString).substring(with: tokenRange)
taggedTokens.append((token, tag, tokenRange))
}
return taggedTokens
}
In this case, we know which tagScheme
we want to use, so we can specify that when we initialize the tagger. Again, we pass in the string and the range.
A difference here is that under options, we’re including .joinNames
. So in a text that contains, for example, personal names, the first and last name will be joined. A city name such as New York that has multiple words, will also get joined into one word.
In this case, we’re looking for three specific tags: I use an array that contains .personalName
, .placeName
, and .organizationName
. Any tags are checked against that array to make sure it has the ones that are relevant to my needs. As before, an array with my tagTokens
is returned.
Demo App
I came up with a sample app to show several examples of this in action. The first one is a fantastic article from BuzzFeed about what happened in that elevator with Jay-Z and Solange. If you haven’t read this article, I recommend it.
The app scans the text and analyzes it for parts of speech. In the middle of the screenshot, I’m looking at verbs.
Next, we’ll look at lemmatization. In this example, I have text from a random article about why you should never trust the Rotten Tomatoes movie review site.
Here, I enter a search term, and that term is lemmatized, so it will be matched with all of the text that I input in my text view, in any form. For example, type in “movie”, and the app highlights all occurrences of “movie” in the text, including the plural form, “movies”, and the possessive form, “movie’s”.
For anyone interested, my sample projects are on GitHub.
Benefits of Natural Language Processing APIs
- Apple uses the same APIs, which makes the user experience consistent.
- The actual processing is completed on the device, which ensures user privacy.
- It’s performant.
Problems
Limits to Name Entity Identification
The name entity identification is limited. I’m in tech, so I decided to analyze an article that mentioned a lot of tech companies, including Twitter and Facebook and BuzzFeed. For some reason, those companies were not recognized as organizations! However, others like Squarespace and maybe Shutterstock and Tumblr were.
Language identification
Language identification within the same sentence does not work the best. I tried to pass along some “Spanglish” to experiment, but if I changed one word in a sentence, then it would affect how the other words were analyzed.
Other Projects
If you want more information about the NLP APIs, here are a few links for following up on them.
A WWDC Session had two really good examples of how to incorporate the APIs into hypothetical apps - Winnow and Whisk.
Ayaka Nonaka had a really great talk at Realm a couple of years ago on natural language processing. She uses the APIs to train a model to identify spam. It’s a couple of years old, so it’s not using the Swift 4 APIs, but it’s pretty similar and you can follow along.
Martin Mitrevski has a great recent blog post where he uses a simple algorithm along with the NLP APIs to find key terms in some of his blog posts.
About the content
This talk was delivered live in September 2017 at try! Swift NYC. The video was recorded, produced, and transcribed by Realm, and is published here with the permission of the conference organizers.