The news says machine learning is the Next Big Thing. But machine learning is happening way over there, on servers, in universities and in big companies with big data. What is it really, and what does it mean for over here, on mobile, in Swift? Are we –gulp– legacy tech? This talk will present a fast, concrete, down-to-earth survey of machine learning, from the perspective of iOS & Swift, summarizing the main techniques, tools, and learning resources. It’ll outline how TensorFlow is like AVFoundation, how model training is like UI design, and how you can use iOS to gather big (enough) data and to exercise modern models using fast native code.
I’m Alexis Gallagher, CTO of Topology Eyewear, and today I want to talk about machine learning from the point of view of a Swift or iOS developer who may not have encountered it before.
At Topology Eyewear, we make customized glasses. We take a video of your face and from that video, we use computer vision to build a detailed model. From that, we’re able to present an augmented reality view of what different kinds of glasses would look like on your face. You can choose the color, the design, and the size.
With all the information, we take Italian Acetate plastic, which is what’s used for premium glasses elsewhere, and run it through a manufacturing robot to make a pair of glasses.
We use computer vision code to do this; in fact, most of computer vision now involves machine learning, so in a way, I’ve been working at a machine learning company without knowing it.
What is Machine Learning?
Most of the programming we do involves human thought. When you’re defining a function, it requires human analysis and thinking; and that gets turned into a function. Machine learning entails defining a function, not by human analysis alone, but with data as well.
Get more development news like this
As a simple example, suppose I wanted to make a function that would predict how tall a person is. I could measure the height of everyone in this room. Everyone in this room would be called the test set. I could then take the average of all the heights that I’ve measured. Then I’m going to write a function that does nothing but return that average. That’s called executing the model.
We can improve upon this model for example by measuring the heights of all the women in the room, and then all the men in the room, and the function would take the gender of a person into account. This would be a slightly better machine learning model.
I could also measure everyone’s age, weight, shoe size, and I could find a set of coefficients that I would multiply against all of those things that can better predict height. Still probably not the best model in the world, but it would start to get better. If you have big shoes, you’re probably taller.
What I’m describing is a linear regression. Linear regressions are not new, as they were invented by Francis Galton while studying sweet pea flowers in the 19th century.
Why the Hype?
If all of this is not new, why is there all the excitement and conversation about machine learning now?
Today, there is more data in general because of smart phones and the internet. We also have a lot more computering power, so we’re able to do things we used to but now with more data.
Because of the increased amount of data, some of the models we’re building are working in surprising ways. A classical example of this is neural networks. Suppose we have a simple neural network with three input cells that represent imaginary neurons: two output cells, three in the middle, giving us a total of nine variables.
You put some values in the beginning, and you get values in the end. You can think of the three layers as working like a function composition. Instead of having three input variables, we look at every color channel, for every pixel of an image, or instead of having three layers, we have 48 layers?
If we train this on thousands of different images, you will end up with a network like this:
This is a diagram of the structure of the inception version three neural network, which is one of the best networks for classifying an image. You put an image in the beginning, in the end you get a declaration on what kind of an image it is.
At the last WWDC, Apple provided sample code that you can use to run inception on iOS devices. You can run it in a very efficient way with DPU optimized implementation. For example, I could take a picture of a bottle, and it will tell me it’s a water bottle.
It is also guessing it might be an oxygen mask with a 1% chance. I took the picture on an airplane, and I think it spotted other things that look like they’re in an airplane, which happens to contain oxygen masks.
Progression of Machine Learning
In 2014, if you want me to tell you if there’s a bird in a picture, I need a research team to do that. But now, you could probably build something like this in a few days by taking the inception network that’s already been trained and training it differently in the last layers that will allow us to pick out birds.
Machine Learning and Swift
Why should a Swift developer care, and does this affect mobile application development?
Yes, in fact, there are systems already using it: predictive typing on the keyboard, SPAM detection, face detection, face recognition, and OCR. These are all earlier machine learning systems, and there will be more opportunities to automatically process documents and comprehend images.
The state of machine learning reminds me of Swift 1.0. When Swift 1.0 came out, it had features that were unusual in mainstream languages, such as Swift enums. Because so many people paid attention to Swift, and because it was so important, those features suddenly became mainstream.
Can you build your own model in Swift?
There are two phases to any machine learning system:
- Defining the model and training it
- Deploying it and using it.
If you’re building a model now, the reality is, you should be doing it in Python with TensorFlow. Python is used because it’s widely used in scientific computing and TensorFlow along with many other libraries assume you’ll be writing in Python.
In those systems, you can find predefined models, and you don’t need to start from scratch. You bring your own data, and train it for the problem that you’re trying to solve. The opportunity where you could use Swift occurs when you deploy it.
For instance, you could take a TensorFlow model, and bundle the TensorFlow library directly into iOS, or you could host it on a server to have your app talk to TensorFlow.
You could also use technology Apple has given. Apple provides the Accelerate framework and the Metal Performance Shaders framework. Both of these provide primitive building blocks for building neural networks and other machine learning algorithms. Metal Performance Shaders is the way you can get real time inference from the inception network on an iOS device.
What is it like to work on machine learning problems?
On a programming level, it’s surprisingly like programming with AVFoundation, or with CoreImage. These systems all have a deferred execution model where you’re defining an object called a session, or something like a filter, but nothing actually happens until you push the go button.
With TensorFlow, you’re wiring up very primitive operations, like adding one matrix to another or adding a convolutional layer. If you’re working with a higher level API for it, that’s sort of what it feels like.
It’s surprisingly similar to developing a user interface design, in the sense that it’s imprecise. One way to think about this is in terms of unit tests. Imagine you had a code base that worked 95% of the time, and 5% of the time it failed, and you couldn’t really explain why it failed, because it was just too complicated to understand.
Although there’s a lot of mathematics that goes into defining models, and how they fundamentally work, their outputs are difficult to interpret, and so you manage them in the same ways you manage a softer process, like user interface design. You could have a design for user interface. You might have a hunch that this is the right way to do it, but you don’t actually know if it’s any good until you put it in front of users and test it.
Machine learning is a natural technology to be deploying on iOS, these models are good for dealing with input that comes in. For example, capturing an image of a person’s face. Everyone’s face is a little different, and you have to deal with different kind of lighting and contours.
The world around us is not full of rigid, logical, geometric forms. It’s full of amorphous things, blobby things like human bodies. Because iOS devices are what people interact with directly, it’s a good way to work with machine learning.
Machine learning software is the software that’s able to take amorphous inputs and translate them into something that’s more rigid, a specific classification instead of a giant pile of pixels. This is how we’re using it at Topology Eyewear.
Where to Start?
- TensorFlow tutorials and summit videos
- The Stanford Coursera course,
Introduction to Machine Learning
- Siraj Raval makes great videos on YouTube that shows a lot of simple projects made with just a few lines of code.
About the content
This talk was delivered live in March 2017 at try! Swift Tokyo. The video was recorded, produced, and transcribed by Realm, and is published here with the permission of the conference organizers.