Slug brandon kase cover

Grokking Lazy Sequences & Collections

When we code, we love to abstract over groups of same-shaped things. In Swift, we do so with Sequences and Collections, and transform them with operators like map, reduce, and filter. Sometimes we don’t want do our transformations until we actually need them – for example, if the transformations are computationally expensive. In this talk, Brandon describes what makes lazy sequences and collections tick, how to define new operators for them, and when you should consider using lazy sequences and collections in your code.


Introduction (00:00)

I am Brandon Kase. I work at Highlight on the SAP shorts. I first became acquainted to lazy sequences and collections when I was working on trying to find a good solution to this problem we were having. I thought that the best way to explain it would be to go through that situation.

Imagine you have a bunch of photos in a stack. These are your photos or your user’s photos. Let’s say your user has 10,000 photos, there should be 10,000 photos in the stack. Imagine you are looking at your phone from the top down. You see only one picture at a time, but you can swipe through them fast. Think about how you would implement this.

We need two views, at least, because we see two different cards. It would be nice if we had all the data in one structure; that way our views can project out pieces of the data that we need. However, there are many photos (e.g. 10,000 photos on their phone), and loading a photo’s data is expensive. You have to poke at some database somewhere. It does iO. Let’s say you do not care about photos. Specifically, if you ever have many things and loading one of those things is expensive, when you load all those things, it is probably expensive.

Holding Things (02:11)

In the photos framework there is this nifty API call that gives you all of the photos on a device, and it does that fast. It only loads the metadata of a specific photo when you poke at it. But it is an Objective-C structure, and we like Swift. In order to do Swifty, we want to transform our collections, and wrap it. You might think of arrays.

let fetchresult = PHAsset.fetchAllAssets/*...*/()

This is something, the API call, it is not exactly that, but we can get all the assets, and we get this fetch result. There is this enumerate assets using block, and that block is called for every single piece of photo metadata. Then we can append it to an array.

let fetchresult = PHAsset.fetchAllAssets/*...*/()
var arr = [PHAsset]()
fetchresult.enumerateAssetsUsingBlock { obj, /*...*/
  arr.append(obj as! PHAsset)
}

But, if we run this: it is slow :snail:. The first line is fast but, when we enumerate over everything, we have a problem. We need some other way to hold stuff. We cannot use arrays, but we can use a collection.

Iterator (04:10)

A collection is an indexable sequence. A sequence is an iterator maker, and an iterator can produce one or more elements. This is Swift 3 (future proof). Eventually, we will maybe produce nil, and that means we are done.

protocol IteratorProtocol {
  associatedtype Element
  mutating func next() -> Element?
}

Let’s say we want to iterate over the even numbers. We are going to have some internal state. Our next function is going to take a snapshot of the state, increment it by two, and then return that old snapshot of state. It starts at zero, the next time we call it will be two (the next one four, and so on).

struct Evens: IteratorProtocol {
  var state: Int = 0

  mutating func next() -> Int? {
    let curr = state
    state += 2
    return .some(curr)
  }
}

We can make our iterator, and print the first five. We can loop five times and call next. We know it is not nil because we just wrote that code. We get zero, two, four, six, eight.

Get more development news like this

let s = Evens()
for _ in 1...5 {
  print(s.next()!) // 0, 2, 4, 6, 8
}

But… this is bad code. Our for loop should iterate over the things. We should not have to use an explicitly unwrapped optional. Sequences can fix this.

Sequence (05:47)

A sequence is an iterator maker. To make Evens sequence, we stick it at the end of the definition of the struct:

struct Evens: IteratorProtocol, Sequence

Now we can just do this, and it is awesome.

for x in Evens().prefix(5) {
  print(x) // 0, 2, 4, 6, 8
}

There is this method prefix on sequences. We take the first five, and we can just say for X in that and print that out. This is much better. However, a sequence had to be an iterator maker. We did not make an iterator, we were an iterator. Why does this work?

In the standard library, this is defined. On sequences, if the iterator for your sequence is yourself, and you yourself conform to the iterator protocol, then make iterator is simply returning yourself. Now we are good. We can conform to sequence.

extension Sequence where
    Iterator == Self,
    Self: IteratorProtocol {
  func makeIterator() -> Iterator {
    return self
  }
}

What else can you do with sequences? (06:58)

for x in someSequence {
  print(x)
}

You can iterate over things. You can also map, filter and reduce… and you do not have to write those yourself. You can take the first few elements and you can skip the first few elements (you can do many things, but perhaps not everything that you might want to do).

A sequence must be able to produce an iterator (but that is all the guarantees that you get). When you iterate, you may have some sequence, and you iterate over it. Later, you may iterate over it more. It could give you the same or different stuff, it depends on your sequence. You could write a sequence that returns one three times and then stops. Or your could write a sequence that returns one forever. You do not know what is going to happen. Sometimes you want to know more stuff - we can use collections for that.

Collection (07:57)

protocol Collection: Sequence, Indexable

A collection is an indexable sequence. We are going to make our photo metadata a collection. We have a bunch of photos. Start index is the first index of your data in your collection. End index is the index after the last piece of data.

struct PhotosMetadata: Collection {
  /*...*/
  var startIndex: Int { /*...*/ }
  var endIndex: Int { /*...*/ }
  func index(after i: Int) -> Int { /*...*/ }
  subscript(i: Int) -> PHAsset { /*...*/ }
}

This method (index after), given some index, it gives you the next index. The one that does something useful for us is the subscript. When we provide an index, you get a PHAsset, and that is your photo metadata. Why do we need to deal with the index crap? It is useful sometimes, it is a good design.

Collection rules (09:26)

  • We have addressable positions; that means we can index.
  • You must support multi-pass iterations. If you iterate over a collection over here and you iterate over it again, it should be the same thing. You should get the same elements in the same order.
  • There exists a one-to-one mapping of indices to elements. There cannot be an infinite collection because you cannot have an infinite number of indices.
  • You should expect constant time subscript access.

Let’s take a look at an example of non-integer indices (see video). We have a linked list. The index, if we want to make a linked list a collection, is the actual node of the link list. Now, let’s think about it. The head of the list is the start index. The end index, one after the last index, is nil. To get from one node to the next node, to get from one index to the next index, all you do is your traverse the next arrow. Given a node, you can produce the value in constant time. that is the reasoning for those index stuff.

Let’s make that photo metadata collection. struct PhotosMetadata: Collection {

private let base: PHFetchResult

init(_ base: PHFetchResult) {
  self.base = base
}

We can wrap the PHFetchResult (what we get back from the API call). Our startIndex is zero.

var startIndex: Int {
  return 0
}

Usually it is zero if you are wrapping some collection that is backed by integer indices.

var endIndex: Int {
  return base.count
}

Your endIndex is usually the count of the number of elements in your collection. If you have integer indices, to go from one index to the next index, all you do is you add one.

func index(after i: Int) -> Int {
  return i + 1
}

We can get the data out by subscripting into our base.

subscript(i: Int) -> PHAsset {
  return base[i] as! PHAsset
}

You might have to do a method call (not the subscript, but it is similar). Now we can deal with our photos, with our photo metadata.

let a = PhotosMetadata(PHAsset.fetchAllAssets/*...*/())

This line of code runs fast. However, our PHAssets are Objective-C classes, and we want to go into struct land and Swift. We want to turn all of our PHAssets into this photo struct:

struct Photo {
  let url: NSURL
  /*...*/
}

It will have a URL (e.g. the path on disk). We have our PhotosMetadata. Then we map. We take our metadata, we map over it and we say “for every piece of PHAsset, I want to turn that into a photo struct”.

let metadata: PhotosMetadata
let photos = metadata.map{ Photo(url: $0.url) }

However, this is slow, because we are looking at every single piece of metadata. Instead, we want to delay the metadata load. If you do not care about photos, we want to delay operations on our group of items.

Lazy Groups of Things (12:44)

In Swift, when you have a LazySequence or a LazyCollection, all operations on that lazy sequence or lazy collection are delayed until the information is forced out (forced, as in you want to access this).

A LazySequence is a sequence:

protocol LazySequenceProtocol: Sequence

A LazyCollection is a lazy sequence and a collection.

protocol LazyCollectionProtocol: LazySequenceProtocol, Collection

Normal map (13:23)

For an eager sequencer collection, eager is the opposite of lazy.

// eager sequence/collection
[1,2,3].map{ $0 + 1 } // [2, 3, 4]

A normal Swift array is an eager collection. When we map over it, we get an array back. We are adding one to every element and we are getting two, three, four.

LazySequence map (13:43)

Evens().lazy.map{ $0 + 1 }
    // LazyMapSequence<Int, Int>
// nothing is computed yet!

When we take a sequence (our Evens that we made), there is this property lazy that exists on every single sequence in Swift. It says: all future operations are now delayed. Instead of giving us back something that has one added to everything, we get this lazy map sequence struct.

LazyCollection map (13:23)

If we take a Swift array, which is a collection, and we say lazy, the lazy property exists on every single Swift collection, and it says all future operations should be delayed, we get back this other thing.

[1,2,3].lazy.map{ $0 + 1 }
    // LazyMapRandomAccessCollection<Int, Int>
// nothing is computed yet!

This is a random access collection; we get a LazyMapRandomAccessCollection that turns integers into integers (that is what that typed means).

protocol LazyCollectionProtocol: LazySequenceProtocol, Collection

Map is defined on sequences, lazy sequences, lazy collections. Lazy collection is a lazy sequence. Lazy sequence is a sequence. We are overriding map, but we returned a different type. We did not return the same type every time. How can an overridden method return different types?

Protocol Default Implementations (15:11)

You have some protocol A.

protocol A { }
extension A {
  func woo() -> Self { return self }
  func hi() -> String {
    return "A"
  }
}

We can define two methods: woo, which returns this thing that conforms to the protocol, and hi, which returns a string that is just the letter A. We can make a struct that conforms to the protocol.

struct ConcA: A {}
print(ConcA().woo().hi()) // "A"

We can call those methods; it will work, and we will print A. We can make another protocol, B:

protocol B { }
extension B {
  func hi() -> Int {
    return 0
  }
}

We can define the method hi that returns an integer (not a string). It is going to be the number zero. If we have some struct that conforms to A and B, then this will not compile.

struct Mystery: A, B { }
print(Mystery().woo().hi()) // compile error!

The .woo will compile, but if you then call .hi, there is an ambiguity (Swift does not know - should I pick the one that returns a string or the one that returns an int).

If you say “B does everything that A does, and then just kidding, hi does something else”, Swift knows what to do.

// now B subsumes A
protocol B: A { }
extension B {
  func hi() -> Int {
    return 0
  }
}

When you say that you conform to B, you also conform to A. You can call woo, which only exists in A, but then you can call the hi from B. That is not a compiler (this is pretty cool!).

struct Mystery: B { }
print(Mystery().woo().hi()) // 0
// NOT a compile error!

Let’s take a step back. Think about sequences in collections, and specifically, lazy sequences, lazy collections. Swift has a bunch of these transformations, a bunch of these things that you can do to these lazy sequences and it has many things you can do to lazy collections. What you get is the lazy collection automatically, for free, gets everything that a lazy sequence can do, but then if certain operations you need to optimize, and if those optimizations require different return types, you can still do it.

LazyMapSequence? (17:31)

struct LazyMapSequence<S: Sequence>: LazySequenceProtocol {
  func makeIterator() -> LazyMapIterator</*...*/> { /*...*/ }
}

A lazy map sequence wraps a sequence, and it conforms to LazySequenceProtocol. Every single sequence in Swift must be able to produce an iterator. The iterator that we produced is a LazyMapIterator. The LazyMapIterator wraps some iterator. We are going to call this transformation on the elements as we are iterating over them. Our next step is just get the next element and call the transformation on it if that element is not nil.

struct LazyMapIterator<Base: Iterator, Element>: IteratorProtocol, Sequence {
  var _base: Base
  let _transform: (Base.Element) -> Element
  mutating func next() -> Element? {
    return _base.next().map(_transform)
  }
}

Let’s look at lazy map sequence again (and a picture of a burrito).

struct LazyMapSequence<S: Sequence>: LazySequenceProtocol

Inside the burrito there is the sequence that you are wrapping. The tortilla is the lazy map sequence. If we do a bunch of stuff to our collection or sequence, we are going to be wrapping it in many tortillas.

let a = [1,2,3].lazy.map{ $0 + 1 }.filter{ $0 != 3 }
// a: LazyFilterBidirectionalCollection<
//       LazyMapRandomAccessCollection<
//        Array<Int>, Int
//       >
//     >

Here we take a Swift array. We say: all future operations must be delayed. We map and then we filter. On the outside, the tortilla will be the filter. We will have a map tortilla, and our sequence or our collection. Tortillas everywhere.

Let’s now go back to our photos. If we stick a .lazy in there, that line of code executes fast:

let metadata: PhotosMetadata
let photos = metadata.lazy.map{ Photo(url: $0.url) }
// not slow!

Now let’s change our data more. To prepare to draw things, we will filter out all the videos, make our photo structs, and make some other struct about what we are going to draw in our view. Look at that code (it is pretty nice).

func prepareData(d: PhotosMetadata) -> ? {
  return d.lazy
   .filter{ $0.isPhoto() } // skip videos
   .map{ Photo(url: $0.url) } // make photos
   .map{ PhotoView(photo: $0) } // make view
}
LazyFilterBidirectionalCollection<
  LazyMapRandomAccessCollection<
    LazyMapRandomAccessCollection<
      PHAsset, Photo
    >, PhotoView
  >
>

But what does this return? What is the return type here? That is the problem. We can use tortilla inference to not think about it. We want Swift to magically tell us what all of our tortillas are, without us having to write it down. By that, I mean type inference.

Type Inference (20:07)

let m = d.lazy.map/*...*/
return m.first

Swift, local variables within a function, you do not need to give type annotations for. If you do a bunch of stuff in a function and you are not returning the thing that you have produced, but maybe some piece of it (e.g. the first element), you never need to write down the type (and life is good). Sometimes you need to return int. We can use tortilla erasure.

Type Erasure (20:56)

The idea here is you have some burrito that is gross because it is all tortillas and there is a tiny bit of meat in it. We do not want that. What we want is one piece of tortilla. We are going to smash them all into one. This is called type erasure.

let m = AnyCollection(
  d.lazy.filter{}.map{}/*...*/
)
// m: AnyCollection<PhotoView>

In Swift, if we have a collection (any sort), we can wrap it in an AnyCollection, and we get back the type of m now is something that we can maintain. It is one tortilla in AnyCollection, and then our final result. It is any collection of the photo views, if we do those filters and maps. We are trading type information for maintainable code. In the future, when you add another map transformation, you do not have to change the type signature of your function. On the other hand, the compiler is losing information about what your type is.

Type erasure in Swift, we have AnySequence and AnyCollection. We have a bunch of other variants on AnyCollection (that depends on the type of the index, which I do not want to go into, but it is in the back of this deck). We go back to the photos now.

func prepareData(d: PhotosMetadata) -> AnyCollection<PhotoView>

We can write down our type signature and we are happy, the code runs, we are good. Then we go and show our product team. Then they say, “I do not want every photo. I want every other photo - skip the first one, show the second one. Skip the third one, show the fourth one”. We need a new tortilla.

For an array, we can say, make it lazy, subsequent operations are lazy. Then call every other. What we want to print is two and four here. We are skipping one, we are skipping three.

let a = [1,2,3,4]
for x in a.lazy.everyOther() {
  print(x) // 2,4
}

The tortilla sequence is a.lazy.everyOther() sequence. It wraps some sequence. That is the base. Every single sequea.lazy.everyOther() iterator, which wraps an iterator. a.lazy.everyOther() iterator stores the iterator it wraps, and then calls next twice and throws away the information from the first next, every single time next is called on this iterator.

On every single lazy sequence in the universe, we want to have this method every other be available.

// the tortilla sequence
struct LazyEveryOtherSequence
    <S: Sequence>: LazySequenceProtocol {
  var base: S
  func makeIterator() -> LazyEveryOtherIterator<S.Iterator> {
    return LazyEveryOtherIterator(base: base.makeIterator())
  }
}

It will take your current sequence and treat that as the meat of your burrito, then wrap up that burrito in a LazyEveryOtherSequence tortilla.

struct LazyEveryOtherIterator
    <I: IteratorProtocol>: IteratorProtocol {
  var base: I
  mutating func next() -> I.Element? {
    if let _ = base.next() {
      return base.next()
    } else {
      return nil
    }
  }
}
extension LazySequenceProtocol {
  func everyOther() -> LazyEveryOtherSequence<Self> {
    return LazyEveryOtherSequence(base: self)
  }
}

You might be wondering, I do not want to go through this stuff, I do not want to do all this, I do not want to jump through these hoops. You could do that, and that may work. But you may go to product and they may ask for changes (“We want every fourth photo, not every other photo”). If you implemented every other in this way, you just call every other twice.

let skipped = photos.everyOther().everyOther()

The idea is if you build an operator with this machinery, you can stick it in different places. You can map and filter, and do every other and then do it twice in a row (you can do whatever you want!). This is precisely why you want the Swift machinery.

Recap (24:33)

  • Collections hold our photo metadata and photos. We wanted to hold stuff: we could not use arrays, we used a collection. We tried to map over it, but it was too slow because we did not want the maps to be applied.
  • We used a LazyCollection’s map that would postpone the transformation to later (Our types became wild… too many tortillas!).
  • We can use AnyCollection to give us maintainable type signatures. If we need to do other things to our data, we can create new operators.
  • If we create operators that can compose, they are more powerful.

I used the IBM Swift Sandbox, a website that runs Swift 3. It is lightweight. I also used Unary - you can wrap any single collection that is indexed via integers into a collection that is indexed via unary numbers specified by strings of the letter X. If that is interesting to you, check out the Appendix.

Q & A (26:06)

Q: When you called each other twice, normally you would just go in and change the method, but you are saying because it is fast because it is lazy, you can just chain them together? Is that the idea?

Brandon: It is probably less efficient than just changing the method, if you want to squeeze out crazy performance. I would argue that your code, it is readable, I guess. I do not know. It is a contrived example.

Q: It is a simple example, but if you had complex filters you were doing, weird transformations, then this is great because then you can see exactly what you are doing and it is fast because it is lazy.

Brandon: I should have mentioned that explicitly. The nice thing about solving this problem in this way is that our logic for how we transform the data is all in one place and was very straightforward.

Q: This is only for Swift 3?

Brandon: No, this exists in Swift 2, but the names of the protocols are different and I think there is less inferred automatically. I am not 100% sure about that, but this works. I implemented this in Swift 2.Something for our app. Not this code exactly, but if you change the names of types, would work. It is in the slide deck.

Q: Hi, I have a question about how smart the default implementation for the lazy types are around collections, rather than sequences. I think of map as an operation that has a sequentiality, and it seems like collections are defined partly in terms of sequences, if I am remembering how the collection structure works and what you said. Collections also allow random access. Let’s say I have a big collection, a million items, and they are all cheap items and it is an array, but then I have some operation that I want to do that is expensive (e.g. turning an item into some media asset). If I took that collection and then whack on the lazy, and then whack on the map, now I have this thing that was cheap to compute because nothing has actually been done yet. But now let’s say I want to go in and access, e.g. the 500,000th item in my new lazy map collection. Does it need to do the map operation on the 500,000 items leading up to the one I want because there is this sequentiality? Or does it only just pull that? Can I get a cheap, easy cache just by doing lazy map?

Brandon: When you do .lazy, you can do a bunch of eager things, and then as soon as you do .lazy, that says all future operations are now lazy. You can do a bunch of quick stuff, do a .lazy and then do a map. Your map is expensive. Right after you do the lazy, that will not be evaluated immediately, what you are saying will work if I have a big collection, then I go lazy, map, expensive operation.

Q: If I have a big collection, then I go lazy, map, expensive operation (*Brandon: and then index*), I can pull an intermediate item directly from the lazily mapped collection, it will only do the operation on that one?

Brandon: Correct. Isn’t that awesome? It is useful. The question was did I use this at work for photos, and the answer is yes. The story was, I made up parts of the story to get the teaching to happen, but the general idea was implemented.

Next Up: New Features in Realm Obj-C & Swift

General link arrow white

About the content

This content has been published here with the express permission of the author.

Brandon Kase

Brandon Kase is a full-stack software engineer at Highlight, where he is currently working on Shorts. He is excited that strong static typing and functional programming are becoming mainstream, and believes that techniques once reserved for academia will help industry produce more reliable software. Brandon is in general fascinated by anything and everything related to well-designed libraries and programming languages – like Swift!

4 design patterns for a RESTless mobile integration »

close