Richard Searle

home

Resource leak when using case classes with Spark UpdateStateByKey

16 May 2015

The Spark documentation describes how Scala case classes can be used to define the data flowing through the system.

There are unfortunately is a leak of the reflection created classes needed for serialization, especially when using updateStateByKey. These experiments illustrate the difficulty.

Update

Failure only occurs when master is local[*].

Given that local is only intended for unit testing, this is not a concern.