Richard Searle

home

Performance surprising with Scala regexp parser combinators

21 Aug 2012

The various examples generally have the form def identifier  = """[_\p{L}][_\p{L}\p{Nd}]*""".r This works fine, but the underlying Java Pattern is recompiled on every reference. The behavior came to light during an upgrade from Java 6 and Scala 2.7.7 to Java 7 and Scala 2.9.2, when a ~ 10% performance degradation was noted. Performance analysis indicated an unexpected large number of calls to Pattern.compile. The Java 7 implementation is evidently somewhat slower. Changing the def to val resolves the problem, without impact to the semantics. In this case, the improvement was greater than 30% which  more than compensates for the degradation.