Chunliang Lyu

I am a developer and researcher from China. Love to make fun and useful stuff. Co-founded Hyperlink.

SORM2: Digging into Scala Internals

Published: 2015-10-09

While developing one of my side projects, I am looking to a good database and ORM library. In the Python world, I have a pretty happy experience about SQLAlchemy, which is pretty mature and feature-rich. Things seems different in the Scala world. While you can of cause use the Java libraries, they are cumbersome and sometimes not Scala friendly. After some survey, I found that the SORM framework seems pretty promising. It has very elegant API. I take this as a opportunity to learn about Scala internals, like reflections, compilation and annotation. Digging into the source and making customizations has turned out to be a pretty fun experience.

The original SORM is open source at here. My own fork can be found here. which contains pretty opinionated customization. My fork is published on my personal maven repository https://repo.chunlianglyu.com, so if anyone is interested, you can checkout by adding dependency to your maven or sbt project.

Introduction

The code on the website makes it very clear:

// Declare a model:
case class Artist( name: String, genres: Set[Genre] )
case class Genre( name: String ) 

// Initialize SORM, automatically generating schema:
import sorm._
object Db extends Instance(
  entities = Set( Entity[Artist](), Entity[Genre]() ),
  url = "jdbc:h2:mem:test"
)

// Store values in the db:
val metal = Db.save( Genre("Metal") )
val rock = Db.save( Genre("Rock") )
Db.save( Artist("Metallica", Set(metal, rock) ) )
Db.save( Artist("Dire Straits", Set(rock) ) )

// Retrieve values from the db:
// Option[Artist with Persisted]:
val metallica = Db.query[Artist].whereEqual("name", "Metallica").fetchOne() 
// Stream[Artist with Persisted]:
val rockArtists = Db.query[Artist].whereEqual("genres.item.name", "Rock").fetch()

As you can see, all you need to do is define the model Genre using a case class. Register the model with Entity[Genre](), and then you are done. No need to define tables, specify columns, or write any SQL. The framework can also handle foreign keys and join transparently, with a succinct and clean interface. Defining a relationship is as simple as defining a field holding reference to another class. Joining the two tables can be done like Db.query[Artist].whereEqual("genres.item.name", "Rock").fetch().

So where does the magic happens?

Decomposition of SORM

This section analyze the core components of the SORM library.

Scala Reflection

The Scala reflection is the basis of this framework. The reflection package is released in a separate library scala-reflect. When we call Entity[Genre](), the Genre class is passed as Type to the sorm.reflection.Reflection class. An Reflection object wrapper various reflections calls, such as propertyValues to get a map of instance property names and values.

scala> 
import sorm.reflection.Reflection
case class Genre(name: String)
val reflection = Reflection[Genre]

scala> reflection.name
res0: String = Genre

scala> reflection.primaryConstructorArguments
res1: List[(String, sorm.reflection.Reflection)] = List((name,String))

scala> reflection.properties
res2: scala.collection.immutable.Map[String,sorm.reflection.Reflection] = Map(name -> String)

By using the reflected information, we can construct the database schema. The class sorm.mappings.EntityMapping is responsible for mapping relection to database table schema. Take the table name as example, for a master table, a trait MasterTableMapping defines tableName lazy val tableName = ddlName(reflection.name). The function ddlName is used to convert CamelCase to underscore_case.

Runtime compilation

It is not enough for the SORM magic, another core component is runtime wrapper class generation and compilation. The scala compilation is included in a separate library scala-compile. When you call the function Entity[T](), actually a new Scale class is generated. Basically you are calling the compiler in the runtime.

In the file src/main/scala/sorm/persisted/PersistedClass.scala, you can find the function createClass(r: Reflection)

  private[persisted] def createClass
    [ T ]
    ( r : Reflection )
    : Class[T with Persisted]
    = {
      val mirror = runtimeMirror(Thread.currentThread().getContextClassLoader)
      val toolbox = mirror.mkToolBox()

      toolbox.eval(
        toolbox.parse(
          generateCode(r, generateName())
            .tap{ c => logger.trace("Generating class:\n" + c) }
        )
      ) .asInstanceOf[Class[T with Persisted]]
    }

The return value is a new class Class[T with Persisted]. The Persisted trait is defined as

trait Persisted {
  def id : Long
  /**
   * Decompose an object of type T with Persisted into an id and and object of
   * type T.
   *
   * @tparam T The type of object Persisted was mixed into
   * @return A tuple of id and the object object it was mixed into
   */
  def mixoutPersisted[ T ] : ( Long, T )
}

For the example class case class Genre( name: String ), this will generate the following code for compilation (with the test class Genre defined in the sorm.test.ReflectionTest package).

class PersistedAnonymous1
  ( val id : Long,
    name : String )
  extends sorm.test.ReflectionTest.Genre( name )
  with sorm.Persisted
  {
    type T = sorm.test.ReflectionTest.Genre
    override def mixoutPersisted[ T ]
      = ( id, new sorm.test.issues.TempTest.Genre(name).asInstanceOf[T] )
    override def copy
      ( name : String = name )
      = new PersistedAnonymous1( id, name )
    override def productElement ( n : Int ) : Any
      = n match {
          case 0 => id
          case 1 => name
          case _ => throw new IndexOutOfBoundsException(n.toString)
        }
    override def productArity = 2
    override def equals ( other : Any )
      = other match {
          case other : sorm.Persisted =>
            id == other.id && super.equals(other)
          case _ =>
            false
        }
  }
classOf[PersistedAnonymous1]

As you can see, the new class add the id property, besides name defined in the original Genre class. What are the methods copy, productElement and productArity? Remember that the Genre is a case class, All the case classes in Scala extends the Product trait, which defines the productElement and productArity abstract methods.

Speedup

If you trace the time, loading the toolbox and compiling the generated source requires at least 2 seconds. If we have lots of classes, the time increases linearly. This is the reason that it is very slow to startup. Another reason is the scala-compile library is pretty large (14.6MB for Scala 2.11.7). For me, this is unaffordable.

I do want to keep the transparency of the primary id column. So I have refactor the code to inherit from the Persistable class.

trait Persistable {
  var id: Option[Long] = None
}

Now if you want to declare a model, you need to inherit from the Persistable class

case class Artist( name: String, genres: Set[Genre] ) extends Persistable

Seems pretty neat, why didn't the original author do it this way? That is because we will have new problems.

equals

The original generated class has defined the equals method that considers the id property. After switching to the inheritance-based method, we need to add the equals method to the Persistable class. Besides the comparison of the id properties, I am also interested in the eqality of other properties, so that Genre("Rock") == Genre("Rock") always holds.

trait Persistable {
  var _id: Option[Long] = None
  
  // case classes
  def productArity: Int
  def productElement(i: Int): Any

  override def equals(other: Any): Boolean = other match {
    case other: Node => _id match {
      case Some(_) =>
        if (_id != other._id) return false
        0 to productArity-1 forall { i =>
          try {
            this.productElement(i) == other.productElement(i)
          } catch {
            case _: Exception => false
          }
        }
      case None =>
        0 to productArity-1 forall { i =>
          try {
            this.productElement(i) == other.productElement(i)
          } catch {
            case _: Exception => false
          }
        }
    }
    case _ => false
  }
}

unique annotation

Defining Entity[Genre](unique=Set("name")) is nice, but I want to go further. I think the decoration on the class would be more intuitive. So here is the definition of the unique annotation.

class unique(fields: String*) extends StaticAnnotation {}

@unique
case class Genre(name: String)

When we generate an Entity instance, we check the annotations.

def entity(t: Type): Entity = {
  val unique: Set[Seq[String]] = t.typeSymbol.asClass.annotations.find(a => a.tree.tpe == typeOf[unique]) match {
    case Some(anno) =>
      Set(anno.tree.children.tail.map(v => v.asInstanceOf[Literal].value.value.asInstanceOf[String]).toSeq)
    case None => Set()
  }
  Entity(Reflection(t), indexed = Set(), unique = unique)
}

copy

Besides equals, we also lose the copy method. The generated wrapper class has a copy(id, name) method, but now with inheritance, we only have a copy(name) method, and need to copy the id manually. The copy method is extremely important since case class are immutable, if we want to change the name of a Genre, it is better to call genre.copy(name="Blue"). For this, I haven't found a better solution, just remember to copy the id field after every copy call.

Constructor

Previously when we load an instance from db, we initialise an instance of the wrapper class. Now, we need to keep track of the original class.

Other customizations

Besides the above mentioned modification, my fork also have the following modifications here and there.

ignore certain fields

implicit val and lazy val will not be persisted to db.

handle JSON Data

I have added the support for JValue from json4s, which will be converted to String when saving and convert back to JValue when loading from database.

minor stuff

  • Use Java 8 time package instead of joda-time, thus has minimal requirement of Java 8.
  • remove dependencies on joda-convert and guava, the only requirement for guava is to convert CamelCase names to underscore_case.
  • id is renamed to _id
  • use sbt instead of maven
  • use slf4j instead of scala-logging

References

  • All the code refers to sorm 0.3.19.