Yet another blog about software development

My thoughts about {Golang,Java,Cassandra,Docker,any-buzzword-here}

Cache, Cache, Cache…

by Adam Jędro. Categories: cache / programming Tags: cache / tips
Share this post on: Facebook LinkedIn Twitter

Cache, cache, cache… it is used in various form in most applications. Among others, few cache types are:
* opcode cache/accelerators such as APC for PHP
* in-memory cache, very often it’s just hash table
* distributed hash table like Memcached
* CPU and its L1/L2/L3
* browser cache
* and many others…

Starting from ORM that caches query results, through typical CRUD applications that use cache for storing data, to your browser that caches static resources - cache is everywhere.

Recently I finished task which was to solve very trivial performance problem: request A is being executed with high frequency and causes lots of database queries to produce a list of items. Easiest way - use cache for final results and we are done, harder way - reorganize code and adapt it to current requirements. We have chosen first way as ad-hoc solution and that is why I want to show simple way to make caching even better. Assume that request A is querying webservice for all tasks assigned to list of users. Cache-aware implementation looks easy, in Go it would be something like:

func getTasksForUsers(users []User) []Task {

	tasks := make([]Task, len(users))
	missedUsers := make([]User, len(users))

	for _, user := range users {
		cachedTasks := getCachedTasksForUser(cacheKey(user))

		if cachedTasks != nil {
				tasks = append(tasks, cachedTasks...)
		} else {
				missedUsers = append(missedUsers, user)
		}
	}

	for _, missedUser := range missedUsers {
		tasksFromDb := getTasksForUserFromDb(missedUser)

		if tasksFromDb != nil {
				tasks = append(tasks, tasksFromDb...)
				setCacheValue(cacheKey(missedUser), tasksFromDb)
		}
	}

	return tasks
}

Very similar implementation can be spotted in most codebases. There is weird race condition and nasty for(we can do better with bulk methods for fetching tasks) but let’s skip these kind of details.

It works well if all users have assigned at least one task - eventually we will fill cache and get satisfying miss/get ratio. If, for example, most users don’t have at least one assigned task we will end up querying cache and db every time request arises but there is a place for improvement. At work we call it ‘null marker pattern’ and it might be introduced here as well. It enhances above code so after each miss from DB we set ‘null marker’ in cache for every particular key. It indicates that value doesn’t exist in database and there is no need to query for that. Null marker should be deleted on every database INSERT so next request will eventually query DB and fill cache with proper value. This pattern might be especially important when application executes lots of reads from DB but majority of data don’t even exists.

Pseudocode example, a bit more complicated from the previous one, looks like this:

func getTasksForUsers(users []User) []Task {

  tasks := make([]Task, len(users))
  missedUsers := make([]User, len(users))

	for _, user := range users {
		cachedTasks := getCachedTasksForUser(cacheKey(user))

		if cachedTasks == nil {
				missedUsers = append(missedUsers, user)
				continue
		}

		if cachedTasks != NULL_MARKER {
				tasks = append(tasks, cachedTasks...)
		}
	}

	for _, missedUser := range missedUsers {
		tasksFromDb := getTasksForUserFromDb(missedUser)

		if tasksFromDb != nil {
				tasks = append(tasks, tasksFromDb...)
				setCacheValue(cacheKey(missedUser), taskFromDb)
		} else {
				setCacheValue(cacheKey(missedUser), NULL_MARKER)
		}
	}

	return tasks
}

It still has weird race conditions but hopefully you get an idea of what is happening here. Nothing special, no rocket science but someone might find it useful. I didn’t find name of this method on the Internet - so feel free to leave a comment if you know more about that.


Feel free to share this post if you like it. Facebook LinkedIn Twitter
comments powered by Disqus