Collective Idea

Collective Idea Logo

Jason Roelofs

Playing with Go: Embarrassingly Parallel Scripts

By Jason Roelofs on December 03, 2012 in concurrency, go, and golang

I recently needed to take a list of domain and find which ones point to a specific IP address. For a small list, say less than 10, manually running dig in the console would work great, but this list had almost 800 domains so I needed a script. As domain lookup is a network request and thus very slow, setting up the domain requests in parallel made sense. I could easily just do this in Ruby, my language du-jour, but I’ve done this type of thread work before and frankly it can be tedious to set up, fragile, and still won’t have access to all of my system’s resources due to the GVL1. I’ve been keeping an eye on Google’s Go for some time now and decided to see how it handled this problem.

I’ve been intrigued by Go since it was originally announced about three years ago. Here was a compiled, fast, light-weight, low level language with many of the features we take for granted these days, such as garbage collection, while also adding on a very sophisticated concurrency model similar to what’s found in Erlang: very lightweight internal processes managed by the runtime. Sounds like a perfect fit for my requirements.

The code I ended up with is here: https://gist.github.com/4170926. For the sake of comparisons I built a sequential version of the script as well as the parallel version and added timings for running both scripts against the full list of domains.

Running these scripts for yourself is a one-liner: go run [script.go]. The input file domains.txt needs to be a newline-delimited list of domains. I’ll go over the more confusing parts of the two scripts to help with understanding what’s really going on here.

Objects?

Go’s object model is very close to C’s: structs with data and methods that operate on said structs. Both scripts only use a small, two-element struct, DomainMap, to keep track of the IP address found for a given domain. I use the short-form to initialization new instances of the DomainMap structure. The order of values maps directly to the order of the defined fields at the top of the scripts.

type DomainMap struct {
  Domain string
  IpMapping string
}

object := DomainMap{domain, ipAddress}

object.Domain == domain
object.IpMapping == ipAddress

Error handling

Go does error handling by returning multiple values from a function, where the second return value is expected to be a value of type error. You can ignore this with the _ variable.

rawIpAddresses, _ := net.LookupIP(domain)

Parallelism

The parallel version of the script has some new concepts that need explaining, particularly goroutines, channels, and channel communication.

A goroutine is a very lightweight process, sort of like a Ruby Fiber. Creating one is simple:

go domainLookup(responseChannel, domain)

Go will grab the function call after the go keyword and execute it in parallel. However, given that we’re no longer in the main process, we can’t just return values from the function. We now need a different way to get the return value. This is where channels come in.

responseChannel := make(chan DomainMap)

As Go is a statically typed language, we need to define the type of channel being created. Channels can only accept data of the same type as the channel. Communication through channels is done with the reverse-stabby operator <-, which should be read as “the data on the right side is flowing to the left side”:

// Write into a channel
returnChannel <- DomainMap{domain, ipAddress}
// Read from the channel
domainMap := <- responseChannel

And that’s all the special syntax. The only real difference between the parallel and sequential scripts is the map-reduce-esque setup to wait for all the goroutines to finish. I didn’t need to worry about thread pooling, system capabilities, or thread safety. Go makes it so easy to write truly parallel code that there’s no excuse not to anymore. I was able to run almost 800 goroutines (one per domain) all throwing out DNS queries and coming back in less than 10 seconds, in a script that doesn’t even look like it’s running in parallel.

Now that Go 1.0 stable is out, it’s a great time to get familiar with this language. I highly recommend checking out the Tour of Go for basic introductions into every major feature of the language, and there’s a ton of documentation on the main website golang.org. For the little bit of time I’ve played with Go now, I see a very bright future for this language.

1 Global VM Lock, more about Ruby’s concurrency here: http://www.engineyard.com/blog/2011/ruby-concurrency-and-you/

By Jason Roelofs on December 03, 2012 in concurrency, go, and golang

18 Comments

  1. Sir

    Sir December 03, 2012

    Can you post the full domains.txt file on your gist as well?

  2. Grogenaut

    Grogenaut December 03, 2012

    Celluloid

  3. Carlos

    Carlos December 03, 2012

    Ignoring errors is not the greatest thing to do…

  4. Jason Roelofs

    Jason Roelofs December 03, 2012 http://collectiveidea.com

    @Sir: I’d rather not as it contains customer data.

    @Carlos: As this is a one-off script, re-running the script is good enough error handling for me. This would definitely be far different if it was a module run inside of a bigger application.

  5. Marcalc

    Marcalc December 03, 2012

    I know you wanted to use GO for this article, but you know you could have considered JRuby for this work, right? 

  6. jnml

    jnml December 03, 2012

    Please use gofmt whenever publishing Go source code.

  7. kikito

    kikito December 03, 2012

    Thanks for your blog post, it was very instructive.

    One question: what does this mean?

        domainMapping = append(domainMapping, on that line.

  8. kikito

    kikito December 03, 2012

    It seems that the code was mingled by a html strip tags, but never mind, I think I figured it out. When you have a LEFT ARROW channel in a param, you are just using whatever that channel returns next as  the param. I assume that this is a blocking call.

  9. Jason Roelofs

    Jason Roelofs December 03, 2012 http://collectiveidea.com

    @kikito: http://golang.org/pkg/builtin/#append

    Append is a built-in function to work on the slice data type, and it always returns the modified slice because this call might resize the one you passed in or a new slice might be allocated depending on the capacity of said slice.

    Also yes [left arrow] is a blocking call.

  10. Job van der Zwan

    Job van der Zwan December 03, 2012

    Another nitpick: it’s not really parallel – it’s concurrent. For now, Go is single-core unless you explicitly tell it to use multiple cores1. As far as I can tell, your code is running single-core. Which actually makes this a nice example of how concurrency can be faster regardless!

    1 http://golang.org/pkg/runtime/#GOMAXPROCS

  11. Phil

    Phil December 03, 2012

    Here’s a list of domains I found https://raw.github.com/tarr11/Webmail-Domains/master/domains.txt

  12. john

    john December 03, 2012

    I tired this under windows 7 and both scripts run the same…I have a Core 2 Duo.  I also: set GOMAXPROCS=2 

    thx!

  13. Nico

    Nico December 04, 2012

    Go’s approach to parallelism reminds me of something…  ah!  Unix and its shells.  It’s very easy to parallelize shell scripts too…  And go channels look remarkably like pipes.  Of course, the shells are kinda sucky and outdated, so yes, Go is better.

  14. Anthony

    Anthony December 04, 2012

    Echoing a previous comment – If you were itching to give Go a try, that’s one thing, but saying that you couldn’t do it in Ruby because of the GVL is fallacious and misleading.
    You could easily have used JRuby and get an industrial-strength Ruby implementation without a GVL.

  15. Jason Roelofs

    Jason Roelofs December 05, 2012 http://collectiveidea.com

    @Anthony: I never said I couldn’t do it in Ruby. What I said was that Ruby’s GVL ensures that you won’t get full use of your system when trying to build concurrent systems. Yes you can switch to JRuby but then you’re not using Ruby, you’re using JRuby, and I wanted to branch out and try something completely outside of the Ruby ecosystem.

    @Job van der Zwan: Right, thanks for pointing that out! Had only glanced at some of that previously, I’ll be sure to remember that setting in the future.

  16. benolee

    benolee December 05, 2012 http://github.com/benolee

    @Jason: In MRI 1.9, threads blocked by IO will run in parallel. You don’t need JRuby to run a bunch of network requests on all your cores. I understand you just wanted to use Go, but please understand that the GVL doesn’t necessarily block ruby threads from running in parallel. 

    See Aaron Patterson’s MagmaRails 2012 talk for a simple demonstration http://www.youtube.com/watch?v=vERwKWqDC0c#t=11m00s

  17. benolee

    benolee December 05, 2012 http://github.com/benolee

    I wrote an example showing Ruby 1.9.3 on a Macbook Pro with a Core i7 resolving 800 random-ish hostnames: https://gist.github.com/ec353d84522531fe2bfa

    As you can see, it takes about 16 seconds, but the point is that the requests run in parallel on MRI with nothing but Thread.new.

    Don’t get me wrong, I think it’s great that you found a simple but practical example to introduce Go’s concurrency primitives, and I appreciate the time you took to write up this blog post. Kudos! I just find that there is a lot of confusion about concurrency when it comes to MRI’s thread implementation, and I think it’s a shame that Rubyists don’t realize they can parallelize IO-bound tasks.

  18. Jason Roelofs

    Jason Roelofs December 05, 2012 http://collectiveidea.com

    @benolee: If anything I shouldn’t have mentioned Ruby’s GVL at all, as that ended up distracting from the point I was trying to make which was to show my playing with concurrency in Go. Doing anything IO bound is of course a very easily parallellizable task for any language, which puts us back in the realm of how hard it is to put together a good example. I never meant to say “Ruby sucks. What does this better?”, but “I’ve done this in Ruby and I want to try another language now!” and talking about my experiments.

Post a Comment

Contact Us

Find us on Google Maps
Collective Idea
44 East 8th Street, Suite 410
Holland, Michigan 49423 USA 42.790334-86.105251

Follow us on the Interwebs

We are currently available for medium and long term projects. Please get in touch if we can be of service.