X10 Version Upgraded
Hi all! Just updated the version of X10 we use from 2.1.0 to 2.2.1 (the latest release currently available) on the research branch. You don't need to update your Neptune gem at all, since these changes are all on the AppScale side of things. Just build a new image with that branch and deploy away!
Version 0.1.3 Released!
Today's release adds support for two new types of computation: the Knowledge Discovery Toolkit and Cicero. KDT lets you write Python code to analyze graphs and is automatically converted to MPI, so you can write code quickly and have it run just as fast. We have support for the newest version available (0.2-preview) and as it requires ATLAS and LAPACK, the newest AppScale build on my research branch (lp:~cgb-cs/appscale/main-cgb-research) automatically installs it. It also installs numpy and scipy (as required by KDT), which have been long requested for Google App Engine apps (although I still have to verify that we don't erase the PYTHONPATH or otherwise prevent them from being used). Here's a sample script that utilizes the Graph500 benchmark from KDT (which assumes you already have the input in S3 in a bucket named 'neptune-testbin'):
puts neptune( :type => "kdt", :output => "/neptune-testbin/mpi-output13.txt", :code => "/neptune-testbin/kdt-test/Graph500.py", :procs_to_use => 16, :nodes_to_use => 16, :storage => "s3", :EC2_ACCESS_KEY => "your access key here", :EC2_SECRET_KEY => "your secret key here", :S3_URL => "https://s3.amazonaws.com" ).inspect
We also add support for Cicero in this release. Cicero is a framework that allows for automatic task execution over Google App Engine and AppScale, so you just write a function in Python, Java, or Go, use oration to make an App Engine app out of it, and then Cicero can execute it for you automatically. You just tell Cicero how many executions you want (so this assumes your app is embarrassingly parallel), use the cicero job type, and you are good to go! A paper with Cicero is in the works, check back for more details! Here's a sample script you can use for Cicero:
response = neptune :type => :cicero,
:nodes_to_use => {"cloud1" => "http://myapp.appspot.com"},
:tasks => 10000,
:function => "dfsp",
:output => "/output/dfsp/"
puts response.inspectIf you use the research branch specified above, the MPI support will now automatically download all the files in whatever directory you want to run code from, whereas before it only downloaded the one file you wanted to execute (making it impossible to include libraries and header files). So get coding and let us know what you think of it!
Version 0.1.2 Released!
After a bit of a delay we have a new version of Neptune out! This time around there aren't any new features, but we have added a unit test suite to get around the horribly long time that the integration tests were taking. We're down from about half an hour now to less than a second, and still at about the same code coverage levels. With that, update your Neptune gem and enjoy!
R Job Reference
Neptune has support for R, and deploys in a non-distributed environment over a single machine.
Go Job Reference
Neptune has support for Go, and deploys in a non-distributed environment over a single machine.
Output Job Reference
To get either the results of a compute job or the code used for a compute job, you use an output job. The results or code are located in the storage backend that you're using. This could be AppDB (whatever database that AppScale is using), or a remote database. Remote databases currently supported are Amazon S3, Google Storage, or Eucalyptus Walrus (as all three are API-compliant).
The parameters that can be used in an input job are:
:type (required) - The type of job to run. For output jobs, this should be set to :output.
Input Job Reference
Before running a compute job, you first have to put some input (usually the code to compute with) in the storage backend that you're using. This could be AppDB (whatever database that AppScale is using), or a remote database. Remote databases currently supported are Amazon S3, Google Storage, or Eucalyptus Walrus (as all three are API-compliant).
The parameters that can be used in an input job are:
:type (required) - The type of job to run. For input jobs, this should be set to :input.
UPC Job Reference
Neptune has support for UPC 2.12.1, and deploys in a distributed environment using the MPI backend.
X10 Job Reference
Neptune has support for X10 2.1.0, and deploys in a distributed environment using the MPI backend.
MPI Job Reference
Let's say you want to run a MPI job, but don't know how. Well here are all the possible parameters that can be used with MPI jobs:
Neptune Picks up Best Paper at ScienceCloud 2011!
Just presented our paper on Neptune at ScienceCloud 2011 in San Jose and picked up the Best Paper award! For those interested, here's the abstract to our paper:
In this paper, we present the design and implementation of Neptune, a domain specific language (DSL) that automates configuration and deployment of existing HPC software via cloud computing platforms. We integrate Neptune into a popular, open-source cloud platform, and extend the platform with support for user-level and automated placement of cloud services and HPC components. Such platform integration of Neptune facilitates hybrid-cloud application execution as well as portability across disparate cloud fabrics. Supporting additional cloud fabrics through a single interface enables high throughput computing (HTC) to be achieved by users who do not necessarily own grid-level resources but do have access to otherwise independent cloud technologies. We evaluate Neptune using different applications that employ a wide range of popular HPC packages for their implementation including MPI, X10, MapReduce, DFSP, and dwSSA. In addition, we show how Neptune can be extended to support other HPC software and application domains, and thus be used as a mechanism for many task computing (MTC).
And here's a link to the paper if you want to give it a read. Enjoy!
Version 0.1.1 Released!
And again we have a release! This time we have support for programs written in Go and R, so get coding! I've included sample scripts in samples/go and samples/r that run the usual "Hello world" programs, and changed the AppScale side of things not to spawn up a machine just to run these programs - most of the time they're small and fast enough not to impact the system's performance. Try it out and let me know what you think!
Also, this is the version that will be in AppScale 1.5 - pending any further releases of course :)
Version 0.1.0 Released!
And another version is out! This time we have a verbose flag - in the past, Neptune jobs would clutter up standard out with everything that was going on. Now, it only does this with the verbose flag. Set it to anything (e.g., :verbose => "blah") and you're good to go. We're packing this version as the version that will be in AppScale 1.5, so new AppScale users will get everything posted to this date and before. Enjoy!
Version 0.0.9 Released!
It's been a little while since our last release, but here it is! Version 0.0.9 adds support for Stochastic State Algorithms via StochKit - just use the "ssa" job type. We'll do a post soon with the particulars of how to lay out your code and the like, so stay tuned!
Version 0.0.8 Released!
As promised, MapReduce support is once again working! Neptune 0.0.8 fixes this support, so when you use an input job to put your data into the underlying datastore, it will also put it into HDFS in case you want to use it for MapReduce later. The test suite includes test cases for regular Hadoop MapReduce via Java WordCount, and for Hadoop MapReduce Streaming via a Ruby implementation of the Embarassingly Parallel NAS Benchmark.
Also, I forgot to mention back in the 0.0.7 release that Walrus support was fixed, so just like for Google Storage, you can run the following:
neptune( :type => output, :storage => "walrus", :EC2_ACCESS_KEY => "your access key", :EC2_SECRET_KEY => "your secret key", :S3_URL => "http://ip of storage box/services/Walrus" )
We also changed it so that for all the S3-like storage backends, you need to specify the URL starting with http, so keep that in mind when deploying jobs.
Also, the test coverage is up to almost 87%, as we now cover many more failure conditions:

So update your Neptune gem and get coding!
Version 0.0.7 Released!
And we have a new version out! Neptune 0.0.7 adds quite a bit of stability compared to previous releases thanks to the use of automated testing via good old fashioned Test::Unit. We also run rcov to automatically see how much code we're covering in our tests and which code in particular we're missing. Right now we're at a little less than 65% coverage - take a look here:

Our fancy new automated testing also revealed a number of tiny bugs to fix (a few around the auto-generation of makefiles) and a major one - when we added input jobs in 0.0.6, we wanted to use it to make job input / output chaining easier, but as a side-effect, it broke MapReduce jobs. These jobs need their input in HDFS when they start, and with all the different storage options we support, we weren't consistently putting the input in HDFS automatically. It's still something we're working out, but it's something we will fix for 0.0.8, so stay tuned for more updates from the world of Neptune!
Automated Testing for Neptune
We're looking into writing some nice automated tests for Neptune for the next release - it's mostly done but we're also messing around with rcov as well to make sure we're covering most of the interesting cases. Hopefully this will make sure we keep Neptune stable across releases, so stay tuned!
Version 0.0.6 Released!
Yet another release is out! This time around we add support for "input" jobs. Previously, whenever we wanted to run a job, we had to copy the input over from our local machine or it had to already be in the underlying datastore. But if you just wanted to place a file in the datastore for later, it wasn't do-able. But now it is! Just run this:
result = neptune( :type => "input", :local => "get_mapreduce_output.rb", :remote => "/neptune-testbin/testscript.rb", :storage => "gstorage", :EC2_ACCESS_KEY => "your access key", :EC2_SECRET_KEY => "your secret key", :S3_URL => "commondatastorage.googleapis.com" ) puts result
From our example above, we indicate where our local copy of the file is (here it's another piece of Neptune code) and where we should store it in the datastore (as these use the S3 naming convention, they should begin with a slash '/'). For the short-term, the bucket should already exist (this matters for Google Storage but not the others). This method call then returns a boolean value corresponding to whether or not the operation succeeded. So upgrade to Neptune 0.0.6 and check back soon for more updates!
Version 0.0.5 Released!
And we have a new release out! Neptune 0.0.5 adds support for alternative storage backends to be used when storing the results of Neptune jobs. Before, we always stored the output of Neptune jobs in the underlying database that AppScale uses (dubbed 'AppDB' in AppScale-speak). Now, you can store the results to Amazon S3, Eucalyptus Walrus, and Google Storage automatically!
Two different ways are available to make use of this support. If you like, you can manually specify your credentials when you run each Neptune job:
output = neptune(
:type => "mpi",
:output => "/neptune-testbin/mpi-output4.txt",
:code => "cpi",
:nodes_to_use => 1,
:storage => "gstorage",
:EC2_ACCESS_KEY => "your access key",
:EC2_SECRET_KEY => "your secret key",
:S3_URL => "commondatastorage.googleapis.com"
)
puts "job started? #{output[:result]}"
puts "message = #{output[:msg]}"Alternatively, you can put your credentials in your environment (ala the Eucalyptus style) and Neptune will automatically pick them up:
output = neptune(
:type => "mpi",
:output => "/neptune-testbin/mpi-output4.txt",
:code => "cpi",
:nodes_to_use => 1,
:storage => "s3"
)
puts "job started? #{output[:result]}"
puts "message = #{output[:msg]}"For the moment, we don't have automated bucket creation when using Google Storage, so if you're using it, make sure to manually create your bucket ahead of time. We'll get it resolved soon!
The latest AppScale branch has the necessary support for Neptune 0.0.5, and when we release AppScale 1.5, it will have this support as well. Let us know if other storage backends would be preferable in your apps!
Version 0.0.4 Released!
A quick update once more! This time around, I fixed the syntax like we talked about back on the 0.0.3 release. Let's walk through the new syntax with an example. Let's suppose we want to compile some Unified Parallel C code and run it over its MPI backend. We begin by compiling the code:
result = neptune (
:type => "compile",
:code => "ring",
:output => "/baz",
:copy_to => "ring-compiled"
)
puts "out = #{result[:out]}"
puts "err = #{result[:err]}"So here I've specified the type of job to run (a compilation job), where my code is located (in a folder named "ring"), and where to copy the compiled code to (a folder named "ring-compiled"). My "ring" folder has a Makefile in it that says:
all: /usr/local/berkeley_upc-2.12.1/upcc --network=mpi -o Ring Ring.c
The latest AppScale branch includes the UPC compiler, and the next release (1.5) will include it as well. So we compile our code and specify that the MPI backend should be used with the Neptune job / Makefile from above, and then can run our code over four nodes as follows:
output = neptune(
:type => "mpi",
:code => "ring-compiled/Ring",
:nodes_to_use => 4,
:procs_to_use => 8,
:output => "/baz/output"
)
puts "job started? #{output[:result]}"
puts "message = #{output[:msg]}"Since our UPC code is compiled to use the MPI backend, we specify MPI as the type of job to run, as well as the location of our compiled code and where the output should be placed. As usual, we also specify how many machines we want to run over, but a new feature in 0.0.4 (when paired with the latest and greatest AppScale) is the ability to specify how many processors are needed. Here we specify 8 processors over 4 nodes, so each machine will get two processors scheduled for computation.
That should give you just enough to get going on Neptune 0.0.4. Happy coding!