Monday, December 17, 2012

The Architecture of Salvus (or, a bunch of my favorite programs)

Components

  • VPN : tinc, connects all computers at all sites into one unified network address space with secure communication
  • SSL : stunnel
  • Client : CoffeeScript client library that runs in web browser
  • Load balancer : HAproxy
  • Database : Apache Cassandra-- distributed, NoSQL, fault tolerant; this is the *only* longterm non-stateless part of the system
  • Compute : VM's running some TCP servers (e.g., python2, sage, console, projects, python3, R, etc.)
  • Hub : written in Node; Sock.js server; connects with *everything* -- compute servers, Cassandra DB, other hubs, and clients.
  • HTTP server : nginx
  • Admin : python program that uses paramiko to start/stop everything, configure VM's, etc.
  • Cloud : (mostly) KVM virtual machines in various places, plus public clouds...

ASCII Art Diagram

   Client    Client    Client   Client  ...
     /|\
      |
   https://salv.us (stunnel, sock.js)
      |
      |
     \|/
 HAProxy load balancers ........                      Admin  
 /|\       /|\      /|\      /|\
  |         |        |        |
  |http1.1  |        |        |
  |         |        |        |
 \|/       \|/      \|/      \|/
 Hub<----> Hub<---->Hub<---> Hub  <----------->   Cassandra <-->  ...
           /|\      /|\      /|\
            |        |        |
   ---------|        |        | (tcp)
   |                 |        |
   |                 |        |
  \|/               \|/       \|/
 Compute          Compute  Compute   Compute ...

Monday, December 10, 2012

What is the relationship between the IPython notebook and Salvus?

I'm at a conference at Brown University this week with Fernando Perez (who started IPython), so I'm writing a short post about some of the relationships between IPython and Salvus. I'll hopefully post again later this week once I've had more time to talk with Fernando.  We are both giving talks about notebooks on Wednesday.
  • The IPython Notebook is currently mainly aimed at single users running everything on their own computer, but Salvus is aimed in the exact opposite direction -- at tens of thousands of simultaneous users on a large cluster.
  • There is currently no code overlap between the two projects, except that they depend on some of the same third party libraries (e.g., mathjax, codemirror, etc.). 
  • The IPython notebook's backend is implemented using Tornado, but I implemented Salvus's using Node.js (for dynamic content) and Nginx (for static content).
  • When I last checked, the IPython frontend is implemented in straight Javascript, whereas I'm implementing Salvus's frontend in CoffeeScript.  I had actually implemented the dynamic backend of Salvus using Tornado, but completely rewrote it because I needed better asynchronous database support and wanted to share more code between the client and server.
  • IPython is BSD licensed open source code available right now and something anybody can very easily install on their own computer. In contrast, Salvus isn't available yet, and will (for now) only be available as a web app, since I designed it from the ground up to run on a distributed clusters.
  • Ipython, Salvus, and the Sage notebook all have a similar feel, since they are were inspired by Mathematica's Notebook interface. I personally never used Mathematica notebooks, but a student of mine, Alex Clemesha, who did much early work on the Sage Notebook, was a big Mathematica user.
  • At some point, I hope to use the infrastructure (hardware, virtual machine management, etc.) I've developed for Salvus to also make the IPython notebook available as part of Salvus.
  • I'm designing Salvus mainly for using Sage, whereas IPython targets running pure Python and numpy/scipy code.
  • With Salvus, I'm putting more work into worrying about compatibility with Android and iOS devices, since I use my iPad 3 and Nexus 4 a lot. E.g., for tab completions in Salvus, I rewrote the CodeMirror completions listing so it is usable on my phone (many standard things are almost usable.
  • A huge amount of the work that went into the Sage Notebook over the years was to support embedded 2d and 3d graphics, interactive widgets and controls (the @interact decorator), debugger, integrated Cython support, mathematics typesetting (mathjax), etc. For Salvus, I consider having all this functionality a very high priority, though it is a daunting amount of work to implement. I don't know if interactive widgets (say) are a high priority for IPython, or even on their roadmap, but I'm sure I'll find out this week.   Incidentally, the analogue of interact for Salvus will be much better than what is in the Sage Notebook, since I've had tons of students and people ask me about fundamental shortcomings of interact over the years, and I'm addressing them from the ground up in the new implementation. 
  • Salvus and Sage's underlying worksheet format is arbitrary HTML, whereas IPython's is Markdown. This has pros and cons.

Thursday, December 6, 2012

What is Salvus?

I started the Salvus project as a successor to the free public Sage Notebook server, which has over 100,000 accounts, despite frequently failing under load. The Salvus webserver software is a complete rewrite from scratch of the notebook server, with a rethought design, running on hardware at University of Washington and other providers (including Servedby.net). The primary goal is to make Sage and other sophisticated free open source mathematical software available to a large number of simultaneous users.

I will soon introduce Pro accounts that provide users with dedicated compute resources for commercial and research level computations and courses, at a level far above what I can possibly provide for free. The revenue from the Pro accounts will go toward paying for hardware hosting and support improving Salvus and Sage itself. If enough users sign up for Pro accounts, the resulting revenue will enable me to push Sage development far beyond what I've been able to do using government grants and volunteer work.

What about source code? Unlike the Sage notebook, Salvus is a large distributed web application, not a program designed to run on a single user's computer. The commercialization center at University of Washington imposed a condition that not all of the source code of Salvus is initially open. However, I anticipate that much open source code will eventually come out of the project.

William Stein, University of Washington, wstein@uw.edu