I've got a bunch of spare hardware and I'm really fascinated in optimising C code and Linux, while also working with distributed systems. To do this, I'm keen to start coding distributed solvers for (NP?) problems, all based on Open Source software.

I've done a few experiments with Condor on Linux - a super simple "problem" of adding up a set of 1 million integers, I've chopped the the data set into blocks of 1000 and scheduled a very simple test program that adds up those numbers and submits the result back to the central Condor server. I would like to build on this with an objective just to further my understanding, again concentrating on optimising (the solver code, Linux) and doing this over many machines - distributed.

Optimising - I'd like to stretch my knowledge of C - making the code more efficient than just a for loop, in particular one thing I'd love to experiment with is CUDA. Doing this with a nice simple problem like adding up integers means I can concentrate on optimisation, but adding integers is a little /too/ simple.

I'm pretty familiar with Linux, but I've not done much in terms of performance tuning, and think this could be just as interesting as improving the solver application.

Distributed - I want to really experiment with HPC, and again, Condor is the only scheduler I've used so far (I'm a bit of an Open Source fanatic).

I have quite a lot of spare hardware (say, 30+ Intel i7 machines), and a perfect solution would be a problem that involves very little network traffic & data transfer (machines spreadout all over the place), absolutely 0 inter-node communication, and workloads that don't need a great deal of disk space. Also, pretty essential, I'd like a problem that would never end (NP?), so I can spend time continually learning in a loop - optimising, twaeking the code, optimising more, trying new hardware, optimising more, tweaking Linux, etc.

So, this is where I ask for your help, because I'm not a machination, or a scientist, so, What category of workloads/problems would best suit learning about C/Linux Optimisation and distributed systems/HPC? Thanks!