SOMA Crystal
SOMA News 26 Oct 2003

A Superfast Solver

It's a command line program

Watch this Newsletter, as BCSolve is currently being updated with even more features, and the Article will expand with a description of the program.
Obs:- New information at the end of the text.

Version: 0.82 Release. - - 1. Nov. 2003 The Program, and some data files
This program is released to the public domain. You are free to distribute it, as long as you distribute this license with it. You may use it for any non-commercial purpose. You may not modify it or reverse engineer it. Any damage that it causes to your computer or anything else is your responsibility.

Copyright 2003
Bruce Cropley

Bruce & Rubic's cube
From Melbourne in Australia, Bruce Cropley has been making a breathtakingly fast puzzle solver. This amazing program is currently supporting:

SOMA, Double SOMA, SOMA+Plus, and the Bedlam cube.

Although it is a DOS like, command line program, this BCSOLVE will produce all solutions to a given puzzle in extremely short time.

The BCSOLVE manual:

Discrete packing puzzle solver.
Version 0.82, 1 November 2003
Author: Bruce Cropley (cropleyb at
Usage: BCSolve082 [-dQS] [-s piece_set] [-F sols] [-ffigurefile] [-m model]

   -d: Show startup debugging (initial # moves per piece)
   -Q: Don't show solutions found. (for testing)
   -S: Show solution statistics
   -s: [SEDB]: Use which piece set
       SOMA, Extended SOMA, Double SOMA, Bedlam
       Usually defaults using the number of cubelets in the figure.
   -F: sols: Calculate a given number of solutions. Default is all.
   -f: figurefile: Solve a figure described in the figure file.
       Default is a cube. (Soma and Bedlam)
   -m: Model to solve within the file.
       Defaults to the first in the file.
       The solver just searches for the first line starting
       with / that contains the given model name.

Example:   BCSolve082 -ss -fcubit3.fig >cubit3.res
Included in the zip file are a few sample figure template files; these are example inputs to the -f argument.

It uses the number of cubelets in the figure to determine which piece set to use:

Anything other number and it will currently refuse to solve it. I plan to eventually offer support for any combination of pieces, as apparently does ISHINO Keiichiro's solver.

Note that BCSolve only copes with figures where the entire piece set is used.

Here's a win32 executable for the BCSolve for you. It was originally built with g++ under Linux, was ported first to Mac OS 10.2 and is currently also built with g++ for windows machines.

About Solvers
These are the 5 Soma solvers that we currently know about:

1. Bundgaard/McFarren's QBASIC program: Courtney made the algorithms, Thorleif the graphics and user interface. This is possibly the slowest solver on the list (1.5 seconds per figure on the average), but it's QBASIC? and the figures are easy to show in the web.

2. Ross Canning's Windows solver: Slow (1 sec./fig.), but very easy to use, and small in size to boot. Have the most amazing and easy graphical user interface.

3. Bob Nungester's Windows solver: Faster (.5 sec/fig.). NOTE: all of the above are timed for a SINGLE solution.

4. Sivy Farhi's DOS solver: Very fast. It can solve multiple solutions in a second.

5. Bruce Cropley's BCSolve: Extremely fast, written in GNU C++ using binary bit flipping to achieve a blazing speed. Solving multiple solutions in less than 1 second.

About BCSolve

Bruce writes (interspersed with snippets of conversation with Thorleif and Courtney):
I wrote BCSolve starting in late 2000. When I was a teenager my family played with a Soma puzzle, which I still have. I bought a Bedlam puzzle and was unable to solve it after 3 weeks of concerted effort. Being a professional software engineer, I had the idea of writing a program to put it back into the original shape. I realised that the problem was computationally very demanding and spent a couple of weeks writing an optimised by inflexible and hacky solver to just solve the Bedlam cube problem. I got it to find all 19186 solutions in a time of around 16 hours, and then forgot about it for a few years.

In August 2003, I started working on the solver again. I had the idea of using it as a way of practicing what I preach with regards to good software implementation. That is, writing test code first, refactoring, doing regular version control checkins, and other such things. For more information about my inspiration for quality software, have a look at and I was also keen to write something really FAST because I was (and still am) considering consulting work speeding up software.

BCSolve doesn't have any graphics yet. It is written in Object-Oriented C++ and writes out a textual representation of all the solutions it finds.

It finds 11520 SOMA cube solutions (including all rotations and reflections) and writes them to a file (~850K) in 0.56s on my 2.4GHz Pentium 4 running Windows XP with 256MB RAM. I am currently working on trimming the solutions so that each unique solution type is found only once. This is proving very tricky. *grin*

It finds solutions to the "Bedlam" cube puzzle at a rate of around 16.5 solutions per second on the same machine. In the Bedlam puzzle, there are 13 pieces - 12x5cubes and 1x4cubes. Each piece is different and there are 19186 unique solutions.

So how does it work?

Fast machine operations for exploring the search space

In C/C++ you can do a bitwise AND and bitwise OR something like this:

int a=0x06;
int b=0x14;

int a_or_b = a|b; // == 0x16
int a_and_b = a&b; // == 0x04

These bitwise operations will usually convert directly to a single instruction in machine code. I didn't actually need to use any assembly code. C++, being a compiled language, converted my bitwise operations into fast machine instructions.

Now if you represent each piece placement as a 32 bit or 64 bit integer (1 bit per cubelet), and you also represent the current occupancy of the entire cube as an integer of the same size, you can use very fast low level machine instructions to place and unplace pieces from the current figure state, and to check the validity of a given piece placement.

Bits and arbitrary figures

Let's assume we want to solve any arbitrary figure, and assume we allocate 1 bit for each small cube. We still have to formulate connection rules, telling which bits are connected, and thus which are available for positioning by each puzzle piece.

Initially I was only solving a Bedlam cube, so the algorithm that maps each cube into a bit of the integer only had to handle that case.
At that stage, I represented the 4*4*4 cube as 64 bits (in GNU C this is an "unsigned long long" native type) and converted each cubelet like so:

Cubelet(x,y,z) -> 1 << (x + 4*y + 16*z) for x,y,z in [0,3]

PiecePosition BITWISE-AND Structure NOT-EQUALS 0
THEN we can't put the piece in that position.

Note that only one of the cubelets has to collide to cause a problem.

However it doesn't really matter what the mapping is between the cubes in the figure and the bits.
The solver only really has to do conversions between 3d and bits when mapping each possible piece position to a (bit string == integer) figure occupancy initially, and the reverse when it finds solutions.
All the collision detection, placement and unplacement code just operates on the (integer) occupancies.

The way I've coded it is that missing cubes in the figure don't get allocated bits.
Anything outside the figure is just not considered during the search, only when we're initially working out all the possible positions for each piece.

Algorithmic Optimisations

Each time you place a piece, what can you then tell about what is possible?

Each piece can be placed in many positions.
But once we have placed a piece, that reduces the set of options for each remaining piece. We want to minimise the number of branches that we take at each node of the search tree. If we cut down the options remaining at each level, there is less to do at the later pieces if we get that far.

Now another thing we want to do is minimise the wasted time on searching when it is possible to tell that there are no solutions from a given position. The sooner we can tell that we are on a dead end branch the better.

Parity Optimisation

BCSolve doesn't use the parity optimisation yet, that could yield substantial improvements for some piece sets such as Double SOMA. I'm not sure how much it would help with the Bedlam cube as there are two +3/-3 pieces, 12 +1/-1 pieces and 1 0 piece.
I guess that you can only tell that there can be no solutions if you get further away from the desired parity (Bedlam: 0) than it is possible to return to with the remaining pieces.

Double SOMA optimisation

Whenever you have two or more identical pieces, they could be swapped in a given solution. We probably only want one such solution. One way of ensuring that you only get one of each such solution is to examine an ordering of the piece occupancies before placement.

For example, if the positions are allocated a number (eg board occupancy bit string), and we have 2 identical piece:

Piece 1 positions
3, 5, 6

Piece 2 positions
3, 5, 6

Then we can say we only want the solutions where piece 1 is numerically below piece 2.

ie we don't want any solution such as: piece 1: 5, piece 2: 3

You can generalise this to N identical pieces.

In the code, when you're going through the piece 2 positions, you would just say:

if piece2.position <= piece1.position move on to next piece2 position.

This should reduce the number of positions that you have to recurse into for each second piece by a factor of 2.

27 = 128, so it should speed it up by about that.

All Cubelets Coverable Optimisation

At each level of the search, BITWISE-OR up all the remaining positions of all the remaining pieces. BITWISE-OR this with the current figure state. If there are any bits missing from the figure solution state, then there is no way that we can possibly solve the figure from here.

All Pieces Placeable Optimisation

This optimisation of the BCSolve algorithm relies upon knowing that all the pieces given are in the solution. If any of the pieces cannot be placed anywhere at any level of the search, the figure is defined to be not solvable from there, and that branch is abandoned.

Piece Must Cover Optimisation

After we've chosen a piece to place but before we start iterating through all it's possible positions:

I've actually applied this to all the remaining pieces at each level of the search, as that should reduce their moves for all subsequent levels of the search.

Display characters

I have a function which sets up a set of SOMA pieces. I pass it a character which is the first piece letter. It then uses ASCII addition to calculate each successive piece character.

So - how does your program label the pieces:
In SOMA (1234567 VLTZABP)

In SOMAplus (V,L,T,Z,A,B,P,D,I,Q,S) 0,1,2,3,4,5,6,7,8,9,$

In DoubleSOMA 1,2,3,4,5,6,7,V,L,T,Z,A,B,P
First set: Second set:
a,b,c,d,e,f,g ; A,B,C,D,E,F,G

Nov 2004:- New information.
I haven't been spending any time on my solver over the last year. In that time, we've moved house again, had another child and been flat out at work as usual.

In January I made another optimisation to the solver that sped it up by about 35% on my Mac. I haven't tried it on my PC. I might write it up on my weblog. The change was to the core function that partitions the possible moves into possible and invalid moves.

In case you're interested, the optimisation went something like this:
   Old code:
   for each possible move:
       check if it is possible after making a new move
       if (possible)
           put it in array of new possible moves
           put it in array of new impossible moves

   copy new possible moves to start of array
   copy new impossible moves to end of array

   New code:
   for each possible move:
       check if it is possible after making a new move
       if (possible)
           put it in back in same array at the start
           put it in array of new impossible moves

   copy new impossible moves to end of array

The reason that this speeds it up so much is that it uses less cache RAM.
By not using one of the temporary arrays, we hit the same slot again soon after the previous contents are moved out.

It is amazing how much faster L1 and L2 cache is than main memory - it is comparable to the difference between main memory and disk.

Remember to try Bruce's WebLog here.

Submitted by Bruce Cropley <cropleyb at>
Edited by Thorleif Bundgaard <>

BACK to news index