Recovering Apple’s Wiki After Storage Failure or How I Learned to Love pg_resetxlog

Posted: February 2nd, 2016 | Author: | Filed under: Mac OS X, Mac OS X Server, Mountain Lion, postgres, Wiki | No Comments »

facepalmRecently I received a panicked phone call from a fellow sysadmin who was in a real jam. He had a customer who was dumping all their knowledge into Apple’s Wiki system running on top of Mountain Lion and Server 2.2.5. The storage system in the mini failed and they had to recover from backup, however the backup was setup using Carbon Copy Cloner and as we all know you cannot rely on a file-based backup system to backup a running postgres database.

After the data was restored the machine did boot but all the postgres services would not start, including the wiki. After reviewing the logs for quite some time I found some entries of pgstat wait timeout and then no log entries for about a day. I assumed that this was our hard drive failure window. Then two days later the log started producing tons of postgres crash statements, launchctl statements and this little nugget Jan 19th 13:29 database system was interrupted This was all the information I needed. From what I can tell, between the time that Carbon Copy Cloner calculated changes and the time that it copied the data some minute things changed within the database and so CCC didn’t get a proper clone. It appears that this error is caused when the database engine no longer knows where to start writing data back into the database. Basically, the counters were broken and had to be reset. Luckily postgres makes a tool called pg_resetxlog

The command has this basic structure:

pg_resetxlog
-x XID set next transaction ID
-m XID set next multitransaction ID
-o OID set next OID
-l TLI,FILE,SEG force minimum WAL starting location for new transaction log
/path/to/database/directory

Now the Apple Wiki postgres data is held within /Library/Server/PostgreSQL\ For\ Server\ Services/Data which is an important detail to hold onto. Within this directory are all the bits of info you’ll need to run the following calculations. You’ll also need this decimal to hex converter.

Source: http://www.postgresql.org/docs/9.0/static/app-pgresetxlog.html

A safe value for the next transaction ID (-x) can be determined by looking for the numerically largest file name in the directory pg_clog under the aforementioned postgres data directory, adding one, and then multiplying by 1048576. Note that the file names are in hexadecimal. It is usually easiest to specify the switch value in hexadecimal too. For example, if 0011 is the largest entry in pg_clog, -x 0x1200000 will work (five trailing zeroes provide the proper multiplier).

A safe value for the next multitransaction ID (-m) can be determined by looking for the numerically largest file name in the directory pg_multixact/offsets under the data directory, adding one, and then multiplying by 65536. As above, the file names are in hexadecimal, so the easiest way to do this is to specify the switch value in hexadecimal and add four zeroes.

A safe value for the next multitransaction offset (-O) can be determined by looking for the numerically largest file name in the directory pg_multixact/members under the data directory, adding one, and then multiplying by 65536. As above, the file names are in hexadecimal, so the easiest way to do this is to specify the switch value in hexadecimal and add four zeroes.

The WAL starting address (-l) should be larger than any WAL segment file name currently existing in the directory pg_xlog under the data directory. These names are also in hexadecimal and have three parts. The first part is the “timeline ID” and should usually be kept the same. Do not choose a value larger than 255 (0xFF) for the third part; instead increment the second part and reset the third part to 0. For example, if 00000001000000320000004A is the largest entry in pg_xlog, -l 0x1,0x32,0x4B will work; but if the largest entry is 000000010000003A000000FF, choose -l 0x1,0x3B,0x0 or more.

Once you have these four values you’re ready to try it out on your database. But before I began I requested a full bootable clone of the server as it was when they restored it, then I took this cloned and placed it into a VM in Fusion and snapped the VM before trying anything. Also, don’t forget that when you want to issue commands to the Apple postgres service you have to use the full path to the commands as well as use the _postgres user. My final command, which recovered the wiki system AND profile manager, looked like this:

sudo -u _postgres /Applications/Server.app/Contents/ServerRoot/usr/bin/pg_resetxlog -f -x 0x100000 -m 0x10000 -o 0x10000 -l 0x1,0x2,0x18 /Library/Server/PostgreSQL\ For\ Server\ Services/Data

Feel free to reach out if you are having issues.