Recovering Apple’s Wiki After Storage Failure or How I Learned to Love pg_resetxlog

Posted: February 2nd, 2016 | Author: | Filed under: Mac OS X, Mac OS X Server, Mountain Lion, postgres, Wiki | No Comments »

facepalmRecently I received a panicked phone call from a fellow sysadmin who was in a real jam. He had a customer who was dumping all their knowledge into Apple’s Wiki system running on top of Mountain Lion and Server 2.2.5. The storage system in the mini failed and they had to recover from backup, however the backup was setup using Carbon Copy Cloner and as we all know you cannot rely on a file-based backup system to backup a running postgres database.

After the data was restored the machine did boot but all the postgres services would not start, including the wiki. After reviewing the logs for quite some time I found some entries of pgstat wait timeout and then no log entries for about a day. I assumed that this was our hard drive failure window. Then two days later the log started producing tons of postgres crash statements, launchctl statements and this little nugget Jan 19th 13:29 database system was interrupted This was all the information I needed. From what I can tell, between the time that Carbon Copy Cloner calculated changes and the time that it copied the data some minute things changed within the database and so CCC didn’t get a proper clone. It appears that this error is caused when the database engine no longer knows where to start writing data back into the database. Basically, the counters were broken and had to be reset. Luckily postgres makes a tool called pg_resetxlog

The command has this basic structure:

pg_resetxlog
-x XID set next transaction ID
-m XID set next multitransaction ID
-o OID set next OID
-l TLI,FILE,SEG force minimum WAL starting location for new transaction log
/path/to/database/directory

Now the Apple Wiki postgres data is held within /Library/Server/PostgreSQL\ For\ Server\ Services/Data which is an important detail to hold onto. Within this directory are all the bits of info you’ll need to run the following calculations. You’ll also need this decimal to hex converter.

Source: http://www.postgresql.org/docs/9.0/static/app-pgresetxlog.html

A safe value for the next transaction ID (-x) can be determined by looking for the numerically largest file name in the directory pg_clog under the aforementioned postgres data directory, adding one, and then multiplying by 1048576. Note that the file names are in hexadecimal. It is usually easiest to specify the switch value in hexadecimal too. For example, if 0011 is the largest entry in pg_clog, -x 0x1200000 will work (five trailing zeroes provide the proper multiplier).

A safe value for the next multitransaction ID (-m) can be determined by looking for the numerically largest file name in the directory pg_multixact/offsets under the data directory, adding one, and then multiplying by 65536. As above, the file names are in hexadecimal, so the easiest way to do this is to specify the switch value in hexadecimal and add four zeroes.

A safe value for the next multitransaction offset (-O) can be determined by looking for the numerically largest file name in the directory pg_multixact/members under the data directory, adding one, and then multiplying by 65536. As above, the file names are in hexadecimal, so the easiest way to do this is to specify the switch value in hexadecimal and add four zeroes.

The WAL starting address (-l) should be larger than any WAL segment file name currently existing in the directory pg_xlog under the data directory. These names are also in hexadecimal and have three parts. The first part is the “timeline ID” and should usually be kept the same. Do not choose a value larger than 255 (0xFF) for the third part; instead increment the second part and reset the third part to 0. For example, if 00000001000000320000004A is the largest entry in pg_xlog, -l 0x1,0x32,0x4B will work; but if the largest entry is 000000010000003A000000FF, choose -l 0x1,0x3B,0x0 or more.

Once you have these four values you’re ready to try it out on your database. But before I began I requested a full bootable clone of the server as it was when they restored it, then I took this cloned and placed it into a VM in Fusion and snapped the VM before trying anything. Also, don’t forget that when you want to issue commands to the Apple postgres service you have to use the full path to the commands as well as use the _postgres user. My final command, which recovered the wiki system AND profile manager, looked like this:

sudo -u _postgres /Applications/Server.app/Contents/ServerRoot/usr/bin/pg_resetxlog -f -x 0x100000 -m 0x10000 -o 0x10000 -l 0x1,0x2,0x18 /Library/Server/PostgreSQL\ For\ Server\ Services/Data

Feel free to reach out if you are having issues.


Using Munki’s nopkg to Push User Level Profiles

Posted: July 24th, 2015 | Author: | Filed under: Mac OS X, munki | Tags: | No Comments »

I recently needed to push some user level profiles, CardDAV to be specific. I use Meraki MDM but the custom mobileconfig profile would only install as a device profile. So I turned to my new munki install instead. Check out this post here if you’re not familiar with nopkg http://grahamgilbert.com/blog/2014/07/27/personal-automation-munki-part-2/

 

First make sure you know the unique identifier of your profile, for an example we’ll use com.company.carddav

First create a folder on your munki repo called profiles then copy the profile into this folder.

Use the following bash scripts in your pkginfo files to check, install, and uninstall the profile as you need

<key>installcheck_script</key>
<string>#!/bin/bash
USER=`/usr/bin/who | grep console | cut -d ' ' -f1`
sudo /usr/bin/profiles -P | grep com.company.carddav | grep $USER
if [ $? -eq 0 ]; then
exit 1
else
exit 0
fi


<key>postinstall_script</key>
<string>#!/bin/bash
USER=`/usr/bin/who | grep console | cut -d ' ' -f1`
/usr/bin/curl -L1 http://munki.yourmunkirepo.com/profiles/com.company.carddav.mobileconfig -o /tmp/profile.mobileconfig
sudo -u $USER /usr/bin/profiles -L -I -F /tmp/profile.mobileconfig
exit 0


<key>uninstall_method</key>
<string>uninstall_script</string>
<key>uninstall_script</key>
<string>#!/bin/bash
USER=`/usr/bin/who | grep console | cut -d ' ' -f1`
/usr/bin/curl -L1 http://munki.yourmunkirepo.com/profiles/com.company.carddav.mobileconfig -o /tmp/profile.mobileconfig
sudo -u $USER /usr/bin/profiles -L -R -F /tmp/profile.mobileconfig
exit 0


Open Directory Crashing from Wildcard SSL Certificate

Posted: March 14th, 2015 | Author: | Filed under: LDAP, Mac OS X Server, SSL | No Comments »

I encountered an issue recently where I imported a wildcard certificate into an Open Directory server which was fine however once I tried to select it Open Directory immediately stopped working.


2015-03-14 10:42:07.113 AM com.apple.launchd[1]: (org.openldap.slapd[20606]) Exited with code: 1
2015-03-14 10:42:07.113 AM com.apple.launchd[1]: (org.openldap.slapd) Throttling respawn: Will start in 10 seconds
2015-03-14 10:42:17.150 AM com.apple.launchd[1]: (org.openldap.slapd[20612]) Exited with code: 1
2015-03-14 10:42:17.150 AM com.apple.launchd[1]: (org.openldap.slapd) Throttling respawn: Will start in 10 seconds

To diagnose I turned ldap off by way of launchd

sudo launchctl unload /System/Library/LaunchDaemons/org.openldap.slapd.plist

And then told openldap to launch in debug mode and don’t fork.

sudo /usr/libexec/slapd -d 99 -F /etc/openldap/slapd.d/

To which I received this reply:

TLS: attempting to read `/etc/certificates/server.inside.tld.ca.6C66FD3E997A9FD902DEA9050EE3F9A58EF63742.key.pem'.
TLS: could not use key file `/etc/certificates/server.inside.tld.ca.6C66FD3E997A9FD902DEA9050EE3F9A58EF63742.key.pem'.
TLS: error:0B080074:x509 certificate routines:X509_check_private_key:key values mismatch /SourceCache/OpenSSL098/OpenSSL098-47.2/src/crypto/x509/x509_cmp.c:406
55047382 main: TLS init def ctx failed: -1

That’s strange I thought, so I cracked open /etc/openldap/slapd.d/cn=config.ldif in vim and found that at the bottom of the file the cert and the key did not change over properly.

olcTLSCertificateKeyFile: /etc/certificates/server.inside.tld.ca.6C66FD3E997A9F
D902DEA9050EE3F9A58EF63742.key.pem
olcTLSCertificatePassphrase: "Mac OS X Server certificate management.6C66FD3E9
97A9FD902DEA9050EE3F9A58EF63742"
olcTLSCertificateFile: /etc/certificates/*.inside.tld.ca.8597F1FABB98A20805065
751BA49E3076EF84E60.cert.pem
olcTLSCACertificateFile: /etc/certificates/*.inside.tld.ca.8597F1FABB98A208050
65751BA49E3076EF84E60.chain.pem

Notice how the certkeyfile does not match the cert or chain file? It’s like Server.app b0rk3d on parse the wildcard symbol while modifying this file. The only way I’ve figured out how to get OD back on it’s feet after this disaster is to remove these lines from the cn=config.ldif and rebooting the OD server. Even if I tried hand coding the cert in Open Directory will stop crashing however the secure LDAP service does not come up.

I’ve since switched to an internal CA and making certs for each FQDN which has been a way better experience.


How to Push Watchman Monitoring Windows Agent

Posted: November 10th, 2014 | Author: | Filed under: Uncategorized | No Comments »

Recently, I was granted access to the Windows beta agent. In a word, amazing. Truly, Allen and the guys at watchman have done an amazing job. Now, I have most of my clients enrolled in Meraki Systems Manager and I wanted to be able to push this agent to them without getting in the user’s face. I came up with the following and please keep in mind, I’m NOT a Windows sysadmin.


mkdir C:\temp
bitsadmin.exe /transfer "MSI" http://www.yourdomain.com/path/to/MonitoringClient.msi C:\temp\MonitoringClient.msi
bitsadmin.exe /transfer "regfile" http://www.yourdomain.com/path/to/monitoringclient.reg C:\temp\MonitoringClient.msi C:\temp\monitoringclient.reg
Regedit /s C:\temp\monitoringclient.reg
Msiexec.exe /I C:\temp\MonitoringClient.msi

I take this code and paste it line by line into the “Command Line” feature of Meraki Systems Manager.

For more info on Watchman Monitoring Windows Beta go here.
For Meraki Systems Manager go here.


How to Automate FileMaker Server Fail Over on the Mac

Posted: September 24th, 2014 | Author: | Filed under: Filemaker, Mac OS X, Mac OS X Server | No Comments »

wool-clones-small-94006I have this managed services client, amazing client, easily my best one and my most favourite. Their workflow relies heavily upon their client roster database which is built on top of FileMaker. Recently I was doing their quarterly audit and noting all the single points of failure in the network. What I realized during this process was FileMaker Server running on top of a Mac Mini Server is a pretty big single point of failure. Of course I have FMServer doing regular backups but when confronted with the question: What is my recourse when the host running FM Server dies? The answer was, quickly install FM Server on a different machine, pull last night’s backup out of nearline storage, and put the server back on it’s feet. Sounds not too bad right? Wrong! There’s still another single point of failure.

Me. I’m busy man! I don’t have time to deal with fires every morning, if that’s how I worked I wouldn’t gone mental or quit years ago. I need automated server fail-over and without the ability to virtualize, due to budget not hardware, I was at a loss. Hmmmm what to do…. Maybe I should bash it? Maybe I should bash it so hard that at the end there’s bash script to bash it for me.

After a lot of bashing I have this script, and with the help of cron, every 15 mins it does a TCP connection to port 5003. It will try three times to connect to the port, if the port responds just once the script will then begin an rsync job of the remote database backup folder to local database backup folder and finally it will copy the latest backup into the databases folder. The idea here is to have the latest backup that was created on the primary FileMaker server to be the production database on the backup. Now if the script cannot connect to port 5003 on the primary then it fires the local FileMaker up and sends out an email alert and will continue to do so.

This script requires a few things:

Get the script off of github: https://github.com/syntaxcollector/syncFM

Note: there are a bunch of variables you’ll need to change at the top of the script. Once deployed ensure that the final result is a FMP12 file inside your replica database folder. If not edit the script, at the bottom you’ll find the rsync command, just tweak it for your environment. Post questions to comments.


Open Directory Replication 10.8.5 problems with Kerio Connnect 8.3.0

Posted: June 22nd, 2014 | Author: | Filed under: Kerberos, Kerio, LDAP, Mac OS X, Mac OS X Server, Mountain Lion, Open Directory | Tags: , , , , | No Comments »

kms_bubbleI recently was hired to implement an Open Directory Master/Replica into a network that wanted to leverage Kerio Connect mail server. At first, all seemed fine. I created the directory, the replica, and installed the kerio extension on both servers as was instructed by the fine folks at Kerio. Now I’d just like to say that this is different than what I remember in the days of 10.6. Back then you only had to install the OD extension on the master, the replica would then copy the schema over so that it could import the extended schema data at that time.

The problem comes into play when you have a master with already provisioned users in Kerio and you want to add an OD replica. Since the replica does not copy over the extended LDAP schema it is unable to replicate any provisioned users. The result is that those users will not exist in the replica which is bad news if you have services relying on that replica. To resolve this problem use the following procedure on the replica you wish to build:

sudo slapconfig -createreplica <master IP> diradmin

Once complete install the Kerio extention.

slapconfig -stopldapserver
slapadd -v -F /etc/openldap/slapd.d -c -l /var/db/openldap/openldap-data/backup.ldif
slapconfig -startldapserver

#gowellandinpiece
#replication


The New Customer Challenge

Posted: April 27th, 2014 | Author: | Filed under: Insight, Mac OS X, Mac OS X Server, Work | No Comments »

ticking-time-bombI’m an Apple consultant. I help small businesses who want nothing to do with the decision making aspect of technology. Planning, budgeting, procurement, deployment, support, deprecation, and recycling. Out of all these contexts no task is more challenging than workstations.

For those who are in the field, you know what I’m talking about. You get a new customer, they have workstations… some are new, some are old, some have MacKeeper, the bastard ones are carrying old migrated home folders that originated from 10.4 and a Cisco VPN kext. Some have 16 mail accounts filling 70% of the disk but since they’re “disabled” in Mail.app you don’t see them at first. Now you have to dig to find out where the space is. Do this across 10 – 50 workstations and you will soon realize why I went bald early.

I needed a quick dirty way to get some very specific data out of the machine and into a little text file, yes I’m sure there are some sort of MDM tools or whatever might have you that will track everything that I don’t care about widget, but I don’t want that. It’s about workflow, see if I don’t get an idea of what I’m stepping into before I step into it I may find out something nasty far too late. In other words, I wouldn’t deploy an MDM before getting an idea of what’s going on.

Introducing sysAudit.sh: feel free to download here

usage: sysAudit.sh -c <client name> -s <ftp server> -u <username> [-p <password>]
OPTIONS:
-c unique identifier for audit, a folder of this name will be made on your ftp server
-s ftp server fqdn/path sans protocol ie: mybigfat.ftpserver.com
-u username to connect to ftp server
-p password for username, will prompt if none given
NOTE:
Requires root privileges to successfully deduce all features

Once I begin relations with the new customer I immediately gain admin access to all their machines, after placing the following script somewhere on the web I can then push it out through ARD in a script something like this:

curl -o /tmp/sysAudit.sh http://www.copiousit.com/sysAudit.sh; chmod +x /tmp/sysAudit.sh; /tmp/sysAudit.sh -c clientname -s ftp.server.com -u ftpuser -p "ftpuserpass"

I also have it wrapped in AppleScript so that I can pop it over email to any remote machines. Usually also along with a Meraki MDM as well. Just place this code into Script Editor, then save as an application. Place sysAudit.sh inside the package of the AppleScript app.

## change the switch arguments!
set path_ to (path to me as string)
set p to POSIX path of path_
do shell script "" & p & "/sysAudit.sh -c clientname -s ftp.server.com -u ftpuser -p 'ftpuserpass'" with administrator privileges


Automated Backups of Mac OS X Server 2.2.2

Posted: April 27th, 2014 | Author: | Filed under: DNS, Mac OS X, Mac OS X Server, Mountain Lion, Open Directory | No Comments »

Hi Everybody! dr-nick-riviera

So I’ve been in the Mac game for quite some time now and all along I was always longing for a good automated backup solution. A few years ago myself and a colleague got together and wrote osx-backup.sh. A simple shell script with a few variables inside. Simply edit the shell script and then install as a cronjob to run nightly. Features of this backup script include:

  • Open Directory archiving
  • Service Plists
  • CalDAV/CardDAV database
  • Profile Manager database
  • DNS records
  • Wiki database and binary files
  • Webmail

I’ve been using this script for years now under 10.6, 10.7 and 10.8. The version listed here is for Server 2.2.2 under 10.8.5

Restoration of these backups is fairly simple to do as long as you know some postgres commands. Here’s the article on how to restore the wiki.

Calendar, webmail are fairly similar. DNS restoration is just a matter of placing the files back in /var/named and /etc/named.conf

If you need to restore open directory archive you should use Apple’s latest knowledge base instructions. Just make sure that the server hostname matches the backup.

To restore OS X Server setting plists:

sudo serveradmin settings < /path/to/your-sa_backup-servicename-plist

Get the code here.


How to Rebuild Software RAID 10 in Mountain Lion – Command Line

Posted: February 27th, 2014 | Author: | Filed under: Mac OS X, Mac OS X Server, RAID | No Comments »

Removal

First, replace the disk 😉

Open a terminal session on affected system and run
diskutil appleRAID list
find the UUID for the affected drive and the UUID for theRAID set. Then fill in the blanks below:
sudo diskutil appleRAID remove <drive UUID> <RAID UUID>
For example:
sudo diskutil appleRAID remove EEDD0AD6-C448-48F7-A766-001C65338E99 7010C337-829C-4F08-B6A4-1C8A9E943CBD
Our RAID 10 now only has three disks attached.

Rebuild

First we need to identify the spare disk waiting for us in the system. Use diskutil list to do this. Here’s some example output. See if you can spot the disk that is not like the others.

# diskutil list

/dev/disk0
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *119.9 GB   disk0
   1:                        EFI                         209.7 MB   disk0s1
   2:                  Apple_HFS ServerHD                118.9 GB   disk0s2
   3:                 Apple_Boot Recovery HD             784.2 MB   disk0s3
/dev/disk1
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk1
   1:                        EFI                         209.7 MB   disk1s1
   2:                 Apple_RAID                         2.0 TB     disk1s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk1s3
/dev/disk2
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk2
   1:                        EFI                         209.7 MB   disk2s1
   2:                 Apple_RAID                         2.0 TB     disk2s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk2s3
/dev/disk3
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk3
   1:                        EFI                         209.7 MB   disk3s1
   2:                 Apple_RAID                         2.0 TB     disk3s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk3s3
/dev/disk4
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:                                                   *2.0 TB     disk4
/dev/disk5
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:                  Apple_HFS Storage                *4.0 TB     disk5
thuja:~ pwladmin$

 

We can see in this list that disk4 has no partition maps and thus is the new disk. We can now add this disk into our degraded raid with:
sudo diskutil appleRAID add member <NewMemberDeviceName> <RAID UUID>
for example
sudo diskutil appleRAID add member disk4 7010C337-829C-4F08-B6A4-1C8A9E943CBD
Your disk is now part of the raid set. If you use diskutil appleRAID list you’ll be able to check the progress of the rebuild.

===============================================================================
Name: Thuja RAID A
Unique ID: 920F03EB-DE44-49AA-9934-0EF53EF032D1
Type: Mirror
Status: Online
Size: 2.0 TB (2000054910976 Bytes)
Rebuild: manual
Device Node: -
-------------------------------------------------------------------------------
# DevNode UUID Status Size
-------------------------------------------------------------------------------
0 disk1s2 D4BCB349-3255-473B-B586-EAF066C5BD6D Online 2000054910976
1 disk3s2 E01DB36B-CDC4-458C-AC07-507433DCB481 Online 2000054910976
===============================================================================
===============================================================================
Name: Thuja Stripe RAID
Unique ID: 9D9FEE5F-5F04-4051-A0AB-A985DFFAF2A0
Type: Stripe
Status: Online
Size: 4.0 TB (4000109559808 Bytes)
Rebuild: manual
Device Node: disk5
-------------------------------------------------------------------------------
# DevNode UUID Status Size
-------------------------------------------------------------------------------
0 -none- 920F03EB-DE44-49AA-9934-0EF53EF032D1 Online 2000054779904
1 -none- 7010C337-829C-4F08-B6A4-1C8A9E943CBD Online 2000054779904
===============================================================================
===============================================================================
Name: Thuja RAID B
Unique ID: 7010C337-829C-4F08-B6A4-1C8A9E943CBD
Type: Mirror
Status: Degraded
Size: 2.0 TB (2000054910976 Bytes)
Rebuild: manual
Device Node: -
-------------------------------------------------------------------------------
# DevNode UUID Status Size
-------------------------------------------------------------------------------
0 disk2s2 13AFF0CD-77FB-4E14-9A89-A09C01ACA4C4 Online 2000054910976
1 disk4s2 EAE79161-3729-41FB-81A1-97CE878C1E31 1% (Rebuilding)2000054910976
===============================================================================

 


CrashPlan Proe 3.8.2010.2 on Mountain Lion 10.8.5

Posted: February 22nd, 2014 | Author: | Filed under: CrashPlan, Mac OS X | 1 Comment »

I recently had an issue where I could not load the CrashPlan proe server onto a 10.8.5 Mac Mini. The app installed however when I asked for the management interface on port 4280 I was greeted with a URL redirect and a blank white page. I was also getting this from /Library/Logs/PROserver/proserver.startup.err


[02.22.14 11:45:02.159 INFO main temPropertiesLoader.loadSystemProperties] * loading properties from: conf/proserver.properties
com.code42.exception.DebugRuntimeException: Failed to start CPCentralServices.
at com.backup42.app.cpc.CPCentralServices.init(CPCentralServices.java:297)
at com.backup42.controller.CPCentralController.start(CPCentralController.java:65)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.backup42.main.CPServiceManager.invokeAll(CPServiceManager.java:120)
at com.backup42.main.CPServiceManager.start(CPServiceManager.java:89)
at com.backup42.main.CPServer.start(CPServer.java:123)
at com.backup42.main.CPServer.main(CPServer.java:387)
Caused by: java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:431)
at java.lang.Long.valueOf(Long.java:525)
at com.backup42.server.manage.ServerManager.initializeMyGuid(ServerManager.java:64)
at com.backup42.server.manage.OsXServerManager.initializeGuid(OsXServerManager.java:170)
at com.backup42.server.manage.ServerManagerService.initializeGuid(ServerManagerService.java:568)
at com.backup42.app.cpc.CPCentralServices.init(CPCentralServices.java:159)

The solution was the following:

stop the service launchctl unload /Library/LaunchDaemons/com.crashplan.proserver.plist
edit launchd plist to point to /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Commands/java instead of /usr/bin/java
then rm /Library/CrashPlan/.proserver_identity
start the service and profit!