This is a story about Moodle, PHP, Active Directory, OpenLDAP, and how I stared a problem in the face for two days without realizing what I was looking at.
Pagination
I assumed maintenance of the LDAP syncing scripts plugin (‘local_ldap’) in 2016. One thing I did was to add PHPUnit coverge which I wrote about in Writing LDAP unit tests for a Moodle plugin.
I’ve received reports about a possible bug with the plugin, Active Directory, and large numbers of users. After standing up a test Active Directory server (which is a story for another day), I’ve been extending the unit tests for the local_ldap module to take advantage of pagination.
PHP added support for LDAP pagination in PHP 5.4. Moodle added support soon after in Moodle 2.4 (MDL-36119). Beyond some ‘range’ queries on the Active Directory side, there wasn’t anything in the plugin using pagination. The queries quickly revealed problems with the Active Directory code:
1 | 1) local_ldap_sync_testcase::test_cohort_group_sync |
Adding pagination is fairly straightforward, but you have to take care because not all LDAP implementations support it. Here’s an example cribbed from Moodle’s implementation:
1 | $connection = $this->ldap\_connect(); // Connection and bind. |
If the LDAP server doesn’t support pagination all the extra code is a no-op and you pass through the do…while loop once.
I wound up adding pagination in three places: the code that gets all the groups from the LDAP server, the code that searches for all the distinct attribute values (e.g. all possible values of ‘eduPersonAffiliation’), and the code in the unit test itself which tears down the test container in LDAP. Everything passed in Active Directory, so I made what I figured would be a pro forma push to Travis to test the code against OpenLDAP. I came back from lunch to read an error message I’d never even heard of:
1 | ldap_delete(): Delete: Critical extension is unavailable |
I’ve spent a fair amount of time debugging bizarre LDAP problems but “Critical extension is unavailable” was a new one on me:
1 | 1) local_ldap_sync_testcase::test_cohort_group_sync |
Researching this phrase led me to a discovery: if you’ve run a paginated query against LDAP in PHP, you need to tear down that connection and reconnect to the server afterwards. The code in question was in the PHPUnit code which tore down the environment between tests. It runs two queries; one deletes all the users and groups while the second deletes the organizational units. This was code I’d taken from the Moodle LDAP authentication module (‘auth_ldap’) and extended with pagination when the Active Directory tests failed.
Mocked up, this is what the code did before I modified it:
1 | Establish LDAP connection |
After adding pagination and connection closures, the code did this:
1 | Establish LDAP connection |
And it still didn’t work. I played around with various permutations for hours but didn’t make any progress. It was Friday, I was tired, I went home for a three-day weekend and didn’t think about it at all (maybe a little). When I got back in the office the problem was staring me right in the face.
In the old, unpaginated code, it wasn’t a problem to invoke ‘ldap_delete’ after retrieving results, using the same LDAP connection. With pagination that logic doesn’t work anymore; we need to ensure we have all the results first, recreate the connection, then separately run the deletes. Thus modified, the logic is like this:
1 | Establish LDAP connection |
The tests pass now. It’s interesting to me that this was never an issue with Active Directory, only with OpenLDAP. It’s a valuable lesson about when to refactor your code and check and your assumptions.