Critical pagination

This is a story about Moodle, PHP, Active Directory, OpenLDAP, and how I stared a problem in the face for two days without realizing what I was looking at.

Pagination

I assumed maintenance of the LDAP syncing scripts plugin (‘local_ldap’) in 2016. One thing I did was to add PHPUnit coverge which I wrote about in Writing LDAP unit tests for a Moodle plugin.

I’ve received reports about a possible bug with the plugin, Active Directory, and large numbers of users. After standing up a test Active Directory server (which is a story for another day), I’ve been extending the unit tests for the local_ldap module to take advantage of pagination.

PHP added support for LDAP pagination in PHP 5.4. Moodle added support soon after in Moodle 2.4 (MDL-36119). Beyond some ‘range’ queries on the Active Directory side, there wasn’t anything in the plugin using pagination. The queries quickly revealed problems with the Active Directory code:

1
2
3
4
5
6
7
1) local_ldap_sync_testcase::test_cohort_group_sync
ldap_list(): Partial search results returned: Sizelimit exceeded

/var/www/moodle/htdocs/local/ldap/locallib.php:144
/var/www/moodle/htdocs/local/ldap/locallib.php:626
/var/www/moodle/htdocs/local/ldap/tests/sync_test.php:176
/var/www/moodle/htdocs/lib/phpunit/classes/advanced_testcase.php:80

Adding pagination is fairly straightforward, but you have to take care because not all LDAP implementations support it. Here’s an example cribbed from Moodle’s implementation:

1
2
3
4
5
6
7
8
9
10
11
12
$connection = $this->ldap\_connect(); // Connection and bind.
$ldappagedresults = ldap\_paged\_results\_supported($this->config->ldap\_version, $connection);
$ldapcookie = '';
do {
if ($ldappagedresults) {
ldap\_control\_paged\_result($connection, $this->config->pagesize, true, $ldapcookie);
}
... // Whatever LDAP task you're doing
if ($ldappagedresults) {
ldap\_control\_paged\_result\_response($connection, $ldapresult, $ldapcookie);
}
} while ($ldappagedresults && $ldapcookie !== null && $ldapcookie != '');

If the LDAP server doesn’t support pagination all the extra code is a no-op and you pass through the do…while loop once.

I wound up adding pagination in three places: the code that gets all the groups from the LDAP server, the code that searches for all the distinct attribute values (e.g. all possible values of ‘eduPersonAffiliation’), and the code in the unit test itself which tears down the test container in LDAP. Everything passed in Active Directory, so I made what I figured would be a pro forma push to Travis to test the code against OpenLDAP. I came back from lunch to read an error message I’d never even heard of:

1
ldap_delete(): Delete: Critical extension is unavailable

Critically unavailable

I’ve spent a fair amount of time debugging bizarre LDAP problems but “Critical extension is unavailable” was a new one on me:

1
2
3
4
5
1) local_ldap_sync_testcase::test_cohort_group_sync
ldap_delete(): Delete: Critical extension is unavailable
/home/travis/build/moodle/local/ldap/tests/sync_test.php:405
/home/travis/build/moodle/local/ldap/tests/sync_test.php:67
/home/travis/build/moodle/lib/phpunit/classes/advanced_testcase.php:80

Researching this phrase led me to a discovery: if you’ve run a paginated query against LDAP in PHP, you need to tear down that connection and reconnect to the server afterwards. The code in question was in the PHPUnit code which tore down the environment between tests. It runs two queries; one deletes all the users and groups while the second deletes the organizational units. This was code I’d taken from the Moodle LDAP authentication module (‘auth_ldap’) and extended with pagination when the Active Directory tests failed.

Mocked up, this is what the code did before I modified it:

1
2
3
4
5
6
7
Establish LDAP connection
Get the top-level information about the test container
Get the users and groups from the test container
Delete those users and groups from the test container
Get the organizational units from the test container
Delete the organizational units from the test container
Delete the test container

After adding pagination and connection closures, the code did this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Establish LDAP connection
Get the top-level information about the test container
Do
Get the users and groups from the test container
Delete those users and groups from the test container
Loop
Close LDAP connection
Establish LDAP connection
Do
Get the organizational units from the test container
Delete the organizational units from the test container
Loop
Close LDAP connection
Establish LDAP connection
Delete the test container

And it still didn’t work. I played around with various permutations for hours but didn’t make any progress. It was Friday, I was tired, I went home for a three-day weekend and didn’t think about it at all (maybe a little). When I got back in the office the problem was staring me right in the face.

In the old, unpaginated code, it wasn’t a problem to invoke ‘ldap_delete’ after retrieving results, using the same LDAP connection. With pagination that logic doesn’t work anymore; we need to ensure we have all the results first, recreate the connection, then separately run the deletes. Thus modified, the logic is like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Establish LDAP connection
Get the top-level information about the test container
Do
Get the users and groups from the test container
Loop
Close LDAP connection
Establish LDAP connection
Delete those users and groups from the test container
Do
Get the organizational units from the test container
Loop
Close LDAP connection
Establish LDAP connection
Delete the organizational units from the test container
Delete the test container

The tests pass now. It’s interesting to me that this was never an issue with Active Directory, only with OpenLDAP. It’s a valuable lesson about when to refactor your code and check and your assumptions.