Creating new operators
The first thing, which has to be done, is to come up with the desired operators. Note that five operators are needed. There is one operator for each strategy. A strategy of an index is really like a plugin, which allows you to put in your own code.
Before getting started, I have compiled some test data:
CREATE TABLE t_sva (sva text);
INSERT INTO t_sva VALUES ('1118090878');
INSERT INTO t_sva VALUES ('2345010477');
Now that the test data is there, it is time to create an operator. For that purpose, PostgreSQL offers the CREATE OPERATOR command:
test=# \h CREATE OPERATOR
Command: CREATE OPERATOR
Description: define a new operator
Syntax:
CREATE OPERATOR name (
PROCEDURE = function_name
[, LEFTARG = left_type ] [, RIGHTARG = right_type ]
[, COMMUTATOR = com_op ] [, NEGATOR = neg_op ]
[, RESTRICT = res_proc ] [, JOIN = join_proc ]
[, HASHES ] [, MERGES ]
)
Basically, the concept is as follows: an operator calls a function, which gets one or two parameters, one for the left argument and one for the right argument of the operator.
As you can see, an operator is nothing more than a function call. So consequently, it is necessary to implement the logic needed into those functions hidden by the operators. In order to fix the sort order, I have written a function called normalize_si:
CREATE OR REPLACE FUNCTION normalize_si(text) RETURNS text AS $$
BEGIN
RETURN substring($1, 9, 2) ||
substring($1, 7, 2) ||
substring($1, 5, 2) ||
substring($1, 1, 4);
END; $$
LANGUAGE 'plpgsql' IMMUTABLE;
Calling the function will return the following result:
test=# SELECT normalize_si('1118090878');
normalize_si
--------------
7808091118
(1 row)
As you can see, all we did is swap some digits. It is now possible to just use the normal string sort order. In the next step, this function can already be used to compare social security numbers directly. The first function needed is the lower than function, which is needed by first strategy:
CREATE OR REPLACE FUNCTION si_lt(text, text) RETURNS boolean AS $$
BEGIN
RETURN normalize_si($1) < normalize_si($2);
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;
There are two important things to note here:
- The function must not be written in SQL. It only works in a procedural or in a compiled language. The reason for that is that SQL functions can be inline under some circumstances and this would cripple the entire endeavor.
- The second issue is that you should stick to the naming convention used in this chapter—it is widely accepted by the community. Less than functions should be called _lt, less or equal to functions should be called _le, and so on.
Given this knowledge, the next functions needed by our future operators can be defined:
-- lower equals
CREATE OR REPLACE FUNCTION si_le(text, text) RETURNS boolean AS $$
BEGIN
RETURN normalize_si($1) <= normalize_si($2);
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;
-- greater equal
CREATE OR REPLACE FUNCTION si_ge(text, text) RETURNS boolean AS $$
BEGIN
RETURN normalize_si($1) >= normalize_si($2);
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;
-- greater
CREATE OR REPLACE FUNCTION si_gt(text, text) RETURNS boolean AS $$
BEGIN
RETURN normalize_si($1) > normalize_si($2);
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;
So far, four functions have been defined. A fifth function for the equals operator is not necessary. We can simply take the existing operator because equals does not depend on sort order anyway.
Now that all functions are in place, it is time to define those operators:
-- define operators
CREATE OPERATOR <# ( PROCEDURE=si_lt,
LEFTARG=text,
RIGHTARG=text);
The design of the operator is actually very simple. The operator needs a name (in my case <#), a procedure, which is supposed to be called, as well as the datatype of the left and the right argument. When the operator is called, the left argument will be the first parameter of si_lt and the right argument will be the second argument.
The remaining three operators follow the same principle:
CREATE OPERATOR <=# ( PROCEDURE=si_le,
LEFTARG=text,
RIGHTARG=text);
CREATE OPERATOR >=# ( PROCEDURE=si_ge,
LEFTARG=text,
RIGHTARG=text);
CREATE OPERATOR ># ( PROCEDURE=si_gt,
LEFTARG=text,
RIGHTARG=text);
Depending on the type of index you are using, a couple of support functions are needed. In the case of standard B-trees, there is only one support function needed, which is used to speed things up internally:
CREATE OR REPLACE FUNCTION si_same(text, text) RETURNS int AS $$
BEGIN
IF normalize_si($1) < normalize_si($2)
THEN
RETURN -1;
ELSIF normalize_si($1) > normalize_si($2)
THEN
RETURN +1;
ELSE
RETURN 0;
END IF;
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;
The si_same function will either return -1 if the first parameter is smaller, 0 if both parameters are equal, and 1 if the first parameter is greater. Internally, the _same function is the workhorse, so you should make sure that your code is optimized.