Four Compelling Use Cases for PostgreSQL Table Functions
In this post, I’ll discuss four realistic use cases that call for user-defined table functions and I’ll explain how each case is met using complete, self-contained code examples.
Use Case #1: Parameterized View
In my blog post “Using Stored Procedures in Distributed SQL Databases”, I described an experiment to measure the speeds of transactions with a variable number of insert statements per transaction when these are implemented respectively as a sequence of top-level database calls or as a single call to a stored procedure that has the same effect. I computed the procedure-to-top-level-call speed ratios, as a function of the number of inserts per transaction. I did this by recording the result for each test in a table and by then running code like I describe in this section.
This table is a stylized version of the table that I used:
create table results( test_no int, method varchar(20), duration dec constraint results_duration_nn not null, constraint results_duration_chk check(duration > 0), constraint results_pk primary key (test_no, method));
Use this to insert some data to illustrate the behavior of the code that I describe in this section:
insert into results(test_no, method, duration) values (1, 'Method A', 10.1), (1, 'Method B', 8.6), (1, 'Method C', 5.4), (2, 'Method A', 20.6), (2, 'Method B', 17.3), (2, 'Method C', 9.9), (3, 'Method A', 30.4), (3, 'Method B', 23.8), (3, 'Method C', 16.2);
This table function computes the speed ratios for two specified methods for all the test_no values:
create function speed_ratios( method_1 in results.method%type, method_2 in results.method%type) returns table(test_no results.test_no%type, ratio dec) language sql as $body$ select test_no, method_a_duration/method_b_duration as ratio from ( select test_no, t1.duration as method_a_duration, t2.duration as method_b_duration from results t1 inner join results t2 using(test_no) where t1.method = method_1 and t2.method = method_2) as a; $body$;
Test it like this:
select test_no as "Test No.", to_char(ratio, '990.99')||'x' as "Speed Ratio" from speed_ratios('Method A', 'Method B') order by 1;
It produces this output:
Test no. | Speed Ratio ----------+------------- 1 | 1.17x 2 | 1.19x 3 | 1.28x
Of course, you could generate a report like this with a client-side program. But doing so would lose a major benefit. The table function approach lets you run arbitrary queries to answer arbitrary questions, like this:
select to_char( (select max(ratio) from speed_ratios('Method A', 'Method B'))/ (select max(ratio) from speed_ratios('Method B', 'Method C')), '990.99') as x;
Use Case #2: Pretty Printed Ad Hoc Reports
for Administrators
In “Use case #1”, there are three distinct values for results.test_no in the example sample data that I use. That means that there are three distinct comparison tables to be looked at: “Method A” vs “Method B”, “Method B” vs “Method C”, and “Method A” vs “Method C”. Rather than producing these by executing a .sql script with the three explicit “select… from speed_ratios(…)” statements, interspersed with appropriate \echo meta-commands, it would be nicer just to execute a single stored procedure. However, the only way to get output from an ordinary PL/pgSQL procedure is to use raise info (or raise notice). When you run such a program from a script, then every output line looks like this:
ysqlsh:raise_info_demo.sql:9: INFO: Some output
Moreover, the line numbers will, in general, all be different—so you can’t simply remove this awful clutter with a simple “search and replace all” in an ordinary text editor. This is especially frustrating for developers and administrators who, using other databases, assemble a kit-bag of scripts and programs to generate useful, and usably readable, reports that—just like for the present use case, assemble the output from several separate queries. Here’s where a table function comes to the rescue. This is the basic idea:
create function pretty_report_speed_ratios() returns table(t varchar) language plpgsql as $body$ begin for j in 1..5 loop t := ''; return next; end loop; t := '=================================================='; return next; t := 'Speed Ratios Report'; return next; t := '=================================================='; return next; t := ''; return next; <<"loop over the method combinations">> for j in 1..3 loop ... end loop "loop over the method combinations"; end; $body$;
Run the report like this:
select t from pretty_report_speed_ratios();
The loop over the method combinations starts like this:
<<"loop over the method combinations">> for j in 1..3 loop declare method_1 results.method%type; method_2 results.method%type; begin case j when 1 then method_1 := 'Method A'; method_2 := 'Method B'; when 2 then method_1 := 'Method B'; method_2 := 'Method C'; when 3 then method_1 := 'Method A'; method_2 := 'Method C'; end case; <<"do each method-pair combination">> ... end "do each method-pair combination"; end; end loop "loop over the method combinations";
Here’s the final missing piece:
<<"do each method-pair combination">> declare tt varchar(20); rr varchar(20); n int := 0; begin for tt, rr in ( select to_char(test_no, '9999999'), to_char(ratio, '99990.99')||'x' from speed_ratios(method_1, method_2) order by 1) loop n := n + 1; t := tt||' '||rr; if n = 1 then t := t||' << '||quot||method_2||quot||' to '||quot||method_1||quot; end if; return next; end loop; end "do each method-pair combination";
And here’s the output produced by pretty_report_speed_ratios():
================================================== Speed Ratios Report ================================================== Test No. Speed Ratio -------- ----------- 1 1.17x << 'Method B' to 'Method A' 2 1.19x 3 1.28x 1 1.59x << 'Method C' to 'Method B' 2 1.75x 3 1.47x 1 1.87x << 'Method C' to 'Method A' 2 2.08x 3 1.88x ==================================================
We can see at a glance that “Method B” is about 1.2x faster than “Method A”, that “Method C” is about 1.6x faster than “Method B”, and that, of course, “Method C” is therefore about 1.9x faster than “Method A”.
The full source text of the pretty_report_speed_ratios() table function is presented in the appendix so that you can simply copy and paste it into ysqlsh and then run it.
Use Case #3: Dynamic IN List
Assume that we have table t with this content:
k | v -----+------ 100 | t_01 200 | t_04 300 | t_42 400 | t_04 500 | t_10 600 | t_42 700 | t_07 800 | t_16 900 | t_42
The code to create this is shown in the “Built-in SQL table functions” section under “generate_series()”. Developers and administrators usually type an explicit in list in an ad hoc query like this:
select k, v from t where k in (200, 300, 500, 900) order by 1;
But what if this is to be issued from a PL/pgSQL procedure and the in list members aren’t known until run-time? The naïve programmer tends to think that this calls for dynamic SQL where the in list is constructed programmatically. But there’s a much better, and simpler to write, way:
select k, v from t where k in ( select unnest as k from unnest( array[200, 300, 500, 900]::int[])) order by 1;
The functionality of the unnest() built-in SQL table function can be implemented as a user-defined table function thus:
create function my_unnest(ks int[]) returns table(k int) language plpgsql as $body$ declare element int; begin foreach element in array ks loop k := element; return next; end loop; end; $body$;
Use it like this:
select k, v from t where k in ( select k from my_unnest( array[200, 300, 500, 900]::int[])) order by 1;
Of course, it’s pointless just to re-implement the built-in unnest(). Moreover, the built-in can accept an actual argument of any datatype. I illustrated a few possibilities in the section that described it. You’d have to implement each overload that you needed explicitly if you programmed it using PL/pgSQL.
My purpose in showing you my_unnest() is this:
- firstly, to show you, ordinarily, how to iterate over the elements in an array in PL/pgSQL;
- and secondly, to show you the overall structure of a table function that computes values that you can then use as an in list in a subquery.
Who knows what arbitrary computation you might want to use to produce the in list values—and, of course, the built-in unnest() simply has its fixed functionality.
Use case 4: Compact syntax for bulk insert
Consider the procedure created thus:
create procedure insert_rows(vs in varchar[]) language plpgsql as $body$ declare element t.v%type; begin foreach element in array vs loop insert into t(v) values(element); end loop; end $body$;
If we have a table created thus:
create table t( k uuid default gen_random_uuid() constraint t_pk primary key, v varchar(100));
then we can populate it thus:
call insert_rows( array['one', 'two', 'three']);
I used the same “default gen_random_uuid()” approach to populate the table’s primary key column in my code example in my “Using Stored Procedures in Distributed SQL Databases” post. I pointed out there that you must make gen_random_uuid() available by installing the PostgreSQL extension like this:
create extension pgcrypto
as explained in the YugabyteDB documentation here.
Notice that the procedure simply loops over the elements of the input array and executes an explicit single-row insert for each value. This looks to be inefficient; but PL/pgSQL doesn’t support methods for bulk binding to insert, update, or delete statements.
Here’s an alternative implementation of insert_rows() that uses unnest():
create or replace procedure insert_rows(vs in varchar[]) language plpgsql as $body$ begin insert into t(v) select unnest from unnest(vs::varchar[]); end; $body$;
Of course, the use of this second implementation of insert_rows() is the same as that for the first implementation that I showed. It’s certainly shorter than the implementation that uses this:
foreach element in array vs loop insert into t(v) values(element); end loop;
It’s therefore easier to understand and review. However, my tests showed that the performance of the two implementations, and of a third that uses my_unnest() was the same within the confidence limits of my measurements.
Conclusion
When developers have a rich set of tools, they’re much better placed to implement an optimal approach for a new use case than when they’re limited by an impoverished toolkit. This three part series of blog posts has presented several use cases where table functions showed their value. User-defined table functions are part of the larger picture of stored procedures, so to use them you have to use a distributed SQL database that supports these! YugabyteDB inherits PostgreSQL’s support, and in this post series I’ve described examples implemented both in SQL and in PL/pgSQL. The examples have used a very wide range of PostgreSQL’s SQL and stored procedure functionality. The fact that all the code that I’ve shown works identically in YugabyteDB and vanilla PostgreSQL is a testament to the success of our strategy to use the upper half of the PostgreSQL query layer simply “as is” on top of our own unique storage layer. This architecture makes YugabyteDB unique among distributed SQL databases.
What’s Next?
- Compare YugabyteDB in depth to databases like CockroachDB, Google Cloud Spanner and MongoDB.
- Get started with YugabyteDB on macOS, Linux, Docker, and Kubernetes.
- Contact us to learn more about licensing, pricing or to schedule a technical overview.
Appendix
Here is the complete source code for the use case “Pretty printed ad hoc reports for administrators”:
create function pretty_report_speed_ratios() returns table(t varchar) language plpgsql as $body$ declare quot constant varchar(1) not null := ''''; begin for j in 1..5 loop t := ''; return next; end loop; t := '=================================================='; return next; t := 'Speed Ratios Report'; return next; t := '=================================================='; return next; t := ''; return next; t := 'Test No. Speed Ratio'; return next; t := '-------- -----------'; return next; <<"loop over the method combinations">> for j in 1..3 loop declare method_1 results.method%type; method_2 results.method%type; begin case j when 1 then method_1 := 'Method A'; method_2 := 'Method B'; when 2 then method_1 := 'Method B'; method_2 := 'Method C'; when 3 then method_1 := 'Method A'; method_2 := 'Method C'; end case; t := ''; return next; <<"do each method-pair combination">> declare tt varchar(20); rr varchar(20); n int := 0; begin for tt, rr in ( select to_char(test_no, '9999999'), to_char(ratio, '99990.99')||'x' from speed_ratios(method_1, method_2) order by test_no) loop n := n + 1; t := tt||' '||rr; if n = 1 then t := t||' << '|| quot||method_2||quot||' to '||quot||method_1||quot; end if; return next; end loop; end "do each method-pair combination"; end; end loop "loop over the method combinations"; t := '=================================================='; return next; for j in 1..5 loop t := ''; return next; end loop; end; $body$;
Run it like this:
\t on select t from pretty_report_speed_ratios(); \t off
The \t on meta-command simply turns off the clutter of the column header (the name of the column t), the underline for that, and the row count footer.