Generate SQL Script in PostgreSQL
DBAs often need to generate SQL statements, specifically DDL (Data Definition Language), using SQL queries and then execute them. PostgreSQL has many features that can be helpful here.
All SQL databases provide their metadata as relational views on the catalog (i.e. database dictionary). That is not a coincidence. Codd’s rule number 4 mandates that a dynamic online catalog must be based on the relational model. Therefore, it’s logical to use SQL queries to generate DDL dynamically from the database metadata and execute the output.
YugabyteDB also uses one process per connection, mirroring PostgreSQL’s protocol and query layer. This isn’t typically a scalability problem as connections can be spread across nodes. With YugabyteDB, all nodes are equal, accepting connections with read and write transactions. Ongoing work to add a database resident connection pool—based on Odyssey—aims to address applications without client-side connection pools and microservices with too many connection pools.
Join us at our free online Distributed SQL Summit on September 12 to hear about our new Connection Manager during the keynote.
Let’s walk through an example using YugabyteDB, which is Postgres-compatible and provides the same SQL language and catalog views as PostgreSQL.
yugabyte=# select format( $SQL$ alter table %1$8I.%2$-30I add column %3$I text not null default %4$L $SQL$ , schemaname, tablename, 'user_mod', 'init' ) as ddl from pg_tables where schemaname='public' and tablename like 'customer%'; ddl ------------------------------------------------------------------------------------------------------ alter table public.customer_customer_demo add column user_mod text not null default 'init' alter table public.customer_demographics add column user_mod text not null default 'init' alter table public.customers add column user_mod text not null default 'init' (3 rows) yugabyte=# \gexec ALTER TABLE ALTER TABLE ALTER TABLE
I have generated an ALTER TABLE statement to ADD COLUMN for a few tables following a pattern (%customer
) in their name and executed the statement.
Let’s look, in detail, at each feature that I used.
The DDL statement will be a character string (text
) in the result. Simple character strings can be enclosed in quotes (‘) but they may contain quotes themselves which have to be doubled (‘ ‘). This can quickly be a nightmare to read and maintain. PostgreSQL has a simple solution: Dollar-Quoted String Constants where a tag enclosed between two dollar signs ($) replaces a quote. The advantages are twofold. You can choose:
- a tag that will not collide with any text you have in the string
- different tags to enclose quotes in quotes
Note that the tag is not mandatory, and when there’s no need for multiple ones, $$ will appear a lot. Have you seen that same thing when defining a stored procedure or some procedural code in a DO block? If so, that’s because the PLpg/SQL code is declared in a character string as text and is interpreted as such at execution time.
Here is another example. Let’s use $PL$
for the PL/pgSQL code text, and $MESSAGE$
for the text to print with NOTICE:
yugabyte=# do $PL$ begin raise notice $MESSAGE$ select I'm a quote $MESSAGE$ ; end; $PL$; NOTICE: select I'm a quote DO
This is equivalent to, but much easier to read, than:
yugabyte=# do ' begin raise notice '' select I''''m a quote '' ; end; '; NOTICE: select I'm a quote DO
Because my DDL contains single quotes, or I want to document the intention of the text string, I built my ALTER TABLE … ADD COLUMN text with $SQL$
.
The DDL I build dynamically needs to contain schemaname
and tablename
into it. Of course, I could concatenate strings like this:
select 'alter table '||schemaname||'.'||tablename||' add column' from pg_tables where schemaname='public' and tablename like 'customer%';
But identifiers should be double quoted when they contain special characters, reserved, or case sensitive words. The query could be:
select 'alter table "'||schemaname||'"."'||tablename||'" add column' from pg_tables where schemaname='public' and tablename like 'customer%';
I did that a lot with the Oracle database but PostgreSQL is better with the format() function. Omitting double quotes is a bug that you will probably encounter when you have table names that contain more than the basic characters. Similar to a printf()
function, you can leave placeholders in the text to be replaced with the function arguments:
yugabyte=# select format('Hello %s','World'); format ------------- Hello World (1 row)
The same query, without the double quoted identifiers, is much simpler:
select format('alter table %s.%s add column', schemaname, tablename) from pg_tables where schemaname='public' and tablename like 'customer%';
And format has more than %s
. I change it to %I
to add quoted identifiers:
select format('alter table %I.%I add column', schemaname, tablename) from pg_tables where schemaname='public' and tablename like 'customer%';
Note that it does more than adding double quotes for all identifiers. It does this only when needed by the SQL identifier, because of case-sensitive name, reserved words, or special characters:
yugabyte=# select format('Hello %I','World'); format --------------- Hello "World" (1 row) yugabyte=# select format('Hello %I','world'); format ------------- Hello world
I like to align the keywords and use left padding by adding a width, and right padding with an additional ‘-
‘ flag link in: %30I.%-30I
When you have single quotes, you can add them by using the dollar-quoted strings, but also with the %L
in format:
yugabyte=# select format('Hello %L','world'); format --------------- Hello 'world' (1 row)
Sometimes you need the same argument at multiple places. Rather than repeating it in the list of arguments, an additional position can be interposed as %n$
. You can try this one:
select format ('I %1$s, you %1$s, we all %1$s for ice %2$s. ', 'scream','cream');
This is how I generated my DDL statements with placeholders for schemaname
, tablename
but also some constants that I found more convenient to put as format’s arguments.
The next thing is to run it. Ending my query by ; displays the result. If I end it with \gexec
it will execute it.
Summary: Generate SQL Script in PostgreSQL
To sum up, use:
$TAG$
to replace outer quotesformat()
to build strings dynamically%s
for text,%L
for test literals in single-quotes,%I
for identifiers that may need double quotes- catalog views to get metadata
\gexec
to execute the result
To clarify, the purpose of this is not to execute arbitrary DDL statements in production. Rather, it’s meant to create scripts that are easily maintainable, free from hardcoded text, and that won’t overlook double quotes for identifiers. Additionally, using a small additional where clause when querying from catalog views can aid in creating idempotent scripts that avoid executing existing statements again.
As a further example, I used this same technique in YugabyteDB to generate the 8 sequences. Watch my YugabyteDB Friday Tech Talk (YFTT), Best Practices Using Sequences, for more information.