Distributed SQL Tips and Tricks – June 29, 2020

June 29, 2020

Welcome to this week’s tips and tricks blog where we recap some distributed SQL questions from around the Internet. We’ll also review upcoming events, new documentation, and blogs that have been published since the last post. Got questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stackoverflow. Ok, let’s dive right in:

Regular expression queries (or pattern matching queries)

YugabyteDB inherits PostgreSQL upper layer, therefore it includes most PostgreSQL functionalities in a distributed cluster. One of them is REGEXP queries. PostgreSQL and thus YugabyteDB support three separate approaches to pattern matching: the SQL LIKE operator, the SIMILAR TO operator, and POSIX-style regular expressions. Some examples below:

yugabyte=# SELECT 'abc' LIKE 'c%'; -- false
yugabyte=# SELECT 'abc' LIKE 'a%'; -- true
yugabyte=# SELECT 'abc' SIMILAR TO '%(b|d)%'; --  true
yugabyte=# SELECT 'abc' SIMILAR TO '(b|c)%'; -- false
yugabyte=# SELECT 'abc' ~ '^(b|c)'; -- false
yugabyte=# SELECT 'abc' ~ '^a'; -- true

Check out the PostgreSQL docs for all examples and usages.

Getting the current timestamp in YSQL

Recently a user was getting incorrect results when inserting the current timestamp generated on the server into a column with datatype (plain) timestamp. The timestamp was generated using the now() function which returns a value whose data type is timestamptz—see below.

YugabyteDB inherits from PostgreSQL two types for timestamps: timestampz (short for timestamp with time zone) and timestamp (short for timestamp without time zone).

Both types store a bare timestamp value using the same on-disk representation. The crucial difference is due precisely to the accompanying data type metadata. It’s this that determines what happens both when you set a value and when you get a value. When you set and get a plain “timestamp” field (or PL/pgSQL variable) nothing at all happens. It’s up to you to interpret the values according to your own convention. But when you set a “timestamp with time zone” field (or PL/pgSQL variable) YSQL (Yugabyte’s SQL API) converts your value (which is understood with respect to the session’s current time zone) into UTC; and when you get it, it converts that stored UTC value to be correct in the session’s present time zone—which time zone might not be the same as it was when the value was stored.

The problem happens when we try to insert the current timestamp returned from now() into a timestamp column. now() returns the current timestamp in the session time zone, which might well not be the one that you chose as the one in which to interpret your plain timestamp values. When it’s inserted in the timestamp column, the time zone is dropped thus inserting a wrong value. An example below:

yugabyte=# create table test(a int primary key, b timestamp without time zone);
yugabyte=# select now(), pg_typeof(now());
          	now          	|    	pg_typeof    	 
-------------------------------+--------------------------
 2020-05-06 16:44:03.917735-07 | timestamp with time zone
(1 row)


yugabyte=# insert into test values (1, '2020-05-06 16:44:03.917735-07');
yugabyte=# select * from test;
 a |         	b         	 
---+----------------------------
 1 | 2020-05-06 16:44:03.917735
(1 row)

yugabyte=# show timezone;
  TimeZone
------------
 US/Pacific

We can see that the column “b” discards the time zone. This means that the stored result is wrong in the sense that it does not respect the user’s convention to understand stored plain timestamp values as UTC values. Of course, the stored value does correctly respect the defined semantics of plain timestamp and “now()”—so this is a user programmed bug! The user can remedy this by using (now() at time zone 'utc'):

yugabyte=# insert into test values (2, (now() at time zone 'utc'));
yugabyte=# select * from test;
 a |         	b         	 
---+----------------------------
 1 | 2020-06-10 19:38:22.859175
(1 row)

How to get query metrics in YSQL?

YugabyteDB has the pg_stat_statements extension from PostgreSQL installed by default. The query execution metrics can be viewed via the “YSQL All Ops” link on the https://yb-tserver-ip:9000/utilz page:

pg_stat_statements query execution metrics can be viewed via the "YSQL All Ops" yugabytedb

You can also query the data from YSQL:

yugabyte=# CREATE EXTENSION pg_stat_statements;
yugabyte=# select pg_sleep(3);
yugabyte=# SELECT query, calls, total_time, mean_time, stddev_time, rows FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
                                                      	query                                                      	| calls | total_time  |  mean_time  | stddev_time | rows
-------------------------------------------------------------------------------------------------------------------------+-------+-------------+-------------+-------------+------
 select pg_sleep($1)                                                                                                 	| 	1 | 3003.182174 | 3003.182174 |       	0 |	1
 create extension pg_stat_statements                                                                                 	| 	1 |   925.11284 |   925.11284 |       	0 |	0

For a quick example we’re using pg_sleep(3) function where the query lasts 3 seconds by sleeping. Note that each server returns only its local statistics. We’re currently working for global level query metrics.

What is the best way to check if a row exists?

The best way to check if a row exists is to use:

yugabyte=# CREATE TABLE test(id BIGSERIAL PRIMARY KEY);
yugabyte=# INSERT INTO test(id) VALUES (13);
yugabyte=# SELECT EXISTS(SELECT 1 FROM test WHERE id=11) AS "exists";
 exists
--------
 f
(1 row)

yugabyte=# SELECT EXISTS(SELECT 1 FROM test WHERE id=13) AS "exists";
 exists
--------
 t
(1 row)

The good thing about using EXISTS is that it always returns a boolean value instead of None (like in some languages such as Python) when just querying for the row:

yugabyte=# SELECT id FROM test WHERE id=11;
 id
----
(0 rows)

We’re continually adding to and updating the documentation to give you the information you need to make the most out of YugabyteDB. We had so many new and updated docs for you, that we wrote a blog post to cover recent content added, and changes made, to the YugabyteDB documentation – What’s New and Improved in YugabyteDB Docs