Distributed SQL Tips and Tricks – March 13, 2020
Welcome to this week’s tips and tricks blog where we recap some distributed SQL questions from around the Internet. We’ll also review upcoming events, new documentation, and blogs that have been published since the last post. Got questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stackoverflow. Ok, let’s dive right in.
How can I UPSERT multiple rows with an update?
PostgreSQL and YSQL enable you to do upserts using the INSERT ON CONFLICT
clause. Similar to multi-row inserts, you can also batch multiple upserts in a single INSERT ON CONFLICT
statement for better performance.
In case the row already exists, you can access the existing values using EXCLUDED.<column_name>
in the query.
In the example below we have a table with where we are keeping counters of different products and incrementing rows in batches:
CREATE TABLE products ( NAME TEXT PRIMARY KEY, quantity BIGINT DEFAULT 0 ); --- INSERT INTO products(name, quantity) VALUES ('apples', 1), ('oranges', 5) ON CONFLICT(name) DO UPDATE SET quantity = products.quantity + excluded.quantity; --- INSERT INTO products(name, quantity) VALUES ('apples', 1), ('oranges', 5) ON CONFLICT(name) DO UPDATE SET quantity = products.quantity + excluded.quantity; --- SELECT * FROM products; name | quantity ---------+---------- oranges | 10 apples | 2 (2 rows)
How can I optimize YugabyteDB for faster integration testing
There are several steps that we can take to make YugabyteDB faster when testing in integration tests, CI/CD, and other testing scenarios:
- Point gflag
--fs_data_dirs
to a ramdisk directory
This will make DML, DDL and create/destroy of a cluster faster because data is not written to disk. - Set gflag
--yb_num_shards_per_tserver=1
Reducing the number of shards lowers overhead when creating/dropping YCQL tables. - Set gflag
--ysql_num_shards_per_tserver here=1
Reducing the number of shards lowers overhead when creating/dropping YSQL tables. - Set gflag
--replication_factor=1
Keeping only the tablet leaders will remove replication overhead. - Use
TRUNCATE
instead ofDROP TABLE
followed byCREATE TABLE
between test cases if you want to keep the same table schema but delete all the rows in it.
These options are all explained in the yb-tserver and yb-master configuration pages.
What is the YSQL equivalent to in SQL Server’s ISNULL()?
The expression ISNULL(expression, replacement)
in MS SQL Server returns the replacement value if the expression returns `NULL`.
The same functionality can be achieved with the coalesce(arg1,arg2,argn)
function. The function takes an arbitrary number of arguments and returns the first value that `IS NOT NULL`. Example in SQL Server:
SELECT Isnull(my_column, 'Missing') FROM table;
The same functionality in YSQL:
SELECT COALESCE(my_column, ‘missing’) FROM table;
How can I find duplicate rows using YSQL?
In many cases we want to find duplicate rows in a table to clean them up or process them. This can be easily achieved with the GROUP BY
and HAVING
clauses.
Suppose we have a table of employees with the following data:
CREATE TABLE employees ( id BIGSERIAL, first_name TEXT, last_name TEXT ); --- INSERT INTO employees (first_name, last_name) VALUES ('John', 'Smith'), ('Bill', 'Potter'), ('John', 'Smith');
You can find duplicate rows using:
SELECT first_name, last_name FROM employees GROUP BY first_name, last_name HAVING Count(*) > 1; first_name | last_name ------------+----------- John | Smith (1 row)
You can also remove duplicate rows using DISTINCT ON
:
DELETE FROM employees WHERE id NOT IN (SELECT DISTINCT ON(first_name, last_name) id FROM employees); SELECT * FROM employees; id | first_name | last_name ----+------------+----------- 1 | John | Smith 2 | Bill | Potter (2 rows)
DISTINCT ON(expression [, ...])
selects only the first row of each set of rows where the given expression (in this case the combination of first_name and last_name) are the same.
New Blogs, Tutorials, and Videos
New Blogs
- Monitoring YugabyteDB with Prometheus and Grafana in Kubernetes
- Getting Started with Falco Runtime Security and Cloud Native Distributed SQL on Google Kubernetes Engine
- Cloud Native Meets Distributed SQL: Bringing Microservices, Kubernetes, Istio & YugabyteDB Together with Hipster Shop Demo
- 5 Query Pushdowns for Distributed SQL and How They Differ from a Traditional RDBMS
New Videos
- Install YugabyteDB on macOS using Docker and yugabyted
- Longhorn Distributed Block Storage & Cloud Native Distributed SQL on Google Kubernetes Engine
- Getting Started with Distributed SQL Colocated Tables
New and Updated Docs
- Read replica clusters
- JSON data types and functionality: This section includes over 35 new pages, including:
Get Started
Ready to start exploring YugabyteDB features? Getting up and running locally on your laptop is fast. Visit our quickstart page to get started.
What’s Next?
- Compare YugabyteDB in depth to databases like CockroachDB, Google Cloud Spanner and MongoDB.
- Get started with YugabyteDB on macOS, Linux, Docker and Kubernetes.
- Contact us to learn more about licensing, pricing or to schedule a technical overview.