Distributed SQL Tips and Tricks – August 3, 2020
Welcome to this week’s tips and tricks blog where we recap some distributed SQL questions from around the Internet. We’ll also review upcoming events, new documentation, and blogs that have been published since the last post. Got questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stackoverflow. Ok, let’s dive right in:
Is there a way to get “updated_at” timestamps from YCQL without additional columns?
On each INSERT/UPDATE/DELETE
YugabyteDB also stores by default the timestamp when the operation occurred. This can be easily looked up by using the WriteTime
function. WriteTime
returns the timestamp in microseconds when the value was written:
ycqlsh:yb_demo> create table users(id bigint PRIMARY KEY, name TEXT); ycqlsh:yb_demo> INSERT INTO users(id,name) VALUES (1, 'Billy Johnes'); ycqlsh:yb_demo> SELECT id,name,writetime(name) from users; id | name | writetime(name) ----+--------------+------------------ 1 | Billy Johnes | 1596097449565229 (1 rows)
How to check the per-row on-disk space usage in YugabyteDB?
Data is saved in --fs_data_dirs
director(ies), which contains both the transaction logs (WAL) and SSTable files (though you can store WAL files separately using --fs_wal_dirs
).
After inserting data, quite likely many of the rows are still in WAL and memtables and not yet flushed to SSTable format on disk. And the WAL files are not in compressed format. To get a better indication, we can either load a lot more data (such that the WAL portion is insignificant) or force a manual flush of the table, and then inspect the size of just the data directories.
After filling a sample table with 1M rows:
CREATE TABLE test(a bigint, b bigint, c bigint, d bigint, PRIMARY KEY((a), b);
We can see the on disk size of the directory where the SSTable files are stored is very small:
$ du -hs yb-data/tserver/data 60K yb-data/tserver/data
This confirms that most of the data is still in WAL and memtables.
After forcing a flush of data to disk using:
./bin/yb-admin --master_addresses 127.0.0.1 flush_table <keyspace> <table>
The data directory size is as follows:
$ du -hs yb-data/tserver/data 48M yb-data/tserver/data
So approximate on-disk size per row size for the given schema is about 48 bytes. This includes the overheads for the metadata (indexes, bloom filters, internal timestamps, etc.).
How to check if a jsonb field contains a property name in YCQL?
Assuming we have a table with rows below:
CREATE TABLE testjsonb (id text PRIMARY KEY,fields jsonb); INSERT INTO testjsonb(id,fields)values('3','{"property":"value3"}'); INSERT INTO testjsonb(id,fields)values('1','{"property3":"value3"}');
We can use IF NOT fields->>'property' = null
like below:
ycqlsh:yb_demo> select * from testjsonb IF NOT fields->>'property' = null; id | fields ----+----------------------- 3 | {"property":"value3"} (1 rows)
How to enable query logging in YSQL?
Query logging can be enabled using the --ysql_log_statement
gflag in yb-tserver.
Valid values are: none (off), ddl (only data definition queries, such as create/alter/drop), mod (all modifying/write statements, includes DDLs plus insert/update/delete/trunctate, etc), and all (all statements).
The query logs will show in postgres*.log
files in the logs directory.
As an example, we’ll start a cluster with yb-ctl
and write test queries:
$ ./bin/yb-ctl start --tserver_flags="ysql_log_statement=all" $ ./bin/ysqlsh yugabyte=# create extension if not exists pg_stat_statements; yugabyte=# SELECT * FROM pg_stat_statements;
And then we can check in the logs directory:
~/yugabyte-data/node-1/disk-1/yb-data/tserver/logs$ ls postgresql-2020-07-30_163500.log yb-tserver.guru-predator.guru.log.INFO.20200730-163500.31158 yb-tserver.guru-predator.guru.log.WARNING.20200730-163500.31158 yb-tserver.INFO yb-tserver.WARNING
And check the recorded queries: (log trimmed below)
~/yugabyte-data/node-1/disk-1/yb-data/tserver/logs$ cat postgresql-2020-07-30_163500.log ... I0730 16:35:46.708248 31619 async_initializer.cc:80] Starting to init ybclient I0730 16:35:46.708577 31619 client-internal.cc:1977] New master addresses: [127.0.0.1:7100] I0730 16:35:46.709975 31619 async_initializer.cc:84] Successfully built ybclient 2020-07-30 16:35:56.470 UTC [31612] LOG: statement: SELECT n.nspname as "Schema", c.relname as "Name", CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'table' WHEN 'I' THEN 'index' END as "Type", pg_catalog.pg_get_userbyid(c.relowner) as "Owner" FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind IN ('r','p','v','m','S','f','') AND n.nspname <> 'pg_catalog' AND n.nspname <> 'information_schema' AND n.nspname !~ '^pg_toast' AND pg_catalog.pg_table_is_visible(c.oid) ORDER BY 1,2; 2020-07-30 16:36:19.483 UTC [31612] LOG: statement: create extension if not exists pg_stat_statements; I0730 16:36:19.483259 31612 thread_pool.cc:166] Starting thread pool { name: TransactionManager queue_limit: 150 max_workers: 50 } 2020-07-30 16:36:21.787 UTC [31612] LOG: statement: SELECT * FROM pg_stat_statements; ...
New Documentation, Blogs, Tutorials, and Videos
New Blogs
- TPC-C Benchmark: 10,000 Warehouses on YugabyteDB
- Getting Started with SQLPad and Distributed SQL on Google Kubernetes Engine
- Distributed SQL Change Management with Liquibase and YugabyteDB on GKE
- Porting Oracle to YugabyteDB
New Videos
- What’s New in YugabyteDB 2.2
- Distributed SQL Meets PostgreSQL
- Getting Started with Hasura GraphQL & YugabyteDB on GKE
- Getting Started with YugabyteDB on GKE with Helm 3
New and Updated Docs
ICYMI, to support the release of YugabyteDB 2.2, we published these new docs:
- Transactional distributed backups – YSQL
- Transactional distributed backups – YCQL
- Online index builds: Simple and unique indexes for YCQL
- Online index builds: Simple indexes for YSQL
- Deferred constraints: YSQL now supports DEFERRABLE INITIALLY IMMEDIATE and DEFERRABLE INITIALLY DEFERRED clauses on foreign keys
- Colocated tables: Database-level colocation for YSQL
- Benchmark YugabyteDB using TPC-C
Upcoming Events
- Aug 3 @ 1:30 pm PT [4:30 pm ET] – YugabyteDB: Bringing Together the Best of Amazon Aurora and Google Spanner
- Aug 4 @ 9:00 am PT [12:00 pm ET] – Data on Kubernetes Meetup: Running Distributed SQL on K8s Using YugabyteDB
- Aug 7 @ 11:30 am PT [2:30 pm ET] – YugabyteDB Community Q&A, Topic: Cloud Native
- Aug 12 @ 10:00 am PT [1 pm ET] – Webinar: Geo-distributed SQL databases: 9 techniques to reduce cross-region latency
- Aug 17 – 20 – KubeCon + CloudNativeCon Europe 2020 Virtual
- Aug 21 @ 11:30 am PT [2:30 pm ET] – YugabyteDB Community Q&A, Topic: GraphQL
Get Started
Ready to start exploring YugabyteDB features? Getting up and running locally on your laptop is fast. Visit our quickstart page to get started.