Enhanced Order/OrderLine Schema Performance After MSSQL Migration to YugabyteDB

October 16, 2024

In this blog, we share why some Yugabyte customers chose to migrate to YugabyteDB from Microsoft SQL Server (MSSQL), feature a quick start guide (so you can DIY), and detail a top YugabyteDB performance tuning tip!

What Issue Are MSSQL Customers Trying to Solve?

Customers have reported that when you insert into a table with a sequentially increasing key in earlier SQL Server versions, all new rows are added to the same page at the end of the B-tree until that page is full. This leads to contention under high concurrency, an issue that occurs in both clustered and non-clustered indexes.

If you are reading this, you may have previously investigated PAGELATCH_EX / PAGELATCH_SH waits.

Suppose you are facing a similar situation on your SQL server application. If you can’t easily upgrade and seek an alternative to an SQL server hash partition strategy (with its limitations and associated application changes), a database migration to YugabyteDB is a great option.

Why Choose YugabyteDB?

Unlike traditional databases, YugabyteDB is designed to automatically and transparently split data across multiple servers. This ensures horizontal scalability and high availability.

Converting traditional database schemas to YugabyteDB (PostgreSQL syntax) can be done with a “lift and shift approach.” However, you can make simple, low-effort optimizations on the physical data model which result in great performance improvements. I’ll share an example later in the blog.

How to Get Started on YugabyteDB

DML/DDL conversion
1. You can easily convert your DML/DDL from transact-sql to PostgreSQL dialect using an ORM. But, if you do not use an ORM then more effort is required. ChatGPT-4 and, more recently, Claude 3.5 Sonnet excel at DDL/DML conversion can assist you in this task – provided you prompt them correctly. Testing the converted query is crucial to ensure you still obtain the correct results. These tools can help you fully convert at least 80% of your queries.
Install yugabyted (as detailed below)
1. YugabyteDB is completely open source, but you can also use the YugabyteDB fully managed service, YugabyteDB Aeon, if your application is mission-critical and you require additional assistance and peace of mind.
2. Get up and running first with a docker download of yugabyted RF1 to complete your functional testing. When you are ready to test performance, switch to Yugabyte Aeon.
To match your SQL server as closely as possible in yugabyted RF1, I recommend starting up YugabyteDB with these settings. Copy and paste this command:
```
docker run -d --name yugabyte -p7001:7000 -p9000:9000 -p15433:15433 -p5433:5433 -p9042:9042 \
yugabytedb/yugabyte:2024.1.1.0-b137 bin/yugabyted start \
--tserver_flags="yb_enable_read_committed_isolation=true,ysql_sequence_cache_method=server,ysql_pg_conf_csv={yb_enable_optimizer_statistics=on,yb_enable_base_scans_cost_model=on}" \
 --background=false
```
This command gives you a cost-based optimizer, read committed isolation level, and your sequences cached on the server and not in the connection. This is probably what you are already used to in SQL server!
Create your schema, connect your application, load your data, run ANALYZE, and complete your functional testing.

Remember to run ANALYZE after you have loaded your data! You need some statistics with the cost-based optimizer, look out for YugabyteDB AUTO-ANALYZE coming soon. For now, schedule a weekly ANALYZE to keep your stats current.

Since you’re migrating your database to another vendor, it’s a best practice to ensure that your test suite passes successfully on YugabyteDB using the PostgreSQL syntax. After this, you can start testing at scale. I recommend you test scaling YugabyteDB on Yugabyte Aeon.

Tuning Tip! You can lift and shift 80% of your application and then tweak the remaining 20% for fantastic results.

How to Micro-Tune in YugabyteDB

Imagine running an e-commerce website on a SQL server with a traditional OrderLines schema. Your most common query is a lookup of the shopping cart or a completed purchase. This query could return three rows, or a keen shopper might have as many as 30 rows (or more).

SELECT * FROM OrderLines WHERE ol.OrderID = $1 order by ID

Let’s review an example schema and see how you can micro-tune it in YugabyteDB. Again, you can leave this completely stock without any problems as YugabyteDB will hash the serial ID and doesn’t have the traditional problems associated with serial columns on b-tree.

CREATE TABLE public.OrderLines (
	ID BIGSERIAL,
	OrderID BIGINT NOT NULL,
	ProductID BIGINT NOT NULL,
	QuantityOrdered BIGINT NOT NULL,
	QuantityDelivered BIGINT,
	QuantityToShip BIGINT,
	RequestedDate TIMESTAMP,
	ReturnedOrderID BIGINT,
	UnitPrice DECIMAL(10, 2) NOT NULL,
	Discount DECIMAL(5, 2),
	CreatedAt TIMESTAMP,
	UpdatedAt TIMESTAMP, 
	PRIMARY KEY (ID HASH)
);

Your set-up may have evolved from the traditional orderline schema. You could add new columns with different data types.

In YugabyteDB, the hash function distributes values to tablets. With this, you can avoid hot latch problems. Your order lines table can linearly scale queries per second and the total number of rows. Petabytes of data per table is possible, you just need enough YugabyteDB nodes and everything scales with you.

You probably aren’t running this lookup on IndexOnlyScan in SQL server because you need to return too many columns to the client for a covering index.

SELECT * FROM OrderLines WHERE OrderID = $1 order by ID

In YugabyteDB, we can model the primary key to satisfy the most highly accessed query for the table object, which happens to be the parent key of the child table. (You can translate this tip to most of your parent keys for your child tables. Note: most!)

PRIMARY KEY, lsm (orderid HASH, id ASC)

What does a primary key like this mean?

It means for your most important select (Reading an entire OrderID). You get the entire row from one node in a multi-node, YugabyteDB deployment. Cross-node latency is minimized for an order with multiple items (rows). Additionally Sorting on ID ASC means that no sort is introduced to the plan.

Finally, and the best part. Because the primary key index is the table, this is the same as an IndexOnlyScan, and your previous two operations in SQL server have been reduced to just one operation.

If you need to keep ID uniqueness across different orders, add a unique constraint: an explicit business role and underlying unique index to detect the uniqueness.

ALTER TABLE public.OrderLines ADD CONSTRAINT uk_orderlines_id UNIQUE (ID);

This additional unique index does not offset the primary key change because we now reduce one hop on the lookup and remove the need for multiple indexes on the table object. This means any writes have fewer write requests to do – overall, a strong net gain.

One trade-off is that range queries purely on the ID column are no longer efficient, but that would typically be an unlikely query here.

You can drop multiple indexes with this primary key change. You might have more, but here are some you may have that the primary key can now serve.

DROP INDEX Idx2 ON public.OrderLines (OrderID);
DROP INDEX Idx3 ON public.OrderLines (OrderID, QuantityDelivered, ProductID);
DROP INDEX Idx4 ON public.OrderLines (OrderID, ReturnedOrderID);

YugabyteDB can push down predicates to the remote node to filter on columns not included in the index (primary key) condition with the Storage filter (renamed from Remote filter in 2024.1 and above).

Storage filter isn’t a good operation when you have to remove a lot of rows. Still, when your average row count per OrderID is a handful, the storage filter is perfect, and all these additional indexes become redundant.

So, using this simple change we have

Removed multiple indexes that served any queries that have an equality on OrderID
Kept uniqueness on ID with a unique ID
Provided IndexOnlyScan by accessing the table through the primary key

You don’t need to make these changes to provide linear scalability, but these changes give you linear scalability at less cost. All changes like this give you more queries per second per CPU.

Conclusion

This blog covers how to get started on YugabyteDB and examines the benefits of migrating from MSSQL, as well as sharing data-modeling tips to minimize cross-node latency.

If you want to skip using yugabyted RF1 to complete your functional testing, or you are ready to test performance but would like enablement or assistance, email our YugabyteDB experts with your application schema details. The team can authorize a free Yugabyte Aeon trial for qualified apps.