A New Approach to Scale RDBMS Platforms — Yugabyte
In some areas of database engineering, we need to extend and improve our current architecture and technological infrastructure because of changing and demanding requirements. In this article, I will try to explain our main problem about scaling our database system. We discovered and tried a new approach in order to improve our stability and scalability features.
What is the Problem ?
Let’s start wtih the problem that we suffer. Currently, we’ve been using PostgreSQL as one of our core database platform. PostgreSQL, is very popular and roboust open source database platform. If you didn’t hear about PostgreSQL, you can investigate the following link.
In releational database systems, we store our data on tables. Tables consist of rows. The main problem is that, the more your table is growing in terms of total number of rows in a single table the harder it is to ensure performance(regarding query response times and efficient I/O performance). Because, the database stores rows of a table on storage blocks. Whenever a new select query is sent to database it has to search related data on storage. The time it takes on disk depends on the amount of data in table. Note: This is only one reason for performance killer there are many reasons also.
In other words, I prepared a simple scenario to demonstrate the effect of table size on query performance. In this demo, I used 2 different PostgreSQL setup that they are identical in terms of PostgreSQL version and benchmark conditions. Details as follows;
PostgreSQL Version : 12
Number of Clients : 20
Number Of Threads : 2
Total Test Duration : 300 seconds
Results show that latency in PostgreSQL that has a bigger table size is higher than the other one.In addtition to this, ability to handle transaction per second also decreases when table size is growing. So the problem is clarified. We want to solve and eliminate this obstacle.
We discovered that we’re not able to distrubute data among different PostgreSQL server in our current PostgreSQL platform. However, there are some alternatives in order to solve this problem but in our use case the other ones are not considered. The main focus for this arcticle is Yugabyte. But some alternatives as follows;
- Table archiving
- Table partitioning
How Do we Solve Problem ?
To solve this problem, we need to distrubute total amount of data among different database servers. Because data size on a single server effects the database performance in terms of query response time.
After all we decided to use a database platform which is capable of distrubuting data among server equally. Yugabyte can be used to achive this goal. Yugabyte is a database platform that can scale horizontally and distrubute the data between tablet servers. (Yugabyte glossary will be given). With the help of Yugabyte, we’re able to obtain and manage smaller size virtual machines. This leads to better I/O performance and improves database performance.
Yugabyte is a distrubuted SQL database. It is a PostgreSQL native database platform and possible to use all features belonging the PostgreSQL because it uses open source PostgreSQL enginee
How Yugabyte Solves our Main Problem ?
Yugabyte solves our problem via distrubuting data between yugabyte tablet servers. Because it can be scaled horizontally and deployed in multi-regional envrionment we’re able to distrbute data and write/read requests between servers.
As seen, we can divide data into small chunks as many as we can.
How to Configure and Use Yugabyte ?
It is easy to start using Yugabyte. You can prefer by configuring a local cluster or a distrubuted cluster consist of tablet servers. You can use the following terms when talking about Yugabyte. These are not all but enough for beginning i think.
- Yugabyte Master Server : master servers are responsible for configuration management.
- Yugabyte Tablet Server : tablet servers are responsible for storing actual data.
- Tablet : yugabyte stores data in objects called “tablets”. Tablets are distrubuted among tablet servers.
In order to start using Yugabyte we have to install required database packages and apply post configurations to start Yugabyte cluster. You can follow these steps.
All servers that need to be in the cluster must be set up. The following script can be used for basic installation.
apt-get updateapt-get install sudo -yapt-get install vim -ysudo apt updateapt-get install python3 -yapt-get install python3*pip -yapt-get install curl -yapt-get install wget -yapt-get install python-is-python3 -ywget https://downloads.yugabyte.com/yugabyte-126.96.36.199-linux.tar.gztar xvfz yugabyte-188.8.131.52-linux.tar.gz && cd yugabyte-184.108.40.206/bash bin/post_install.sh
Servers that need to be in the cluster must be installed for Mserver roles. Here, on each server, the rpc address value should be updated to be the private ipv4 address of the server. The following script should be run on all servers.
./bin/yb-master \--master_addresses ip_1:7100,ip_2:7100,ip_3:7100 \--rpc_bind_addresses ip_address_for_server \--leader_failure_max_missed_heartbeat_periods 10 \--fs_data_dirs "/root/var/data" \--placement_cloud karadeniz \--placement_region tr-kastamonu \--placement_zone tr-kastamonu-taskopru \>& /root/var/data/yb-tserver.out &
Here, placement_cloud, placement_region and placement_zone values can be given as desired in on-premise and cloud environments. However, it will be very advantageous to use concepts such as data center, hall and rack in on-premise environments. Therefore, they should also be paid attention to.
Additionally, the /root/var/data directory is used as the data path by default. A separate directory can be used here as well. It is necessary to make sure that this directory exists.
The tserver role should be deployed to the servers that should be in the cluster. The following command can be used for this.
./bin/yb-tserver \--tserver_master_addrs ip_1:7100,ip_2:7100,ip_1:7100 \--rpc_bind_addresses ip_address_for_server \--start_pgsql_proxy \--pgsql_proxy_bind_address ip_address_for_server:5433 \--cql_proxy_bind_address ip_address_for_server:9042 \--fs_data_dirs "/root/var/data" \--placement_cloud demo \--placement_region us-west \--placement_zone us-west-2a \>& /root/var/data/yb-tserver2.out &
The points to be considered above are also valid in the article. After all these operations are done, the cluster installations are completed.
To confirm the cluster installations, the WEB UI should be checked, which should be open on the 7000 port of the servers. For instance,
At node_1_ip:7000 there should be an output like the one below.
As we installed and configured Yugabyte with 3 servers we expect total data size per server to be reduced by up to one-third. Let’s compare with PostgreSQL. In this case, I will insert the same amount of data both Yugabyte and PostgreSQL and compare the total storage size for database.
The results does not complain yugabyte is better than postgres or vice versa. I also share the test environment details.
Yugabyte Benchmark Results
# of Records : 1000000
# of Ops : 1000000
# of Thread : 4
OS : CentOS
TP : 8859
Runtime(seconds) : 112
Tool : ycsb
PostgreSQL Benchmark Results
# of Records : 1000000
# of Ops : 1000000
# of Thread : 4
OS : Ubuntu
TP : 2054
Runtime : 480
Tool : pgbench
Test results show that, yugabyte is performing better under heavy load in terms of SELECT and UPDATE requests. Because, SELECT and UPDATE request can be distrubuted among the yugabyte servers. But this does not imply that it is completely better than the other.
Also, because i studied with docker i had to use different OS types.
As can be seen from these results, the distribution of requests to the database between servers leads to performance gains. In this way, as the number of yugabyte database servers increases, it is in a position to respond to more requests with the same performance values.
Since Yugabyte physically divides the data and stores it on multiple servers, the amount of data size per server is less when compared to PostgreSQL. In this case, it causes a decrease in the number of I/O requests required to complete incoming requests to the database. Thus, performance gain is achieved.
Please do not accept the results of these tests until you have tried them yourself and tested according to your needs. I carried out these tests in my own environment and under the conditions I determined. Therefore, if you are going to make such a comparison, I suggest you do tests according to your own data model and conditions. Results may vary according to need.
When To Use Yugabyte
Of course, there is no database platform fulfills your all needs regarding performance,scalability,availability and disaster side features. The important point is to define your requirements and priorities. This is the key point.
I think the most important reason for choosing Yugabyte database is that it can work as distributed. The most prominent feature of Yugabyte is that it can meet OLTP loads simultaneously in multiple geographical regions. Secondly, since there is no master-slave model in the cluster structure, it can scale INSERT requests at the cluster and geographical region level. These features indicate that Yugabyte is suitable for OLTP requests that need high availability.For detailed information please see : yugabyte FAQ