15 SQL Query Optimisation Techniques
Database is an essential software component for any business. It contains organisational data and allows businesses to build features to serve their customers. Here at Halodoc, we extensively use Relational Database Management Systems(RDBMS) as well NoSQL databases for our specific needs.
We have setup active and passive monitoring systems for checking slow queries in the database. We have enabled slow log and we have a performance insight monitor for passive monitoring. For active monitoring, we have alerts on CPU, memory and application behaviour. We have also created custom scripts to trigger alerts when query execution time reaches a particular threshold. These scripts also terminate queries based on predefined criteria.
This blog provides query optimisation techniques for designing good logic and extracting maximum performance from the database. Query logic impacts database server, application server, network, IO and end-user experience.
Query optimisation has many benefits, some of which are listed below.
- Minimize production issues: Every slow query requires high CPU, memory, and IOPS. We have seen most production database issues are caused by non-optimised queries.
- Performance issues: Slow query means slower response time of applications using the query, which results in poor end-user experience. It is very important to test logic/query with sufficient data before running it in production.
- Save infra cost: Unoptimised query requires more CPU, IOPS and memory. Additional load can be put on the same server if queries are optimised.
Basic Considerations for Logic Design
- Understand the strengths and limitations of the database systems
With clustering like Kubernetes it is relatively easy to scale application servers but to scale and load balance the database server it is customary to use a "source-replica" structure. This structure enables efficient write-loads at the source and read-loads at replicas.
Creating and maintaining replicas and scaling it up and down is expensive and not as feasible as in application server.
In summary, avoid embedding complex business logic in queries. Try to create light-weight SQL queries - avoiding complex joins and avoid long-running transactions. - Consideration for Data type
The datatype of any column is an important part of the table design and has a significant impact on performance.
For example, using Varchar(36) for storing fixed-length data (UUID) will incur a 20% query executing penalty when compared to using char(36) for fixed-length data. Similarly using char on variable column can increase size of data and size of the index.
Designing table with a basic understanding of datatype like varchar(1000) vs text can give performance benefits. An unnecessary large datatype increases data on the disk and reduces the performance of the indexes . - Index at the time of table design not after production issue:-
The index is a very critical part of table design. In Startups new features go live very frequently. It is very important to create indexes at the time of the creation of the table or before new logic goes into production.
It is also important not to create too many indexes on any table as it can slow down transactions.
Use Explain Plan for Query analysing
Explain Plan is a great utility of MySQL to check query behaviour. It works for select, insert, update and delete query. There are lot of insights we can get from Explain Plan. Below are few important points.
- Check index is available or not
- Index is used or not.
- How many rows are getting scanned for providing results.
- Determine ordering operation using filesort or else.
- Length of an index.
- Determine access type.
- Use SHOW WARNINGS after explain give some additional information.
In explain plan example below query executing plan by format=json
We can understand how the query gets executed at the database in the following points.
- Index is available in
“possible_keys”
. - Table doing full table scan as
“access_type”=”ALL”
- Index is not used from
show warnings
. - Filesort is used for
“ordering_operation”
. - Row scanning is high from
“rows_examined_per_scan”
.
Now we can optimise query by changing phone_no='12345'
and additional optimisation for order by filesort
a composite index on phone_no, user_name
can be added as shown in below explain plan.
Query optimisation techniques and Common mistakes
- Avoid Using Different Character Encoding:
It requires implicit conversion while making join on tables with different charset. Most of time optimiser does on perform proper implicit conversion of charset and not utilise index. We should use the same encoding across tables to utilise the better indexing feature of the DB.
Both tables below uses different CHARSETS
In the example below a.user_name = b.customer_name
compares with different encoding and hence it may not utilise indexes even-though index is present and datatype are same for user_name
and customer_name
.
It requires to alter table to fix charset or need to covert charset as show in below query
2. Understanding of comparing columns datatypes:
In the above tables, phone_no
is stored as VARCHAR
in the first table and mobile_no
is BIGINT
in the second table. When two columns with different datatypes get compared in a query it might not utilise the index.
As example in below:
It is required to alter one of the column's datatype so they match.
3. Understanding of column datatype comparing with values:
In queries, one should avoid comparing different datatype column and value. When a query compares different datatypes, it uses implicit datatype conversion.
For example in the query below, mobile_no
is a BIGINT
datatype but query comparing as CHAR
value.
For better utilisation of indexes, comparison should be done as the same data type as an integer to integer
or varchar to varchar
.
4. Avoid using Function-based clause in where condition:
When a query uses a function in the where or join clause on the column, it would not utilise the index.
One way to optimise query below
is by changing date(mail_sent)=date(now)
as following
The function-based index is available only certain database version like MySQL 8. But if this kind of use of the function is unavoidable in MySQL 5.7 we can create a computed column and create an index on that.
5. Avoid using DISTINCT and GROUP BY at the same time:
The query below has performance overhead when it uses Distinct and Group By together, GROUP BY itself makes rows distinct. Using GROUP BY as in the below query also has one drawback, it will give random values for those columns(b.customer_name) that do not have a group function.
There is no need to use DISTINCT
and GROUP BY
together. This query could be re-written as follows
6. Avoid using UNION and DISTINCT at the same time:
Union itself makes distinct records, so we need not use DISTINCT
with UNION
as shown in the query below.
7. Avoid selecting unnecessary columns:
Selecting unnecessary columns would be a waste of memory, CPU cycle and network. Every query should select only the required columns for better query performance.
As example query below is select all records
it is better to select column name instead of * or extra columns.
8. Avoid using Long-running transaction:
When a DML query is executed on a large data range, during the execution time the transaction may lock rows or may lock the full table.
During this lock time the database disallows concurrent transactions and other DML queries would be in a wait state, this similar to small downtime on the table access.
Breaking long running transactions into smaller transaction can be more optimised while making an UPDATE
or Delete
query.
In the above queries the 1st transaction is deleting huge data because of long date range and the 2nd query is not utilising index and responding slower and locking the table during execution time.
9. Avoid subquery where possible:
MySQL and many databases perform slower when there is a subquery in SQL statement query. Subquery creates temp tables to store data and sometimes it creates temp tables on the disk thereby slowing the query execution.
We should avoid such queries as much as possible as it is not scalable.
10. Avoid using Order by in the Subquery:
Using Order By
in subquery gives overhead as it unnecessarily file sort subquery data.
Such Order By makes sense when limiting rows is required.
11. Avoid using Subquery in IN and NOT IN:
MySQL and many databases perform slower when there is a subquery in SQL statement query. Subquery creates temp tables to store data and sometimes it creates temp tables on the disk thereby slowing the query execution.
In the query below NOT IN (select id from t2)
creates a temp table.
We can optimise the above query in two ways.
- We can optimise by selecting
id
in 1st query and use the result in 2nd query as above. - Or we can change the above subquery to LEFT JOIN as below
12. Avoid joining tables with OR condition:
Joining tables with OR condition may convert the query into a cross join like query and this query would take a long time and high resources to execute.
For example the query below written with OR
can be optimised by making changes to query as shown below by changing OR to UNION ALL
.
13. Avoid join with not equal condition:
When a query joins with NOT EQUAL operator it searches all rows and uses a full table scan. Joining tables with != is a bad query type join. We should rewrite the query to use different logic does not use != operator.
14. Understanding Transaction and Deadlock:
When a query uses different range indexes and executes simultaneously then it increases the chance of a deadlock.
In the example query below, columns sale_id
, item
and date
have indexes. When these types of queries are executed concurrently, the 1st query may take a lock on those rows that are required to complete the transaction for the 2nd query, and similarly the 2nd query can acquire a lock on those rows required for the 1st query to complete the transaction.
We should avoid these kinds of situations while making DML queries.
As example:
One way to optimise this transaction is to divide the query into two parts: get the primary key by 1st query and put the result in the 2nd query.
15. Avoid Full-Text Search:
When a query searches for keywords with a wildcard in the beginning it does not utilise an index.
It is recommended to search using like ‘keyword%’
instead of searching using like ‘%keyword%’
so as to utilise the indexes as shown below. For a full-text search, it is advisable to use the Full-Text Index.
Conclusion:
We have covered some of the important query optimisation techniques that we have learn over the years and is known to produce good results. There are various other possible optimisation options for queries like usage of where
instead of having
, use of pagination etc..,
If you diligently follow the query optimisation techniques detailed above, you could benefit from improved query performance, lesser production issues and save cost by minimising resources.
Join us
We are always looking out for top engineering talent across all roles for our tech team. If challenging problems that drive big impact enthral you, do reach out to us at careers.india@halodoc.com
About Halodoc
Halodoc is the number 1 all around Healthcare application in Indonesia. Our mission is to simplify and bring quality healthcare across Indonesia, from Sabang to Merauke. We connect 20,000+ doctors with patients in need through our Tele-consultation service. We partner with 3500+ pharmacies in 100+ cities to bring medicine to your doorstep. We've also partnered with Indonesia's largest lab provider to provide lab home services, and to top it off we have recently launched a premium appointment service that partners with 500+ hospitals that allow patients to book a doctor appointment inside our application. We are extremely fortunate to be trusted by our investors, such as the Bill & Melinda Gates Foundation, Singtel, UOB Ventures, Allianz, GoJek and many more. We recently closed our Series B round and In total have raised USD$100million for our mission. Our team works tirelessly to make sure that we create the best healthcare solution personalised for all of our patient's needs, and are continuously on a path to simplify healthcare for Indonesia.