Types of Database Index
A database index is a data structure used to speed up database queries. It is similar to the catalog of books and can help quickly locate a certain row or rows in a table. A database index usually consists of a set of index keys (or index fields) whose values are stored in a data structure for fast lookup of specific rows.
Index type
From how indexes are implemented, they can be classified as follows:
- B+ tree index
- hash index
- Spatial indexes (e.g. R-trees)
- Full text index
- Bitmap index
Each index type is suitable for different data types and query types, and the appropriate index type should be selected according to specific needs.
B+ Tree Index
B+ tree is a balanced tree, each node contains a certain number of keywords and pointers to child nodes to support fast query and sorting. B+ tree index is suitable for various data types and supports range query and sorting, so it is widely used in many database systems.
When the database executes a query, it first searches the B-tree index for the node containing the keyword required for the query, and then finds the data record containing the keyword through the pointer. If the query is a range query, the B-tree index can also query nodes containing keywords within the range and return all data records that meet the conditions.
Advantages
- Support fast query and sorting, improve the performance of the database.
- Supports various data types, can index numeric, string and date/time data.
- Can handle large amounts of data because it is a balanced tree.
Disadvantages
- When inserting or deleting data, the B+ tree may need to be rebalanced to maintain the nature of the balanced tree, so the insertion or deletion operation may be slower.
- When updating data, it is necessary to delete the original data record and create a new data record, so it may consume a lot of time and space.
Hash Index
Hash index is a hash table-based index that uses a hash function to map data to buckets in the hash table.
When the database executes a query, it uses a hash function to calculate the hash value of the keyword required for the query, and then looks for a data record with the same hash value in the hash table. If multiple data records are found with the same hash value, comparison operators must be used to determine the actual data record.
Advantages
- Very fast when querying data because buckets are directly accessible.
- It is especially effective for equivalent queries, because the required data can be directly located.
- No sorting, so no extra time and space consumption.
Disadvantages
- Range queries and sorting are not supported.
- Hash collisions can cause queries to be slower because the data records must be further checked for matches using comparison operators.
- Not available for all data types, only for numeric data.
- Insert, update and delete operations may require rehashing to maintain the hashtable nature.
Spatial Index
Spatial index is an index specially used to deal with spatial data, usually used in geographic information system (GIS) and spatial database. It is an index for quickly querying the position and geometric relationship of spatial data.
Spatial indexes typically use grid structures such as grids, trellis graphs, or quadtrees to divide spatial data into many small grid cells. Each grid cell stores one or more data points within it, and uses spatial algorithms to quickly query other grid cells to determine the geometric relationship of the data points.
Advantages
- Can quickly query the location and geometric relationship of spatial data.
- It is very effective for querying the range of spatial data, such as rectangular range, circular range, etc.
- Projections and views for fast querying of data.
Disadvantages
- Relatively complex and requires knowledge of many spatial algorithms.
- Not suitable for non-spatial data.
- When the amount of data is large, a large amount of memory may be required.
Full Text Index
A full-text index is an index for fast searching of text content. It can search a text field in a database, or search the contents of a file in a document management system. A full-text index usually performs word segmentation on the text content (decomposes the text into words), and uses a data structure (such as an inverted index) to store word-segmented information for fast query of search terms.
Advantages
- You can quickly search text content.
- You can search for vague words.
- Can handle multilingual text.
Disadvantages
- Not applicable for non-text data.
- May require significant computational and storage resources, especially when the text data is large.
- There may be misjudgment or misidentification.
Bitmap Index
A Bitmap index is a special data index that speeds up queries by storing data as a binary bitmap (Bitmap). Each bit represents whether a data record meets a certain condition. For example, if a bit is 1, it means that the record meets a certain condition, and if it is 0, it means that the record does not meet the condition.
Advantages
- Efficient. Bitmap index can quickly perform a large number of logical operations, thereby improving query efficiency.
- Space efficiency. Bitmap index is very compact when storing data because it only needs to store 0 and 1 binary bits.
Disadvantages
- Difficulty updating data. When data changes, the entire Bitmap index needs to be regenerated, which can be time-consuming.
- Applies only to certain types of data. Bitmap indexing is only suitable for data with a limited number of categories, such as city or gender, etc.
Therefore, Bitmap indexes are usually suitable for columns with fewer distinct values to speed up queries.
Other Categories
Unique Index
From the perspective of uniqueness, indexes can be divided into unique indexes and non-unique indexes. A unique index can be used to ensure the unique constraint of a column in a table. It is usually used to implement primary key constraints and unique constraints. The database usually automatically creates a unique index for the primary key or unique constraints.
Advantages and disadvantages
The unique index can not only improve the efficiency of query, but also play a role in data verification, avoiding the generation of duplicate data, and improving the quality and accuracy of data.
Its disadvantage is that whenever there is data modification or insertion, it needs to check whether the unique constraint is violated.
Clustered Index
From the perspective of clustering ratio, indexes can be classified as clustered or non-clustered. The index keys of the clustered index are arranged in the same order as the data in the data table. Each table can only have one clustered index, because the data in the data table can only be sorted in one way.
In some advanced database optimizations, the clustering rate is also calculated for non-clustered indexes, so that the optimizer can evaluate the disk IO cost when returning to the table. When an index with a small clustering rate is returned to the table, it will cause disk IO jitter, which will significantly affect query performance. Generally speaking, the larger the clustering rate, the lower the cost of disk IO.
There are two implementations of clustered index,
- One is that the clustered index and table data are stored together, and each leaf node in the index tree stores a complete table row, represented by MySQL's InnoDB engine and SQL Server.
- The second is that the data of the clustered index and the table are separated. You can first have the table, and then define the clustered index, represented by DB2, PostgreSQL, and Oracle.
Advantages and disadvantages
The advantage of a clustered index is that it can reduce the number of disk I/Os, thereby improving query performance.
However, a clustered index also has some disadvantages, such as when new data is inserted, it may require moving data pages or reorganizing the index, which may negatively affect performance.
Secondary Index
From the perspective of whether it is a primary key, indexes can be divided into primary key indexes and secondary indexes. There can only be one primary key index and multiple secondary indexes.
Composite Index
From the perspective of the number of index columns included, indexes can be divided into single-column indexes and composite indexes. Composite indexes can meet various query conditions, thereby saving index space and index maintenance costs. The matching principle of the composite index follows the leftmost prefix matching principle. For details, please refer to "How to Create Efficient Indexes".
Function Index
A function index (or expression index) is an index based on a function or an expression. It uses a function or expression to provide a calculated value as an index column to build an index, which can improve query performance without modifying the application.
Partial Index(Conditional Index)
A partial index is an index built on a subset of a table, and the subset is defined by a conditional expression, and the index includes only those rows in the table that satisfy the conditional expression.
Summarize
- B+ tree index: B+ tree index is the most commonly used index type. It has the nature of a balanced tree and can effectively maintain a large amount of data. It is suitable for most data types, especially numeric and string data, and supports interval queries. ; Due to the ordered nature of B-tree indexes, using B-tree indexes can save the sorting operations required in SQL queries.
- Hash index: A hash index is an index type based on a hash table, which is suitable for exact match queries and has high query efficiency, but does not support range queries and sorting operations.
- Spatial index: Spatial index is an index for spatial data, which is used to process the query and management of spatial data, such as R-tree.
- Full-text index: Full-text index is an index type for text data, mainly used for text search and ranking.
- Bitmap index: Bitmap index is a special index type that uses a bitmap to maintain the index and is suitable for queries that process a large amount of Boolean data.
Contact us
PawSQL: https://www.pawsql.com
Email: service@pawsql.com
Twitter: https://twitter.com/pawsql
Follow us on WeChat: