- Posted by Gavin Soorma
- On June 6, 2019
- 0 Comments
- 12c, 18c, 19c, catalog, chunk, GDSCTL, GSM, new feature, sharding
Oracle Sharding is an architecture in which data is horizontally partitioned across a number of independent physical databases called shards. Think of it as one giant database partitioned into many small databases located on different servers – similar to the concept of one giant table being divided into a number of smaller partitions. Unlike the case of Oracle Partititioing where all the partitions of table are located in the same database, in Oracle Sharding the partitions of the same table are located on different databases.
All the shards together make up a single logical database, which is referred to as a sharded database or SDB.
Horizontal partitioning involves splitting a database table across the shard databases so that each shard contains the same table with the same set of columns but a different subset of rows. A table split up or partitioned in this manner is also known as a sharded table.
As far as the application is concerned a shared database looks like a single database and the number of shards and distribution of data across those shards are completely transparent to the application.
Sharding provides advantages like global distribution of data where shards are located in different geographical regions and each such sharded database has data relevant and distinct to the geographical region the shard is located in. It also provides the linear scalability of workloads, data and users as well as fault isolation where the failure of a shard will be transparent to other shards located in maybe different data centres as well as maybe different countries.
However, it should be kept in mind that applications that use sharding must have a well-defined data model and data distribution strategy and also primarily access data using a sharding key. Examples of a sharding key could be the CUSTOMER_ID or ORDER_ID columns in a shared table and this is mainly suited for OLTP applications.
In addition to the shard databases, we also have a Shard Catalog database which provides the centralized management of the shard database topology as well as performs tasks like automated shard deployment and co-ordinating multi or cross-shard queries.
The Shard Director is Global Service Manager (GSM) type network listener which provides high performance routing of application connections based on the sharding key.
We typically using the GDSCTL command line utility to manage the shard catalog as well as entire shard environment.
Data in the sharded database (SDB) is typically accessed via a global service which is defined via GDSCTL
Distribution of partitions across shards is achieved by creating partitions in tablespaces that reside on different shards. Each partition of a sharded table is stored in a separate tablespace on a separate shard based on the sharding or partition key.
A tablespace is a logical unit of data distribution in an SDB. Sharded table partitions are stored in different tablespaces.
A sharded table family is a set of tables that are sharded in the same manner and are typically tables linked by a parent-child relationship.
A chunk is a set of tablespaces that store corresponding partitions of all tables in a sharded table family. So for example take the case of a shared table family consisting of the CUSTOMERS, ORDERS and LINEITEMS. A single chunk (typically a single tablespace) will contain the relevant partitions of all the 3 tables and they will be located in the same tablespace. Having the corresponding partitions of related tables are always stored in the same shard helps minimize the number of multi-shard joins and improves shard performance.
In Oracle Sharding, tablespaces are created and managed as a unit called a tablespace set. A tablespace set consists of multiple tablespaces distributed across shards and all tablespaces in a tablespace set would have the same properties.
Sharding introduces some changes to the DDL statements like CREATE SHARDED TABLE, CREATE DUPLICATED TABLE, CREATE TABLESPACE SET and such DDL statements with this syntax can only be executed against a sharded database. We need to use the command ALTER SESSION ENABLE SHARD DDL.