StackGres Docs > Administration Manual > Sharded Cluster > DDP sharding technology

DDP sharding technology

Overview

DDP (Distributed Data Partitioning) allows you to distribute data across different physical nodes to improve the query performance of high data volumes, taking advantage of distinct nodes' resources.

DDP is an SQL-only sharding implementation that leverages PostgreSQL core functionalities:

Partitioning: Uses PARTITION BY RANGE to create virtual shards that map to physical shard nodes
postgres_fdw: Creates foreign data wrapper connections to remote shard nodes, allowing the coordinator to query data transparently
dblink: Used for management operations like checking shard connection status and creating distributed restore points

No external middleware or third-party extension is required beyond what PostgreSQL already provides.

How DDP Works

DDP uses the coordinator as the entry point for all queries. The coordinator maintains foreign table definitions that map to tables on the shard nodes via postgres_fdw. When a query is executed, PostgreSQL’s query planner routes the query to the appropriate shard based on the partition definitions.

Virtual Shards

DDP introduces the concept of virtual shards. Virtual shards are range partitions on the coordinator that map to foreign tables on the shard nodes. This allows fine-grained control over data distribution:

Multiple virtual shards can exist on a single physical shard
Virtual shards can be moved between physical shards for rebalancing

Shard Connections

Each shard is connected to the coordinator via postgres_fdw foreign servers. DDP provides SQL functions to manage these connections:

ddp_create_shard_connection(): Creates a new FDW server connection to a shard
ddp_change_shard_connection(): Modifies an existing shard connection
ddp_drop_shard_connection(): Removes a shard connection
ddp_get_shard_status_connection(): Checks shard connection status
ddp_has_shard_connection(): Checks if a shard connection exists

Data Distribution

DDP provides functions to manage data distribution across shards:

ddp_create_vs(): Creates virtual shards with range partitioning
ddp_drop_vs(): Removes virtual shards
ddp_add_vs_in_shard(): Adds virtual shards to worker nodes using dblink
ddp_tables_distribution(): Reports table distribution information

Creating a basic DDP Sharded Cluster

Create the SGShardedCluster resource:

apiVersion: stackgres.io/v1alpha1
kind: SGShardedCluster
metadata:
  name: cluster
spec:
  type: ddp
  database: mydatabase
  postgres:
    version: '15'
  coordinator:
    instances: 2
    pods:
      persistentVolume:
        size: '10Gi'
  shards:
    clusters: 4
    instancesPerCluster: 2
    pods:
      persistentVolume:
        size: '10Gi'

This configuration will create a coordinator with 2 Pods and 4 shards with 2 Pods each. The coordinator uses postgres_fdw to connect to the shard nodes and route queries.

Distributed Restore Points

DDP supports creating distributed restore points across all shards using two-phase commit (2PC). This allows consistent point-in-time recovery across the entire sharded cluster:

SELECT ddp_create_restore_point('my_restore_point');

Key Differences from Citus

Feature	DDP	Citus
Implementation	SQL-only using PostgreSQL core features	PostgreSQL extension
Dependencies	None (uses `postgres_fdw`, `dblink`, partitioning)	Citus extension
Query routing	PostgreSQL partition pruning and FDW	Citus distributed query engine
Data distribution	Range-based virtual shards	Hash-based distribution
Coordinator	Standard PostgreSQL with FDW	PostgreSQL with Citus extension