DDP (Distributed Data Partitioning) allows you to distribute data across different physical nodes to improve the query performance of high data volumes, taking advantage of distinct nodes' resources.
DDP is an SQL-only sharding implementation that leverages PostgreSQL core functionalities:
PARTITION BY RANGE to create virtual shards that map to physical shard nodespostgres_fdw: Creates foreign data wrapper connections to remote shard nodes, allowing the coordinator to query data transparentlydblink: Used for management operations like checking shard connection status and creating distributed restore pointsNo external middleware or third-party extension is required beyond what PostgreSQL already provides.
DDP uses the coordinator as the entry point for all queries. The coordinator maintains foreign table definitions that map to tables on the shard nodes via postgres_fdw. When a query is executed, PostgreSQL’s query planner routes the query to the appropriate shard based on the partition definitions.
DDP introduces the concept of virtual shards. Virtual shards are range partitions on the coordinator that map to foreign tables on the shard nodes. This allows fine-grained control over data distribution:
Each shard is connected to the coordinator via postgres_fdw foreign servers. DDP provides SQL functions to manage these connections:
ddp_create_shard_connection(): Creates a new FDW server connection to a shardddp_change_shard_connection(): Modifies an existing shard connectionddp_drop_shard_connection(): Removes a shard connectionddp_get_shard_status_connection(): Checks shard connection statusddp_has_shard_connection(): Checks if a shard connection existsDDP provides functions to manage data distribution across shards:
ddp_create_vs(): Creates virtual shards with range partitioningddp_drop_vs(): Removes virtual shardsddp_add_vs_in_shard(): Adds virtual shards to worker nodes using dblinkddp_tables_distribution(): Reports table distribution informationCreate the SGShardedCluster resource:
apiVersion: stackgres.io/v1alpha1
kind: SGShardedCluster
metadata:
name: cluster
spec:
type: ddp
database: mydatabase
postgres:
version: '15'
coordinator:
instances: 2
pods:
persistentVolume:
size: '10Gi'
shards:
clusters: 4
instancesPerCluster: 2
pods:
persistentVolume:
size: '10Gi'
This configuration will create a coordinator with 2 Pods and 4 shards with 2 Pods each. The coordinator uses postgres_fdw to connect to the shard nodes and route queries.
DDP supports creating distributed restore points across all shards using two-phase commit (2PC). This allows consistent point-in-time recovery across the entire sharded cluster:
SELECT ddp_create_restore_point('my_restore_point');
| Feature | DDP | Citus |
|---|---|---|
| Implementation | SQL-only using PostgreSQL core features | PostgreSQL extension |
| Dependencies | None (uses postgres_fdw, dblink, partitioning) |
Citus extension |
| Query routing | PostgreSQL partition pruning and FDW | Citus distributed query engine |
| Data distribution | Range-based virtual shards | Hash-based distribution |
| Coordinator | Standard PostgreSQL with FDW | PostgreSQL with Citus extension |