Hotspot issue happens in indexed, horizontally scalable, distributed data storage systems. In GCP, there are quite a few services that can suffer from this issue. In the article, I will discuss why it happened and how to avoid it when we design a database/storage system.
In Google Cloud Platform (GCP), Cloud Spanner is a fully managed, scalable, and highly available relational database service. “Hotspotting” in the context of Cloud Spanner refers to a performance issue that occurs when a disproportionate(When too many requests are sent to the same server) amount of read or write traffic is directed to a small subset of nodes or data partitions, causing uneven load distribution and potentially degrading the overall performance of the database.Hotspot issue happens in indexed, horizontally scalable, distributed data storage systems. In GCP, there are quite a few services that can suffer from this issue. In the article, I will discuss why it happened and how to avoid it when we design a database/storage system.
Reasons and solutions for Hotspotting in Google Cloud Storage
1. Sequential Keys: – How to Design Primary Keys
Problem: Using monotonically increasing primary keys (like auto-incrementing IDs or timestamps) causes new data to be written to the same node/servers or partitions.
Solution: Use composite primary keys, random IDs (like UUIDs), or hash-based keys to distribute writes across different nodes.
--Combine Fields: Use more than one column to create a unique key. For example, combine a user ID with a timestamp.
PRIMARY KEY (user_id, timestamp)
-- Random IDs: Use random IDs like UUIDs to spread the data out.
PRIMARY KEY (uuid)
Avoid Sequential Numbers: Don’t use numbers that keep increasing (like 1, 2, 3) for your keys because they create hotspots.
2. Index Design – How to Design Indexes
Indexes can also cause hotspotting if not designed properly:
Avoid Hot Indexes
Distribute Index Writes: Design indexes to distribute writes. For example, avoid using monotonically increasing values as the leading column in your indexes.
Composite Indexes: Use more than one column for indexes to spread the load.
CREATE INDEX idx_user_timestamp ON my_table(user_id, timestamp);
3. Data Sharding/spliting
Shard your data manually or through schema design to distribute load across multiple nodes.
Horizontal Sharding/splitting
- Split/sharding Data: Divide your data into smaller pieces. For example, you can partition by user ID.
PRIMARY KEY (shard_id, user_id)
4. Query Design
Optimize your queries to avoid hotspotting:
Use Range Queries
- Spread Range Queries:: When querying data, use range queries that span multiple partitions.
SELECT * FROM my_table WHERE timestamp BETWEEN '2024-06-01' AND '2024-06-10';
5. Configuring Spanner
Leverage Cloud Spanner configurations:
Splitting and Partitioning
- Pre-split Data: Pre-split data into multiple ranges to distribute load.
- Manual Split: You can manually split a table to ensure better distribution
gcloud spanner databases execute-sql your-database-id --sql="ALTER TABLE your-table-name SPLIT AT (value);"
How To Identifying Hot Spots
To identify hot spots, you can monitor the following metrics in Cloud Spanner:
- High Latency: Increased read or write latencies can indicate hot spots.
- CPU Utilization: Uneven CPU utilization across nodes might point to hot spots.
- RPC Metrics: Metrics such as
reads
,writes
, andcommits
can help identify patterns of hot spotting.
Practical Example
Suppose you have a table with user transactions, and you currently use an auto-incrementing integer as the primary key. This could lead to hot spotting. Here’s how you can mitigate it:
Current Schema – The following schema causes hot spotting.
-- following schema desing cause the hotspotting
CREATE TABLE Transactions (
TransactionID INT64 NOT NULL,
UserID INT64 NOT NULL,
Timestamp TIMESTAMP NOT NULL,
Amount FLOAT64
) PRIMARY KEY (TransactionID);
1. Improved Schema – following are different way’s to improve the schema to avoid hot spotting.
- Add a random shard/split ID to distribute load:
GenerateShardID
/splitid using a random number generator or hash function.
Using theFARM_FINGERPRINT
function, we’ll insert data into theTransactions
table with a generatedShardID
.
CREATE TABLE Transactions (
ShardID INT64 NOT NULL,
TransactionID INT64 NOT NULL,
UserID INT64 NOT NULL,
Timestamp TIMESTAMP NOT NULL,
Amount FLOAT64
) PRIMARY KEY (ShardID, TransactionID);
INSERT INTO Transactions (ShardID, TransactionID, UserID, Timestamp, Amount)
VALUES (MOD(FARM_FINGERPRINT(CAST(123 AS STRING)), 10), 1, 123, '2024-06-13T12:34:56Z', 100.0);
Google cloud sql example
CREATE TABLE UserAccessLog (
ShardId INT64 NOT NULL
AS (MOD(FARM_FINGERPRINT(CAST(LastAccess AS STRING)), 2048)) STORED,
LastAccess TIMESTAMP NOT NULL,
UserId INT64 NOT NULL,
) PRIMARY KEY (ShardId, LastAccess, UserId);
2. Improved Schema with Hashing
- Hash the user ID to create a more even distribution:
- Use a function to hash
UserID
before inserting it into the table.
CREATE TABLE Transactions (
HashedUserID STRING(64) NOT NULL,
TransactionID INT64 NOT NULL,
UserID INT64 NOT NULL,
Timestamp TIMESTAMP NOT NULL,
Amount FLOAT64
) PRIMARY KEY (HashedUserID, TransactionID);
Conclusion
By understanding the causes of hot spotting and implementing strategies to distribute load more evenly across your database, you can significantly improve the performance and scalability of your Cloud Spanner instances. Regular monitoring and proactive schema design are key to avoiding hot spots and ensuring efficient operation.