Scaling
DiscordRDA provides features for scaling your bot to millions of guilds across multiple servers.
Scaling Strategies
Vertical Scaling
Run larger instances:
# Single powerful server
bot = DiscordRDA::Bot.new(
token: ENV['DISCORD_TOKEN'],
shards: :auto,
cache: :redis,
enable_scalable_rest: true
)
Pros: Simple, no coordination needed Cons: Single point of failure, hardware limits
Horizontal Scaling
Run multiple smaller instances:
Server 1: Shards 0-7
Server 2: Shards 8-15
Server 3: Shards 16-23
Pros: Fault tolerant, unlimited scale Cons: More complex, needs coordination
Scalable REST Client
Enable for high-traffic bots:
bot = DiscordRDA::Bot.new(
token: ENV['DISCORD_TOKEN'],
enable_scalable_rest: true
)
Benefits:
- Request queuing - Orders requests optimally
- Burst handling - Manages rate limit resets
- Priority system - User-facing requests first
REST Proxy
Offload REST to dedicated servers:
bot = DiscordRDA::Bot.new(
token: ENV['DISCORD_TOKEN'],
enable_scalable_rest: true,
rest_proxy: {
url: 'http://rest-proxy.internal:8080',
headers: { 'Authorization' => 'proxy-token' }
}
)
Proxy server handles:
- Rate limiting
- Request queuing
- Caching
- Request deduplication
Distributed REST
Multiple REST workers:
# On each app server
bot = DiscordRDA::Bot.new(
token: ENV['DISCORD_TOKEN'],
enable_scalable_rest: true,
rest_config: {
workers: 10, # Concurrent workers
queue_size: 10000, # Max queue depth
timeout: 30 # Request timeout
}
)
Distributed Caching
Share cache across all instances:
# All shards use same Redis
bot = DiscordRDA::Bot.new(
token: ENV['DISCORD_TOKEN'],
cache: :redis,
redis_config: {
host: 'redis-cluster.internal',
port: 6379,
cluster: true # Redis Cluster mode
}
)
Cache Invalidation
# Invalidate across all instances
bot.cache.invalidate(:guild, guild_id)
# Automatically propagated to all shards
Message Bus
Coordinate between instances:
# Using Redis pub/sub
bus = DiscordRDA::EventBus.new(
adapter: :redis,
redis_config: { host: 'redis.internal' }
)
# Subscribe to events
bus.subscribe('broadcast') do |message|
puts "Received: #{message}"
end
# Publish events
bus.publish('broadcast', { type: 'reload', data: {} })
Session Management
Session Transfer
Move guilds between shards:
# Transfer guild from shard 5 to shard 10
bot.reshard_manager.transfer_guild(
guild_id: '123456789',
from_shard: 5,
to_shard: 10
)
Use cases:
- Load balancing
- Shard maintenance
- Regional optimization
Session Persistence
Maintain sessions across restarts:
bot = DiscordRDA::Bot.new(
token: ENV['DISCORD_TOKEN'],
shards: :auto,
session_store: :redis # Persist sessions
)
# On restart, sessions are restored
# Users don't see "bot typing" interruptions
Load Balancing
Gateway Load Balancing
Distribute shards evenly:
# Kubernetes deployment
# Each pod gets shard assignment via env
shard_id = ENV['SHARD_ID'].to_i
total_shards = ENV['TOTAL_SHARDS'].to_i
bot = DiscordRDA::Bot.new(
token: ENV['DISCORD_TOKEN'],
shards: [[shard_id, total_shards]],
cache: :redis
)
Health Checks
# Kubernetes liveness probe
get '/health' do
status = bot.status
if status[:connected] && status[:latency] < 500
200
else
503
end
end
# Readiness probe
get '/ready' do
if bot.status[:shards].all? { |s| s[:status] == :connected }
200
else
503
end
end
Kubernetes Deployment
Deployment Config
apiVersion: apps/v1
kind: Deployment
metadata:
name: discord-bot
spec:
replicas: 8 # 8 shards
template:
spec:
containers:
- name: bot
image: my-bot:latest
env:
- name: SHARD_ID
valueFrom:
fieldRef:
fieldPath: metadata.name # Derive from pod name
- name: TOTAL_SHARDS
value: "8"
- name: REDIS_HOST
value: "redis-service"
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
Service Config
apiVersion: v1
kind: Service
metadata:
name: bot-metrics
spec:
ports:
- port: 8080
name: metrics
selector:
app: discord-bot
Monitoring at Scale
Metrics Collection
# Prometheus metrics
require 'prometheus/client'
bot.on(:dispatch) do |event|
# Track events
EVENT_COUNTER.increment(labels: { type: event.type })
end
bot.on(:rate_limited) do |event|
# Track rate limits
RATELIMIT_COUNTER.increment(labels: { route: event.route })
end
# Expose metrics
get '/metrics' do
Prometheus::Client.registry.to_s
end
Centralized Logging
bot = DiscordRDA::Bot.new(
token: ENV['DISCORD_TOKEN'],
log_format: :json, # Structured logging
log_level: :info
)
# Ship to ELK/Fluentd
Tracing
# OpenTelemetry tracing
require 'opentelemetry'
bot.use(DiscordRDA::OpenTelemetryMiddleware)
# Traces include:
# - Gateway events
# - REST requests
# - Cache operations
# - Command execution
Database Scaling
Read Replicas
# Write to primary, read from replicas
DATABASE = {
write: PG.connect(host: 'db-primary'),
read: PG.connect(host: 'db-replica')
}
# Writes
DATABASE[:write].exec('INSERT ...')
# Reads
DATABASE[:read].exec('SELECT ...')
Connection Pooling
require 'connection_pool'
DB_POOL = ConnectionPool.new(size: 10, timeout: 5) do
PG.connect(host: 'db.internal')
end
# Use in handlers
DB_POOL.with do |conn|
conn.exec('SELECT * FROM users WHERE id = $1', [user_id])
end
Caching Layer
# Cache database queries
def get_user(user_id)
# Check cache first
if cached = bot.cache.get(:user_data, user_id)
return cached
end
# Fetch from DB
user = DB_POOL.with { |conn| conn.exec(...).first }
# Cache result
bot.cache.set(:user_data, user_id, user, ttl: 300)
user
end
Regional Deployment
Regional Gateways
Deploy shards close to users:
US-EAST: Shards 0-7 (for North America guilds)
EU-WEST: Shards 8-15 (for European guilds)
ASIA: Shards 16-23 (for Asian guilds)
Discord automatically routes guilds, but you can optimize:
# Preferred regions for shard
bot = DiscordRDA::Bot.new(
token: ENV['DISCORD_TOKEN'],
shards: [[0, 24]],
preferred_regions: ['us-east', 'us-central']
)
Disaster Recovery
Backup Strategy
# Regular state backups
Thread.new do
loop do
sleep 3600 # Every hour
backup = {
timestamp: Time.now,
guild_settings: GuildSettings.all.to_h,
user_data: UserData.recent.to_h
}
# Save to S3/GCS
S3.put_object(
bucket: 'bot-backups',
key: "backup-#{Time.now.to_i}.json",
body: JSON.dump(backup)
)
end
end
Failover
# Primary-Secondary setup
if primary_healthy?
bot.run
else
# Promote secondary
promote_to_primary!
bot.run
end
Performance Optimization
Async Processing
# Use Fibers for concurrent operations
bot.on(:message_create) do |event|
# Handle asynchronously
Fiber.new do
process_message(event.message)
end.resume
end
Lazy Loading
# Don't load until needed
def guild_config(guild_id)
@guild_configs ||= {}
@guild_configs[guild_id] ||= load_guild_config(guild_id)
end
Connection Reuse
# Keep-alive connections
HTTP_CLIENT = HTTP::Client.new(
keep_alive_timeout: 30
)
Cost Optimization
Right-Sizing
# Monitor resource usage
# Scale based on actual needs, not theoretical maximum
Spot Instances
# For non-critical shards
bot = DiscordRDA::Bot.new(
token: ENV['DISCORD_TOKEN'],
shards: [[0, 8]],
on_shutdown: :transfer # Transfer guilds to stable shards
)
Complete Scaling Architecture
┌─────────────────────────────────────────────────────────────┐
│ Load Balancer │
│ (CloudFlare/AWS ALB) │
└─────────────────────────────────────────────────────────────┘
│
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Bot Pod 1 │ │ Bot Pod 2 │ │ Bot Pod N │
│ Shards 0-3 │ │ Shards 4-7 │ │ Shards N+ │
│ │ │ │ │ │
│ ┌──────────┐ │ │ ┌──────────┐ │ │ ┌──────────┐ │
│ │Shard 0 │ │ │ │Shard 4 │ │ │ │Shard N │ │
│ │Shard 1 │ │ │ │Shard 5 │ │ │ │Shard N+1 │ │
│ │Shard 2 │ │ │ │Shard 6 │ │ │ │... │ │
│ │Shard 3 │ │ │ │Shard 7 │ │ │ │ │ │
│ └──────────┘ │ │ └──────────┘ │ │ └──────────┘ │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
└──────────────────────┼──────────────────────┘
▼
┌──────────────────┐
│ Redis Cluster │
│ (Shared Cache) │
└──────────────────┘
│
┌──────────────────┐
│ PostgreSQL │
│ (Primary+Replica)│
└──────────────────┘
Next Steps
- Learn Sharding - Foundation of scaling
- Rate Limiting - Critical for scale
- Caching - Reduce load