Scaling Your Deployment

Learn how to scale your AI services to handle varying workloads

Scaling Your Deployment

This guide explains how to scale your Cubik deployments to handle various workload requirements.

Scaling Options

Cubik offers different scaling approaches depending on your deployment type:

Configuring Scaling

Serverless Configuration

  1. Navigate to your deployment: /workspace/{workspace-slug}/deployments/{deployment-slug}
  2. Click on "Settings" tab
  3. Select "Scaling" from the submenu
  4. Configure:
    • Minimum instances (0-N)
    • Maximum instances

Dedicated Configuration

  1. Navigate to your deployment: /workspace/{workspace-slug}/deployments/{deployment-slug}
  2. Click on "Settings" tab
  3. Select "Scaling" from the submenu
  4. Configure:
    • Minimum replicas
    • Maximum replicas

Scaling Best Practices

  • Start Small: Begin with conservative scaling settings and adjust based on observed performance
  • Monitor Scaling Events: Use the monitoring dashboard to track scaling activities
  • Set Appropriate Minimums: Configure minimum instances for critical services to avoid cold starts
  • Consider Cost vs. Performance: Balance scaling settings to manage costs while maintaining performance
  • Test Load Scenarios: Validate scaling behavior under different load patterns
  • Use Gradual Scaling: Configure step scaling for more predictable resource allocation

Cost Considerations

  • Serverless deployments scale to zero when idle, minimizing costs
  • Dedicated deployments maintain minimum replicas, providing consistent performance at a fixed base cost
  • Monitor the scaling metrics to optimize cost-performance balance

Scaling Limits

  • Default account limits: 1 maximum instances per deployment
  • Request quota increases for higher scaling requirements
  • Regional scaling limits may apply