Scaling Your Deployment
Learn how to scale your AI services to handle varying workloads
Scaling Your Deployment
This guide explains how to scale your Cubik deployments to handle various workload requirements.
Scaling Options
Cubik offers different scaling approaches depending on your deployment type:
Configuring Scaling
Serverless Configuration
- Navigate to your deployment:
/workspace/{workspace-slug}/deployments/{deployment-slug}
- Click on "Settings" tab
- Select "Scaling" from the submenu
- Configure:
- Minimum instances (0-N)
- Maximum instances
Dedicated Configuration
- Navigate to your deployment:
/workspace/{workspace-slug}/deployments/{deployment-slug}
- Click on "Settings" tab
- Select "Scaling" from the submenu
- Configure:
- Minimum replicas
- Maximum replicas
Scaling Best Practices
- Start Small: Begin with conservative scaling settings and adjust based on observed performance
- Monitor Scaling Events: Use the monitoring dashboard to track scaling activities
- Set Appropriate Minimums: Configure minimum instances for critical services to avoid cold starts
- Consider Cost vs. Performance: Balance scaling settings to manage costs while maintaining performance
- Test Load Scenarios: Validate scaling behavior under different load patterns
- Use Gradual Scaling: Configure step scaling for more predictable resource allocation
Cost Considerations
- Serverless deployments scale to zero when idle, minimizing costs
- Dedicated deployments maintain minimum replicas, providing consistent performance at a fixed base cost
- Monitor the scaling metrics to optimize cost-performance balance
Scaling Limits
- Default account limits: 1 maximum instances per deployment
- Request quota increases for higher scaling requirements
- Regional scaling limits may apply