Cloud Deployment Guide¶
This guide covers deploying machine learning models to AWS SageMaker using easy_sm, including training jobs, model deployment, batch predictions, and processing jobs.
Overview¶
Easy_sm provides commands for deploying to AWS SageMaker with minimal configuration. All commands auto-detect your app configuration and IAM role from environment variables.
Prerequisites¶
- AWS CLI configured with credentials
- SageMaker execution role (see AWS Setup Guide)
- Docker image pushed to ECR (
easy_sm push) SAGEMAKER_ROLEenvironment variable set
Environment Setup¶
Before using cloud commands, set your SageMaker IAM role:
# Set once per session
export SAGEMAKER_ROLE=arn:aws:iam::123456789012:role/SageMakerRole
# Or add to ~/.bashrc for persistence
echo 'export SAGEMAKER_ROLE=arn:aws:iam::123456789012:role/SageMakerRole' >> ~/.bashrc
Note: You can override auto-detection with -r/--iam-role-arn flag on any command.
Cloud Training¶
Overview¶
The train command launches SageMaker training jobs using your Docker image from ECR.
Basic Usage¶
# Train on SageMaker
easy_sm train -n job-name -e ml.m5.large \
-i s3://bucket/input -o s3://bucket/output
Parameters¶
| Parameter | Flag | Required | Description |
|---|---|---|---|
| Job Name | -n, --base-job-name | Yes | Prefix for training job name |
| Instance Type | -e, --ec2-type | Yes | EC2 instance type (e.g., ml.m5.large) |
| Input Path | -i, --input-s3-dir | Yes | S3 path to training data |
| Output Path | -o, --output-s3-dir | Yes | S3 path for model artifacts |
| Instance Count | -c, --instance-count | No | Number of instances (default: 1) |
| App Name | -a, --app-name | No | Override auto-detected app name |
| IAM Role | -r, --iam-role-arn | No | Override SAGEMAKER_ROLE env var |
Multi-Instance Training¶
# Train on multiple instances for distributed training
easy_sm train -n my-training-job -e ml.m5.xlarge \
-i s3://my-bucket/training-data \
-o s3://my-bucket/model-output \
-c 2
Instance Type Selection¶
CPU Instances (general purpose): - ml.m5.large - 2 vCPU, 8 GB RAM - Good for small models - ml.m5.xlarge - 4 vCPU, 16 GB RAM - Medium workloads - ml.m5.2xlarge - 8 vCPU, 32 GB RAM - Larger datasets - ml.m5.4xlarge - 16 vCPU, 64 GB RAM - Heavy compute
GPU Instances (deep learning): - ml.p3.2xlarge - 1x V100 GPU, 8 vCPU, 61 GB RAM - ml.p3.8xlarge - 4x V100 GPU, 32 vCPU, 244 GB RAM - ml.g4dn.xlarge - 1x T4 GPU, 4 vCPU, 16 GB RAM - Cost-effective
Selection Guide: - Small tabular models → ml.m5.large - Medium ML models → ml.m5.xlarge - Large models/datasets → ml.m5.2xlarge or higher - Deep learning (CNNs, NLP) → ml.p3.2xlarge or ml.g4dn.xlarge
Training Output¶
The command outputs the S3 path to the trained model:
easy_sm train -n my-job -e ml.m5.large -i s3://bucket/in -o s3://bucket/out
# Output: s3://bucket/out/my-job/output/model.tar.gz
This output is designed to be piped to deployment commands (see Piped Workflows).
Training Job Management¶
# List recent training jobs
easy_sm list-training-jobs -m 10
# List job names only (pipe-friendly)
easy_sm list-training-jobs -n -m 5
# Get model artifacts from completed job
easy_sm get-model-artifacts -j my-training-job
Model Deployment¶
Easy_sm supports two deployment types: 1. Provisioned Endpoints - Dedicated instances, predictable performance 2. Serverless Endpoints - Auto-scaling, pay-per-invocation
Provisioned Endpoints¶
Overview¶
Provisioned endpoints run on dedicated EC2 instances with predictable performance and pricing.
Basic Usage¶
Parameters¶
| Parameter | Flag | Required | Description |
|---|---|---|---|
| Endpoint Name | -n, --endpoint-name | Yes | Unique name for endpoint |
| Instance Type | -e, --instance-type | Yes | EC2 instance type |
| Model Path | -m, --s3-model-location | Yes | S3 path to model.tar.gz |
| Instance Count | -c, --instance-count | No | Number of instances (default: 1) |
| App Name | -a, --app-name | No | Override auto-detected app name |
| IAM Role | -r, --iam-role-arn | No | Override SAGEMAKER_ROLE env var |
Example with Multiple Instances¶
# Deploy with 3 instances for high availability
easy_sm deploy -n prod-endpoint -e ml.m5.xlarge \
-m s3://bucket/models/model.tar.gz \
-c 3
Instance Type Selection¶
CPU Instances (cost-effective for most models): - ml.t2.medium - 2 vCPU, 4 GB RAM - Minimal/testing - ml.m5.large - 2 vCPU, 8 GB RAM - Small models - ml.m5.xlarge - 4 vCPU, 16 GB RAM - Medium models - ml.c5.xlarge - 4 vCPU, 8 GB RAM - Compute-optimized
GPU Instances (deep learning inference): - ml.g4dn.xlarge - 1x T4 GPU, 4 vCPU, 16 GB RAM - ml.p3.2xlarge - 1x V100 GPU, 8 vCPU, 61 GB RAM
Selection Guide: - Development/testing → ml.t2.medium - Production (small models) → ml.m5.large - Production (medium models) → ml.m5.xlarge - High throughput → ml.c5.xlarge or multiple instances - Deep learning inference → ml.g4dn.xlarge
When to Use Provisioned Endpoints¶
- Predictable traffic patterns - Steady request rate
- Low latency requirements - <100ms response time
- High throughput - Thousands of requests per minute
- Long-running inference - Complex models with longer execution time
Serverless Endpoints¶
Overview¶
Serverless endpoints auto-scale based on traffic and charge per invocation. Ideal for variable workloads.
Basic Usage¶
Parameters¶
| Parameter | Flag | Required | Description |
|---|---|---|---|
| Endpoint Name | -n, --endpoint-name | Yes | Unique name for endpoint |
| Memory Size | -s, --memory-size-in-mb | Yes | Memory in MB (1024, 2048, 3072, 4096, 5120, 6144) |
| Model Path | -m, --s3-model-location | Yes | S3 path to model.tar.gz |
| Max Concurrency | -mc, --max-concurrency | No | Max concurrent invocations (default: 5) |
| App Name | -a, --app-name | No | Override auto-detected app name |
| IAM Role | -r, --iam-role-arn | No | Override SAGEMAKER_ROLE env var |
Memory Size Selection¶
Available memory sizes: 1024, 2048, 3072, 4096, 5120, 6144 MB
Selection Guide: - Small models (<100MB) → 2048 MB - Medium models (100-500MB) → 4096 MB - Large models (>500MB) → 6144 MB
Example with Concurrency Control¶
# Deploy with higher concurrency for variable traffic
easy_sm deploy-serverless -n api-endpoint -s 4096 \
-m s3://bucket/models/model.tar.gz \
-c 100
When to Use Serverless Endpoints¶
- Variable traffic - Sporadic or unpredictable requests
- Low request volume - <1000 requests per day
- Cost optimization - Pay only for invocations
- Development/testing - No idle costs
- Cold start tolerance - Can accept 1-3 second startup delay
Serverless Limitations¶
- Cold start latency - First request after idle period takes 1-3 seconds
- Max invocation time - 60 seconds per request
- Memory limits - Max 6144 MB
- No GPU support - CPU inference only
Provisioned vs Serverless Comparison¶
| Factor | Provisioned | Serverless |
|---|---|---|
| Cost | Fixed hourly rate | Pay per invocation |
| Latency | Consistent (<100ms) | Variable (cold starts) |
| Scaling | Manual (set instances) | Automatic |
| Idle Cost | Continuous | None |
| Best For | Steady traffic | Variable traffic |
| GPU Support | Yes | No |
| Max Memory | Instance-dependent | 6144 MB |
Deployment Output¶
Both commands output the endpoint name:
Endpoint Management¶
List Endpoints¶
# List all endpoints
easy_sm list-endpoints
# Output format:
# EndpointName: my-endpoint
# Status: InService
# CreationTime: 2025-01-15 10:30:00
# ───────────────────────────────────
Delete Endpoint¶
# Delete endpoint (keeps endpoint configuration)
easy_sm delete-endpoint -n endpoint-name
# Delete endpoint and configuration
easy_sm delete-endpoint -n endpoint-name --delete-config
Making Predictions¶
Once deployed, invoke endpoints using AWS SDK or CLI:
Python (boto3):
import boto3
client = boto3.client('sagemaker-runtime')
response = client.invoke_endpoint(
EndpointName='my-endpoint',
ContentType='text/csv',
Body='1.0,2.0,3.0,4.0,5.0'
)
prediction = response['Body'].read()
print(prediction)
AWS CLI:
aws sagemaker-runtime invoke-endpoint \
--endpoint-name my-endpoint \
--content-type text/csv \
--body '1.0,2.0,3.0,4.0,5.0' \
output.txt
Batch Transform¶
Overview¶
Batch transform runs batch predictions on large datasets without deploying an endpoint.
Basic Usage¶
easy_sm batch-transform -e ml.m5.large --num-instances 1 \
-m s3://bucket/model.tar.gz \
-i s3://bucket/input-data \
-o s3://bucket/predictions
Parameters¶
| Parameter | Flag | Required | Description |
|---|---|---|---|
| Instance Type | -e, --ec2-type | Yes | EC2 instance type |
| Instance Count | --num-instances | Yes | Number of instances |
| Model Path | -m, --s3-model-location | Yes | S3 path to model.tar.gz |
| Input Path | -i, --s3-input-location | Yes | S3 path to input data |
| Output Path | -o, --s3-output-location | Yes | S3 path for predictions |
| Wait | -w, --wait | No | Wait until job completes (default: false) |
| Job Name | -n, --job-name | No | Custom job name (auto-generated if not provided) |
| App Name | -a, --app-name | No | Override auto-detected app name |
| IAM Role | -r, --iam-role-arn | No | Override SAGEMAKER_ROLE env var |
Example with Multiple Instances¶
# Process large dataset with parallel instances
easy_sm batch-transform -e ml.m5.xlarge --num-instances 5 \
-m s3://bucket/models/model.tar.gz \
-i s3://bucket/batch-data \
-o s3://bucket/batch-predictions
When to Use Batch Transform¶
- Large datasets - Millions of predictions
- Periodic predictions - Daily/weekly batch jobs
- No real-time requirements - Can wait minutes/hours
- Cost optimization - No idle endpoint costs
- High throughput - Parallel processing across instances
Processing Jobs¶
Overview¶
Processing jobs run data transformation and feature engineering on SageMaker.
Basic Usage¶
Parameters¶
| Parameter | Flag | Required | Description |
|---|---|---|---|
| Script File | -f, --processing-file | Yes | Python script in processing/ directory |
| Instance Type | -e, --ec2-type | Yes | EC2 instance type |
| Job Name | -n, --job-name | Yes | Unique name for processing job |
| Input Path | -i, --s3-input-location | No | S3 path to input data |
| Output Path | -o, --s3-output-location | No | S3 path for output |
| App Name | -a, --app-name | No | Override auto-detected app name |
| IAM Role | -r, --iam-role-arn | No | Override SAGEMAKER_ROLE env var |
Processing Script Structure¶
Your processing script should be in app-name/easy_sm_base/processing/:
import pandas as pd
import os
def process():
"""
Processing function for data transformation.
"""
# Input/output paths in SageMaker container
input_path = '/opt/ml/processing/input'
output_path = '/opt/ml/processing/output'
# Load data
df = pd.read_csv(os.path.join(input_path, 'data.csv'))
# Transform data
df_processed = transform_features(df)
# Save output
df_processed.to_csv(
os.path.join(output_path, 'processed.csv'),
index=False
)
if __name__ == '__main__':
process()
Example with Input/Output¶
# Run processing with S3 input/output
easy_sm process -f preprocess.py -e ml.m5.large -n preprocessing-job \
-i s3://bucket/raw-data \
-o s3://bucket/processed-data
When to Use Processing Jobs¶
- Data preprocessing - Feature engineering before training
- ETL workflows - Extract, transform, load operations
- Data validation - Quality checks and profiling
- Post-processing - Transform model outputs
Data Upload¶
Upload Data to S3¶
This uploads files to S3 for use in training or processing jobs.
Complete Cloud Workflow¶
End-to-End Example¶
# 1. Set environment
export SAGEMAKER_ROLE=arn:aws:iam::123456789012:role/SageMakerRole
# 2. Build and push Docker image
easy_sm build
easy_sm push
# 3. Upload training data
easy_sm upload-data -i ./data -t s3://my-bucket/training-data
# 4. Train model
easy_sm train -n training-job-001 -e ml.m5.xlarge \
-i s3://my-bucket/training-data \
-o s3://my-bucket/models
# 5. Get model path
MODEL_PATH=$(easy_sm get-model-artifacts -j training-job-001)
# 6. Deploy to endpoint
easy_sm deploy -n prod-endpoint -e ml.m5.large -m $MODEL_PATH
# 7. Test endpoint
aws sagemaker-runtime invoke-endpoint \
--endpoint-name prod-endpoint \
--content-type text/csv \
--body '1.0,2.0,3.0' \
output.txt
Monitoring and Troubleshooting¶
CloudWatch Logs¶
Training and endpoint logs are in CloudWatch:
Training Jobs:
Endpoints:
View logs via AWS Console or CLI:
# Get training job logs
aws logs tail /aws/sagemaker/TrainingJobs/my-training-job --follow
# Get endpoint logs
aws logs tail /aws/sagemaker/Endpoints/my-endpoint --follow
Common Issues¶
Issue: Training job fails immediately
Check CloudWatch logs for errors. Common causes: - Missing dependencies in requirements.txt - Incorrect S3 paths - IAM role lacks S3 permissions - Code errors in training script
Issue: Endpoint creation fails
- Model artifacts missing or corrupted
- Serving code has errors
- Insufficient IAM permissions
- Instance type not available in region
# Check endpoint status
aws sagemaker describe-endpoint --endpoint-name my-endpoint
# View endpoint logs
aws logs tail /aws/sagemaker/Endpoints/my-endpoint --follow
Issue: Endpoint returns errors
- Input format doesn't match expected format
- Model file not found in container
- Serving code exception
# Test endpoint with verbose output
aws sagemaker-runtime invoke-endpoint \
--endpoint-name my-endpoint \
--content-type text/csv \
--body '1.0,2.0,3.0' \
--debug \
output.txt
Issue: High latency on serverless endpoint
- Cold start delay (normal for first request after idle)
- Increase memory size for faster initialization
- Consider provisioned endpoint for consistent latency
Issue: "AlgorithmError" during training
- Check training code for exceptions
- Verify data format matches expected input
- Check CloudWatch logs for stack traces
Cost Monitoring¶
View SageMaker costs: - AWS Console → Cost Explorer → Filter by Service → SageMaker - Set up cost alerts in CloudWatch
Cost optimization tips: 1. Serverless for low traffic - No idle costs 2. Right-size instances - Don't over-provision 3. Delete unused endpoints - Provisioned endpoints charge 24/7 4. Use batch transform - More cost-effective than endpoints for batch workloads 5. Multi-instance training - Faster training reduces billable time
Performance Optimization¶
Training: - Use larger instances for faster training - Use distributed training (-c flag) for large datasets - Use GPU instances (ml.p3.x) for deep learning
Endpoints: - Add more instances for higher throughput - Use auto-scaling for variable traffic - Choose CPU vs GPU based on model type
Batch Transform: - Increase instance count for parallel processing - Use larger instances for faster per-record inference
Next Steps¶
- Piped Workflows: See Piped Workflows Guide for command composition
- AWS Setup: See AWS Setup Guide for IAM and S3 configuration
- Local Testing: See Local Development Guide for testing before cloud deployment