Configuration¶

easy_sm uses JSON configuration files and environment variables to manage project settings.

Configuration File¶

Each project has a JSON configuration file named {app-name}.json in the project root.

Example Configuration¶

{
    "image_name": "my-ml-app",
    "aws_profile": "dev",
    "aws_region": "eu-west-1",
    "python_version": "3.13",
    "easy_sm_module_dir": "my-ml-app",
    "requirements_dir": "requirements.txt"
}

Configuration Fields¶

Field	Description	Example
`image_name`	Docker image name (used for ECR)	`my-ml-app`
`aws_profile`	AWS CLI profile name	`dev`, `prod`
`aws_region`	AWS region for SageMaker operations	`eu-west-1`, `us-east-1`
`python_version`	Python version for Docker image	`3.13`, `3.12`
`easy_sm_module_dir`	Directory containing `easy_sm_base/`	`my-ml-app`
`requirements_dir`	Path to requirements file	`requirements.txt`

Auto-Detection¶

Most commands auto-detect the configuration file:

# Automatically finds my-ml-app.json in current directory
easy_sm build
easy_sm train -n job-name -e ml.m5.large -i s3://... -o s3://...

You can override with the -a/--app-name flag:

easy_sm build -a my-ml-app

Multiple Config Files

If multiple *.json files exist in the current directory, easy_sm will fail. Either:

Remove extra JSON files
Use -a flag to specify which app to use

Environment Variables¶

SAGEMAKER_ROLE (Required)¶

The IAM role ARN for SageMaker operations:

export SAGEMAKER_ROLE=arn:aws:iam::123456789012:role/SageMakerRole

Persist Across Sessions

Add to ~/.bashrc or ~/.zshrc:

echo 'export SAGEMAKER_ROLE=arn:aws:iam::123456789012:role/SageMakerRole' >> ~/.bashrc
source ~/.bashrc

You can override with the -r/--iam-role-arn flag:

easy_sm train -r arn:aws:iam::123456789012:role/OtherRole ...

AWS Credentials¶

easy_sm uses the standard AWS credential chain:

Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
AWS CLI credentials file (~/.aws/credentials)
IAM role (when running on EC2/ECS)

The aws_profile in the config file specifies which profile to use from ~/.aws/credentials.

Project Structure¶

After running easy_sm init, your project structure looks like:

my-project/
├── my-app.json                      # Configuration
├── requirements.txt                 # Python dependencies
└── my-app/                          # Module directory
    └── easy_sm_base/                # Template directory
        ├── Dockerfile               # Customize if needed
        ├── training/
        │   ├── train                # Entry point (shell script)
        │   └── training.py          # Your training code
        ├── prediction/
        │   └── serve                # Your serving code
        ├── processing/              # Processing scripts
        └── local_test/
            └── test_dir/            # Test data for local runs
                ├── input/           # Input data
                │   └── data/
                │       └── training/
                └── model/           # Model output

Dockerfile Customization¶

The default Dockerfile is generated during easy_sm init. You can customize it for:

Installing system dependencies
Adding custom build steps
Configuring environment variables

Example customization:

FROM python:3.13

# Install system dependencies
RUN apt-get update && apt-get install -y \
    libgomp1 \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements and install
COPY requirements.txt /opt/program/requirements.txt
RUN pip install --no-cache-dir -r /opt/program/requirements.txt

# Copy code
COPY training /opt/program/training
COPY prediction /opt/program/prediction
COPY processing /opt/program/processing

# Set environment variables
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE

WORKDIR /opt/program

Maintain Entry Points

Keep the original entry point scripts (training/train, prediction/serve) or SageMaker won't work correctly.

Requirements File¶

The requirements.txt file lists Python dependencies:

scikit-learn>=1.3.0
pandas>=2.0.0
joblib>=1.3.0
numpy>=1.24.0

Pin Versions

Pin exact versions for reproducibility:

scikit-learn==1.3.2
pandas==2.1.4

Docker Tags¶

Control Docker image versions with the --docker-tag flag:

# Build with custom tag
easy_sm --docker-tag v1.0 build

# Use tagged image for training
easy_sm --docker-tag v1.0 local train

# Push tagged image
easy_sm --docker-tag v1.0 push

Default tag is latest.

Versioning Strategy

Use semantic versioning for production:

easy_sm --docker-tag v1.0.0 build
easy_sm --docker-tag v1.0.0 push
easy_sm --docker-tag v1.0.0 train -n prod-job-v1.0.0 ...

Multiple Environments¶

Manage multiple environments (dev, staging, prod) with separate config files:

my-project/
├── my-app-dev.json      # Dev environment
├── my-app-staging.json  # Staging environment
├── my-app-prod.json     # Production environment
└── my-app/
    └── easy_sm_base/

Use the -a flag to select environment:

# Dev
easy_sm build -a my-app-dev
easy_sm train -a my-app-dev -n dev-job ...

# Production
easy_sm build -a my-app-prod
easy_sm train -a my-app-prod -n prod-job ...

Configuration Validation¶

easy_sm validates configuration on each command:

App name: Alphanumeric, hyphens, underscores only (prevents path traversal)
Config file: Must exist and contain valid JSON
Required fields: All fields must be present
IAM role: Must be set via env var or -r flag

If validation fails, you'll see an error message:

Error: Configuration file 'my-app.json' not found
Error: Invalid app name: '../../../etc/passwd'
Error: SAGEMAKER_ROLE environment variable not set