Scaled Jobs

This guide covers configuring the pgt-scaledjob Helm chart for deploying event-driven jobs to Kubernetes. The chart creates KEDA ScaledJobarrow-up-right resources that automatically scale job executions based on external event sources such as message queues, databases, or custom metrics.

When to Use ScaledJobs

ScaledJobs are ideal for:

  • Queue processing - Process messages from SQS, RabbitMQ, Kafka, Azure Service Bus, etc.

  • Event-driven workloads - Scale based on events from external systems

  • Batch processing - Process items in batches with automatic scaling

  • Background tasks - Execute tasks triggered by external events

For time-based scheduling, use CronJobs instead.

Prerequisites

Add the chart as a dependency in your Chart.yaml:

apiVersion: v2
name: my-scaledjob
version: 0.0.1
dependencies:
  - name: pgt-scaledjob
    version: 0.0.3
    repository: oci://public.ecr.aws/w9m9e0e9/pgt-helm-charts

After adding the dependency, run:

ℹ️ KEDA Required

ScaledJobs require KEDAarrow-up-right to be installed in your cluster. KEDA monitors your event sources and automatically creates Jobs when events are detected.


Basic Configuration

Required Fields

⚠️ Memory Requests and Limits

Always set memory requests equal to memory limits. This ensures your pod receives a Guaranteed Quality of Service (QoS) class, which provides predictable scheduling and OOM kill priority.

Development Environment & Cost Optimization

When deploying ScaledJobs to non-production environments, consider these settings to reduce costs:

💡 Spot Instances

The preferSpotInstances: true setting prefers scheduling your workloads on spot instances when available, which can significantly reduce compute costs. If you're interested in enabling spot instances for your environment, please reach out to the platform team to discuss your requirements and ensure your applications are suitable for spot instance usage.

💰 Cost-Saving Tips for ScaledJobs

  • Scale to zero: Set minReplicaCount: 0 to ensure no jobs run when there are no events to process

  • Limit max replicas: Use lower maxReplicaCount in non-production to prevent runaway scaling

  • Increase polling interval: A longer pollingInterval reduces API calls to your event source

  • Right-size resources: Development jobs often need fewer resources than production


KEDA Triggers

Triggers define what events cause KEDA to create new Jobs. KEDA supports many trigger types including message queues, databases, HTTP endpoints, and custom metrics.

AWS SQS Queue

Scale based on messages in an AWS SQS queue:

Azure Service Bus Queue

Scale based on messages in an Azure Service Bus queue:

RabbitMQ Queue

Scale based on messages in a RabbitMQ queue:

Kafka Topic

Scale based on consumer lag in a Kafka topic:

PostgreSQL Query

Scale based on a PostgreSQL query result:

💡 More Triggers

KEDA supports 50+ trigger types. See the KEDA Scalers documentationarrow-up-right for the full list and configuration options.


Trigger Authentication

When triggers need to authenticate with external services, use TriggerAuthentication to provide credentials securely.

💡 Creating Cloud Identities

Before configuring trigger authentication, you need to create IAM roles or managed identities. See:

AWS with Pod Identity (IRSA)

For AWS services using IAM Roles for Service Accounts:

Azure with Workload Identity

For Azure services using Workload Identity:

Credentials from Secrets

For services requiring username/password or connection strings:


Scaling Configuration

Polling Interval

How often KEDA checks the trigger source for events:

Replica Limits

Control the minimum and maximum number of concurrent jobs:

Scaling Strategy

Control how KEDA creates jobs based on events:

Strategy
Behaviour

default

KEDA manages scaling as events are detected

accurate

Creates exactly the number of jobs matching events in queue

eager

Creates maxReplicaCount jobs immediately when events are detected


Job Configuration

Job Execution Settings

Job History

Configure how many completed jobs to retain:


Container Configuration

Command and Arguments

Override the container's default entrypoint and arguments:


Environment Variables

Direct Environment Variables

Load from ConfigMap or Secret


External Secrets

The pgt-scaledjob chart includes pgt-secrets as a subchart for fetching secrets from AWS Secrets Manager or Azure Key Vault. For full configuration options, see the PGT Secrets documentation.


Volume Mounts

Mount ConfigMaps or Secrets as files:


Pod Configuration

Labels and Annotations

Tolerations

For scheduling on specific nodes:


Prometheus PodMonitor

Enable metrics scraping for ScaledJobs that expose metrics during execution:


Complete Examples

AWS SQS Queue Processor

Process messages from an SQS queue with IRSA authentication:

Azure Service Bus Queue Processor

Process messages from an Azure Service Bus queue with Workload Identity:

RabbitMQ Queue Processor

Process messages from a RabbitMQ queue:


Troubleshooting

Use Argo CD to investigate issues with ScaledJobs.

Viewing ScaledJob Status

  1. Navigate to your application in the Argo CD UI

  2. Locate the ScaledJob resource in the application tree

  3. Click on the ScaledJob to view its details including:

    • Current replica count

    • Trigger status

    • Last scale time

Viewing Job Executions

  1. In the Argo CD application tree, look for Job resources created by the ScaledJob

  2. Click on a Job to see its status and completion time

  3. Expand the Job to see its Pods

Checking Pod Logs

  1. In the application tree, find the Pod created by a Job

  2. Click on the Pod resource

  3. Select the Logs tab to view container output

Checking TriggerAuthentication

  1. Locate the TriggerAuthentication resource in the application tree

  2. Verify the authentication configuration matches your trigger requirements

Common Issues

Jobs not being created:

  • Verify KEDA is installed and running in the cluster

  • Check if the trigger source has events/messages

  • Verify TriggerAuthentication credentials are correct

  • Check KEDA operator logs for trigger errors

Jobs failing repeatedly:

  • Check Pod logs for error messages

  • Verify secrets and ConfigMaps are correctly configured

  • Ensure the ServiceAccount has required permissions

  • Check if backoffLimit is too low

Authentication errors:

  • Verify ServiceAccount annotations for IRSA/Workload Identity

  • Check TriggerAuthentication references match the trigger configuration

  • Ensure IAM roles/managed identities have correct permissions

Scaling issues:

  • Check maxReplicaCount isn't limiting scaling

  • Verify pollingInterval is appropriate for your use case

  • Review scalingStrategy for your workload pattern


Values Reference

Value
Type
Default
Description

name

string

nil

Required. ScaledJob name

organisationName

string

nil

Required. Organisation name

scaledjob.pollingInterval

int

30

How often KEDA checks triggers (seconds)

scaledjob.successfulJobsHistoryLimit

int

3

Successful jobs to retain

scaledjob.failedJobsHistoryLimit

int

1

Failed jobs to retain

scaledjob.maxReplicaCount

int

100

Maximum concurrent jobs

scaledjob.minReplicaCount

int

0

Minimum jobs to maintain

scaledjob.scalingStrategy

string

default

default, accurate, or eager

scaledjob.parallelism

int

1

Parallel pods per job

scaledjob.completions

int

1

Required successful completions

scaledjob.backoffLimit

int

3

Retries before job failure

scaledjob.activeDeadlineSeconds

int

nil

Maximum job duration

scaledjob.restartPolicy

string

OnFailure

OnFailure or Never

scaledjob.triggers

list

[]

Required. KEDA trigger configurations

affinity.nodeAffinity.preferSpotInstances

bool

false

Prefer scheduling on spot instances

container.image.registry

string

nil

Required. Container registry

container.image.repository

string

nil

Required. Image repository

container.image.tag

string

nil

Required. Image tag

container.command

list

[]

Container entrypoint override

container.args

list

[]

Container arguments

serviceAccount.name

string

nil

Required. ServiceAccount name

serviceAccount.annotations

object

{}

ServiceAccount annotations

triggerAuthentication.enabled

bool

false

Enable TriggerAuthentication

triggerAuthentication.secretTargetRef

list

[]

Secrets for authentication

triggerAuthentication.podIdentity

object

{}

Pod identity configuration

environmentVariables

list

[]

Environment variables

environmentVariablesFrom

list

[]

Load env vars from ConfigMap/Secret

volumes

list

[]

Volume mounts

podMonitor.enabled

bool

false

Enable PodMonitor

pgt-secrets.enabled

bool

false

Enable external secrets

Last updated

Was this helpful?