AWS GameDay 2014

whoami

 

Henning Kristensen

Certificeret Cloud-gøgler

henning@dashsoft.dk

 

Yay! GameDay!

250 Seats

Build

Break

Fix

Team HJA

Andreas fra Tyskland
Justin fra Colorado
...og jeg selv

Arkitektur

SÃ¥dan delte vi opgaven op...

Andreas: CloudFormation templates for køer, S3 buckets og AutoScale groups

 

Justin: Ansible script for AMI generering

 

Mig selv: VPC konfiguration incl. NAT instans

Ansible?

  • Orkestreringsværktøj
  • Idempotent*
  • YAML
  • Playbooks med Tasks
  • Moduler

CloudFormation

  • AWS Native
  • JSON Template
  • Deklarativ
  • Bredt supporteret i AWS
- hosts: localhost
  connection: local
  gather_facts: False
  vars_files:
  - var_files/gameday.yml
  vars:
    key_name: "{{appname}}"
    instance_type: t2.micro
    security_group: batchprocessingsg
    image: ami-4985b048
    master_image_name: master-image
  tasks:
  - name: Create VPC
    local_action:
      module: ec2_vpc
      state: present
...


vpc_net: 10.3
appname: imageprocessor4
nat_amiid: ami-11d6e610
iamrole: BatchProcessing-{{appname}}
var_files/gameday.yml:
create_vpc.yml (fortsat):
- name: Create VPC
  local_action:
    module: ec2_vpc
    state: present
    cidr_block: "{{vpc_net}}.0.0/16"
    resource_tags: { "Environment":"Gameday" }
    subnets:
    - cidr: "{{vpc_net}}.1.0/24"
      az: "{{region}}a"
      resource_tags: { "Environment":"Gameday", "Tier" : "Nat" }
    - cidr: "{{vpc_net}}.2.0/24"
      az: "{{region}}a"
      resource_tags: { "Environment":"Gameday", "Tier" : "App" }
    internet_gateway: True
    route_tables:
    - subnets:
        - "{{vpc_net}}.1.0/24"
      routes:
        - dest: 0.0.0.0/0
          gw: igw
    region: "{{region}}"
  register: vpc
create_vpc.yml (fortsat):
- name: NAT SG
  local_action:
    module: ec2_group
    name: natsg
    description: NAT Security Group
    vpc_id: "{{vpc.vpc_id}}"
    region: "{{region}}"
    rules:
    - proto: tcp
      from_port: 80
      to_port: 80
      cidr_ip: 0.0.0.0/0
    - proto: tcp
      from_port: 443
      to_port: 443
      cidr_ip: 0.0.0.0/0
    - proto: tcp
      from_port: 22
      to_port: 22
      cidr_ip: 0.0.0.0/0
  register: natsg
create_vpc.yml (fortsat):
- name: Create nat server
  local_action:
    module: ec2
    key_name: "{{appname}}"
    group_id: "{{natsg.group_id}}"
    instance_type: m1.small
    image: "{{nat_amiid}}"
    source_dest_check: False
    instance_tags:
       Name: NAT
    wait: yes
    vpc_subnet_id: "{{vpc['subnets'][0]['id']}}"
    region: "{{region}}"
    assign_public_ip: yes
  register: nat
create_vpc.yml (fortsat):
- name: Recreate VPC with NAT instance routing
  local_action:
    module: ec2_vpc
    state: present
    cidr_block: "{{vpc_net}}.0.0/16"
    resource_tags: { "Environment":"Gameday" }
    subnets:
    - cidr: "{{vpc_net}}.1.0/24"
      az: "{{region}}a"
      resource_tags: { "Environment":"Gameday", "Tier" : "Nat" }
    - cidr: "{{vpc_net}}.2.0/24"
      az: "{{region}}a"
      resource_tags: { "Environment":"Gameday", "Tier" : "App" }
    internet_gateway: True
    route_tables:
    - subnets:
        - "{{vpc_net}}.1.0/24"
      routes:
        - dest: 0.0.0.0/0
          gw: igw
    - subnets:
        - "{{vpc_net}}.2.0/24"
      routes:
        - dest: 0.0.0.0/0
          gw: "{{nat.instance_ids[0]}}"
    region: "{{region}}"
  register: vpc
create_vpc.yml (fortsat):
- name: Find existing master image
  local_action:
      shell aws --region ap-northeast-1 ec2 describe-images --filters Name=name,Values=master-image | scripts/find_image.py {{master_image_name}}
  register: existing_master_image_id
- debug: var=existing_master_image_id

- name: Wipe master image
  local_action:
    shell aws --region {{region}} ec2 deregister-image --image-id {{existing_master_image_id.stdout}}
  when: existing_master_image_id.stdout|length > 0
create_vpc.yml (fortsat):
- name: Create Master Instance
  register: master_instance
  local_action:
    module: ec2
    key_name: "{{appname}}"
    instance_type: "{{ instance_type }}"
    image: "{{ image }}"
    wait: yes
    user_data: "{{ lookup('file', 'var_files/master_image_userdata.txt') }}"
    group: "{{ security_group }}"
    instance_tags:
      Name: Master
    count: 1
    vpc_subnet_id: "{{vpc['subnets'][0]['id']}}"
    region: "{{ region }}"
- debug: var=master_instance.instances
var_files/master_image_userdata.txt:
#!/bin/sh

# Install ImageMagick, the AWS SDK for Python, and create a directory
yum install -y ImageMagick
easy_install argparse
mkdir /home/ec2-user/jobs

# Download and install the batch processing script
# The following command must be on a single line
wget -O /home/ec2-user/image_processor.py https://us-west-2-aws-training.s3.amazonaws.com/architecting-lab-3-creating-a-batch-processing-cluster-3.1/static/image_processor.py
create_vpc.yml (fortsat):
- name: Create Master AMI
  register: master_image
  local_action:
    module: ec2_ami
    description: "master worker image"
    instance_id: "{{ master_instance.instances[0].id }}"
    wait: yes
    name: "{{master_image_name}}"
    region: "{{ region }}"
- debug: var=master_image

- name: Terminate master instance
  local_action:
      shell aws --region {{region}} ec2 terminate-instances --instance-ids {{master_instance.instances[0].id}}
create_vpc.yml (fortsat):
- name: autoscaling cloudformation
  cloudformation:
    stack_name="autoscaling-cloudformation"
    state=present
    region="{{region}}"
    disable_rollback=false
    template=../cloudformation/autoscaling.json
  args:
    template_parameters:
      subnets: "{{vpc['subnets'][1]['id']}}"
      ami: "{{master_image.image_id}}"
      securitygroup: "{{batchprocessingsg.group_id}}"
  tags:
  - cloudform
../cloudformation/autoscaling.json:​
 
{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Description": "autoscaling",
  "Parameters": {
    "subnets": {
      "Description": "ids of the subnets to launch instances in",
      "Type": "CommaDelimitedList"
    },
    "ami": {
      "Description": "the ami id",
      "Type": "String",
      "MinLength": "1",
      "MaxLength": "255"
    },
    "securitygroup": {
      "Description": "security group id",
      "Type": "String",
      "MinLength": "1",
      "MaxLength": "255"
    }
  },
../cloudformation/autoscaling.json:​
 
"Resources": {
  "iamrole": {
    "Type": "AWS::IAM::Role",
    "Properties": {
      "AssumeRolePolicyDocument": {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Effect": "Allow",
            "Principal": {
              "Service": [
                "ec2.amazonaws.com"
              ]
            },
            "Action": [
              "sts:AssumeRole"
            ]
          }
        ]
      },
      "Path": "/",

../cloudformation/autoscaling.json:​
 
    "Policies": [
      {
        "PolicyName": "root",
        "PolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Sid": "Stmt1415740964000",
              "Effect": "Allow",
              "Action": [
                "sqs:*"
              ],
              "Resource": [
                "*"
              ]
            },
            {
              "Sid": "Stmt1415740989000",
              "Effect": "Allow",
              "Action": [
                "s3:*"
              ],
              "Resource": [
                "*"
              ]
            }
          ]
        }
      }
    ]
  }
},
../cloudformation/autoscaling.json:​
 
"iamprofile": {
  "Type": "AWS::IAM::InstanceProfile",
  "Properties": {
    "Path": "/",
    "Roles": [
      {
        "Ref": "iamrole"
      }
    ]
  }
},
../cloudformation/autoscaling.json:​
 
"launchconfig": {
  "Type": "AWS::AutoScaling::LaunchConfiguration",
  "Properties": {
    "EbsOptimized": false,
    "InstanceMonitoring": true,
    "ImageId": {
      "Ref": "ami"
    },
    "KeyName": "imageprocessor2",
    "SecurityGroups": [
      {
        "Ref": "securitygroup"
      }
    ],
    "AssociatePublicIpAddress": false,
    "IamInstanceProfile": {"Ref": "iamprofile"},
    "InstanceType": "t2.micro",
    "UserData": {
      "Fn::Base64": {
        "Fn::Join": [
          "",
          [
            "#!/bin/sh\n",
            "/usr/bin/python /home/ec2-user/image_processor.py &\n"
          ]
        ]
      }
    }
  }
},
../cloudformation/autoscaling.json:​
 
"autoscalinggroup": {
  "Type": "AWS::AutoScaling::AutoScalingGroup",
  "Properties": {
    "Cooldown": "10",
    "DesiredCapacity": 1,
    "MaxSize": 4,
    "MinSize": 1,
    "LaunchConfigurationName": {
      "Ref": "launchconfig"
    },
    "VPCZoneIdentifier": {
      "Ref": "subnets"
    },
    "AvailabilityZones": [
      "ap-northeast-1a"
    ]
  }
},
../cloudformation/autoscaling.json:​
 
 
"addpolicy": {
  "Type": "AWS::AutoScaling::ScalingPolicy",
  "Properties": {
    "AdjustmentType": "ChangeInCapacity",
    "AutoScalingGroupName": {
      "Ref": "autoscalinggroup"
    },
    "ScalingAdjustment": "1"
  }
},
"removepolicy": {
  "Type": "AWS::AutoScaling::ScalingPolicy",
  "Properties": {
    "AdjustmentType": "ChangeInCapacity",
    "AutoScalingGroupName": {
      "Ref": "autoscalinggroup"
    },
    "ScalingAdjustment": "-1"
  }
},
../cloudformation/autoscaling.json:​
 
    "cloudwatchalarm": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "ActionsEnabled": "true",
        "AlarmActions": [
          {
            "Ref": "addpolicy"
          }
        ],
        "AlarmDescription": "queue is to long",
        "AlarmName": "long-queue",
        "ComparisonOperator": "GreaterThanOrEqualToThreshold",
        "Dimensions": [
          {
            "Name": "QueueName",
            "Value": "input"
          }
        ],
        "EvaluationPeriods": 1,
        "MetricName": "ApproximateNumberOfMessagesVisible",
        "Namespace": "AWS/SQS",
        "OKActions": [
          {
            "Ref": "removepolicy"
          }
        ],
        "Period": 60,
        "Statistic": "Average",
        "Threshold": 10
      }
    }
  }
}

Hvad lærte vi?

  • Ansible er ikke ret idempotent mht. AWS-ressourcer
  • Produktiviteten i Ansible og CloudFormation er sammenlignelig
  • En novice kan lære at lave noget meningsfuldt i Ansible pÃ¥ et par timer
  • Scriptet infrastruktur tager tid at teste!
  • Forvent x3-4 tidsforbrug i forhold til konsollen

Fase 2: Break

  • Security by Obscurity
  • Rodebutik? Hvad kører overhovedet?
  • ...sÃ¥ vi forfaldt til hjernedød destruktion

Fase 3: Fix

  1. Generer nye Access Keys
  2. Kør Ansible-scriptet
  3. Slet samtidig alt hvad break'erne havde efterladt

Fame & Glory!

  • Most evil/best hack - traffic-shaping NAT
     

  • Best story - Commit ikke AWS Access Keys til public Github repos
     

  • Fastest recovery - Team HJA

More than one way to do it?

Troposphere:
Python + CloudFormation ÷ JSON

Game Day Exercices

Links