Monday, June 27, 2016

git flow with bamboo and docker

git flow is well known branching model for git repositories. The challenge however is how to make it work when it comes to integrating with build, release and deployment process. This post describes one simple recipe to implement E2E process. The gist of the recipe is to use software version of the package as the docker image tag.

Git flow branching model

git flow branching model is the original post that describes the branching model in detail. I found this link very useful that provides good summary of the git flow with commands for implementing the branching model.

If you are using atlassian suite of products then it is best to name branches after JIRA tickets for better integration and traceability.

Bamboo build process

For every repository create 3 plans as following:

CI and CD Plan

This plan builds from develop branch and create docker image with tag as "latest". The bamboo plan can deploy the image automatically to CD environment. In addition, QualDev (QA) team can request deployment in QualDev environment.

Release Plan

This plan builds from master and release*/hotfix* branches. The docker images are created with tag as (npm package version or maven version). The deployment of images from this build are typically on demand.

Feature Plan

This plan builds from feature* branches. This plan doesn't generate any docker image. This is primarily for running unit and integration tests.


Bamboo plan and Docker Image

Following is the sample job in the bamboo plan to create docker image and push to AWS ECR. This is based on a nodejs project. The project source include a build.json file with placeholders for build key and build number. The dockerfile replaces them with the value passed in the build-arg parameters to docker build command. build.json along with npm package version provide complete context of the build currently deployed in a given environment.

#!/bin/bash
# Configure aws
echo $bamboo_AWS_AKEY > 1.txt
echo $bamboo_AWS_SKEY >> 1.txt
echo "" >> 1.txt
echo "" >> 1.txt
aws configure < 1.txt
# Login to AWS ECR
LOGIN_STRING=`aws ecr get-login --region us-east-1`
${LOGIN_STRING}
PRODUCT=
COMPONENT=
PACKAGE_VERSION=$(cat package.json | grep version | head -1 | awk -F: '{ print $2 }' | sed 's/[",]//g' | tr -d '[[:space:]]')
TAG=
BUILDKEY=${bamboo.buildKey} 
BUILDNUMBER=${bamboo.buildNumber}
                    REPOURL= 
# Build and Push docker image
docker build --build-arg BUILD_KEY=$BUILDKEY --build-arg BUILD_NUMBER=$BUILDNUMBER -t $PRODUCT/$COMPONENT:$TAG -f dockerbuild/Dockerfile --no-cache=true .
docker tag $PRODUCT/$COMPONENT:$TAG      $REPOURL/$PRODUCT/$COMPONENT:$TAG
docker push $REPOURL/$PRODUCT/$COMPONENT:$TAG

Following command in dockerfile updates the build.json
# Update build key and number
RUN sed -i -- "s/BUILDKEY/$BUILD_KEY/g; s/BUILDNUMBER/$BUILD_NUMBER/g" ./build.json

Further an API like following can make the information available to internal users about the details of the service running.
    const build = require('./build.json');

    {
      method: 'GET',
      path: '/about',
      config: {
        handler: function (request, reply) {
          var about = {
            "name": process.env.npm_package_name,
            "version": process.env.npm_package_version,
            "buildKey":  build.buildKey,
            "buildNumber": build.buildNumber,
            "config": conf.getProperties()
          }
          return reply(about);
        }
      }
    }


Friday, April 22, 2016

Polymer 1.0 Vulcanize Error

I recently started getting following error when running "gulp" on polymer starter kit:

Starting 'vulcanize'...
ERROR finding /starter-kit/app/elements/bower_components/bower_components/promise-polyfill/Promise.js
ERROR finding /starter-kit/app/elements/bower_components/whenever.js/whenever.js
ERROR finding /starter-kit/app/elements/bower_components/bower_components/bower_components/bower_components/bower_components/web-animations-js/web-animations-next-lite.min.js
ERROR finding starter-kit/app/elements/bower_components/paper-datatable/weakCache.js
ERROR finding starter-kit/app/elements/weakCache.js
:starter-kit$

I spent multiple hours looking on google but couldn't find a concrete answer. The same code and command works on my collegues machines.

I figured out that one difference between their and my machine was the npm repository. They were using external repo while I was using internal repo.

The Polymer/vulcanize team released 1.14.9 and it had a critical bug (https://github.com/Polymer/vulcanize/issues/332). As soon as they found that, they unpublished 1.14.9 version. However, before they could unpublish, our internal repo had cached it.

To resolve this I had to manually downgrade to 1.14.8 which I did by changing repo path to public npm repo.


Saturday, April 2, 2016

SSL Error - SSL_VERIFY_CERT_CHAIN:certificate verify failed

Background


Recently (last week), we installed new SSL certificate on the Tomcat instances in production. The process involved:


  1. Create a new Java Keystore
  2. Generate a new CSR
  3. Obtain the certificate for our domain along with certificate chain
  4. Import the certificate with the certificate chain in the keystore
  5. Update Tomcat server.xml to point to new keystore
  6. Restart Tomcat process


The Tomcat instance hosts a SOAP WebService. The verification steps involved


  1. Checking the certificate details in multiple browsers
  2. Verifying SOAP API invocation using SOAP-UI tool


The verification was successful and we applied the change in production.


Issue

Within few hours couple of customers reported issue that they are not able to access the API. One customer shared the error log:


Caused by: javax.xml.ws.soap.SOAPFaultException: nested fault: SSL protocol error
error:140CF086:SSL routines:SSL_VERIFY_CERT_CHAIN:certificate verify failed
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
at com.sun.xml.ws.fault.SOAP11Fault.getProtocolException(SOAP11Fault.java:189)
at com.sun.xml.ws.fault.SOAPFaultBuilder.createException(SOAPFaultBuilder.java:122)
at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:119)
at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:89)
at com.sun.xml.ws.client.sei.SEIStub.invoke(SEIStub.java:118)


The first reaction was that we messed up something in deployment. Without reviewing 
the error and understanding the root cause the decision was to restore the service 
and the change was rolled back. Since the old certificate was valid for few more weeks,
it was a good decision.


Analysis



Later in the day I analyzed the error and concluded that there was no issue with the 
certificate or the deployment. Following is the new certificate chain when viewed in the 
browser



Following is the old certificate chain when viewed in the browser


While the root CA is same, the intermediate CA changed from "Verisign Class 3 Secure Server CA - G3" to "Symantec Class 3 Secure Server CA - G4". This change happened because the new certificate that we requested was SHA2. Verisign class 3 certificate is SHA1 whereas Symantec class 3 certificate is SHA2. Symantec has issued new intermediate CA certs with after Verisign aquisition in 2010. 


Clients that don't have Symantec class 3 intermediate certificate in their truststore will fail with error SSL_VERIFY_CERT_CHAIN.

Resolution

To overcome this error, customers must import the intermediate certificate from following link into their truststore:
https://knowledge.symantec.com/support/ssl-certificates-support/index?page=content&actp=CROSSLINK&id=INFO2045

Following page has instructions for installing certificate on various platforms:
https://knowledge.symantec.com/support/ssl-certificates-support/index?page=content&id=INFO212

Conclusion

  1. When installing new certificates, notify customers in advance (few weeks). Do this even if the change is limited to just extension of expiry date or domain name change.
  2. Any change in hashing algorithm i.e. SHA1 to SHA2 or SHA2 to SHA3 should be announced well in time to all customers. Different browsers have different timelines when it comes to migrating from SHA1 to SHA2. The biggest risk of such seemingly minor changes is on API integration.
  3. Observe the certificate chain carefully. Just seeing the green page icon in the browser bar is not sufficient. Share the chain with customer if it different from existing certificate chain.

Update - 04/20/2016


I missed one important part in my analysis. I verified the certificate chain using browser but never bothered to look at the chain in the keystore. It turns out the keystore didn't have full certificate chain and that caused clients to fail. If clients had the intermediate certificate in their truststore it would not have mattered. So the fix on our side was to import root CA.


Update - 04/20/2016


Today we went through another issue which was related to SHA1 to SHA2 update. One of the key customers was not prepared and post update they were not able to access out services. The client software was running on a Windows 2003 server that was never patched and was lacking support for SHA2. They were seeing following error while connecting to our service:

The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel.

While this key customer was trying to figure out how to patch their system(which is not easy), we put together a workaround solution for them so that they can continue to use the server. Here is what we did:
  1. Asked customer to use non-secure port. Since customer connects to our APIs over VPN it was okay to use non-secure port. However, it was not possible because the URL was hardcoded in the code and nobody knew where the source is or how to build it. So we went to option #2
  2. We setup a new server. 
    1. Installed the required software (which is Java + Tomcat + WAR file)
    2. Created a new self signed SHA1 certificate for the domain
    3. Configured tomcat to use the new keystore and self signed certificate
    4. Shared the certificate with customer to import in their truststore
    5. Asked customer to update the /etc/hosts (or equivalent for Windows) on their machine to point domain name to the IP of this new server. This avoided the need for changing the hardcoded URL in the code.
Following links were esp useful when troubleshooting and recommending solution to customer to patch their Windows 2003 server: