Monday, June 27, 2016

git flow with bamboo and docker

git flow is well known branching model for git repositories. The challenge however is how to make it work when it comes to integrating with build, release and deployment process. This post describes one simple recipe to implement E2E process. The gist of the recipe is to use software version of the package as the docker image tag.

Git flow branching model

git flow branching model is the original post that describes the branching model in detail. I found this link very useful that provides good summary of the git flow with commands for implementing the branching model.

If you are using atlassian suite of products then it is best to name branches after JIRA tickets for better integration and traceability.

Bamboo build process

For every repository create 3 plans as following:

CI and CD Plan

This plan builds from develop branch and create docker image with tag as "latest". The bamboo plan can deploy the image automatically to CD environment. In addition, QualDev (QA) team can request deployment in QualDev environment.

Release Plan

This plan builds from master and release*/hotfix* branches. The docker images are created with tag as (npm package version or maven version). The deployment of images from this build are typically on demand.

Feature Plan

This plan builds from feature* branches. This plan doesn't generate any docker image. This is primarily for running unit and integration tests.


Bamboo plan and Docker Image

Following is the sample job in the bamboo plan to create docker image and push to AWS ECR. This is based on a nodejs project. The project source include a build.json file with placeholders for build key and build number. The dockerfile replaces them with the value passed in the build-arg parameters to docker build command. build.json along with npm package version provide complete context of the build currently deployed in a given environment.

#!/bin/bash
# Configure aws
echo $bamboo_AWS_AKEY > 1.txt
echo $bamboo_AWS_SKEY >> 1.txt
echo "" >> 1.txt
echo "" >> 1.txt
aws configure < 1.txt
# Login to AWS ECR
LOGIN_STRING=`aws ecr get-login --region us-east-1`
${LOGIN_STRING}
PRODUCT=
COMPONENT=
PACKAGE_VERSION=$(cat package.json | grep version | head -1 | awk -F: '{ print $2 }' | sed 's/[",]//g' | tr -d '[[:space:]]')
TAG=
BUILDKEY=${bamboo.buildKey} 
BUILDNUMBER=${bamboo.buildNumber}
                    REPOURL= 
# Build and Push docker image
docker build --build-arg BUILD_KEY=$BUILDKEY --build-arg BUILD_NUMBER=$BUILDNUMBER -t $PRODUCT/$COMPONENT:$TAG -f dockerbuild/Dockerfile --no-cache=true .
docker tag $PRODUCT/$COMPONENT:$TAG      $REPOURL/$PRODUCT/$COMPONENT:$TAG
docker push $REPOURL/$PRODUCT/$COMPONENT:$TAG

Following command in dockerfile updates the build.json
# Update build key and number
RUN sed -i -- "s/BUILDKEY/$BUILD_KEY/g; s/BUILDNUMBER/$BUILD_NUMBER/g" ./build.json

Further an API like following can make the information available to internal users about the details of the service running.
    const build = require('./build.json');

    {
      method: 'GET',
      path: '/about',
      config: {
        handler: function (request, reply) {
          var about = {
            "name": process.env.npm_package_name,
            "version": process.env.npm_package_version,
            "buildKey":  build.buildKey,
            "buildNumber": build.buildNumber,
            "config": conf.getProperties()
          }
          return reply(about);
        }
      }
    }


Friday, April 22, 2016

Polymer 1.0 Vulcanize Error

I recently started getting following error when running "gulp" on polymer starter kit:

Starting 'vulcanize'...
ERROR finding /starter-kit/app/elements/bower_components/bower_components/promise-polyfill/Promise.js
ERROR finding /starter-kit/app/elements/bower_components/whenever.js/whenever.js
ERROR finding /starter-kit/app/elements/bower_components/bower_components/bower_components/bower_components/bower_components/web-animations-js/web-animations-next-lite.min.js
ERROR finding starter-kit/app/elements/bower_components/paper-datatable/weakCache.js
ERROR finding starter-kit/app/elements/weakCache.js
:starter-kit$

I spent multiple hours looking on google but couldn't find a concrete answer. The same code and command works on my collegues machines.

I figured out that one difference between their and my machine was the npm repository. They were using external repo while I was using internal repo.

The Polymer/vulcanize team released 1.14.9 and it had a critical bug (https://github.com/Polymer/vulcanize/issues/332). As soon as they found that, they unpublished 1.14.9 version. However, before they could unpublish, our internal repo had cached it.

To resolve this I had to manually downgrade to 1.14.8 which I did by changing repo path to public npm repo.


Saturday, April 2, 2016

SSL Error - SSL_VERIFY_CERT_CHAIN:certificate verify failed

Background


Recently (last week), we installed new SSL certificate on the Tomcat instances in production. The process involved:


  1. Create a new Java Keystore
  2. Generate a new CSR
  3. Obtain the certificate for our domain along with certificate chain
  4. Import the certificate with the certificate chain in the keystore
  5. Update Tomcat server.xml to point to new keystore
  6. Restart Tomcat process


The Tomcat instance hosts a SOAP WebService. The verification steps involved


  1. Checking the certificate details in multiple browsers
  2. Verifying SOAP API invocation using SOAP-UI tool


The verification was successful and we applied the change in production.


Issue

Within few hours couple of customers reported issue that they are not able to access the API. One customer shared the error log:


Caused by: javax.xml.ws.soap.SOAPFaultException: nested fault: SSL protocol error
error:140CF086:SSL routines:SSL_VERIFY_CERT_CHAIN:certificate verify failed
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
at com.sun.xml.ws.fault.SOAP11Fault.getProtocolException(SOAP11Fault.java:189)
at com.sun.xml.ws.fault.SOAPFaultBuilder.createException(SOAPFaultBuilder.java:122)
at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:119)
at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:89)
at com.sun.xml.ws.client.sei.SEIStub.invoke(SEIStub.java:118)


The first reaction was that we messed up something in deployment. Without reviewing 
the error and understanding the root cause the decision was to restore the service 
and the change was rolled back. Since the old certificate was valid for few more weeks,
it was a good decision.


Analysis



Later in the day I analyzed the error and concluded that there was no issue with the 
certificate or the deployment. Following is the new certificate chain when viewed in the 
browser



Following is the old certificate chain when viewed in the browser


While the root CA is same, the intermediate CA changed from "Verisign Class 3 Secure Server CA - G3" to "Symantec Class 3 Secure Server CA - G4". This change happened because the new certificate that we requested was SHA2. Verisign class 3 certificate is SHA1 whereas Symantec class 3 certificate is SHA2. Symantec has issued new intermediate CA certs with after Verisign aquisition in 2010. 


Clients that don't have Symantec class 3 intermediate certificate in their truststore will fail with error SSL_VERIFY_CERT_CHAIN.

Resolution

To overcome this error, customers must import the intermediate certificate from following link into their truststore:
https://knowledge.symantec.com/support/ssl-certificates-support/index?page=content&actp=CROSSLINK&id=INFO2045

Following page has instructions for installing certificate on various platforms:
https://knowledge.symantec.com/support/ssl-certificates-support/index?page=content&id=INFO212

Conclusion

  1. When installing new certificates, notify customers in advance (few weeks). Do this even if the change is limited to just extension of expiry date or domain name change.
  2. Any change in hashing algorithm i.e. SHA1 to SHA2 or SHA2 to SHA3 should be announced well in time to all customers. Different browsers have different timelines when it comes to migrating from SHA1 to SHA2. The biggest risk of such seemingly minor changes is on API integration.
  3. Observe the certificate chain carefully. Just seeing the green page icon in the browser bar is not sufficient. Share the chain with customer if it different from existing certificate chain.

Update - 04/20/2016


I missed one important part in my analysis. I verified the certificate chain using browser but never bothered to look at the chain in the keystore. It turns out the keystore didn't have full certificate chain and that caused clients to fail. If clients had the intermediate certificate in their truststore it would not have mattered. So the fix on our side was to import root CA.


Update - 04/20/2016


Today we went through another issue which was related to SHA1 to SHA2 update. One of the key customers was not prepared and post update they were not able to access out services. The client software was running on a Windows 2003 server that was never patched and was lacking support for SHA2. They were seeing following error while connecting to our service:

The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel.

While this key customer was trying to figure out how to patch their system(which is not easy), we put together a workaround solution for them so that they can continue to use the server. Here is what we did:
  1. Asked customer to use non-secure port. Since customer connects to our APIs over VPN it was okay to use non-secure port. However, it was not possible because the URL was hardcoded in the code and nobody knew where the source is or how to build it. So we went to option #2
  2. We setup a new server. 
    1. Installed the required software (which is Java + Tomcat + WAR file)
    2. Created a new self signed SHA1 certificate for the domain
    3. Configured tomcat to use the new keystore and self signed certificate
    4. Shared the certificate with customer to import in their truststore
    5. Asked customer to update the /etc/hosts (or equivalent for Windows) on their machine to point domain name to the IP of this new server. This avoided the need for changing the hardcoded URL in the code.
Following links were esp useful when troubleshooting and recommending solution to customer to patch their Windows 2003 server:












Friday, April 1, 2011

Cassandra 0.7.x - Understanding the output of nodetool cfhistograms


Command - Usage and Output
Cassandra provides nodetool cfhistograms command to print statistic histograms for a given column family. Following is the usage:
./nodetool -h -p cfhistograms

The output of the command has following 6 columns:
  • Offset
  • SSTables
  • Write Latency
  • Read Latency
  • Row Size
  • Column Count

Interpreting the output
  • Offset: This represents the series of values to which the counts for below 5 columns correspond. This corresponds to the X axis values in histograms. The unit is determined based on the other columns.
  • SSTables: This represents the number of SSTables accessed per read. For eg if a read operation involved accessing 3 SSTables then you will find a +ve value against Offset 3. The values are recent i.e. for duration lapsed between two calls.
  • Write Latency: This shows the distribution of number of operations across the range of Offset values representing latency in microseconds. For eg. If 100 operations took say 5 ms then you will find a +ve value against offset 5.
  • Read Latency: This is similar to write latency. The values are recent i.e. for duration lapsed between two calls.
  • Row Size: This shows the distribution of rows across the range of Offset values representing size in bytes. For eg. If you have 100 rows of size 2000bytes then you will find a +ve value against offset 2000.
  • Column Count: This is similar to row size. The offset values represent column count.

Some additional details
  • Typically in a histogram the values are plotted over discrete intervals. Similarly Cassandra defines buckets. The number of buckets is 1 more than the bucket offsets. The last element is values greater than the last offset. The values you see in the Offset column in the output is bucket offsets.
  • The bucket offset starts at 1 and grows by 1.2 each time (rounding and removing duplicates). It goes from 1 to around 36M by default (creating 90+1 buckets), which will give us timing resolution from microseconds to 36 seconds, with less precision as the numbers get larger. (see EstimatedHistogram class)





Friday, March 11, 2011

Schema Management in Cassandra 0.7

Schema Management in Cassandra

Starting with Cassandra 0.7 the schema management in Cassandra is very easy. It is as good as centralized schema management with no SPoF . Typically schema operations involve loading schema initially, making changes to existing schema like adding CF and/or modifying existing CF attributes, and dropping schema elements like CFs and Keyspaces.

There are 3 ways these operations can be performed:

Load schema from cassandra.yaml using schematool or JMX Console: This option can be used to load schema only once. Running it twice in a cluster won't have any impact. So this is good for loading initial schema.

schematool import
OR
JConsole:MBeans->org.apache.cassandra.db->StorageService -> Operations -> loadSchemaFromYAML


Create/Modify schema using Thrift APIs: This provides high flexiibility and good for applications that wish to create/drop Keyspaces and ColumnFamilies on fly. You cannot modify existing ColumnFamilies using the APIs. Refer to Cassandra Wiki - API for details of the APIs available. Following APIs are available:
  • describe_keyspace
  • describe_keyspaces
  • system_add_column_family
  • system_drop_column_family
  • system_add_keyspace
  • system_drop_keyspace

Create/Modify schema using cassandra-cli: This is the most flexible option available. It allow practically everything that option #1 and #2 allow collectively. Following commands are supported. You can see the commands by entering "help;" command on cassandra-cli. For details of specific command type "help ;". For eg "help create keyspace;".
  • Describe keyspace
  • Show list of keyspaces
  • Add a new keyspace with the specified attribute(s) and value(s)
  • Update a keyspace with the specified attribute(s) and value(s)
  • Create a new column family with the specified attribute(s) and value(s)
  • Update a column family with the specified attribute(s) and value(s)
  • Delete a keyspace
  • Delete a column family

Under the hood

The Cassandra Wiki - Schema Updates describes the operations in good details. Following is the high level summary:

  • Cassandra uses Schema and Migrations ColumnFamily in system keyspace for maintaining schema and changes to schema respectively.
  • Schema changes done on one node are propagated on other nodes in the cluster
  • Migrations CF tracks individual changes to schema. Schema CF contains reference to the latest version in use
  • Some manual cleanup may be needed if node crashes while schema changes are being applied to the cluster
  • To avoid concurrency issues always push schema changes through one node

Examples

Dropping a Keyspace

  • Connect to cassandra-cli on a node and run drop keyspace command.

[root@rwc-sb6240-1 bin]# ./cassandra-cli
Welcome to cassandra CLI.

Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
[default@unknown] connect 20.17.221.19/9160;
Connected to: "NarenCluster072" on 20.17.221.19/9160
[default@unknown] drop keyspace KeyspaceMigration;
5fba3f1f-4a82-11e0-b8ee-f90f8a3f5e1f
[default@unknown] exit;
[root@rwc-sb6240-1 bin]#


  • The logs on the node will show following events (DEBUG MODE)

DEBUG [pool-1-thread-151] 2011-03-09 11:21:03,334 CassandraServer.java (line 759) drop_keyspace
DEBUG [MigrationStage:1] 2011-03-09 11:21:03,343 Table.java (line 397) applying mutation of row 35666261336631662d346138322d313165302d623865652d663930663861336635653166
...
DEBUG [CompactionExecutor:1] 2011-03-09 11:21:04,146 CompactionManager.java (line 109) Checking to see if compaction of Schema would be useful DEBUG [MigrationStage:1] 2011-03-09 11:21:04,146 MigrationManager.java (line 106) Announcing my schema is 5fba3f1f-4a82-11e0-b8ee-f90f8a3f5e1f
DEBUG [CompactionExecutor:1] 2011-03-09 11:21:04,147 CompactionManager.java (line 109) Checking to see if compaction of Migrations would be useful
DEBUG [ReadStage:14] 2011-03-09 11:21:04,150 MigrationManager.java (line 87) Their data definitions are old. Sending updates since d052796e-4a80-11e0-b8ee-f90f8a3f5e1f
DEBUG [ReadStage:15] 2011-03-09 11:21:04,151 MigrationManager.java (line 87) Their data definitions are old. Sending updates since d052796e-4a80-11e0-b8ee-f90f8a3f5e1f
...
DEBUG [pool-1-thread-151] 2011-03-09 11:21:05,629 StorageProxy.java (line 628) My version is 5fba3f1f-4a82-11e0-b8ee-f90f8a3f5e1f DEBUG [pool-1-thread-151] 2011-03-09 11:21:05,629 StorageProxy.java (line 659) Schemas are in agreement.


  • On the other nodes the log entries will look like

DEBUG [ReadStage:9] 2011-03-09 11:12:19,250 MigrationManager.java (line 82) My data definitions are old. Asking for updates since d052796e-4a80-11e0-b8ee-f90f8a3f5e1f
DEBUG [ReadStage:9] 2011-03-09 11:12:19,253 MigrationManager.java (line 106) Announcing my schema is d052796e-4a80-11e0-b8ee-f90f8a3f5e1f
DEBUG [MigrationStage:1] 2011-03-09 11:12:19,273 SchemaCheckVerbHandler.java (line 36) Received schema check request.
...
DEBUG [MigrationStage:1] 2011-03-09 11:12:20,681 MigrationManager.java (line 106) Announcing my schema is 5fba3f1f-4a82-11e0-b8ee-f90f8a3f5e1f

Thursday, December 17, 2009

Residential Gateway - Part 2

Since I am not much busy now a days, you may see multiple posts from me in a single day :). In the last part I talked about the Residential Gateway in general. In this post I will talk about the WAN side interface i.e. DSL.


DSL stands for Digital Subscriber Line. It is the technology that is used to transmit digital content over phone line (the very same line that is connected to your landline phone). Some of you must be wondering how is that possible? Will I be able to use my phone and Internet simultaneously?

The technology has the answer. The phone line that we have today is under utilized. It is used to carry only voice traffic which is transmitted over the frequency band 300 Hz to 3400 Hz. Whereas the cable is capable of carrying signals at very high frequency. The DSL technology makes use of the unused frequency bands to send/receive data.

What about simultaneous use? It is possible using a splitter/microfilter. A splitter is a small piece of hardware that is usually supplied by the broadband service provider. The phone line is connected to splitter. There are two output ports on a splitter. One port connects to DSL modem whereas the other port connects to phone. Splitter splits the signals based on the frequency. Signals with lower frequency

There are a number of variations/standards of DSL technology. They primarily differ in two parameters viz speed and distance they support. Ofcourse there are core technology differences. Note that with distance the signal quality deteriorates and it is not possible to install repeaters for data signals. Hence, distance plays important role.


The most popular DSL standards are ADSL and VDSL.

ADSL: Asymmetric Digital Subscriber Line
As the name suggests the download and upload speeds are different. In most of the home networks people download more than they upload. Hence, this results in very good user experience. The band from 25.875 kHz to 138 kHz is used for upstream communication, while 138 kHz to 1104 kHz is used for downstream communication.

ADSL supports download speed of upto 12 Mbps and upload speed of upto 1.5 Mbps. ADSL2+ extends the capability of ADSL by doubling the downstream bits which is done by extending the downstream frequency band from 1.1 MHz to 2.2 MHz. As a result ADSL2+ supports download speed of upto 24 Mbps. ADSL works for max distance upto 5000 meters from exchange. The close the exchange the better would be the signal quality and speed.

Following diagram shows the ADSL2+ router in the broadband network:
ADSL2+ Router in Network

VDSL: Very High Bitrate DSL
It is similar to ADSL2+ but uses high frequency band, in the order of 30Mhz. As a result it provides very high download and upload speed, approx 100 Mbps . The indicated max speeds are achievable for max distance of upto 300 meters from exchange.

The distance of 300 meters from exchange is not practical in most cases . Hence, Optical Network Unit is used to provide service even from a larger distance. The broadband service provider lays Optical cables from exchange to the locality (typically building or group of buildings) which are connected to ONU. The ONU then connects to routers at home over the phone line (DSL). Optical cables are capable of carrying data at speed of Gpbs.

Due to high speed VDSL is ideal for IPTV and HDTV services. Since it supports symmetric upload and download speeds as well it is suitable for video conferencing.

Following diagram shows the VDSL network diagram:
VDSL Network


Refer to following link for more technical differences between ADSL and VDSL:
http://www.pulsewan.com/data101/adsl_vdsl_basics.htm




Tuesday, December 15, 2009

Residential Gateway - Part 1

I worked for 2 years at a company that manufactures Residential Gateway. I primarily worked on the GUI and Configuration customization of these gateways. At times the work involved debugging functional issues that required understanding of the underlying protocols and standards. This provided me the opportunity to learn about various networking standards.

I referred Wikipedia and RFCs for most of the things I learned. I am going to share my learning through series of blogs. I don't intend to capture the internal details like multiplexing techniques or packet/frame formats as those can be obtained from standards documents. The information here will provide conceptual understanding and some useful facts about the technologies.

In this first installment let's understand what is Residential Gateway.

Residential Gateway is quite a popular term in USA but not a well known term in India. In India people refer it as Modem mostly because:
- It is just a Modem
- If it is not just a Modem then either people don't know about its features and/or they don't use other features.

The primary function of Residential Gateway is to enable broadband Internet connection for home users. It is a combination of modem and router. In addition, it provides other features like:
- Firewall
- NAT
- DHCP
- DNS
- VoIP

Following diagram shows the complete broadband ecosystem. The CPE in the diagram is the residential gateway:
Broadband Network

Following diagrams shows how a typical home network looks like:
Home Network 1
Home Network 2
Following diagram shows the common ports available on a residential gateway and sample devices that can be connected:
Residential Gateway Ports and Devices


If you look at the diagrams available at above links you will notice that the most common interface on the home or LAN side are:
- USB
- Ethernet (RJ-45)
- Wireless (Wifi or 802.11x)
- Coax (For TV or STB)

On the WAN side the most common interfaces are HPNA (DSL) and Cable (this is same as the one on which you get Cable TV service). Out of these the DSL interface is most commonly used, atleast in India. The primary reason for DSL popularity is that the basic infrastructure i.e. the phone line is already present. User just need to buy the modem cum router and he is ready to setup his/her home network.

I will talk about DSL and related technologies in the next part. Stay tuned...