Monthly Archives: January 2014

Virtualizations on Enterprise Level

Let’s talk about a popular technology that is changing the way businesses configure, store, manage, distribute and maintain their software as well as hardware systems. There is an inherent shift from mainframe computing to x86 based systems due to performance enhancements and cost effectiveness. With in-built support for virtualization many companies are adopting the virtual architecture in driving the software needs for their business.

Virtualization is a way of presenting a logical or virtual view of separate environments working independently or together by abstracting applications and components from a system consisting of hardware, software or both. The logical view can be different from the parent system as the virtual systems depend for their existence in terms of form and functionality on core or fundamental components of the parent system. This means any virtual view will piggy-back on the parent application/system to mimic or enhance all or a handful of functions into separate sub-systems customized according to the need and providing benefits like increased performance, scalability, cost effectiveness, maintenance etc.

When we hear the word “Virtualization” we think of VMware or Parallels software that will let you run Windows8/Windows Server 2012/Linux on a Mac OS or on another OS. On enterprise level there are different types of virtualization technologies providing solutions that tackles a specific field of interest or an area and can be customized or activated to work on their own or be integrated to work as part of a system.

Here is a nice little diagram I created to highlight various types of virtualization options that can be implemented to the need.

  1. Access Virtualization – This type of virtualization allows any device to access and use the application on a common environment without the need to know much about each other. The device will know how to connect to the application system and the application will accept those devices that were setup to connect and utilize the system without any hiccups or issues. An example could be remote devices like a laptop, tablet, think client, POS system, and terminals with different hardware accessing a server that runs an application capable to work with those devices.  This helps an organization be independent in deal with changes on the hardware side and preventing to rewrite the entire applications. Say for example a company makes a switch from Android based tablets to iOS based iPad and with access virtualization there is seamless migration and transition in accessing the same applications.
  2. Application Virtualization – This type of virtualization lets an application to be used on multiple and different systems. These applications would typically use an API or a framework to integrate the application with base systems regardless of their native underlying hardware. There would be some configuration and other customizations while integrating applications on different systems but overall it provides greater flexibility and operating system independence. Different vendor companies developing such software include Microsoft with their Microsoft Application Virtualization or App-V, Citrix with their popular XenApp product and VMware with their ThinApp product. Another type of application virtualization would be similar to porting of legacy applications when older hardware are decommission and not supported.
  3. Processing Virtualization – This type of virtualization operates at or below the Operating System level. There are different ways processing virtualization can make use of an operating system  that is encapsulated to work separately or as part of a system for workload sharing and failover or for linking such systems to distribute data to separate areas.  What that means is that one system can be made to appear as if there are many or many systems appear to work as one. There are scenarios where parallel computing is needed for certain set of work while raw GPU is needed to drive some graphics. Processing Virtualization helps dealing with such scenarios by letting Hypervisor or Virtual machine manager handle requests and delegate them to proper virtual system by managing and diverting the resources such process threads, GPU utilization cycles etc where they are needed.
  4. Network Virtualization – This type of virtualization pertains to dealing with links of system and their availability based on the security setup. Multiple systems can be virtualized to be used by different users based on their access level. For example finance users will have access to the finance virtual system while sales will have access to sales virtual system while both of them being hosted on the same hardware. These systems are set up in network to be made available to their user base.
  5. Storage Virtualization – This is one of the most common types of virtualization that is being adopted by various companies to manage their growing needs for data storage and access. Cloud based computing has really pushed the storage virtualization into every sector whether its enterprise or general consumer. This type of system where data storage can be dynamically managed by altering their capacity, resources and processing request has helped business with their changing application needs. Cost effectiveness is also a major factor as companies can start with a common server and use it for virtualizing several storage applications to be used across different groups of a company. For example, a server can host virtual database systems like Oracle, SQL Server or Teradata which can be accessed by different applications. The beauty of this setup is that constraints related to capacity, processing power, priority, user management etc. can be eliminated.  As need for growing data imparts need to add more storage capacity, newly modules of hard drives can be seamlessly and effortlessly distributed across different virtual database systems or streamlined towards the database that has the greatest need. The same stands for the other way in terms of accessing data through various medium from a common source. On consumer product level, an example would be Dropbox a cloud data storage service where users can start with certain amount of space but can buy more as their needs grow. It beautifully works with any device in accessing the stored data.
  6. Hardware Assisted Virtualization – This type of virtualization is actually more of a collection of techniques and hardware designed on a much lower level of virtualization. The newer chipsets have Virtualization capability at the hardware level and with related instruction sets featuring directed I/O, Extended Page tables, Rapid Virtualization Indexing, Tagged TLB, security etc. Such system has inherent support for hosting more Virtual machines per server and enables the software to interact more effectively with greater consolidation. Some of the noted capabilities include bypassing the Virtual machine manager or hypervisor in managing the low level instructions for memory management, I/O and hyper threading with compatible processor. Examples of hardware assisted virtualization include AMD-V technology in AMD chipsets and Intel VT in Intel chipsets.

SQL Tutorial – Data Retrieval (Part 1)

Structured Query Language is the universal language spoken by all the enterprise databases. If you ask a database developer what is his primary spoken language, he will most say “SQL”. That was a joke! But the point is when you are working on databases, understanding the SQL is of foremost importance since it is the building block for writing/reading Stored procedures, Derived Views, Macros etc which are most commonly used form of SQL usage for retrieving the data in a desired form.

Between various databases such as SQL Server, Teradata, Oracle, Netezza etc the statement functions and syntaxes can differ but the structure and fundamental concepts of SQL can be applied to any of them.

We will be using Production schema from AdventureWorks 2012 database on SQL Server 2012 for all our examples. Click on the picture for schema diagram.

Production Schema

SQL statements are categorized by their primary function. Below are the types of SQL Statement.

Types of SQL Statements

Before we start, lets understand the structure of Select Statement since this is the most commonly and frequently used Data Retrieval Statement. The Select and From clauses are required and the remaining clauses are optional.

Select Statement Construct

If the data that we are trying to retrieve lies in two separate tables, they would need to be joined using the Primary and Secondary Key relationship that exists between the two tables.

There are different types of joins and is easier to understand using Venn diagram from set theory,

  • Inner Join – Using this join returns all the common rows between two or more tables.

Select Table_A.*,Table_B.* from Table_A INNER JOIN Table_B on Table_A.Key = Table_B.Key

Inner Join Visual Diagram

  • Right Outer Join - This join as the name suggest will return matching rows from the table that is on the left side (A) of the join and return the entire dataset for the table on the right side (B).

Select Table_A.*,Table_B.* from Table_A RIGHT OUTER JOIN Table_B on Table_A.Key = Table_B.Key

Right Join Visual Diagram

  • Left Outer Join – This join as the name suggest will return matching rows from the table that is on the right side of the join and return the entire dataset for the table on the left side.

Select Table_A.*,Table_B.* from Table_A LEFT OUTER JOIN Table_B on Table_A.Key = Table_B.Key

Left Join Visual Diagram

  • Full Outer Join – Will give you all the rows from all the tables joined using this type of join.

1. Select Table_A.*,Table_B.* from Table_A OUTER JOIN Table_B on Table_A.Key = Table_B.Key

Outer Join Visual Diagram

2. Select Table_A.*,Table_B.* from Table_A OUTER JOIN Table_B on Table_A.Key = Table_B.Key where Table_A.Key is NULL OR Table_B.Key is NULL

Outer Join with NULL Keys

  • Cross Join – This is commonly known as Cartesian product and should be avoided. The join will return combination of all the rows from the two tables and return N x N number of rows.

Select Table_A.*,Table_B.* from Table_A OUTER JOIN Table_B

 

Lets start with an example by constructing a basic Select statement that has all the clauses shown in the picture above..

SELECT  Product.Name ,
        ProductID ,
        ProductLine,
        ListPrice,
        AVG(ListPrice) AS Avg_ListPrice,
        StandardCost
FROM  AdventureWorks2012.Production.Product
WHERE ListPrice > = 100
AND StandardCost <= 300
AND ProductLine IS NOT NULL
GROUP BY ProductID,
         ListPrice,
         Product.Name,
         ProductLine,
         ListPrice,
         StandardCost
HAVING AVG(ListPrice) < 500
ORDER BY ProductID ASC,
         Product.Name ASC,
         ListPrice DESC

 

Complete Basic Select SQL Statement

The above example was for selecting the data from a single table. The tables will most likely be in normalized form when working on either RMDBS database with dimensional modeling or OLAP cubes and since the data would reside on a large number of normalized tables; while retrieving data, a SQL query needs to be constructed that will join different tables related by Primary keys and Foreign keys. I covered Normalization upto 3NF in this post.

Each table is created/designed to have at least a unique primary key (PK) for every row and the tables can be related to other tables by using Foreign Key(FK) as shown in the example below.

Primary Key and Foreign Key Relation

 

ProductID is the Primary Key in Product table and ProductSubcategoryID is one of the Foreign key. You can see that ProductSubcategory has ProductSubcategoryID as its Primary key and these two tables can be joined through the Primary Key – Foreign key join.

Let’s join Product and ProductSubcategory tables. Note how we applied an inner join on the third row. Right Outer join, Left Outer join or Outer Joins can also be applied depending on what result set you are trying to retrieve.

SELECT  Product.Name , ProductID , ProductLine , Product.ProductSubcategoryID
FROM  AdventureWorks2012.Production.Product AS Product
INNER JOIN AdventureWorks2012.Production.ProductSubcategory AS ProductSubCat
ON Product.ProductSubcategoryID = ProductSubCat.ProductSubcategoryID

 

Resultset 1 basic join

Let’s understand SubQuery concept.  Take tables “Product”, “BillofMaterials” and “UnitMeasure” for selecting products that are of subcategory code “12” and the products should be in “Each”. There are 38 unit measure codes in “BillofMaterials” table  and the Name of those unit measure codes are in the table “UnitMeasure”. So lets say you don’t want to look up the Name from the UnitMeasure table manually to find out the code(for “Each” in this case)  that you want to filter the result set for. We can write a subquery that will get the measure code for selected measure code name.

In short, we will need to add another query (subquery) in the where clause conditions that will retrieve records for “Each” and feed that into the main query’s Where clause condition.

SELECT  Product.Name ,
        ProductID ,
        ProductLine ,
        Product.ProductSubcategoryID,
BillofMaterial.UnitMeasureCode,
BillofMaterial.PerAssemblyQty
FROM    AdventureWorks2012.Production.Product AS Product
        INNER JOIN AdventureWorks2012.Production.BillOfMaterials AS BillofMaterial
ON Product.ProductID = BillofMaterial.BillOfMaterialsID
WHERE ProductLine IS NOT NULL
AND ProductSubcategoryID = 12
AND UnitMeasureCode = (SELECT UnitMeasure.UnitMeasureCode
FROM AdventureWorks2012.Production.UnitMeasure
WHERE UnitMeasure.Name = 'Each' )

Let’s learn about couple of operators -UNION, UNION ALL, EXCEPT, EXCEPT ALL, INTERSECT, INTERSECT ALL etc.

UNION operator will combine data from two or more tables and UNION ALL will retain the duplicate rows.

SELECT ProductSubcategoryID
      ,Name
  FROM AdventureWorks2012.Production.ProductSubcategory
  WHERE ProductSubcategoryID IS NOT NULL
  union
  SELECT ProductSubcategoryID
         ,[Name]
   FROM AdventureWorks2012.Production.product
   WHERE ProductSubcategoryID is NOT NULL

Union Query Example

EXCEPT operator will give results from the first half of the SQL but not from the Tables that are after EXCEPT operator.

Example:

SELECT ProductSubcategoryID
      ,Name
  FROM AdventureWorks2012.Production.ProductSubcategory
  WHERE ProductSubcategoryID IS NOT NULL
  EXCEPT
  SELECT ProductSubcategoryID
         ,[Name]
   FROM AdventureWorks2012.Production.product
   WHERE Name  IN ('Forks','Gloves','Handlebars','Headsets','Helmets')
Result for EXCEPT Operator

INTERSECT operator will give  only the matching rows from the queries.

Example.

SELECT  Product.Name ,
        ProductID ,
        ProductLine,
ListPrice,
StandardCost,
FinishedGoodsFlag
FROM  AdventureWorks2012.Production.Product
WHERE StandardCost> 1000
INTERSECT
SELECT  Product.Name ,
        ProductID ,
        ProductLine,
ListPrice,
StandardCost,
FinishedGoodsFlag
FROM  AdventureWorks2012.Production.Product
WHERE FinishedGoodsFlag = '1'
ORDER BY Name, ProductID

 

      AND ProductLine= 'M'
Intersect SQL Example

While using these operators, the select list should have equal and identical fields in terms of datatype and data length. Also note in the query above for INTERSECT, we put order by clause in the end after both the SQL statements.

In upcoming tutorials, I will give more examples of various analytical functions, nested queries and also examples of Data Definition language.

Normalizing Data

Depending on the design of relational database, the tables might contain lots of duplicate data or might contain incorrect entries of data in single line that you can’t delete without losing the other entry or it may contain data that shouldn’t exist.

Normalization is the process of effectively organizing & rearranging the data in a database into a more standard normal form that prevents these kind of issues.  Normalizing the data is due to the need for data integrity, flexibility and efficiency.  There are several levels of normalization but you might not need to apply all of them to your database. For RMDBS compliance, you can have your database specialist normalize a flat database into three levels at minimum. Here are all the levels listed.

  • First Normal Form (1NF)

  • Second Normal Form (2NF)

  • Third Normal Form (3NF)

  • Boyce‐Codd Normal Form (BCNF)

  • Fourth Normal Form (4NF)

  • Fifth Normal Form (5NF)

  • Domain/Key Normal Form (DKNF)

First Normal Form (1NF) – In this step we will eliminate repeating groups by creating separate table for each set of related attributes and assign a primary key (PK) to each of those tables.

These are the requirements for 1NF

  1. Each column must have a unique name.

  2. The order of the rows and columns doesn’t matter.

  3. Each column must have a single data type.

  4. No two rows can contain identical values.

  5. Each column must contain a single value.

  6. Columns cannot contain repeating groups.

1st Normal form - Analyticalviews.com

 

 

Second Normal Form (2NF) – Take the data that is only partly dependent on the Primary Key and put it in a seperate table. This means that we need to split up the table columns that are of cardinality “many-to-many relationship”. After splitting the data, foreign keys should be defined and added so that the tables can be related. In this step all the non key attributes are fully functionally dependent on the Primary Key.

The requirements for 2NF are listed below

  1. The database tables should be in 1NF form
  2. All of the non key fields are dependent on all of the key fields.

Second Normal form 2NF - Analyticalviews.com

 

 

Third Normal Form (3NF) – In this form, no non key attributes are transitively dependent on the Primary key. Transitive dependency means that one attribute is dependent on the second attribute, which in turn depends on a third attribute. When you are deleting data in a table, and if the tables are transitively dependent on each other, there will be unwanted loss of information. An easier way of understanding this step is that the data that is not dependent on Primary Key  should be separated out and stored in another table.

The requirements for 3NF are listed below

  1. The database tables should be in 2NF form
  2. Should not contain transitive dependencies

Third Normal form 3NF - Analyticalviews.com

 

 

The final tables after performing the first three normalization forms are now ready for use in a relational database.

Normalized tables - Analyticalviews.com

 

BusinessObjects vs Tableau Assessment

Tableau is becoming a very popular and a very rapidly adopted analytical tool. BusinessObjects has long held the crown of being the most popular Business Intelligence & Analytical Platform with its diverse and rich set of tools and capabilities.

Lets look at the assessment on their individual capabilities and how they line up against each other.

BOBJ Vs Tableau Tool comparision

 

I hope this helps organizations make an informed decision based on their needs.

 

Consequences of Over-Promising or Over-Delivering and the Right Process

At the end of the project if you deliver less than what was expected then you are pretty much on the path of risking your reputation. If you deliver more than the expected you are a hero. Many a times, especially with new managers taking new responsibilities and authority, more features are delivered by intention than what was asked for originally in spite of falling behind budget and added constraint on resources. It stretches and burns out the project team by making them work overtime and causing a higher stress level, all because of the manager’s commitment and desire on providing extra features that was never asked or intended in the first place.

This is a detrimental approach due to the fact that there could be business reasons why certain features are shut out on purpose or maybe the features are being held back for next release, maybe their systems are already maxed out on resource usage and cannot accommodate any extra functionality which hasn’t been scoped for. This is commonly known as “gold plating” in project management world and advises to clearly stay out of this practice.

The habit of delivering extra value or features also skews the project baselines for any future projects since the client/business users will perceive receiving extra features as an expected act in spite of being not accounted in the project scope or budget.  Gold platting eventually also leads to feature request by the customer in an ongoing project along with more changes over time. The nature of most businesses is that they change; the direction and strategic goals of an organization or a company changes quarterly or on yearly basis thereby potentially affecting any current & on-going implementations. Any added features or changes requested without invoking change management most certainly leads to a point of failure.  This is due to the fact that added features means more resource effort hours which translates into higher than assessed budget.

An example of correct procedure to follow during any change request is to assess the impact of this change, get authorization and approvals from appropriate authorities, change the scope, time and cost baselines to accommodate the changes, invoke change management and get approvals, change the project schedule and timelines, send clear communications on new dates and milestones and implement the changes.

Now there are times when a development team is working closely with the business user/base and they form a certain level of relationship where the developers are obliged to incorporate constant changes as requested by the client. Managers should be firm in controlling the scope and making sure proper procedures are followed and right parties/team members are involved in an open discussion while making clear its added impact on the ongoing work.

The practice of delivering extra without invoking proper protocols and process will cause more damage to your reputation and impact team morale due to over utilization. So the next time you think of a brilliant idea that will, in your eyes, add value to the business/project, go ahead with discussions by involving stake holders, program managers or client or committee who are in position and authorized to approve any changes and let the proper process take care of the rest!