• Earned Value Management – Formulas, Calculations & Examples

    Any project needs a constant monitoring of time, material and cost usage. If you see your team falling behind on schedule, you can add more resources to stay on track. But that could result in increase of project cost (Unless you are not paid overtime!). This concept is commonly known as triple constraint in the(…)

  • Determining the Critical Path and Float

    In spite of Critical Path being such an important method of managing a successful and timely delivery of project deliverables and as the name implies it being “critical”, it is surprisingly overlooked. How many times have you experienced a setback or delay in completing some of the project tasks whether due to any unexpected issues or dependencies(…)

  • A case for using a shared information library

    Over time, an information library can be a great source of information for your team. Whether your company uses Sharepoint or Microsoft TFS, FileStack or OneHUB the idea is to adopt and implement a process & a system for archiving and maintaining code changes for each client as well as any break-fixes or subsequent enhancements(…)

  • SSIS – Forcing a sub package to succeed

    Use this method when you need to force certain tasks to succeed in spite of failure scenarios. This applies to spreadsheets and flat files loads that are non-critical to the remaining flow of tasks and should not cause the entire job to fail. This method involves moving desired load tasks into a new sub package(…)

  • SQL Server Integration Services – An ETL Guide

    SQL Server integration Services has been a tool of choice for me since I first ventured into SQL Server 2005 world. Since then SQL Server has evolved dramatically and more features have been added with increased capabilities. With SSIS2012, you can now do column mapping, declare and change the parameter properties during runtime of a(…)

  • Understanding the Requirements Traceability Matrix

    One of the important documents that stand out during project planning is the requirements traceability matrix (RTM). It is the heart of requirements collection phase and is a very effective document for maintaining, tracing and implementing the requirements as the project progresses. While working in an SDLC project, there tend to be numerous resources involved(…)

  • R.A.I.D Document

    In my earlier post “8 Must-Have Documents in a Project Manager’s Briefcase” I mentioned R.A.I.D document. Today I will dive a little deeper and show you some examples. R.A.I.D stands for Risk, Assumptions, Issues and Decisions. You might see different variations of this document; some people like to use “Dependencies” instead of “Decisions” whereas others(…)

  • Virtualizations on Enterprise Level

    Let’s talk about a popular technology that is changing the way businesses configure, store, manage, distribute and maintain their software as well as hardware systems. There is an inherent shift from mainframe computing to x86 based systems due to performance enhancements and cost effectiveness. With in-built support for virtualization many companies are adopting the virtual(…)

  • Normalizing Data

    Depending on the design of relational database, the tables might contain lots of duplicate data or might contain incorrect entries of data in single line that you can’t delete without losing the other entry or it may contain data that shouldn’t exist. Normalization is the process of effectively organizing & rearranging the data in a(…)

  • BusinessObjects vs Tableau Assessment

    Tableau is becoming a very popular and a very rapidly adopted analytical tool. BusinessObjects has long held the crown of being the most popular Business Intelligence & Analytical Platform with its diverse and rich set of tools and capabilities. Lets look at the assessment on their individual capabilities and how they line up against each(…)

  • Consequences of Over-Promising or Over-Delivering and the Right Process

    At the end of the project if you deliver less than what was expected then you are pretty much on the path of risking your reputation. If you deliver more than the expected you are a hero. Many a times, especially with new managers taking new responsibilities and authority, more features are delivered by intention(…)

  • 8 Must-Have Documents in a Project Manager’s Briefcase

    Ok so it’s not the 1980’s and we don’t carry briefcases (unless you work at Wall Street!!) but let’s say there is a shared location on the network where Project Managers can archive and share certain documents with the intended members. What are the documents you are most likely to seed? I compiled a list(…)

  • Spotting a Quality IT Resource

    What makes a developer great or stand out among others? Should the focus be solely on their technical skills? To what extent should their other non-technical skills and qualities be factored in? Project managers know that the success of their projects depend on having excellent developers in their team.  Hiring top notch developers is directly(…)

  • Myths surrounding Dimensional Models

    I came across an interesting article dispelling myths around dimensional models and thought I will discuss them here. Myth 1: Dimensional models are only for Summary level data – This myth is result of ill-designed or planning a design that caters to only a certain applications requiring a narrow set of query results. Dimensional models(…)

Earned Value Management – Formulas, Calculations & Examples

Any project needs a constant monitoring of time, material and cost usage. If you see your team falling behind on schedule, you can add more resources to stay on track. But that could result in increase of project cost (Unless you are not paid overtime!). This concept is commonly known as triple constraint in the Project Management world and it means that if either one or more constraints (time, cost or schedule) change in positive or negative direction, the other constraints will be affected as well.

To stay on track we universally use the tried and tested Earned Value management calculations. Not only are they used for monitoring and control of a project; they are a valuable set of tools for reporting performance to the stakeholder or sponsors or your program manager.

Here is an example of a Project Performance Report.

Performance Report sample

Changes in project baselines can occur due to either internal reasons or external factors; for example internal reason could be resource turnover and external reason could be change in requirement by business)

Lets assume there is a change in requirement and we invoke change management to have those changes approved and added to the project baselines. That will cause an increase in budget & time and they would need to be recalculated. Now we might also need to forecast “If” situations such as what if the schedule is immovable then how many more resources would I need to add and what would be the additional cost incurred?

These questions and analysis can be easily done via EV calculations.

Here are the terms and formulas that are most commonly used. I created this diagram depicting the terms in pictorial form for easier understanding.

Earned value management Diagram

  • Planned Value (PV): This is the estimated value of work that is planned to be done.
  • Earned Value (EV): EV is where you are currently in the project. i.e, earned value of work completed.
  • Actual Cost (AC): This is the actual cost of the work completed.
  • Budget at Completion (BAC): This is the project’s total budget that is determined by the P.O.
  • Estimate at Completion (EAC): This is a forecasting term that will help in determining expected total cost in the end. EAC = AC + (BAC – EV)
  • Estimate to Complete (ETC): This is expected remaining cost. (from now until the end) ETC = EAC-AC
  • Cost Variance (CV): EV-AC. You want your earned value higher than the actual cost.
  • Schedule Variance (SV): You again want to keep your SV positive.
  • SV=EV-PV
  • Variance at Completion (VAC): This is the variance calculation to determine whether the project is running over or under budget.

Here is an example , let’s say we hired a coder with billing rate of $50/hour to code 500 lines of code in 5 days (total of 40 hours). The coder is expected to code 100 lines/day and work 8 hours/day.

Scenario on Day 1:  Coder worked 6 hours and finished 200 lines of code

  1. PV = $400    (per 100 lines of code or 8 hours of work at $50/hour)
  2. EV = 200 lines of code x $4/line of code = $800
  3. AC = 6 hours x $50/hour = $300
  4. CV = EV – AC = $800 – $300 = $500
  5. SV = EV – PV = $800 – $400 = $400
  6. CPI = EV/AC = $800/$300 = 2.67
  7. SPI = EV/PV = $800/$400 = 2
  8. BAC = Total Project Budget = $400/day x 5 days = $2000
  9. EAC = AC + (BAC – EV) = $300 + ($2000 – $800) = $1500     — Atypical, that is, variable work each day by ignoring the past. To be used when all future work will be accomplished at the budgeted rate. –
  10. EAC = BAC / CPI = $2000 / 2.67 = $750  — Typical, that is, assuming same amount of work per day and same spending per day will continue. –
  11. EAC = AC + [(BAC – EV) / (CPI x SPI)] = $300 + [($2000-$800)/(2.67 x 2)] =  $525   -Used when dealing with poor cost performance and the completion date is firm.
  12. ETC = EAC – AC = $1200
  13. VAC = BAC – EAC = $500
  14. TCPI = (BAC – EV) / (BAC – AC) = 70.59%

Perform same calculations for the following daily scenarios:

Scenario on Day 2:  Coder worked 8 hours and finished 50 lines of code

Scenario on Day 3:  Coder worked 6 hours and finished 100 lines of code

Scenario on Day 4:  Coder worked 10 hours and finished 50 lines of code

Scenario on Day 5:  Coder worked 10 hours and finished 100 lines of code

Earned Value Management Formulas and Calculations example

 

EVM Formula graph

I created this spreadsheet that will perform all the calculations for you and also give you budget status on daily basis(See “Over or Under Budget?” column). Just input the cells highlighted in Yellow.

And here is the attachment for the Excel sheet –> Excel Spreadsheet for EVM Formulas and Calculations

Determining the Critical Path and Float

In spite of Critical Path being such an important method of managing a successful and timely delivery of project deliverables and as the name implies it being “critical”, it is surprisingly overlooked. How many times have you experienced a setback or delay in completing some of the project tasks whether due to any unexpected issues or dependencies or identified risks and things of that nature? To stay on track, critical path helps you mitigate the risk of falling behind the set & agreed schedule and also assists in managing those delays in a way that will cause no real negative effect on the firm project completion date.

Critical path method is used to calculate early & late start dates and finish dates without taking into consideration any resource constraints. Critical path method highlights that one path, out of various different paths in your schedule network diagram, that results in latest completion date of the project. Some of the tasks with a calculated float greater than “0” can be delayed but the tasks that fall under the critical path cannot be delayed. Any delay will cause increase in resources, schedule and related cost. Microsoft Project has an option for automatically displaying the critical path that should be monitored closely by any project manager. The rationale being that there could be delays on certain tasks but the maximum delay (without affection timeline) could only be what is permissible by the calculated slack or float. So when you notice your team is going slow on certain tasks and missing the completion date, you will need to figure out a way to restructure the dates for those delayed tasks in such a way that the project completion date is not affected. This is where critical path and slacks comes into picture.

We can easily find the critical path using Microsoft Project and also visually highlight the entire path in Gantt chart. Here are the sample tasks I created in MS Project; You can see the tasks highlighted in yellow(in task list) and red (in Gantt chart) are the tasks on critical path.

Critical Path in Microsoft Project with slack

 

Now let’s assume Task 2 has been delayed to start on 5/29 instead of starting on 5/21 as originally planned. Since Task 2 has a slack of 9 days, we can delay the start date upto maximum 9 days before affecting the project completion date. See below on Gantt Chart how moving task 2 has no negative effect on the total time frame.

Critical Path Non Critical Tasks moved

 

Calculating the slack manually will clear this concept from the ground level. Let’s say you have 9 WBS tasks in your project plan and the durations have been determined for each task. Next step is to determine the predecessors for each subsequent task. This forms a network schedule diagram which will be used for calculating the Critical Path.  Start by designating the acronyms ES, EF, LS and LF to the 4 corners of the task box as shown in the diagram below.

Critical Path and Float1

Next step is to perform Forward Pass calculation to determine earliest start and earliest finish of each task. You start by assigning the value 0 to your first ES on Task 1.  You will then add the duration for task 1 to the ES value (which is Zero for first task) and determine the EF. So EF for task 1 is 0 +2 = 2.

You will carry the values of EF =2 to ES of task 2 and ES of Task 3 since both the the successor tasks. Again you will add ES + Duration on Tasks 2 and 3 to get EF values.

Now before you proceed, notice that Task 4 has two predecessor tasks. So you will need to calculate the ES for both of the predecessor tasks (Task 2 and 6) and pick the highest EF value.

After calculating the Earliest Finish for Task 2 and Task 6, we will pick 16 as ES for Task 4 since that is the highest number out of 7 and 16.

Keep repeating these steps for each task until all ES and EF are calculated and determined.

Critical Path and Float2

Backward Pass needs to be performed for calculating the LF and LS and is followed in the opposite direction of forward pass, i.e, from Task 9 to Task 1.

Start by taking the Earliest Finish value of the last task and assign it to the Latest Finish of that new task. Now subtract duration from the Latest Finish to get Latest Start.

The LF of Task 9 is 37, so the Latest Start will be 37 – duration 3 = 34.  Designate LS value to the LF of preceding task (from right to left perspective).

Keep performing the same steps until you reach Task 3. As you can see the values for LF of Task 3 can be carried over by both Task 5 and Task 6. Unlike forward pass method where you pick the highest values you will instead pick the lowest of the number. So the correct value carried over to LF of Task 3 will 10 (from 10 and 18 of Tasks 6 and 4 respectively)

Critical Path and Float3

The final step before we can determine the tasks falling on the critical pass is to calculate the float or slack. Float is the number of days a particular task can be delayed until the project completion is affected negatively.

To calculate float you will need to subtract Earliest Finish from Latest Finish (LF-EF) for each tasks. After calculating the float for all the tasks, you will note that there is a path where all the tasks have Zero float. That means there is no slack or delay allowed on any of those tasks. In our example, the critical path is in the order of Task 1, 3,5, 4,7 and 9.

Critical Path and Float4

Critical Path Report

Here is the video depicting each steps.

A case for using a shared information library

Over time, an information library can be a great source of information for your team. Whether your company uses Sharepoint or Microsoft TFS, FileStack or OneHUB the idea is to adopt and implement a process & a system for archiving and maintaining code changes for each client as well as any break-fixes or subsequent enhancements using SharePoint site.

The Objective

  • Help track changes for each implementation.
  • Check in the code changes per bank and maintain a history of all code revisions.
  • Use of a break-fixes SharePoint page that will contain all the changes made by anyone archived by date.
  • Have a separate section or page that includes all the comments or important information about an implementation.
  • Shared internally across various gourps such as Product Development Group and Product Support Group.

The Opportunity

  • There are many “break-fixes” or modifications that are added to an existing code base over a period of time. Tracking those changes as well as handling the job failures in absence of the primary developer can be challenging and a backup resource will face numerous challenges without the 360 degree knowledge of a particular implementation.
  • Will serve as a search site for any keywords to quickly check if there is any information around it. For example, a keyword search of “Custom code to rollup particular loan types” will yield a result of any items that were checked in or added on the SharePoint site. The scope of the search could include all the banks and will return results related to the keyword.
  • Repository of stored information may include added code snippets, additional logic or requirements implemented in the past, known bugs, discussions and log of any relevant information.
  • Will minimize the amount of time needed for a new or a backup resource to jump into job failures or other issues. Such site will also serve as a knowledge base and as a reference point to check before diving deep into any task.
  • Global search on SharePoint will enable your to look for a particular piece of information across all the banks. Such a wealth of knowledge will increase clarity and negate the reason to reinvent a similar work product that might already be in use at some other bank.
  • Important communications as well as details and insight on all clients can be maintained and retrieved when needed.
  • Such a site will help capture more accurate and detailed information along with the business rules and logic.

Solution

  • Create a SharePoint site that will have following features –
    • Code Versioning and comments on all subsequent check-ins of the original code.
    • Archival of all the changes along with discussion board and comments section.
  •  Site for maintaining other key or supplemental information, which can be beneficial to all, as a common knowledge.
    • Create a site for archiving a set of common code that can be reused at a different implementation.
  • Site for archival of “know-how” and “How to” knowledge or tips.

Other Benefits

  • Reduced work time for any fixes or “job fail” resolutions.
  • Easy transition of knowledge between responsible resource and new or a backup resource.
  • Negating the need to re-invent the wheel by making use of existing code library.

 

CONCLUSION

With a growing list of  clients, incorporating a shared repository system will surely help meet the challenges of maintaining relevant and critical information for each implementation.

SSIS – Forcing a sub package to succeed

Use this method when you need to force certain tasks to succeed in spite of failure scenarios. This applies to spreadsheets and flat files loads that are non-critical to the remaining flow of tasks and should not cause the entire job to fail. This method involves moving desired load tasks into a new sub package and calling the sub package from parent via Execute Package Task. A working example is set up on TEST server for reference.

The set up and configuration is as follows ->

1

In this example we will move the Flat File load Task to a new Sub-package.dtsx that will in turn be called from Execute Package Task.

Double Click on Execute Package Task à Package à ReferenceType. Change it to “External Reference”, the reason being that when the parent package is executed from SQL Agent job the child package will be accessible and be able to run. Skipping this step will result in an error shown below.

 

2

 

3

After switching to External Reference, you’ll need to specify a connection from where the SQL Agent job will run.

4

After selecting the connection, select your sub package containing file loads.5

Go back to the parent package, highlight “Execute Package Task” and open up Properties for that Task. Change the property value of “ForceExecutionResult” from “None” to “Success”.  Make sure you don’t accidently set this for entire package!

5

 

Click on “Event Handlers” tab, and select “Execute package task” from the drop down list.

6

 

Click on the blue colored link as seen on the above screenshot. This will create an “OnError” event for “Execute Package Task” and thereby exposing new system variables that are otherwise not accessible. Make sure to click on grid option icon and select “Show system variables”.

7

 

Scroll down and locate “Propagate” variable and change it to False from the default value of True. This will ensure that any error that occurs in that component (Execute Package task in this case) will not propagate to the other elements of control flow. Make sure this is done ONLY on the OnError Event of “Execute Package Task”.

8

 

9

 

Add a “Send Mail Task” with an On Failure path of the precedence constraint. This will send an email notifying the intended recipient of sub-package failure.

10

SQL Server Integration Services – An ETL Guide

SQL Server integration Services has been a tool of choice for me since I first ventured into SQL Server 2005 world. Since then SQL Server has evolved dramatically and more features have been added with increased capabilities. With SSIS2012, you can now do column mapping, declare and change the parameter properties during runtime of a package, Script task and script component for running VB or C# code, Connection manager where you can create and share connections on a project level, the merge join transformations now consume reduced memory usage, the DQS cleaning transformation that helps you do cleansing transformations with relative ease etc.

One of the biggest feature of SQL Server 2012 is introduction of the In-memory OLTP execution. The in memory feature is becoming a very sought upon by the consumer due to the dramatic performance gain as well as dropping price of physical memory modules. Where SQL Server 2012 claims to have 5-20 times performance improvements, SAP Hana with its new architecture, claims to boost the performance by a whopping 300 times! As soon as I get my hands on Hana, I will do the comparison test. In-memory optimization gets this performance boost by taking advantage of hyper threading technology and enables concurrent processing that eliminates logical locks on the tables. The Stored procedures are also compiled natively and the result set is held in the memory. This is all on top of a massive algorithm change that supports faster processing.

Anyways, let us get back to the ETL discussion. Where do ETL as well as other tools/capabilities like OLAP cube, Universe semantic layer, end systems, source systems stand in the overall picture of the systems setup?

I created this to understand the role of each layer.

Business Intelliegence System Overview

 

From a pure information system’s perspective, it all starts with the source system. But what I didn’t show in the diagram above is that the source systems gets the data from an actual physical event like sales information from P.O.S system, Payroll data from Time and Attendence system, customer information using CRM system and so on. Let us say a company uses SAP to place orders, create invoices and stores shipment data. They are all tied to a physical event that happened in real world. The data entered in those system are then stored into the application’s native database or external databases. ETL comes into picture at this point and enables to work with raw data by performing cleansing task, data profiling, checking the referential integrity, NULL errors & invalid/corrupted records, transforming the data for storing in a dimensional or OLAP compliant schema. That’s just a very high level description but a lot goes on behind the scene and we’ll see some of that in coming sections. The ETL script or package will store the data into their respective tables with proper cardinaility and data integrity into a data warehouse. The end systems where the final data is presented could be accessed directly from the data warehouse(through applications developed using .NET, Java, HTML5 APIs etc) or by using intermediate tools like BO universe designer, SQL Server Analytical cubes etc. Each has it’s own strength and set of capabilities. For example, if the objective is to create a reporting system where users can do data mining or analysis, the SSAS and BO universe will provide that capability by allowing  addition of rules, calculations etc that can be used in friendly GUI for even a non technical user.

Let us dive into SSIS tool. When you log in to SQL Server Data Tools (previously called BIDS- Business Intelligence Development Studio), you will be prompted to pick a project for either SSIS, SSAS or SSRS. Once you create an Integration Services project, you’ll see the a package with dtsx extension being created. The dtsx package is in the XML form and you can see the code by right clicking on the package name from Solution Explorer and selecting “View Code”.

SQL Server Data Tools - SSIS

 Click on the picture for larger view.

The view that you see in the picture above is the default layout for SSIS.  On the right side is the Solution explorer where you can create a new connection, packages etc and in the center you can create Control Flow and Data Flow designs, on the bottom section you can manage project level connections and declare Variables in the Variables section.

Control Flow is where you define various tasks that are arranged in a logical order and sequential pattern to get the desired result.  Control flow contains various task components like Data Flow task, FTP, Script task,  execute process task and containers like For Loop, For Each etc. Each task can be executed in order by using “precedence” operator. It is nothing but a path that the process will take in case of success, complete or failure.

Tasks are individual work functions that will have that will execute a particular code based on the type of task with some user input. The great thing about SSIS is that users don’t need to insert a function script in a code form, they can just be dragged and dropped. Try dragging a task into Control Flow and double click on the task. A task editor window will open up  which will let you set the parameters, expressions, script, properties etc.

Tasks are generally executed sequentially and each task has either a  success, failure and completion value that gets evaluated based on constraints, expression or a combination of both in logical OR and logical AND. (We’ll see those shortly in an example).

Here are a few of the most popular tasks.

  • Analysis Services Processing Task – This task will process a SQL Server Analysis Services Cube, dimension or a mining model. Frequently used to update the analytical cube with latest data.
  • Bulk Insert Task – This task executes BULK INSERT SQL command to load data into  a table.
  • Data Flow Task  – Data flow is the heart of the ETL package where various transformations happen in the data pipeline. This is the most frequent and obvious task used in any package. The data flow task will let you connect to any data source and will output the data based on the transformations performed. When ever this task is added in the control flow, a data flow task model is automatically created that can be accessed through a tab next to “Control Flow” tab.
  • Execute Package Task – This task lets you execute a package from within a package and making it modular.
  • Execute Process Task – This task will execute a program external to the package. The Task editor will let you select a program on your operating system that will be executed with input, output parameters and also using an expression.
  •  Execute SQL Task – This is another popular task frequently used by the database architect for executing a SQL statement. For example the SQL might insert data from the source (ex-excel sheet) as soon as the excel sheet is loaded with data from another task.
  • FTP Task – Lets you connect to a server using the FTP protocol for sending and receiving files.
  • Send Mail Task – Will let you send an email using SMTP protocol. Commonly used to send out a custom message on success, failure or completion of a task.

Precedence Constraints are the links that connects various tasks and directs the flow of task execution in the designed order.  Business rules or logic can be applied as a part of an expression. For example, execute the task only when a defined variable has a value of greater than 100. The constraint values determines whether and when the next task will be executed in response to the preceding task’s execution.

There are three constraint values –

  • Success – A task link to another task will only execute when the first task successfully completes.
  • Failure – In linked tasks, the next task is executed only when the preceding task fails to complete.
  • Completion – The linked task executes regardless of success and failure on completion/execution of preceding task.

Precedence Constraints

 

 

Containers are used for grouping various tasks together logically into units of work. This is not helps with visual aspect but also enables defining variables and event handlers on the containers itself. There are 4 types of containers-

  • Task Host Container- This is not a visible item but just an abstract interface concept.
  • Sequence Container- This container allows grouping tasks logically with separate boundaries. This container can be collapsed and expanded as needed.
  • For Loop Container- This sets up a loop for all the tasks within a container that will be executed until a condition is satisfied.
  • Foreach loop Container-This sets a loop on files or records and executes the task for each of the records.

Data Flow is the core of any SSIS package where most of the action takes place. It enables you to authenticate and connect to the database, load the relevant data into database engine’s memory, perform transformations like row sampling, term lookup, importing columns, fuzzy logic etc and export the data to an ODBC, OLE DB, Raw file, excel, sql server etc destinations. This is the place where in-memory processing of data takes place and really shows the performance edge of SQL Server database engine. The connections that can be used for sources and destinations are actually defined in the connection manager thats globally available for any data flow tasks.

 

Note: More Content and Examples to follow. Stay Tuned.

 

Understanding the Requirements Traceability Matrix

One of the important documents that stand out during project planning is the requirements traceability matrix (RTM). It is the heart of requirements collection phase and is a very effective document for maintaining, tracing and implementing the requirements as the project progresses.

While working in an SDLC project, there tend to be numerous resources involved for different phases. We can’t expect all of them to speak a common language in terms of referring to an item or a task or even the process itself due to different background on their skill set and type of job roles.

One term I often see people getting confused is the requirements traceability matrix and Mapping matrix. The key distinction is that the former is furnished by the Analyst throughout the requirements collection phase and the latter is produced during the BRD or mapping of physical requirements to their technical components (example – an item tracked through different schemas and databases and associated business rules).

The requirements captured and finalized in the requirements traceability matrix will help shape the WBS tasks and Activities list, which in turn will be used for developing, project schedule. That was a very high level description of the process but when you put all that into action there are many steps and many documentations resulting in the end.  That’s a topic of discussion for another day!

Going back to the Requirements Traceability Matrix, the project team manager and the analyst will work closely with the SMEs and stakeholders to capture their vision and goals of the project objectives more accurately and also capture supplemental information like who requested the requirement, business justification, affected components, business rules, related WBS entry, source etc. that comes in handy during the entire project. We’ll see some of the elements used in the matrix more closely in the attached sample.

So how are these different elements captured and what are the steps involved?

The requirements collection process starts with the project manager meeting with the SMEs and Stakeholders by setting up one or more of the following activities.

  • Interviews
  • Focus groups
  • Cross-functional team meetings
  • Brainstorming sessions
  • Facilitated workshops
  • Prototype (my favorite and most common)
  • Surveys and Questionnaire

In my experience, acquiring the prototype and following it up with additional questions around the components requested is one of the most effective ways of gathering requirements. Without examples or prototypes, the analyst will be left scrambling to understand the business requirements and translate them into technical deliverables. This is because at times the business users/stakeholders tend to be vague in what they want in terms of intrinsic needs while defining a deliverable. I shouldn’t say this but many of the users/stakeholders are of wavering mind and will change their requirements as the project timeline advances. Which in my opinion is fine because during project progression the corporate goals and direction of the organization might change with their changing consumer market and those changes in goals would need to be aligned and incorporated in an on-going project. This is where RTM comes in handy and is the master sheet for “tracing” the requirements and the details around each of them. One thing to remember is that there should be unique identifier for each requirement and they should never be deleted or re-ordered.  This is so each of the requirements that are referenced throughout the communications and also included in the BRDs or other documents like RAID would mess up the association and cause confusion. I always use a flag on all the entries that states if the requirement is current or outdated & whether it is needed or not.

 

Here is an example of a Requirements Traceability Matrix document.

Requirements Traceability matrix example

R.A.I.D Document

In my earlier post “8 Must-Have Documents in a Project Manager’s Briefcase” I mentioned R.A.I.D document. Today I will dive a little deeper and show you some examples. R.A.I.D stands for Risk, Assumptions, Issues and Decisions. You might see different variations of this document; some people like to use “Dependencies” instead of “Decisions” whereas others use “Action” in place of “Assumption”. Personally I feel that the dependencies are anyways captured as a part of issues and action items are also listed for mitigation of  risk and issues log. So it would make more sense to use the categories – Risk, Assumptions, Issues and Decisions.

This is what I wrote in my earlier post and I am just pasting it back instead of reinventing the wheel!

R.A.I.D Document- This document will have sections for documenting and maintaining risks, assumptions, issues and decisions logged throughout the project lifecycle.

  • Risk Log – This document would contain the detail about the risk, affected components, impact on timeline, level of severity, team member(s) responsible for resolving the risk, action needed, cause of the risk, impacted group, impacted work packages, resolution target date etc. All risks should be numbered uniquely and in the order of discovery and should be referred by that number throughout the project life cycle.

RAID Document - Analyticalviews.com

  • Assumptions- This section will list out all the assumptions before the commencement of project and during project execution as you follow through your activities. Every project will have assumption such as availability of required data, estimated disk space for database, availability of required software and things of that nature.

RAID Document - Analyticalviews.com

  • Issue log – This section is similar to risk log and would contain list of all the issues numbered uniquely and in order they were received along with the impact of the issue, priority,  affected components, team member(s) responsible for resolving the issue, status indicator, actual reported date, projected completion date, dependencies etc. Entries in the issue log should never be deleted or moved once they are resolved or there is change in status or priority and should have unique id so they can be refereed throughout the project by that id.

RAID Document - Analyticalviews.com

  • Decisions- This section documents the decisions taken by any individual authorized in their position to do so, decision details, department/group, date recorded, impacted and affected components.

RAID Document - Analyticalviews.com

Hope this is helpful to anyone trying to build a template. Or just contact me if you need my excel sheet.

Virtualizations on Enterprise Level

Let’s talk about a popular technology that is changing the way businesses configure, store, manage, distribute and maintain their software as well as hardware systems. There is an inherent shift from mainframe computing to x86 based systems due to performance enhancements and cost effectiveness. With in-built support for virtualization many companies are adopting the virtual architecture in driving the software needs for their business.

Virtualization is a way of presenting a logical or virtual view of separate environments working independently or together by abstracting applications and components from a system consisting of hardware, software or both. The logical view can be different from the parent system as the virtual systems depend for their existence in terms of form and functionality on core or fundamental components of the parent system. This means any virtual view will piggy-back on the parent application/system to mimic or enhance all or a handful of functions into separate sub-systems customized according to the need and providing benefits like increased performance, scalability, cost effectiveness, maintenance etc.

When we hear the word “Virtualization” we think of VMware or Parallels software that will let you run Windows8/Windows Server 2012/Linux on a Mac OS or on another OS. On enterprise level there are different types of virtualization technologies providing solutions that tackles a specific field of interest or an area and can be customized or activated to work on their own or be integrated to work as part of a system.

Here is a nice little diagram I created to highlight various types of virtualization options that can be implemented to the need.

  1. Access Virtualization – This type of virtualization allows any device to access and use the application on a common environment without the need to know much about each other. The device will know how to connect to the application system and the application will accept those devices that were setup to connect and utilize the system without any hiccups or issues. An example could be remote devices like a laptop, tablet, think client, POS system, and terminals with different hardware accessing a server that runs an application capable to work with those devices.  This helps an organization be independent in deal with changes on the hardware side and preventing to rewrite the entire applications. Say for example a company makes a switch from Android based tablets to iOS based iPad and with access virtualization there is seamless migration and transition in accessing the same applications.
  2. Application Virtualization – This type of virtualization lets an application to be used on multiple and different systems. These applications would typically use an API or a framework to integrate the application with base systems regardless of their native underlying hardware. There would be some configuration and other customizations while integrating applications on different systems but overall it provides greater flexibility and operating system independence. Different vendor companies developing such software include Microsoft with their Microsoft Application Virtualization or App-V, Citrix with their popular XenApp product and VMware with their ThinApp product. Another type of application virtualization would be similar to porting of legacy applications when older hardware are decommission and not supported.
  3. Processing Virtualization – This type of virtualization operates at or below the Operating System level. There are different ways processing virtualization can make use of an operating system  that is encapsulated to work separately or as part of a system for workload sharing and failover or for linking such systems to distribute data to separate areas.  What that means is that one system can be made to appear as if there are many or many systems appear to work as one. There are scenarios where parallel computing is needed for certain set of work while raw GPU is needed to drive some graphics. Processing Virtualization helps dealing with such scenarios by letting Hypervisor or Virtual machine manager handle requests and delegate them to proper virtual system by managing and diverting the resources such process threads, GPU utilization cycles etc where they are needed.
  4. Network Virtualization – This type of virtualization pertains to dealing with links of system and their availability based on the security setup. Multiple systems can be virtualized to be used by different users based on their access level. For example finance users will have access to the finance virtual system while sales will have access to sales virtual system while both of them being hosted on the same hardware. These systems are set up in network to be made available to their user base.
  5. Storage Virtualization – This is one of the most common types of virtualization that is being adopted by various companies to manage their growing needs for data storage and access. Cloud based computing has really pushed the storage virtualization into every sector whether its enterprise or general consumer. This type of system where data storage can be dynamically managed by altering their capacity, resources and processing request has helped business with their changing application needs. Cost effectiveness is also a major factor as companies can start with a common server and use it for virtualizing several storage applications to be used across different groups of a company. For example, a server can host virtual database systems like Oracle, SQL Server or Teradata which can be accessed by different applications. The beauty of this setup is that constraints related to capacity, processing power, priority, user management etc. can be eliminated.  As need for growing data imparts need to add more storage capacity, newly modules of hard drives can be seamlessly and effortlessly distributed across different virtual database systems or streamlined towards the database that has the greatest need. The same stands for the other way in terms of accessing data through various medium from a common source. On consumer product level, an example would be Dropbox a cloud data storage service where users can start with certain amount of space but can buy more as their needs grow. It beautifully works with any device in accessing the stored data.
  6. Hardware Assisted Virtualization – This type of virtualization is actually more of a collection of techniques and hardware designed on a much lower level of virtualization. The newer chipsets have Virtualization capability at the hardware level and with related instruction sets featuring directed I/O, Extended Page tables, Rapid Virtualization Indexing, Tagged TLB, security etc. Such system has inherent support for hosting more Virtual machines per server and enables the software to interact more effectively with greater consolidation. Some of the noted capabilities include bypassing the Virtual machine manager or hypervisor in managing the low level instructions for memory management, I/O and hyper threading with compatible processor. Examples of hardware assisted virtualization include AMD-V technology in AMD chipsets and Intel VT in Intel chipsets.

Normalizing Data

Depending on the design of relational database, the tables might contain lots of duplicate data or might contain incorrect entries of data in single line that you can’t delete without losing the other entry or it may contain data that shouldn’t exist.

Normalization is the process of effectively organizing & rearranging the data in a database into a more standard normal form that prevents these kind of issues.  Normalizing the data is due to the need for data integrity, flexibility and efficiency.  There are several levels of normalization but you might not need to apply all of them to your database. For RMDBS compliance, you can have your database specialist normalize a flat database into three levels at minimum. Here are all the levels listed.

  • First Normal Form (1NF)

  • Second Normal Form (2NF)

  • Third Normal Form (3NF)

  • Boyce‐Codd Normal Form (BCNF)

  • Fourth Normal Form (4NF)

  • Fifth Normal Form (5NF)

  • Domain/Key Normal Form (DKNF)

First Normal Form (1NF) – In this step we will eliminate repeating groups by creating separate table for each set of related attributes and assign a primary key (PK) to each of those tables.

These are the requirements for 1NF

  1. Each column must have a unique name.

  2. The order of the rows and columns doesn’t matter.

  3. Each column must have a single data type.

  4. No two rows can contain identical values.

  5. Each column must contain a single value.

  6. Columns cannot contain repeating groups.

1st Normal form - Analyticalviews.com

 

 

Second Normal Form (2NF) – Take the data that is only partly dependent on the Primary Key and put it in a seperate table. This means that we need to split up the table columns that are of cardinality “many-to-many relationship”. After splitting the data, foreign keys should be defined and added so that the tables can be related. In this step all the non key attributes are fully functionally dependent on the Primary Key.

The requirements for 2NF are listed below

  1. The database tables should be in 1NF form
  2. All of the non key fields are dependent on all of the key fields.

Second Normal form 2NF - Analyticalviews.com

 

 

Third Normal Form (3NF) – In this form, no non key attributes are transitively dependent on the Primary key. Transitive dependency means that one attribute is dependent on the second attribute, which in turn depends on a third attribute. When you are deleting data in a table, and if the tables are transitively dependent on each other, there will be unwanted loss of information. An easier way of understanding this step is that the data that is not dependent on Primary Key  should be separated out and stored in another table.

The requirements for 3NF are listed below

  1. The database tables should be in 2NF form
  2. Should not contain transitive dependencies

Third Normal form 3NF - Analyticalviews.com

 

 

The final tables after performing the first three normalization forms are now ready for use in a relational database.

Normalized tables - Analyticalviews.com

 

BusinessObjects vs Tableau Assessment

Tableau is becoming a very popular and a very rapidly adopted analytical tool. BusinessObjects has long held the crown of being the most popular Business Intelligence & Analytical Platform with its diverse and rich set of tools and capabilities.

Lets look at the assessment on their individual capabilities and how they line up against each other.

BOBJ Vs Tableau Tool comparision

 

I hope this helps organizations make an informed decision based on their needs.

 

« Older Entries