SQL Tutorial – Data Retrieval (Part 1)

Structured Query Language is the universal language spoken by all the enterprise databases. If you ask a database developer what is his primary spoken language, he will most say “SQL”. That was a joke! But the point is when you are working on databases, understanding the SQL is of foremost importance since it is the building block for writing/reading Stored procedures, Derived Views, Macros etc which are most commonly used form of SQL usage for retrieving the data in a desired form.

Between various databases such as SQL Server, Teradata, Oracle, Netezza etc the statement functions and syntaxes can differ but the structure and fundamental concepts of SQL can be applied to any of them.

We will be using Production schema from AdventureWorks 2012 database on SQL Server 2012 for all our examples. Click on the picture for schema diagram.

Production Schema

SQL statements are categorized by their primary function. Below are the types of SQL Statement.

Types of SQL Statements

Before we start, lets understand the structure of Select Statement since this is the most commonly and frequently used Data Retrieval Statement. The Select and From clauses are required and the remaining clauses are optional.

Select Statement Construct

If the data that we are trying to retrieve lies in two separate tables, they would need to be joined using the Primary and Secondary Key relationship that exists between the two tables.

There are different types of joins and is easier to understand using Venn diagram from set theory,

  • Inner Join – Using this join returns all the common rows between two or more tables.

Select Table_A.*,Table_B.* from Table_A INNER JOIN Table_B on Table_A.Key = Table_B.Key

Inner Join Visual Diagram

  • Right Outer Join - This join as the name suggest will return matching rows from the table that is on the left side (A) of the join and return the entire dataset for the table on the right side (B).

Select Table_A.*,Table_B.* from Table_A RIGHT OUTER JOIN Table_B on Table_A.Key = Table_B.Key

Right Join Visual Diagram

  • Left Outer Join – This join as the name suggest will return matching rows from the table that is on the right side of the join and return the entire dataset for the table on the left side.

Select Table_A.*,Table_B.* from Table_A LEFT OUTER JOIN Table_B on Table_A.Key = Table_B.Key

Left Join Visual Diagram

  • Full Outer Join – Will give you all the rows from all the tables joined using this type of join.

1. Select Table_A.*,Table_B.* from Table_A OUTER JOIN Table_B on Table_A.Key = Table_B.Key

Outer Join Visual Diagram

2. Select Table_A.*,Table_B.* from Table_A OUTER JOIN Table_B on Table_A.Key = Table_B.Key where Table_A.Key is NULL OR Table_B.Key is NULL

Outer Join with NULL Keys

  • Cross Join – This is commonly known as Cartesian product and should be avoided. The join will return combination of all the rows from the two tables and return N x N number of rows.

Select Table_A.*,Table_B.* from Table_A OUTER JOIN Table_B

 

Lets start with an example by constructing a basic Select statement that has all the clauses shown in the picture above..

SELECT  Product.Name ,
        ProductID ,
        ProductLine,
        ListPrice,
        AVG(ListPrice) AS Avg_ListPrice,
        StandardCost
FROM  AdventureWorks2012.Production.Product
WHERE ListPrice > = 100
AND StandardCost <= 300
AND ProductLine IS NOT NULL
GROUP BY ProductID,
         ListPrice,
         Product.Name,
         ProductLine,
         ListPrice,
         StandardCost
HAVING AVG(ListPrice) < 500
ORDER BY ProductID ASC,
         Product.Name ASC,
         ListPrice DESC

 

Complete Basic Select SQL Statement

The above example was for selecting the data from a single table. The tables will most likely be in normalized form when working on either RMDBS database with dimensional modeling or OLAP cubes and since the data would reside on a large number of normalized tables; while retrieving data, a SQL query needs to be constructed that will join different tables related by Primary keys and Foreign keys. I covered Normalization upto 3NF in this post.

Each table is created/designed to have at least a unique primary key (PK) for every row and the tables can be related to other tables by using Foreign Key(FK) as shown in the example below.

Primary Key and Foreign Key Relation

 

ProductID is the Primary Key in Product table and ProductSubcategoryID is one of the Foreign key. You can see that ProductSubcategory has ProductSubcategoryID as its Primary key and these two tables can be joined through the Primary Key – Foreign key join.

Let’s join Product and ProductSubcategory tables. Note how we applied an inner join on the third row. Right Outer join, Left Outer join or Outer Joins can also be applied depending on what result set you are trying to retrieve.

SELECT  Product.Name , ProductID , ProductLine , Product.ProductSubcategoryID
FROM  AdventureWorks2012.Production.Product AS Product
INNER JOIN AdventureWorks2012.Production.ProductSubcategory AS ProductSubCat
ON Product.ProductSubcategoryID = ProductSubCat.ProductSubcategoryID

 

Resultset 1 basic join

Let’s understand SubQuery concept.  Take tables “Product”, “BillofMaterials” and “UnitMeasure” for selecting products that are of subcategory code “12” and the products should be in “Each”. There are 38 unit measure codes in “BillofMaterials” table  and the Name of those unit measure codes are in the table “UnitMeasure”. So lets say you don’t want to look up the Name from the UnitMeasure table manually to find out the code(for “Each” in this case)  that you want to filter the result set for. We can write a subquery that will get the measure code for selected measure code name.

In short, we will need to add another query (subquery) in the where clause conditions that will retrieve records for “Each” and feed that into the main query’s Where clause condition.

SELECT  Product.Name ,
        ProductID ,
        ProductLine ,
        Product.ProductSubcategoryID,
BillofMaterial.UnitMeasureCode,
BillofMaterial.PerAssemblyQty
FROM    AdventureWorks2012.Production.Product AS Product
        INNER JOIN AdventureWorks2012.Production.BillOfMaterials AS BillofMaterial
ON Product.ProductID = BillofMaterial.BillOfMaterialsID
WHERE ProductLine IS NOT NULL
AND ProductSubcategoryID = 12
AND UnitMeasureCode = (SELECT UnitMeasure.UnitMeasureCode
FROM AdventureWorks2012.Production.UnitMeasure
WHERE UnitMeasure.Name = 'Each' )

Let’s learn about couple of operators -UNION, UNION ALL, EXCEPT, EXCEPT ALL, INTERSECT, INTERSECT ALL etc.

UNION operator will combine data from two or more tables and UNION ALL will retain the duplicate rows.

SELECT ProductSubcategoryID
      ,Name
  FROM AdventureWorks2012.Production.ProductSubcategory
  WHERE ProductSubcategoryID IS NOT NULL
  union
  SELECT ProductSubcategoryID
         ,[Name]
   FROM AdventureWorks2012.Production.product
   WHERE ProductSubcategoryID is NOT NULL

Union Query Example

EXCEPT operator will give results from the first half of the SQL but not from the Tables that are after EXCEPT operator.

Example:

SELECT ProductSubcategoryID
      ,Name
  FROM AdventureWorks2012.Production.ProductSubcategory
  WHERE ProductSubcategoryID IS NOT NULL
  EXCEPT
  SELECT ProductSubcategoryID
         ,[Name]
   FROM AdventureWorks2012.Production.product
   WHERE Name  IN ('Forks','Gloves','Handlebars','Headsets','Helmets')
Result for EXCEPT Operator

INTERSECT operator will give  only the matching rows from the queries.

Example.

SELECT  Product.Name ,
        ProductID ,
        ProductLine,
ListPrice,
StandardCost,
FinishedGoodsFlag
FROM  AdventureWorks2012.Production.Product
WHERE StandardCost> 1000
INTERSECT
SELECT  Product.Name ,
        ProductID ,
        ProductLine,
ListPrice,
StandardCost,
FinishedGoodsFlag
FROM  AdventureWorks2012.Production.Product
WHERE FinishedGoodsFlag = '1'
ORDER BY Name, ProductID

 

      AND ProductLine= 'M'
Intersect SQL Example

While using these operators, the select list should have equal and identical fields in terms of datatype and data length. Also note in the query above for INTERSECT, we put order by clause in the end after both the SQL statements.

In upcoming tutorials, I will give more examples of various analytical functions, nested queries and also examples of Data Definition language.