Monthly Archives: September 2013

Brainstorming on Project Kickstarter – Creating the next social media Assistant and Trend mapper powered by your posts

•       Users can pose a question to the crowd.

•       Questions to be sorted and organized by categories.

•       Questions posed will appear on user’s screen based on the keywords, interests, topics a user picks while creating their profile.

•       Users to get points if they decide to answer a question.

•       Best answer to be decided or promoted by their number of endorsements (up/down) or once a limit of endorsements is achieved.

•       Over time, a database will be created with best answers.

•       Users can view other answers by their ranking.

•       Any question previously asked will get an automatic answer based on the best answer.

•       Information to be distributed based on the popularity of the crowd. Crowd powered.

•       Analyze replies to decide on agree/not agree on the original post.

Collection of Database Concepts

I will use this section to add bits of assorted information related to databases and with time I will revisit this section to add more mini-topics.

Schemas-

In a big organization what are the chances that groups or individuals would create views with the same name? It happens very often, for example let’s say there are separate divisions operating under a parent company selling products in completely different category and market. The business lines will obviously have their own strategies, processes and the way they do their daily business in every aspect while making sure their fundamentals are aligned with the parent company’s goal. IT resources from both divisions working on sales data from the parent company’s database will most likely create logical views or macros to support their line of business and the chances are that out of hundreds of views, some will be named the same. To avoid this scenario, RMDBS provides a hierarchical cataloging of views/tables/procedures by their schema names. The database admin can create a schema for any user or group and the schemas can have views with the same name. This helps in identifying the right entity and provides flexibility in designing and working on multiple applications by multiple groups in a company. The views would be used by the schema name followed by the view name. For ex schema1.inventory, schema2.inventory.

So in short a schema is basically a space for different users or groups with separate access privileges and gives the ability to store, run or edit views or stored procedures.

Joining Fact Tables -

Each business process translates into a dimensional table containing related contextual data based on the similar category of description. For example a sales model can have a fact table containing the measures of different quantities of unit sold with a bunch of dimensional tables surrounding this fact table. There can be a location table that would contain information of the retailers who sold the unit quantities; this information would contain retailer code, name, address, zip code, store open indicator, active flag and so on. Another table would be sales calendar containing the company sales calendar with different groupings by, say for example, week end dates, month, quarter, year etc.  Each of these dimensional tables would have a primary key through which they can be joined on a many to one relationship with the fact table.  In the simplest form a fact is surrounded by a number of dimensional tables joined through a key. This allows the query engine to pick up the data requested by the constraints and scan the fact table for related entries.

Dimension Tables for Textual Context Usage

Dimension tables are integral part of any fact table dimensional modeling schema. Dimension tables contain textual context that are associated with a business process measurements stored in fact tables.

Such a table can contain anywhere between 20 to 100 attributes or columns. These tables are smaller in terms of number of rows but are considered wide with much large text content.  Each dimensional table has to have a primary key through which they can be linked or associated to the fact table. The attributes serve as primary source of grouping, query filters, constraints or labels. Attributes can at times have cryptic or coded names but it is considered good practice to name them according to the description of the content of a column. This makes it easier for user/developer to understand and use rather than memorizing what each coded name means.  Although many time the operational codes have some logic to the naming convention which is aligned with the applications and the source data used. For example, the first four letters could be related to area and the next three could be district and so on.

It is said that the analytical power of any DW/BI is directly proportional to the quality and depth of the attributes.  A dimensional table designed to contain multiple levels ranging from the most granular to the highest possible under the business rules and capabilities will give the most flexibility in usage of the contained data in any Business Intelligence application or tools. These levels being in hierarchical grouping allows different tools to produce resultant data in the form required or as needed according to the specification.  The hierarchical data should have description added as a column in spite of the redundancy of data. This allows ease of use and better overall query performance. Dimensional tables should not be normalized to a degree where descriptive information is stored in another supplemental table in form of snow flaking. Since dimensional tables have significantly fewer rows than a fact table, further normalization would rarely make a difference in database size.