Thursday, 31 July 2014

From RDBMS to NoSQL to DBaaS

Relational Database Management Systems had been a subject I paid special attention to, when I was at college.  Having used DB2 and Oracle, the attention that the NoSQL databases are getting over the past two years or so, made me think why we need it, and whether RDBMS stay for the coming decades.

The ACID (Atomicity, Consistency, Isolation and Durability) properties and referential integrity that RDBMS provides us cannot be compromised in many systems.  That means, that for some systems, there were some needs, that led to the emergence and high usage of NoSQL (Not Only SQL ) systems.

An obvious reason that convinced me for the emergence of NoSQL is the increasing volume of un-structured and semi-structured data, and other documents that are used in the social networks which most of us use.  Then the other reasons slowly came by.  The speed at which we want a post to be published,  the number of reads and likes to displayed is one.  And the concurrency at which the social networks and other systems are being used, by many of us around the world, is also on the rise.  Above all this, we want a 24 X 7 availability of most of the sites, without which our satisfaction rate will drop down

It is to satisfy the above requirements that the use of NoSQL is on the rise.  Let us see how this is made possible.

1. Use of distributed databases
Distributed databases can be located at servers at any geographic location.  This means that they could be available on servers across the internet.  Also they could be located on the cloud infrastructure.  Distributed databases support replication and duplication, thereby enabling continuous availability.  Since data is available across many locations, concurrent usage is also made possible.

2. Horizontal scaling (sharding)
All users of Facebook want a quick login and quicker updates.  With this, a wiser way to store the database of Indian users in servers in India and Canadian users in servers in Canada, than storing data at any place in the globe.  This is an example of sharding.

3. Scalability
Many of the NoSQL databases are capable of storing large quantities of data.  With BigData the volume of data that is generated every second is on the rise.  Hence the ability to store data becomes important.

4. Schema-less databases
To enable storage of semi-structured and unstructured data, the databases do not store data in tables.  Data is stored as Documents, Columns, Key Value Store or Graph Databases.

Let us have a look at a couple of ways in which data is stored.

a) Document
Documents that contain semi-structured data are stored.  The MongoDB database stores documents.  This database is platform independent and holds JSON like documents.

b) Column
A column in a tuple of three arguments (name, column, timestamp)
student_name: {name: "student_name", value: "vishnu", timestamp: 123456789}

A Column Family is a set of Columns.  This is in some ways similar to a table, but the main difference is that, the same set of Columns need not be provided for all Column Family objects.  Please notice the difference between the column families given below.

    student_name: {name: "student_name", value: "aditya", timestamp: 123456789}
    school_name: {name: "school_name", value: "sun shine", timestamp: 123456789}
    city: {name: "city", value: "bangalore", timestamp: 123456789},
    student_name: {name: "student_name", value: "lily", timestamp: 123456789}
    school_name: {name: "school_name", value: "sun shine", timestamp: 123456789}
    standard: {name: "standard", value: ""IV", timestamp: 123456789},

hbase which is like a  BigTable for Hadoop uses column type storage.  This is an open source database from the Apache Foundation.  Hadoop uses hbase to store critical data, the size of which is much smaller when compared to the Big Data that Hadoop can store.

Database As A Service
The name indicates here that the database is provided as a service by a cloud provider.  The cloud provider will do the installation, upgrades and maintenance activities on the database and the customers can invoke services on it.  DBaaS reduces the time taken for installation and maintenance and manpower required for database management for the customer.  With the emergence of Cloud, DBaaS is not a surprise.  IBM Cloudant is a DBaaS, that stores JSON documents.

Saturday, 19 July 2014

IBM MDM in the Big Data - Interoperability between products in the Big Data platform

The goal of Big Data is to obtain valuable insights through analysis.  With the IBM InfoSphere Master Data Management system serving as the single repository to obtain trusted data, let us discuss some of the key Big Data products with which IBM InfoSphere MDM could be integrated with.

Customer data from a single source or from multiple sources are loaded into MDM.  Also, there are downstream systems that receive data from MDM.  Some of the MDM APIs are integrated with InfoSphere DataStage for the Extract Transform and Load (ETL) operations, to form the MDM Connector.  The MDM Connector can be used for ETL operations using MDM.

It is important to determine the quality of data from a data source before it is data is loaded to the MDM server.  The IBM InfoSphere Information Analyzer can be used for accessing the data quality and its structure before loading data into MDM.  In addition, MDM can be configured to leverage the standardization and matching features of the IBM InfoSphere QualityStage.

The term Big Data encompasses structured data and unstructured data.  The IBM MDM provides a trusted single view of structured data.  The IBM InfoSphere Data Explorer, the tool used to derive Insights from Big Data, uses the MDM Connectors to access data from the MDM database to obtain a holistic view of the entities.

InfoSphere MDM has a Probabilistic Matching Engine, that can be used for matching parties to identify suspected duplicates.  This Probabilistic Matching Engine can be configured for use by InfoSphere BigInsights.  InfoSphere BigInsights is a product that supports storage of large volumes of un-structured, semi-structured and structured data and provides data analysis capabilities on such data.  The InfoSphere Data Click can also be used with MDM, to load master data into BigInsights system and other analysis sytems.

While MDM provides a single trusted view of data, business processes are required to ensure that the master data is accurate from the point of creation.  IBM Business Process Management Process Center and Process Designer components can be used to create workflows that govern data steward oriented tasks.  Master Data Management along with Business Process Management enable organizations to immediately take critical business decisions., a Customer Relationship Management (CRM) solution available in the the Cloud (SaaS) is integrated with IBM MDM, which enables it to obtain a 360 degree view of its customers.  

MDM data can be exported and predictive analysis can be performed using the Cognos Business Intelligence reports.

Details on the given integrations and integrations with other products could be obtained in the below links.
IBM InfoSphere Master Data Management v 11.3.0
Master Data Management, Business Process Management and Services Oriented Architecture 

Saturday, 12 July 2014

Master Data Management (MDM) in Big Data

"Maintaining a golden record of every entity" - this is precisely what a Master Data Management (MDM) system does.

MDM stores a cleansed, de-duplicated trusted view of structured data and plays a major role amidst big data flowing in from the social networks and streaming data.

I do see many organizations use master data to improve their performance.

A Diabetes clinic calls a patient's mobile number when his/her consultation is due.  So they do maintain the master data of their patients.  When the system is good enough to store all medical data about the patient, based on the tests he/she undergoes, the Doctor's analysis reports and the medicines prescribed, each time the patient consults a Doctor, then the system becomes capable of providing a complete view of the patient's health.

A retailer sends a customer an SMS, a month before the customer's Birthday, with a greeting and offering a 5% discount for what the customer shops during that one month.  So, here the master data of a customer is stored along with the mobile and date of birth and there is a system to send a message a month before his/her birthday.  By doing so, the retailer ensures that they maintain a good relationship with their customers.  When this retailer stores the list of items that the customer purchases, the total cost he/she pays, the mode of payment (cash or card) along with the Date of purchase, the retailer will be able to predict when the customer may visit again.

An insurance firm informs its customer through an SMS or email that insurance payment is due in a month.  This is again a system in which the customer's details are stored along with the payment date of insurance.  Hence the company makes sure that they do not lose their customer.  The company has to ensure that such a message is being sent each time an installment has to be paid. 

Some banks are able to classify their customers as Classic, Premium etc., based on the balance they maintain in their account, over a period of time.  This easily indicates that extent to which they maintain big data.

One of the example we consider for Big Data is Facebook.  This social website also holds master data of its users.  It asks each user for name, city, employment and date of birth.  The family relationships, close friends list, friends and Likes of user, along with the other primary details contribute to the master data.  Alerts on friend's birthdays, the list of probable friends of a user, the groups in which a user may like to join and the personalities a user likes may be derived based on the master data.

A good Master Data Management system will ensure that the data is cleansed, duplicate data is not present and the data is trusted.

This master data plays a major role in analysis.  With a complete view of  patient's current health and history, a Doctor who consults, will be able to easily make out the drugs to which this patient is allergic, what medicines would not suit a patient due to the medicines being (or been) taken and prescribe treatment accordingly.  A Banker would be able to suggest a Recurring Deposit (or some other plan) to an account holder, based on the balance in his/her account or based on the monthly salary being deposited to that account.

With these being the uses of analysis to a user, the benefits that a customer of MDM would be much more.  A retailer can find out the lean seasons and try to give discounts during those periods of a year.  During the peak seasons, they can increase the stock.  They can also find out which products sell well in a particular geography and increase the stock of that product.  By having fast networks, stocks can be replenished as required.

When hospital chains start having Master Data Management System, it would make the life of a patient much easier.  This becomes all the more important for patients with critical illnesses. Having such a system could also help in medical research.

With these just being examples I have noticed, further details could be obtained from the below links.
IBM Master Data Management for Big Data
IBM Think Big - Big Data & MDM
IBM Master Data Management: The key to leveraging Big Data
How MDM Fits with Big Data, Mobile and Cloud
IBM Master Data Management - Solutions for Healthcare