My reflections: March 2013

Sunday, March 31, 2013

IT & BA LAB Session 10: 26/03/2013

IT & BA LAB Session 10: 26/03/2013
Assignment 1: Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length,
T<- cbind(x,y,z)
Create 3 dimensional plot of the same (of all the 3 types as taught)

Commands :
> Random1<-rnorm(30,mean=0,sd=1)
> Random1
> x<-Random1[1:10]
> x
> y<-Random1[11:20]
> y
> z<-Random1[21:30]
> z
> T<-cbind(x,y,z)
> T
> plot3d(T[,1:3])

> plot3d(T[,1:3],col=rainbow(64))

> plot3d(T[,1:3],col=rainbow(64),type= 's')

Screenshots:

Assignment no 2:
Read the documentation of rnorm and pnorm,
Create 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories) Hint: ?factor
3. Color code and draw the graph
4. Smooth and best fit line for the curve

Commands :
> x<-rnorm(200,mean=5,sd=1)
> y<-rnorm(200,mean=3,sd=1)
> z1<-sample(letters,5)
> z2<-sample(z1,200,replace=TRUE)
> z<-as.factor(z2)
> t<-cbind(x,y,z)
> qplot(x,y)

> qplot(x,z,alpha=I(2/10))

> qplot(x,z)

> qplot(x,y,geom=c("point","smooth"))

> qplot(x,y,colour=z)

> qplot(log(x),log(y),colour=z)

Screenshots:

Friday, March 22, 2013

Business Applications Lab Session #9 on 19th Mar 2013

Data Visualization Using Tableau

There is a great deal of discussion about the value of analytics and big data management in the technology industry today. In my view, data is only useful if it can help provide insights into what customers want (for finding opportunities), what customers do, and what patterns are not obvious from raw data are very important and can be the difference between leading and losing.

In my research for this task I come across a number of tools that may be interesting. One of them is Tableau. Tableau is a Data Visualization tool that allows users to easily connect to data sets from very simple flat data files (.csv, .xls, .txt) to very complex SQL data structures (Hadoop, SQL Server, Oracle, etc). Tableau can analyze data while the data stays in the repository or the data can be imported into Tableau for offline processing.

Some of the useful things you can do with Tableau are showcased here using small case lets :

Real Estate Industry

The real estate industry thrives on data. Your ability to get insight from it can set you apart.

Monitor trends for home prices, sales volume and foreclosures.
Do detailed site analysis using demographic data and Tableau's built-in mapping capabilities.
Provide clients with customized market reports.

Retail Industry

Lots of data is already available to retailers to make good decisions – from loyalty programs and web analytics to third-party information and point-of-sale details. But there’s a big gap between having the data and putting it to work for you. Tableau’s analytical depth and visualization capabilities can help improve your retail analytics by allowing you to:

Create interactive dashboards that support real-time decisions
Incorporate geographical-based data for targeted segmentation
Blend multiple data sources for more robust analysis

Health Care Analytics

Healthcare costs can quickly spin out of control. Misallocation of resources can quickly bring down quality of care. To keep efficiency and profitability moving in the right direction, you need to see all your key healthcare reporting metrics across hospitals, programs, and regions. You need to cut that data many different ways and share it with key employees in order to manage your business more effectively. Use Tableau to:

Understand profitability by specialties, HRGs (Healthcare Resource Groups), gender, and age.
Identify patterns of cost and profitability by admission method and specialty.
Provide interactive, web-based dashboards to staff so they can get exactly the data they need right on the floor and in real time.

Government reporting

Government data is complex and enormous- so are the challenges facing those who work with it. With Tableau Desktop, you can query millions of rows of data in seconds, drag-and-drop to visualize any dataset, and even publish your analysis to Tableau Public to meet transparency reporting requirements. Governments and public-private organizations are using Tableau to.

Present enormous countrywide datasets clearly and allow drill-down to local areas.
Provide online access to public data without programming.

Banking Analytics

Banks distinguish themselves by the quality of their service. With Tableau you can offer customers a new level of insight and stand out from the competition. Customers from RBC Wealth Management to the Macquarie Group to Fifth Third Bank use Tableau for their banking analytics. Banks use Tableau to:

Provide web-based tools for clients and salespeople to track the value of savings and investments
Provide what-if analysis to help clients understand the effects of changes in investment decisions
Monitor loans and manage risk across geographies with interactive banking dashboards
Dynamically produce reports on outstanding accounts that require attention

Regardless of role/industry Tableau is rapid-fire business intelligence that equips anyone to analyze data quickly. Its intuitive user interface means there’s no need for canned reports, dashboard widgets, or templates to get started. All we need is your data and the questions you want to answer.

In my personal experience as well, we have used Tableau at Mckinsey Knowledge Center to drive reports to various clients through linking it directly to our project management tool. Tableau works wonders by linking reports to live data & helps in setting up reports at different levels for usage throughout the client organisation.

Divij Sharma

Friday, March 15, 2013

Business Applications Lab Session #8 on 12th Mar 2013

Session # 8 :

In this session we learnt about the panel data generation and its various models.

Panel Data refers to the combination of various time series data cascaded together
The basic function used for panel data generation and estimation is plm.

The data set we have used in this session in "Produc".

The description for the same is as under.

- state : the state
- year : the year
- pcap: private capital stock
- hwy : highway and streets
- pc: public capital
- gsp: gross state products
- emp: labor input measured by the employement in non–agricultural payrolls
- unemp: state unemployment rate

Use the data set "Produc" , a panel data set within plm package for panel estimations.

Assignment :
To calculate the values for all the 3 models and decide which models best fits the data set for panel estimation ?

Solution :
Step1 : calculating value for pooling model

Step2 : calculating value for fixed model

Step3 : calculating value for random model

To choose the best model that fits the data set "Produc" ,we need to run pairwise hypothesis tests among the 3 models and select the best fit in the end.

Test1 :
Between pooling and fixed model

Command :
pFtest (fixed1 , pooled)

Test details :

H0: Null: the individual index and time based params are all zero
Alternative Hypothesis : atleast one of the index and time based params are non zero

The hypothesis test suggests that the alternative hypothesis has significant effects.
As the p-value is too low.. Null hypothesis is rejected.

Hence Fixed model is better than the pooling model.

Test2:
Between pooling and random model

Command :
plmtest (pooled)

Test details :

H0: Null: the individual index and time based params are all zero : Pooling Model
Alternative Hypothesis : atleast one of the index and time based params are non zero : Random Model

The hypothesis test suggests that the alternative hypothesis has significant effects.
As the p-value is too low.. Null hypothesis is rejected.

Hence random model is better than the pooling model.

Test3:
Between fixed and random model

Command :

We use Hausman test -:
phtest(random1 , fixed1)

Test details :

H0: Null: individual effects are not correlated with any regressor : Random Model
Alternative Hypothesis : Individual effects are correlated : Fixed Model

The hypothesis test suggests that the one of the models is inconsistent.
As the p-value is too low.. Null hypothesis is rejected.

Hence fixed model is better than random model.

Conclusion :-
We can conclude that fixed model best fits the "Produc" data set panel data estimations. i.e there is significant correlation observed with the regressor variables and index impact exists.
Hence, we would choose "Fixed" model to estimate the panel data presented by "Produc" data set.