Sine Plot of Global Temperature of Earth

The Earth has lived more than 4.5 billion years ago and with the evolution of man that just recently took place based on its age, we have very scarce information on what happened on pre-historic time except the remains of the fossil, rocks and other geological antiques available for study. Though still in the very short span of manhood on this planet so far, the global temperature has been going up and down.

Global Temperatures

Global Temperatures

I found this image in one of the discussions in the Analytics group on LinkedIn. You can see an irregular but  seasonal sine of wave of the global temperature of earth when plotted on a time-series horizontal scale. The memories of the eruption of Mount Pinatubo in Pampanga still linger at the back of my head; I think I was on my first few years of primary school when the volcano erupted which destroyed thousands of hectares of field crops and even reached its sulfuric ashes in some provinces in the Visayas.

I am not saying that the contributing factor to the cooling down of the planet is mainly caused by the volcanic eruption but considering the major contributors of warming are solar activities, weather conditions, man-made pollution and and oceanology temperature pattern.

Like I have learned in my Natural Sciences subjects, it is hard to predict when will the volcano erupt. Statistically speaking, if you would perform an additive smoothing or even ARIMA based on the data points in the plot above, the next volcano eruption that may lead to sudden decrease in the global temperature may happen soon. Why?

The gap between the highs and lows in the plots are a few centuries apart. Due to the pollution that greatly affect the warming of the planet, the process of cooling down becomes shorter in span. Mother Earth needs to cool down soon.

Personality Test by Talentoday.com

Psychometrics has always been an amazing field of Psychology and I have always been inspired to analyse the collected data trying to understand how human behaviour works, in Organizational Psychology, at least. Talentoday offers this win-win situation where they can collect data for their research and you, as the inquisitor, can explore how you behave especially in your professional career.

Radar Chart

The test is calculated by standardization of the norm making sure that across the test takers, the z-score and stern scores are used to interpret the results. Also, I noticed that there is a pattern in the questions which is usually frequent in these kinds of tests where validity is measured to predict the outcome based on the indicators. The rest of the data collection and calibration process are available in the website once you have finished the exam. The radar chart above is interactive.

The exam result is divided into 5 different clusters. The ordering are based on my results in descending order where my highest score is 6.5 and the lowest is 4. These clusters are the main personality dimensions which are proven to be essential in the professional world.

  • Dare
  • Excel
  • Manage
  • Adapt
  • Communicate

Also, the exam prepares a motivation scale – the things that drive you to achieve your goals and the things you may need to work on. Obviously mine is communication. Each of the clusters mentioned above earns an ordinal rating from 1 to 10 giving you more meaning to your score on a lower-level.

Motivation Scale

Lastly, you will be presented with your talent ID enlisting your empowering attitude that makes you unique.

Once done, you can even receive a PDF of the summary and detailed report based on your answers. After 6 months, you can reassess, as part of the test and retest method, if there’s any change in your outlook. Although I have not tried comparing my results among my friends nor the people who have taken the test within the same organization. If there’s an option to compare the results within the same industry or job specification, that’s going to be interesting. I guess that they’re still collecting more data regardless if these have not been scientifically gathered because of dependency and deviance from randomness.

Talentoday Personal Book by Adrian Cuyugan

https://www.scribd.com/embeds/266359664/content?start_page=1&view_mode=scroll&access_key=key-8GMySBkZBhRvEx4R47x8&show_recommendations=true
Join and take your personality exam. Let’s compare!

Personality Test by Talentoday.com

Psychometrics has always been an amazing field of Psychology and I have always been inspired to analyze the collected data trying to understand how human behavior works, in Organizational Psychology, at least. Talentoday offers this win-win situation where they can collect data for their research and you, as the inquisitor, can explore how you behave especially in your professional career.

Talentoday

The test is calculated by standardization of the norm making sure that across the test takers, the z-score and stern scores are used to interpret the results. Also, I noticed that there is a pattern in the questions which is usually frequent in these kinds of tests where validity is measured to predict the outcome based on the indicators. The rest of the data collection and calibration process are available in the website once you have finished the exam. The radar chart above is interactive.

The exam result is divided into 5 different clusters. The ordering are based on my results in descending order where my highest score is 6.5 and the lowest is 4. These clusters are the main personality dimensions which are proven to be essential in the professional world.

  • Dare
  • Excel
  • Manage
  • Adapt
  • Communicate

Also, the exam prepares a motivation scale – the things that drive you to achieve your goals and the things you may need to work on. Obviously mine is communication. Each of the clusters mentioned above earns an ordinal rating from 1 to 10 giving you more meaning to your score on a lower-level.

Talentoday

Lastly, you will be presented with your talent ID enlisting your empowering attitude that makes you unique.

Talentoday

Once done, you can even receive a PDF of the summary and detailed report based on your answers. After 6 months, you can reassess, as part of the test and retest method, if there’s any change in your outlook. Although I have not tried comparing my results among my friends nor the people who have taken the test within the same organization. If there’s an option to compare the results within the same industry or job specification, that’s going to be interesting. I guess that they’re still collecting more data regardless if these have not been scientifically gathered because of dependency and deviance from randomness.

Join and take your personality exam. Let’s compare!

@CSC

WordCloud Twitter Text Analysis on CSC using R

These past few days, I have been reading a lot on non-parametric tests on natural language as one of the current work I have been doing is about natural language processing via machine learning. This is very advanced and even the Data Science course offered in Coursera has not started yet so I am relying fully on what I have been reading on fora and some blog articles. Starting with acquiring tweets from Twitter requires some libraries because of the oAuth that they have implemented. If you have not installed these packages, you need them before you can reproduce my code.

install.packages("twitteR")
install.packages("ROAuth")
install.packages("RCurl")
install.packages("tm")
install.packages("wordcloud")
install.packages("RColorBrewer")

Then, load the libraries, as usual.

library(twitteR)
library(ROAuth)
library(RCurl)
library(tm)
library(wordcloud)
library(RColorBrewer)

Set the option of RCurl package to use the file you will be downloading later using CurlSSL option

options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))

Download the Twitter oAuth using RCurl and save it to your working envionment

download.file(url = "http://curl.haxx.se/ca/cacert.pem", destfile = "cacert.pem")
save(Credentials, file="credentials.RData")

For easier code reading, assign these links to your environment. Note that the consumerKey and consumerSecret objects are redacted because these are unique to your own twitter API. You need to create a developer account on Twitter to acquire your own codes.

reqURL <- "https://api.twitter.com/oauth/request_token"
accessURL <- "https://api.twitter.com/oauth/access_token"
authURL <- "https://api.twitter.com/oauth/authorize"
consumerKey <- "MVdO2NE****************"
consumerSecret <- "oRZ9ff2yWvf9*************************c"
twitCred consumerSecret = consumerSecret, requestURL = reqURL, accessURL = accessURL, authURL = authURL)
twitCred$handshake(cainfo = "cacert.pem")

On your R console, after running the codes above, it will give you a link, somewhat similar like below:

> twitCred$handshake(cainfo = "cacert.pem")
To enable the connection, please direct your web browser to:
https//api.twitter.com/oauth/authorize?oauth_token=pK8JJiDb6j*******************************
When complete, record the PIN given to you and provide it here: 

Copy and paste the link to your browser. Click on accept to allow twitter to provide access on API. Pause here because you input the code given by twitter. If it is successful, as usual, R being the introvert, it will not give you any message. On the other hand, if your code is wrong, it will give you an error like, Unauthorized. You can check if you can now access Twitter API

registerTwitterOAuth(twitCred)

This is where you can start scraping the tweets. Twitter only allows a maximum of 1,500 tweets you can extract for a limited number of past days. If you want to get a constant feed, you may need to build a custom function to do it for you. For this analysis, let us just get the sample of the most recent tweets while I am writing these bunch of codes.

csc <- searchTwitter("@CSC", n = 1500)

Check the first tweet that was collected.

csc.firsttweet[[1]]
csc.firsttweet$getScreenName()
csc.firsttweet$getText()

Or check the head and tail of your list.

head(csc); tail(csc)

Once you have checked that you have a good number of tweets, prepare your data and convert it to a corpus.

csc.frame <- do.call('rbind', lapply(csc, as.data.frame))
csc.corpus <- Corpus(VectorSource(csc.frame))

Also, convert the characters into a homogenized language by removing stop words, punctuation marks and numbers. Take note that I added a few more words to be removed because these are values from the category in the data set when we downloaded the tweets.

csc.corpus <- tm_map(csc.corpus, tolower) # Convert to lowercase
csc.corpus <- tm_map(csc.corpus, removePunctuation) # Remove punctuation
csc.corpus <- tm_map(csc.corpus, removeNumbers) # Remove numbers
csc.corpus <- tm_map(csc.corpus, removeWords, c(stopwords('english'), 'false', 'buttona', 'hrefhttptwittercomtweetbutton', 
'relnofollowtweet', 'true', 'web', 'relnofollowtwitter', 'april', 'hrefhttptwittercomdownloadiphone', 'iphonea', 
'relnofollowtweetdecka', 'via', 'hrefhttpsabouttwittercomproductstweetdeck', 'hrefhttpwwwhootsuitecom', 'httptcoqqqiaipk', 
'androida', 'cschealth', 'cscanalytics', 'csccloud', 'relnofollowhootsuitea', 'cscmyworkstyle', 'cscaustralia', 'hrefhttptwittercomdownloadandroid')) # Remove stop words

Prepare the document term matrix

csc.dtm <- DocumentTermMatrix(csc.corpus)
csc.dtm.matrix <- as.matrix(csc.dtm)

Or term document matrix, whichever you prefer.

csc.tdm <- TermDocumentMatrix(csc.corpus)
csc.tdm.sum <- sort(rowSums(as.matrix(csc.tdm)), decreasing = T) # Sum of frequency of words
csc.tdm.sum <- data.frame(keyword = names(csc.tdm.sum), freq = csc.tdm.sum) # Convert keyword frequency to DF
csc.tdm.sum

Plot the wordcloud.

cloudcolor <- brewer.pal(8, "Paired")
wordcloud(csc.tdm.sum$keyword, csc.tdm.sum$freq, scale=c(8,.2), min.freq=1, max.words=Inf, random.order=T, rot.per=.3, colors=cloudcolor)
@CSC

Wordcloud using TwittR and TM package in R

Yes! It is CSC‘s birthday this April! In my next few posts, I will perform some sentiment analysis particularly on this data set where the false keyword is the mostly frequently and standing word have been used by users.