1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.
-
Upload
willa-stewart -
Category
Documents
-
view
214 -
download
1
Transcript of 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.
1
The R Project for statistical computing
Eric Fouh, Christopher Poirel
CS 5604
Fall 2010
2
What is R?
3
Usages of R
• statistics system
• data handling and storage facility
• calculations on arrays, in particular matrices
• integrated collection of tools for data analysis
• graphical tool for data analysis
• programming language (called ‘S’)
4
Structure of R• R functions and dataset are stored in packages
• R is provided with 25 “standard” packages:
• Hundreds of contributed packages (written by different authors ) are available
Package Name Description
baseBase R functions
dataset Base R datasets
graphicsR functions for base graphics
stats R statistical functions
utils R utility functions
matrix Matrix package
class Functions for classification
clusterFunctions for cluster analysis
5
R and Information Retrieval
IR Concept R package
Text preprocessing
Term weighting, scoring
tm package: Constructs a term-document matrix, using one of the the following weighting functions TF (weightTf), TF-IDF
(weightTfIdf). e.g. tdm <- TermDocumentMatrix(crude, control = list(weighting = weightTfIdf, stopwords = TRUE))
vector space model for scoring clv package: dot.product function returns a cosine similarity
measure of two vectors.
vector space classification class package: performs a k-Nearest Neighbour Classification on a dataset
Hierarchical clustering Cluster package: computes clusters (agglomerative hierarchical ) on dataset
Latent Semantic Indexing Base package: performs Singular Value Decomposition on matrix
6
Getting started with R• To start R
>R• To quit R
>q()• To see installed packages
>library()• To load a package
>library(class)• To start help
> help.start()• To create a vector
> x <- c(10.4, 5.6, 3.1, 6.4, 21.7)• To create a matrix
> x <- array(1:20, dim=c(4,5)) # Generate a 4 by 5 array filled with number from 1 to 20.• To display an object
>x• To delete an object
>rm x• To load data from file
>HousePrice <- read.table("houses.data")
7
Examples (1)
• Term-Document Matrix
8
Examples (1)
9
Examples (2)
• Eigenvalues and eigenvectors
10
Examples(3)
11
Examples(3)
• Law Rank approximation
12
Examples(3)
13
Examples(3)