ECS 132 Final Project - University of California,...

21
ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham Vijayaragavan December 5, 2011 1

Transcript of ECS 132 Final Project - University of California,...

Page 1: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

ECS 132 Final Project

Rajiv BattulaAlex Devlin

Edward HuangGowtham Vijayaragavan

December 5, 2011

1

Page 2: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

1 Introduction

Using the knowledge we gained in ECS 132, we set out to examine two datasets in hopes of drawingmeaningful conclusions. The first set of data was a survey of recent college graduates, administeredby the National Center for Science and Engineering Statistics (NCSES), which is a part of theNational Science Foundation. This survey gave us an insight into what college graduates look forwhen seeking job opportunities and how they feel about the positions they eventually accept. Morespecifically, we targeted Computer Science, Economics and Communications Majors. We looked atwhat graduates from these majors thought about salaries and found some interesting facts aboutthese three groups of people.

Next, we looked at data that the researchers from the “Stanford Network Analysis Project” (SNAP)gathered from the production recommendations given to users on Amazon.com. Specifically, theresearchers looked at the “Customers Who Bought This Item Also Bought” feature on Amazon.com.Researchers from SNAP looked a product, i, frequently co-purchased with another product, j. Theirresultant data consists of edges from a product i to a product j, if product i is often purchasedalongside product j.

2 National Survey of Recent College Graduates

This survey was given out to college graduates who received their bachelors or masters degreesbetween July 1,2000 and June 30, 2002. Since this survey was administered in 2003, it is outdated,and many significant events have transpired since. This data may not be completely accurate, orrepresentative, of the present day.

2.1 Importance of Compensation

2.1.1 Salary Expectations

We wanted to see how much Computer Science, Economics, and Communication majors valuecompensation. Firstly, we looked at how much importance graduates of each major place on salaries.With 1 being “Very Important” and 4 being “Not Important At All”, we found that these majorsgive almost equal importance to salaries. Computer Science Majors gave an average importance of1.43 to salaries while their Economics counterparts gave an average of 1.48 importance. Lastly, welooked at how much importance Communications majors gave to salary, with an average importanceof 1.42.

1

Page 3: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

The 95% confidence intervals for the difference between the Computer Science averages and theother two averages for salary importance are as follows.

Difference Between From To

Computer Science - Economics -0.09185297 -0.01962764

Computer Science - Communications -0.03736083 0.05199687

As we can see, the ranges of the confidence intervals are small. This means that our sample of collegegraduates is fairly representative of the overall population. As for the actual comparison, computerscience graduates appear to value their salaries to about the same degree as Communications majorsand Economics majors. The difference between Computer Science and Economics may only includenegative values, but they are so minuscule that for all intents and purposes, we cannot concludeany difference in how much they value their salaries. The graphs of the densities of each major arein the appendix.

2.1.2 Salary Satisfaction

Next, logically we wanted to find out how satisfied graduates were with their salaries on their currentjob. Here, we found a bigger gap. Again, with 1 as “Very Satisfied” and 4 as “Very Dissatifised”,CS majors were more satisfied with their salaries with an average of 1.73. The Economics andCommunications majors were less satisfied with an average of 1.90 and 1.98 respectively.

2

Page 4: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

Again, we have the confidence intervals, with 95% confidence, for the differences in the satisfactionof compensation for Computer Scientists with the other two majors.

Difference Between From To

Computer Science - Economics -0.1719048 -0.07435523

Computer Science - Communications -0.2631668 -0.143071

The range here is slightly more pronounced, though still not extremely large. Though there isa statistically significant chance that Computer Science graduates are more satisfied with theirsalaries than Economics and, to a greater extent, Communications graduates, the difference itselfin the population may not be great.

We also see that these confidence intervals are similar to the last ones about salary expectations.The confidence intervals are small which means that the sample size selected by the NCSES wasquite representative of larger sample size.

2.2 Professional Conferences

We wanted to find out how many graduates of each major have been to a professional meeting inthe past year (relative to 2003). We looked at the question

“During the past year, did you attend any professional society or association meetingsor professional conferences?”

3

Page 5: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

Major Percent who attend conferences

Computer Science 32.94%

Economics 48.71%

Communications 49.67%

As shown, it seems that a much smaller percentage of Computer Science graduates attend confer-ences compared to their peers from the Economics and Communications majors.

The confidence interval, with 95% confidence, for this question came out to be as follows.

Difference Between From To

Computer Science - Economics -0.1580744 -0.1573609

Computer Science - Communications -0.1677969 -0.1669039

The range of the confidence interval for computer science graduates is much lower than the othertwo ranges, so that there is little doubt that computer scientists as a whole attend conferences lessthan the other majors. To find the reasons behind this finding, we would need to observe manyother factors about Computer Science Majors.

2.3 Job Satisfaction

In this section, we calculate the average of the computer science graduates’ satisfaction in 9 fields.Each of the fields could be ranked from 1, as “Very Satisfied”, to 4, as “Very Dissatifised.” Thus,the average could also fall between 1 and 4. For the computer science graduates, the mean of these

4

Page 6: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

9 fields averaged together was 1.616, indicated that the majority of computer scientists were fairlysatisfied with their jobs. In the survey, there was also a field from 1 to 4 for overall satisfactionwith one’s job. We compared this number to the mean of only the “Overall Satisfaction” field got1.747.

With 95% confidence, the confidence interval is -0.1716763 to -0.08994155

Both statistics indicate that most computer scientists are satisfied with their jobs. However, thefact that the overall satisfaction rating is higher than the average of the different factors allows usto draw some interesting speculations. In particular, computer scientists may find their jobs to begreater than the sum of its parts.

2.4 Parents’ Educational Background and Salary

We wanted to use linear regression to see if there was some kind of relationship between theeducation level of the parents of a graduate and their salary. Our initial hypothesis was thatsomeone who has more educated parents would likely be earning more due to the presumption thatthe more educated parents are able to provide more opportunities for their children.

Where the education level is:

5

Page 7: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

Number Education level

1 Less Than High School

2 High School Diploma

3 Some College

4 Bachelor’s Degree

5 Master’s Degree

6 Professional Degree

7 Doctorate

Our findings are as follows.

Minimum 75510.51

Maximum 83961.95

As we can see from the graph, it appears that graduates who have more educated parents havehigher salaries. We feel that this can be explained by the idea that more educated parents can givetheir children early access to tools that may help them later on in life. There are many factors totake into consideration that might help explain this relationship between parents education leveland salaries.

We decided to go further and break down to Mother and Father.

6

Page 8: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

Minimum 77182.92

Maximum 80990.75

Minimum 75121.9

Maximum 83824.19

7

Page 9: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

Interestingly, graduates with well educated mothers don’t seem to have salaries significantly higherthan those with less educated mothers. On the other hand, it appears as though graduates withmore educated fathers are likely to earn more than those with less educated fathers. Again, thereare many social and economical factors that may explain this observation. Note that we want toemphasize that salary is not the only measure of success. There have been many admirable people,and peers, in the field of computer science who, despite having less educated parents, have beenand are very successful.

3 Amazon Product Links

This data shows the relationships between products that become evident when observing consumerbuying patterns. Analyzing this kind of data has been key to Amazon’s success; once an accurateprediction can be made about a product, the retailer can customize the online shopping experiencefor the consumer. Aside from some ethical issues, data like this has much practical significance forboth the consumer and retailer. Note that, in the data from STAN, a product i being frequentlypurchased with j does not necessarily indicate the same for product j with i.

3.1 Confidence Interval for Mean of Degree Distribution

Out of 262111 products (nodes) with 1234877 edges (co-purchasing links), we found the followingdata. Code is in the Appendix.

Mean 4.794336

Standard Deviation 0.7233048

With 95% confidence, the confidence interval for the product links ranges from 4.791542 to4.797129.

3.2 Proportion of outlinks not reciprocated

This is the confidence interval for the proportion of outlinks that are not reciprocated. Meaningthat we only take into account one-way links.

Number of outlinks NOT reciprocated 564707

Number of outlinks 1234877

Proportion 0.457 = 45.7%

With 95% confidence, the confidence interval for the proportion of outlinks that are not reciprocatedis from 0.4559988 to 0.4585975.

According to our analysis, almost half of the outlinks that products have are reciprocated. Thisproportion is an amazing indication of Amazon’s ability to market the right products to the rightconsumers. The consumer is appropriately introduced to products they might not have otherwisefound, and Amazon’s sales are bolstered, so in a way, everyone reaps the benefits. With its sophis-ticated targeted system of e-commerce, there is no doubt why Amazon is in the forefront of onlinesales.

8

Page 10: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

4 Summary

By observing data from the National Survey of Recent College Graduates (from 2003), we were ableto draw some very interesting conclusions. There are many more, seemingly infinite, combinationsof data points we could have observed, which would have revealed some social phenomena. Fromthe Amazon product links, we saw just a piece of how an online retailer was able to weather thedot-com crash, and further become the largest online retailer. The number of interpretations wecan draw from raw data is breathtaking. If we widen our scope from just college graduates andproduct associations to many other areas, it is startling to imagine the kind of possibilities.

5 Credits

Alex wrote the initial parsing code to import the raw data into R. Everyone helped write and debugcode for problem 2, and each part of Problem 1, with a each member having a slight focus on eachpart. Rajiv did the writeup in LATEX. We are happy to say that everyone in group worked togetherwell and contributed equally to this project and over the course of the quarter.

9

Page 11: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

A Graphs of Salary Importance and Satisfaction Densities

Importance given to Salary.

10

Page 12: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

11

Page 13: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

Actual Satisfaction with Salary on the job.

12

Page 14: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

B Code

B.1 Problem 1A

1 prob1a i n i t <− function ( ) {2 SATSAL CS <<− NULL3 SATSAL ECON <<− NULL4 SATSAL COMM <<− NULL56 FACSAL CS <<− NULL7 FACSAL ECON <<− NULL8 FACSAL COMM <<− NULL9

10 satSalCsSD <<− NULL11 facSalCsSD <<− NULL12 satSalEconSD <<− NULL13 facSalEconSD <<− NULL14 satSalCommSD <<− NULL15 facSalCommSD <<− NULL1617 satSalCsMean <<− NULL18 facSalCsMean <<− NULL19 satSalEconMean <<− NULL20 facSalEconMean <<− NULL21 satSalCommMean <<− NULL22 facSalCommMean <<− NULL2324 csmajor <<− c (116730)

13

Page 15: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

25 econmajor <<− c (419230)26 comm <<− c (766610)27 }282930 prob1 readinput <− function ( ) {31 # Open f i l e f o r read ing32 con <− f i l e ( ’ECG03 .DAT’ )33 open ( con )3435 # We parse through the f i l e and add the response o f s a l a r y s a t i s f a c t i o n36 # and sa l a r y e xp e c t a t i on to t h e i r r e s p e c t i v e majors37 while ( length ( myline <− readLines ( con , n=1, warn=FALSE) ) > 0) {3839 # s p l i t and u n l i s t each l i n e so t ha t we can acces s the e lements o f the l i n e as

a vec t o r40 myline <− noquote ( unlist ( s trsp l i t ( myline , ’ ’ ) ) )4142 mymajor <− paste ( myline [ 5 7 5 : 5 8 0 ] , c o l l a p s e=’ ’ )4344 i f (mymajor==csmajor ) {45 SATSAL CS <<− c (SATSAL CS, myline [ 9 2 6 ] )46 FACSAL CS <<− c (FACSAL CS, myline [ 3 8 4 ] )47 }48 i f (mymajor==econmajor ) {49 SATSAL ECON <<− c (SATSAL ECON, myline [ 9 2 6 ] )50 FACSAL ECON <<− c (FACSAL ECON, myline [ 3 8 4 ] )51 }52 i f (mymajor==comm) {53 SATSAL COMM <<− c (SATSAL COMM, myline [ 9 2 6 ] )54 FACSAL COMM <<− c (FACSAL COMM, myline [ 3 8 4 ] )55 }56 }57 close ( con )5859 # For each major r ep l a c e a va lue ’L ’ ( not answered ) wi th NA60 SATSAL CS <<− noquote (SATSAL CS)61 SATSAL CS [SATSAL CS==’L ’ ] <<− NA62 SATSAL CS <<− as . integer (SATSAL CS)6364 FACSAL CS <<− noquote (FACSAL CS)65 FACSAL CS [FACSAL CS==’L ’ ] <<− NA66 FACSAL CS <<− as . integer (FACSAL CS)6768 SATSAL ECON <<− noquote (SATSAL ECON)69 SATSAL ECON[SATSAL ECON==’L ’ ] <<− NA70 SATSAL ECON <<− as . integer (SATSAL ECON)7172 FACSAL ECON <<− noquote (FACSAL ECON)73 FACSAL ECON[FACSAL ECON==’L ’ ] <<− NA74 FACSAL ECON <<− as . integer (FACSAL ECON)7576 SATSAL COMM <<− noquote (SATSAL COMM)77 SATSAL COMM[SATSAL COMM==’L ’ ] <<− NA78 SATSAL COMM <<− as . integer (SATSAL COMM)7980 FACSAL COMM <<− noquote (FACSAL COMM)81 FACSAL COMM[FACSAL COMM==’L ’ ] <<− NA82 FACSAL COMM <<− as . integer (FACSAL COMM)83 }

14

Page 16: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

848586 prob1 mean <− function ( ) {87 # This func t i on c a l c u l a t e s and ou tpu t s the means8889 satSalCsMean <<− mean(SATSAL CS, na .rm=T)90 facSalCsMean <<− mean(FACSAL CS, na .rm=T)91 satSalEconMean <<− mean(SATSAL ECON, na .rm=T)92 facSalEconMean <<− mean(FACSAL ECON, na .rm=T)93 satSalCommMean <<− mean(SATSAL COMM, na .rm=T)94 facSalCommMean <<− mean(FACSAL COMM, na .rm=T)9596 cat ( ”mean o f SATSAL CS : ” , satSalCsMean , ”\n” )97 cat ( ”mean o f SATSAL ECON: ” , satSalEconMean , ”\n” )98 cat ( ”mean o f SATSAL COMM: ” , satSalCommMean , ”\n” )99 cat ( ”mean o f FACSAL CS : ” , facSalCsMean , ”\n” )

100 cat ( ”mean o f FACSAL ECON: ” , facSalEconMean , ”\n” )101 cat ( ”mean o f FACSAL Comm: ” , facSalCommMean , ”\n” )102103 jpeg ( ’ images/Sa la rySat i s f a c t i onGraph . jpg ’ )104105 barplot ( yl im=c ( 1 . 5 , 2 . 0 ) , of fset=1.5 ,c ( satSalCsMean , satSalEconMean , satSalCommMean)

−1.5 , xlab=”Majors” , ylab=”Average S a t i s f a t i o n ” , names . arg=c ( ”CS” , ”Econ” , ”Comm” ) )

106107 dev . of f ( )108109 jpeg ( ’ images/SalaryExpectationGraph . jpg ’ )110111 barplot ( yl im=c ( 1 . 0 0 , 1 . 7 5 ) , of fset=1.00 ,c ( facSalCsMean , facSalEconMean ,

facSalCommMean) −1.00 , xlab=”Majors” , ylab=”Average Importance” , names . arg=c ( ”CS” , ”Econ” , ”Comm” ) )

112113 dev . of f ( )114 }115116 prob1 sd <− function ( ) {117 # Here we c a l c u l a t e the s tandard d e v i a t i on us ing b u i l t−in R func t i on s118119 satSalCsSD <<− sd (SATSAL CS, na .rm=T)120 facSalCsSD <<− sd (FACSAL CS, na .rm=T)121 satSalEconSD <<− sd (SATSAL ECON, na .rm=T)122 facSalEconSD <<− sd (FACSAL ECON, na .rm=T)123 satSalCommSD <<− sd (SATSAL COMM, na .rm=T)124 facSalCommSD <<− sd (FACSAL COMM, na .rm=T)125126 cat ( ” sd o f SATSAL CS : ” , satSalCsSD , ”\n” )127 cat ( ” sd o f SATSAL ECON: ” , satSalEconSD , ”\n” )128 cat ( ” sd o f SATSAL COMM: ” , satSalCommSD , ”\n” )129 cat ( ” sd o f FACSAL CS : ” , facSalCsSD , ”\n” )130 cat ( ” sd o f FACSAL ECON: ” , facSalEconSD , ”\n” )131 cat ( ” sd o f FACSAL COMM: ” , facSalCommSD , ”\n” )132 }133134 prob1 mean CI <− function ( ) {135136 # Ca l cu l a t e con f idence i n t e r v a l f o r the d i f f e r e n c e between the two majors .137138 # CS and Econ

15

Page 17: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

139 MoE1 <− 1 .96 ∗ sqrt ( ( satSalCsSD ˆ2)/length (SATSAL CS) + ( satSalEconSD ˆ2)/length (SATSAL ECON) )

140 MoE2 <− 1 .96 ∗ sqrt ( ( facSalCsSD ˆ2)/length (FACSAL CS) + ( facSalEconSD ˆ2)/length (FACSAL ECON) )

141142 # CS and Communications143 MoE3 <− 1 .96 ∗ sqrt ( ( satSalCsSD ˆ2)/length (SATSAL CS) + (satSalCommSDˆ2)/length (

SATSAL COMM) )144 MoE4 <− 1 .96 ∗ sqrt ( ( facSalCsSD ˆ2)/length (FACSAL CS) + ( facSalEconSD ˆ2)/length (

FACSAL COMM) )145146 # Then ca l cua t e the Confidence I n t e r v a l147 CI1 = satSalCsMean − satSalEconMean − MoE1148 CI2 = satSalCsMean − satSalEconMean + MoE1149150 CI5 = satSalCsMean − satSalCommMean − MoE3151 CI6 = satSalCsMean − satSalCommMean + MoE3152153 CI3 = facSalCsMean − facSalEconMean − MoE2154 CI4 = facSalCsMean − facSalEconMean + MoE2155156 CI7 = facSalCsMean − facSalCommMean − MoE4157 CI8 = facSalCsMean − facSalCommMean + MoE4158159160 cat ( ”The con f idence i n t e r v a l f o r the d i f f e r e n c e betwen CS and Econ graduates ’

s a t i s f a c t i o n i s : ( ” , CI1 , ” , ” , CI2 , ” ) \n” )161 cat ( ”The con f idence i n t e r v a l f o r the d i f f e r e n c e betwen CS and Econ graduates ’

importance i s : ( ” , CI3 , ” , ” , CI4 , ” ) \n” )162163 cat ( ”The con f idence i n t e r v a l f o r the d i f f e r e n c e betwen CS and Comm graduates ’

s a t i s f a c t i o n i s : ( ” , CI5 , ” , ” , CI6 , ” ) \n” )164 cat ( ”The con f idence i n t e r v a l f o r the d i f f e r e n c e betwen CS and Comm graduates ’

importance i s : ( ” , CI7 , ” , ” , CI8 , ” ) \n” )165166 }

B.2 Problem 1B

Very similar Problem 1A, only major changes are shown here.

1 prob1b proport ion <− function ( ) {2 #Table p rov ide s the counts o f each type o f element , organ i zed a l p h a b e t i c a l l y .34 countsCS <− table (PRO MEETINGS CS)5 proportionCS <<− countsCS [ 2 ] / ( countsCS [ 1 ] + countsCS [ 2 ] )6 cat ( ”Proport ion o f CS majors who have attended p r o f e s s i o n a l meetings in the past

year i s : ” , proportionCS , ”\n” )78 countsEcon <− table (PRO MEETINGS ECON)9 proport ionEcon <<− countsEcon [ 2 ] / ( countsEcon [ 1 ] + countsEcon [ 2 ] )

10 cat ( ”Proport ion o f Econ majors who have attended p r o f e s s i o n a l meetings in the pastyear i s : ” , proportionEcon , ”\n” )

1112 countsComm <− table (PRO MEETINGS COMM)13 proportionComm <<− countsComm [ 2 ] / ( countsComm [ 1 ] + countsComm [ 2 ] )14 cat ( ”Proport ion o f Econ majors who have attended p r o f e s s i o n a l meetings in the past

year i s : ” , proportionComm , ”\n” )15 }

16

Page 18: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

1617 prob1b CI <− function ( ) {18 n1 = table (PRO MEETINGS CS) [ 1 ] + table (PRO MEETINGS CS) [ 2 ]19 n2 = table (PRO MEETINGS ECON) [ 1 ] + table (PRO MEETINGS ECON) [ 2 ]20 n3 = table (PRO MEETINGS COMM) [ 1 ] + table (PRO MEETINGS COMM) [ 2 ]2122 SD1 = sqrt ( proportionCS∗(1−proportionCS )/n1 )23 SD2 = sqrt ( proport ionEcon∗(1−proport ionEcon )/n2 )24 SD3 = sqrt ( proportionComm∗(1−proportionComm)/n2 )2526 CI CS ECON = sqrt ( (SD1ˆ2)/n1 + (SD2ˆ2)/n2 )27 CI CS COMM = sqrt ( (SD1ˆ2)/n1 + (SD3ˆ2)/n3 )2829 cat ( ”Conf idence i n t e r v a l f o r the d i f f e r e n c e between CS and Econ majors : ( ” ,

proportionCS − proport ionEcon − CI CS ECON, ” , ” , proportionCS + CI CS ECON, ” )\n” )

30 cat ( ”Conf idence i n t e r v a l f o r the d i f f e r e n c e between CS and Comm majors : ( ” ,proportionCS − proportionComm − CI CS COMM, ” , ” , proportionCS + CI CS COMM, ” )\n” )

31 }

B.3 Problem 1C

Only major changes from 1A

1 prob1 mean <− function ( ) {2 satavgsumCsMean <<− mean(SATAVGSUM CS, na .rm=TRUE)3 satoveral lCsMean <<− mean(SATOVERALL CS, na .rm=TRUE)45 cat ( ”mean o f satavgsum cs : ” , satavgsumCsMean , ”\n” )6 cat ( ”mean o f s a t o v e r a l l c s : ” , satoveral lCsMean , ”\n” )78 jpeg ( ” images/Prob1c . jpg ” )9 barplot ( yl im=c ( 1 . 5 0 , 1 . 80 ) , of fset=1.50 , c ( satavgsumCsMean , satoveral lCsMean )

−1.50 , names . arg=c ( ”Avg Sum” , ”Overa l l ” ) )10 dev . of f ( )11 }1213 prob1 sd <− function ( ) {14 satavgsumCsSD <<− sd (SATAVGSUM CS, na .rm=TRUE)15 satovera l lCsSD <<− sd (SATOVERALL CS, na .rm=TRUE)1617 cat ( ” sd o f satavgsum cs : ” , satavgsumCsSD , ”\n” )18 cat ( ” sd o f s a t o v e r a l l c s : ” , satoveral lCsSD , ”\n” )19 }2021 prob1 mean CI <− function ( ) {22 MoE1 <− 1 .96 ∗ satavgsumCsSD/sqrt ( length (SATAVGSUM CS) )23 MoE2 <− 1 .96 ∗ satovera l lCsSD/sqrt ( length (SATOVERALL CS) )2425 cat ( ”The con f idence i n t e r v a l f o r s a t i s f a c t i o n o f CS i s : ( ” , satavgsumCsMean − MoE1

, ” , ” , satavgsumCsMean + MoE1, ” ) \n” )26 cat ( ”The con f idence i n t e r v a l f o r o v e r a l l s a t i s f a c t i o n o f CS i s : ( ” ,

satoveral lCsMean − MoE2, ” , ” , satoveral lCsMean + MoE2, ” ) \n” )27 }

17

Page 19: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

B.4 Problem 1D

1 prob1 readinput <− function ( ) {2 con <− f i l e ( ’ECG03 .DAT’ )3 open ( con )45 while ( length ( myline <− readLines ( con , n=1, warn=FALSE) ) > 0) {6 myline <− noquote ( unlist ( s trsp l i t ( myline , ’ ’ ) ) )78 mymajor <− paste ( myline [ 5 7 5 : 5 8 0 ] , c o l l a p s e=’ ’ )9

10 i f (mymajor==csmajor ) {11 edu f a th e r <− myline [ 3 5 6 ]12 edu mother <− myline [ 3 5 7 ]13 edu f a th e r <− as . integer ( edu f a th e r )14 edu mother <− as . integer ( edu mother )15 edu f a th e r [ edu f a th e r == ’ 8 ’ ] <− NA16 edu mother [ edu mother == ’ 8 ’ ] <− NA17 sa l a r y <− paste ( myline [ 9 0 6 : 9 1 2 ] , c o l l a p s e=’ ’ )18 s a l a r y <− as . integer ( s a l a r y )19 s a l a r y [ s a l a r y == ’ 999998 ’ ] <− NA20 sa l a r y [ s a l a r y == ’ 0 ’ ] <− NA21 FATHER EDU CS <<− c (FATHER EDU CS, edu f a th e r )22 MOTHER EDU CS <<− c (MOTHER EDU CS, edu mother )23 SALARY <<− c (SALARY, s a l a r y )24 }25 }26 close ( con )27 }2829 prob1 r e g r e s s <− function ( ) {30 md <<− matrix (nrow=length (SALARY) , ncol=3)31 md[ , 1 ] <<− FATHER EDU CS32 md[ , 2 ] <<− MOTHER EDU CS33 md[ , 3 ] <<− SALARY3435 i <− 136 for ( a in md[ , 3 ] ) {37 i f ( i s .na(md[ i , 1 ] + md[ i , 2 ] + md[ i , 3 ] ) ) {38 md[ i , 3 ] <<− NA39 md[ i , 2 ] <<− NA40 md[ i , 1 ] <<− NA41 }42 i <− i + 143 }4445 md <<− na . omit (md)4647 momAndDad <− md[ , 1 ] + md[ , 2 ]4849 lmout <<− lm(md[ , 3 ] ˜ momAndDad)50 lmoutMom <<− lm(md[ , 3 ] ˜ md[ , 2 ] )51 lmoutDad <<− lm(md[ , 3 ] ˜ md[ , 1 ] )5253 jpeg ( ” images/MomEduSalary . jpg ” )5455 plot (md[ , 2 ] , md[ , 3 ] , x lab=”Education l e v e l o f Mother” , ylab=” Salary ” )56 abline ( lmoutMom)5758 dev . of f ( )

18

Page 20: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

5960 jpeg ( ” images/DadEduSalary . jpg ” )6162 plot (md[ , 1 ] , md[ , 3 ] , x lab=”Education l e v e l o f Father ” , ylab=” Salary ” )63 abline ( lmoutDad )6465 dev . of f ( )6667 jpeg ( ” images/ParentsEduSalary . jpg ” )6869 plot (momAndDad, md[ , 3 ] , x lab=”Education l e v e l o f Parents ” , ylab=” Salary ” )70 abline ( lmout )7172 dev . of f ( )7374 cat ( ”Minimum : ” , min( lmout$f itted . va lue s ) , ” \n” )75 cat ( ”Maximum: ” , max( lmout$f itted . va lue s ) , ” \n” )76 }

B.5 Problem 2

1 prob2 i n i t <− function ( ) {2 myline <<− NULL3 amazon <<− NULL4 outdegrees <<− NULL5 ub e r l i s t <<− l i s t ( l i s t ( ) )6 }78 read input <− function ( ) {9 con <− f i l e ( ’ amazon0302 . txt ’ )

10 open ( con )1112 myline <<− readLines ( con , n=4, warn=FALSE)13 myline <<− readLines ( con , warn=FALSE)14 myline <<− as . integer (noquote ( unlist ( s tr sp l i t ( myline , ’ \ t ’ ) ) ) )15 amazon <<− matrix (nrow = ( length ( myline )/2) , ncol = 2)16 oddrows <− seq (1 , (dim( amazon ) [ 1 ] ) ∗2 , by=2)17 amazon [ , 1 ] <<− myline [ oddrows ]1819 # amazon now conta ins a l l rows o f amazon0302 . t x t20 amazon [ , 2 ] <<− myline [ oddrows + 1 ]2122 close ( con )23 }2425 prob2a <− function ( ) {26 # Stor ing the outdegree f r e qu en c i e s27 outdegrees <<− table ( amazon [ , 1 ] )2829 # Ca l cu l a t e the mean , s tandard dev ia t i on , and i n t e r v a l .30 EV <− mean( outdegrees )31 SD <− sd ( outdegrees )32 MoE <− 1 .96 ∗ SD/sqrt ( length ( outdegrees ) )3334 cat ( ”The 95% con f idence i n t e r v a l f o r the product l i n k s i s : ( ” , EV − MoE, ” , ” , EV

+ MoE, ” ) \n” )35 }36

19

Page 21: ECS 132 Final Project - University of California, Davisheather.cs.ucdavis.edu/~matloff/132/ExemplarTermProjects/...ECS 132 Final Project Rajiv Battula Alex Devlin Edward Huang Gowtham

37 prob2b <− function ( ) {38 count <− 03940 for ( i in 1 : ( length ( myline )/2) ) {41 u b e r l i s t [ [ 1 ] ] [ [ ( amazon [ i , 1 ]+1) ] ] <<− NA42 }43 for ( i in 1 : ( length ( myline )/2) ) {44 u b e r l i s t [ [ 1 ] ] [ [ ( amazon [ i , 1 ]+1) ] ] <<− append( u b e r l i s t [ [ 1 ] ] [ [ ( amazon [ i , 1 ]+1) ] ] , (

amazon [ i , 2 ]+1) )45 }46 for ( i in 1 : ( length ( myline )/2) ) {47 u b e r l i s t [ [ 1 ] ] [ [ ( amazon [ i , 1 ]+1) ] ] <<− as . vector (na . omit ( u b e r l i s t [ [ 1 ] ] [ [ ( amazon [ i

, 1 ]+1) ] ] ) )48 }4950 }

20