SAS Shootout - Analysis on Baby Boomer Population

35
SAS Shootout 2016 Group 5 Contents 1. Executive Summary ............................................................................................................................................................................................... 2 2. Problem Understandings ....................................................................................................................................................................................... 3 i. Healthcare ......................................................................................................................................................................................................... 3 ii. Labor forces landscape..................................................................................................................................................................................... 3 iii. Tax & Social Security ....................................................................................................................................................................................... 3 2. Approach ............................................................................................................................................................................................................... 3 Research question #1........................................................................................................................................................................................... 4 Research question #2........................................................................................................................................................................................... 4 Research question #3........................................................................................................................................................................................... 4 3. Literature Review .................................................................................................................................................................................................. 5 i. Healthcare ......................................................................................................................................................................................................... 5 ii. Labor forces ..................................................................................................................................................................................................... 5 iii. Tax and Social Security .................................................................................................................................................................................... 6 4.Data Exploration ..................................................................................................................................................................................................... 6 Disease Table ....................................................................................................................................................................................................... 7 Labor Force Population ...................................................................................................................................................................................... 13 Social Security Tax Table .................................................................................................................................................................................... 14 5. Data Preparation ................................................................................................................................................................................................. 14 Disease data ....................................................................................................................................................................................................... 14 Labor force population ...................................................................................................................................................................................... 15 Social Security Tax.............................................................................................................................................................................................. 17 Estimation of collected Taxes ....................................................................................................................................................................... 17 Estimation of Tax Benefits ............................................................................................................................................................................ 18 6. Modelling ............................................................................................................................................................................................................ 17 Problem question #1: Trend of diseases ........................................................................................................................................................... 19 Problem question #2: Trend of Labor force Population ..................................................................................................................................... 23 Problem question #3: Estimation of the year that the Tax Benefit paid out exceeds the Collected Taxes ........................................................ 25 7. CONCLUSIONS ..................................................................................................................................................................................................... 26 8. References ........................................................................................................................................................................................................... 27

Transcript of SAS Shootout - Analysis on Baby Boomer Population

Page 1: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Contents 1. Executive Summary ............................................................................................................................................................................................... 2 2. Problem Understandings ....................................................................................................................................................................................... 3

i. Healthcare ......................................................................................................................................................................................................... 3 ii. Labor forces landscape ..................................................................................................................................................................................... 3 iii. Tax & Social Security ....................................................................................................................................................................................... 3

2. Approach ............................................................................................................................................................................................................... 3 Research question #1 ........................................................................................................................................................................................... 4 Research question #2 ........................................................................................................................................................................................... 4 Research question #3 ........................................................................................................................................................................................... 4

3. Literature Review .................................................................................................................................................................................................. 5 i. Healthcare ......................................................................................................................................................................................................... 5 ii. Labor forces ..................................................................................................................................................................................................... 5 iii. Tax and Social Security .................................................................................................................................................................................... 6

4.Data Exploration ..................................................................................................................................................................................................... 6 Disease Table ....................................................................................................................................................................................................... 7 Labor Force Population ...................................................................................................................................................................................... 13 Social Security Tax Table .................................................................................................................................................................................... 14

5. Data Preparation ................................................................................................................................................................................................. 14 Disease data ....................................................................................................................................................................................................... 14 Labor force population ...................................................................................................................................................................................... 15 Social Security Tax.............................................................................................................................................................................................. 17

Estimation of collected Taxes ....................................................................................................................................................................... 17 Estimation of Tax Benefits ............................................................................................................................................................................ 18

6. Modelling ............................................................................................................................................................................................................ 17 Problem question #1: Trend of diseases ........................................................................................................................................................... 19 Problem question #2: Trend of Labor force Population ..................................................................................................................................... 23 Problem question #3: Estimation of the year that the Tax Benefit paid out exceeds the Collected Taxes ........................................................ 25

7. CONCLUSIONS ..................................................................................................................................................................................................... 26 8. References ........................................................................................................................................................................................................... 27

Page 2: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

1. Executive Summary In this project, we have been focusing on the effects of changing population on three main areas: Healthcare (demands for services, and disease prevalence), overall labor forces landscape, and the U.S. government’s policies on tax and social security. The population and demographic shifts are constantly changing, having various impacts on different areas of our society. By understanding the previous trends, we can anticipate the future effects of them, and be able to provide appropriate solutions. As a start, we started with the data understanding and data exploration part. For the disease dataset, line graphs are drawn to analyze the trend of the diseases to find which age groups are being affected by which type of diseases. The number of disease cases in various years are also studied to read its trend. The disease dataset is joined with the census dataset in order to find the percentage of total population affected by a particular disease. Similarly, this type of study is performed on the labor population as well. The trend of various sectors of work force is analyzed and from that the future trends are predicted. For the social security revenue, we have proceeded by calculating an average tax pay out for the retired population and have assumed that the amount will not change in the future by neglecting the factors of inflation. We have also assumed the tax rate to be constant. Using the projected labor force population, we have then calculated the tax in and tax out and have identified the year at which the payout will exceed the tax revenue. In the healthcare sector, we identified that the trend of cancer reduces as the years proceed. Whereas the disease osteoarthritis is stagnant as we could see that the trend remains the same in the future. But the diseases, Dementia and Septicemia tends to increase in the future so we would recommend the government to concentrate on these two diseases in investing on research purpose on these two diseases so that the impact can be reduced future. The changes in the demography will have a major impact on the work force. The healthcare emphasis would shift to gerontology. More people will be involved in the care of the older people. The work environment has to be more friendly to the senior citizens as they consist of more than twenty percent of the population. In education, instead of K12, the government would need to focus more on retraining of workers. The social security corpus will be in negative by the year 2038. By increasing the tax rate or decreasing the benefit pay out amount, this year shifts to 2052. The practical implications of increasing the tax or decreasing the benefits are huge and politically very sensitive.

Page 3: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

2. Problem Understandings i. Healthcare Disease prevention and improved treatment are two different important strategies in healthcare. People could be given warnings of disease years before they occur. It would help them save money and increase the chance of cures. However, not all the diseases are predictable. For some diseases, it is very difficult to find the associations between cases and gender, age, and other factors. In this project, we will find the association between disease case numbers and the gender and age, projecting which diseases are more likely to be prevented. Diseases and sick time have a very huge impact on the economy too. Another issue in healthcare industry is the limitation of funds and human resources. If we could find the patterns of the disease over years, we could predict which disease would be prevalent in next few year or decades. Then, pharmaceutical company would focus on researching treatments for the specific trending disease, which is beneficial to both companies and patients. In this project, we will find the peak year of each disease and its future trends. Utilizing the time series methodology, we will predict the coming prevalent disease in the future. ii. Labor forces landscape Change in the demography (Aging population) will affect the employment and availability of workforce in different sectors. The spread in the age group will also differ. Aging population will require more health care and less childcare and education. Total sum game will be same. Child care will shift to Health Care importance being given to gerontology. Education will shift from K-12 to adult retraining. As people in the Middle Ages will need retraining to be able to work longer. Machinery and work environments need to be redesigned for aged population. iii. Tax & Social Security Current situation: Established in 1935, the government has collected 18 trillion dollars in taxes paid, and paid out 15.2 trillion. In the next 16 years, the entire boomer cohort will reach the retirement age, leading to a large burden on the government by receiving less tax revenue and paying more to eligible tax benefit recipients. This problem is resolved in multiple ways by different governments: -

1. Reducing the payout Amount 2. Increasing the Retirement Age 3. Increasing the working Age optionally 4. Giving incentive to work at old age by providing % of SS. 5. Increasing the Taxes in other fields 6. Reserving a % of work for old people

Page 4: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

2. Approach With CRISP-DM as the base of our research approach, we have used the data provided by Center for Disease Control and Prevention, Bureau of Labor Statistics, and Tax and Social Security data in SAS products such as base SAS, JMP, and SAS EM, to answer the following research statements.

Research question #1 Assuming that the effect of technology changes on diagnosis and treatment procedures will continue forward, what is the past and future trend of disease prevalence and treatments within the population segments Research question #2 The future relationship between the numbers of the industry workers as a function of the population segments. Its effect other systems. For Example, how the change in population affects the social security revenue Research question #3 Examining effects of changing labors population on Tax & Social Security policies and identifying the year at which the tax payout exceeds the revenue.

Page 5: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

We have studied the current policies of Tax and Social security. Given the data of Occupation and Pay, and Social Security (tax rates and capped wage), we have estimated the average annual amount of taxes the U.S. government collects from a working person as well as the average annual amount of money they pay a retired person. Moreover, with the projected labor force data, we could predict the time when the Social Security fund (mostly contributed by the collected tax) will not have enough money for paying benefits to recipients (which might lead to a financial crisis for the government if it is unpredictable). Next, we’re going to examine different strategies of tax and social security policies, anticipating the future effects of each changing policy. Two main strategies that we have examined are:

- Increasing Tax Limit. - Increase Payroll Tax Rate for employee/employer. With the current policies, it’s anticipated that the Tax Benefit will exceed the Social Security fund in 2038.

3. Literature Review We have reviewed various research papers, and articles related to the changing population and its impacts in a variety of fields. i. Healthcare Recent articles in Newsweek and the Washington Post, based on research published in the peer-reviewed literature, report that most measures to prevent disease might not save money. Further, they noted, the presidential candidates who advocate increased use of preventive services as a cost-saving device are misguided and setting false expectations that prevention can cure the ailing U.S. health care system. Taking into consideration technological changes in medicine and industry, we can see in the future:

- Medicine – Preventive cure for Heart Attack - Industry – Auto Drive Cars and Trucks are commercially successful leading to fewer accidents.

Decrease in drive times and efficiency in Logistics Appendix C. Motor Vehicle Accident Figures and Heart Failure stats. ii. Labor forces The X11 procedure, an adaptation of the U.S. Bureau of the Census X-11 Seasonal Adjustment program, seasonally adjusts monthly or quarterly time series. The procedure makes additive or multiplicative adjustments and creates an output data set containing the adjusted time series and intermediate calculations. The X11 procedure also provides the X-11-ARIMA method developed by Statistics Canada. This method fits an ARIMA model to the original series, and then uses the model forecast to extend the original series. This extended series is then seasonally adjusted by the standard X-11 seasonal adjustment method. The extension of the series improves the estimation of the seasonal factors and reduces revisions to the seasonally adjusted series as new data become available. The X11 procedure incorporates sliding spans analysis. This type of analysis provides a diagnostic for determining the suitability of seasonal adjustment for an economic series.

Page 6: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

We can plot the trend analysis along with seasonality for the Labor force employment. The requirement is to have 36 monthly or 12 quarterly readings. We have annual readings and we cannot use the same here. So we calculated the quarterly figures for the given data and did the analysis.

iii. Tax and Social Security Payroll taxes are taxes that are levied on the gross wages of workers. From the inception of the Social Security program through 2013, payroll taxes have constituted 96% of all income to the Social Security program (except interest on the Trust Fund). The Social Security Act of 1935 set the taxable maximum at $3,000 ($113,700 in 2013). Income earned above this amount was not subject to Social Security taxes. This threshold was a fixed amount that was not indexed for inflation or wage levels. The social security benefit amounts are generally related to the amount of Social Security payroll taxes paid by workers over the course their lifetimes. The Social Security Administration has an Online Calculator that provides an estimate of monthly old-age benefits based upon your earnings, birth date, and expected retirement age. The results can be delivered in either today's dollars or in future (inflated) dollars.

Page 7: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Tax benefit calculation The formula is based on the average indexed monthly earnings, or AIME, in the 35 highest-earning years after age 21 up to the Social Security wage base. In 2012, this ceiling is $110,100. "If a person works (fewer) than 35 years, missing years are filled in with zeros. If they have worked more than 35 years, only the highest-earning years will be considered," says Charles C. Scott, president of Pelleton Capital Management Ltd. in Scottsdale, Ariz. Bulankov says earnings from a worker's 35 highest-earning years are tallied at age 62 and indexed for inflation, resulting in the AIME. The Social Security Administration determines the primary insurance amount, or PIA, by applying a PIA formula to the AIME. The AIME is "divided into three segments, called bend points (which are adjusted each year for inflation), giving you the worker's PIA," says Scott. For example, assume you have a 62-year-old born in 1950 whose total indexed earnings over his 35 highest-earning years were $2 million. The $2 million divided by 420 months gives the worker an AIME of $4,762. The first bend point, $767 of the AIME, is multiplied by 90 percent. The difference between $767 and the second bend point of $4,624 ($3,857) is multiplied by 32 percent. The amount more than $4,624 ($138) is multiplied by 15 percent. These percentages and limits are set by the SSA. So let's apply this formula to find out what the Social Security benefit would be at full retirement age. The first bend point gives you a benefit of $690.30 ($767 x 0.9 = $690.30). The second bend point gives you a benefit of $1,234.24 ($3,857 x 0.32 = $1,234.24) The third bend point gives you a benefit of $20.70 ($138 x 0.15 = $20.70).

The sum of all of these amounts is $1,945.24. Because amounts are rounded down to the next-lowest dime, this worker's PIA, which is the amount the worker would receive at full retirement age (66), is $1,945.20. * Due to the nature of data provided, we are going to apply another formula to calculate the Social Security benefit. We will explain this formula later in this report. 4.Data Exploration Disease Table In the disease dataset, we notice over this 13 years’ cancer is the most prevalent disease comparing to the other diseases. Dementia and Alzheimer's disease are the least prevalent diseases. In further exploration of the data, we can see that the rate of a person getting cancer increases as the age of the

Page 8: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

person increases, i.e., the older the patient is, there is higher probability of cancer. The number of people affected by cancer drops after the age of 75. This might be because of the lethality of the cancer disease.

2013 2000

Page 9: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Above is the trend of Asthma cases from 1990 to 2010. We notice over these 21 years; Asthma cases is almost stable. However, in 1988 it comes the bottom of the Asthma cases. In year 2003 it comes the peak of Asthma. Asthma is a chronic respiratory disease characterized by episodes or attacks of inflammation and narrowing of small airways. It could be caused by allergens (e.g., pollen), infections, exercise, changes in the weather, and exposure to airway irritants (e.g., tobacco smoke). Since it is a chronic disease and could be triggered by various attacks. We could not say in 1998 the weather or air quality was worse than other years, or that in 2003 was better than other year. Therefore, over all Asthma cases keeps very steady over these 21 years.

Above is the trend of cancer cases from 1990 to 2010. We can see the cancer cases is dropping down significantly. Cancer therapy developed very fast over recent 20 years on both chemical and physical therapy. And now, many biotech companies are developing a more advanced tech, such as Car-T cell therapy or Targeted Cancer Therapies, to cure the cancer more effectively. Cancer has been considered the most fatal killer in the world over years. Many elites on Biotech industry and other related industries focused on this. It looks they achieved a big progress.

Above is the trend of Osteoarthritis (OA) cases from 1990 to 2010. The cases of OA are obviously growing very fast over past 21 years. It is reasonable because OA is a very common joint disease. Medical care for

Page 10: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

osteoarthritis patients in the United States costs $185.5 billion a year, according to a new study. Of that amount, insurers pay $149.4 billion while patients pay $36.1 billion in out-of-pocket costs. By the research of Arthritis Foundation, 1. One in two adults will develop symptoms of knee OA during their lives. 2. One in four adults will development symptoms of hip OA by age 85. 3. One in 12 people 60 years or older have hand OA. So as total population grows over years. The cases that people get OA would be increasing significantly. Meanwhile, still there are more female patients that male patients. By 2010, we did not see any sign of the peak. So probably the case number of OA would still be growing in next few years. This will be verified in the time series modeling.

Above is the trend of Septicemia cases from 1990 to 2010. It has a similar curve with that of OA. Keeping increasing over years. But of Septicemia the female cases and male cases are very close. Septicemia is a serious bloodstream infection. It must be treated in the hospital. If left untreated, septicemia can progress into sepsis. Treatment for sepsis often involves a prolonged stay in the intensive care unit and complex therapies, which will incur high costs. Thus in the future research we will pay more attention on these diseases to see how it will develop and how will it affect the whole cost of health care.

Page 11: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Above is the trend of Dementia cases from 1990 to 2010. In 2010 dollars, the cost associated with dementia treatment and care is estimated to be between $42,000 and $56,000 per person per year. Because about 15% of persons 70 and older have dementia, the total cost in the US is an astounding $157 billion to $215 billion per year. So how Dementia will develop will affect health care cost effectively. From the above plots, we notice that from 1990 to about 2000, the average cases of Dementia keep increasing. However, after 2000, the cases are tending to be steady with some waves. Now we would like to find out which diseasing is occurring more frequently among teenagers. So we focus on the age group 0_17.

Most disease is like above Dementia: 0. Yes. Disease usually do not visit kids. But not all.

The above is the plot of cancer cases and Septicemia age group 0_17. It shows that teenagers could also get cancer even though the change is very little. The data to be applied for time series analysis Why we only pay attention on age group 55-64; We will only use the time series dataset of each disease in age group 55-64, because the US retirement age is 67. In that case, people in age group 55-64 will soon become a group that not working but suffering more disease. The health condition of this age group will affect the health care cost significantly in next few years. This data is like below:

Page 12: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Figure 1: Septicemia 55_64

We notice the cases of Septicemia of age group 55_64 is increasing in recent 8 years significantly. So we probably will conduct a time series analysis for this group.

Figure 2:Dementia_55-64

Dementia is not a big issue in age group 55_64. Thus we will not conduct time series analysis on Dementia.

Page 13: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Figure 3:Osteoarthritis_55-64

It seems Osteoarthritis would be a big problem in next few years in age group 55-64. We will conduct time series analysis on Osteoarthritis.

Figure 4:Cancer_55-64

By above plots, cancer is going less and less, which means we probably would spend less for cancer in the future. We will find it out in time series analysis. Labor Force Population Analyzing the labor force data population, we can see that the age group 16-19, 35-44 is reducing in number as we proceed from the year 2000 to 2013. From this we can understand that the aging population is increasing in number comparing to the younger population, which supports our problem statement of the baby boomer. Apart from the population strength, we can also see there is a trend of people moving more towards the management, service related works and there is a drop in the sales, production related occupation works. The construction, natural resources work seems stagnated around all the years.

Page 14: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Social Security Tax Table After studying the wage limit trend, we can see that the value raises in a gradual manner from year to year and there are no spikes in the trend. By doing further literature review, we came to know that although the nominal value of the tax max has grown from $3,000 in 1937 to $113,700 in 2013, in inflation-adjusted dollars the tax limit declined from 1937 until the late 1960s, and then grew once it was indexed to wage growth in 1975. In wage-adjusted dollars, the tax limit has remained roughly constant since the mid-1980s. 5. Data Preparation Disease data The absolute values in case of disease is not useful for predicting the future because the total population changes over years. One cannot say the disease become more prevalent only because you find the case number increases. Thus we join the disease and census table to calculate the percentage of the case among the total population. In the given disease table, we notice that we have missing values and imputation doesn’t look reasonable and meaningful. For example, Dementia and Alzheimer’s disease occurs only in older age group. This is the reason there is no value for those disease in the younger age groups. Any imputation with a value for these age group is incorrect. So our solution is to leave the missing value alone. We will not impute them nor let them be the noise of the research. For GLM modelling only we will replace it with 0. Disease and census table join We can see that the age interval is different between the disease data and the census data. In order to normalize the age interval, we combined the age group of the census population so that it is in accordance with the disease population. The final age interval groups are 0---17, 18---24, 25---44, 45---54, 55---64, 65---74, 75---84, 85+. Apart from this, the age column values differ in the disease and census table. In order to join these two tables, the age value columns are updated in the census data in accordance with the disease data. Detailed SQL queries are stated in the Appendix II For modelling purpose and uniformity in our mind, we are only going to use the gender – ‘male’ and ‘female’. We will be neglecting the observations with gender as ‘all’.

Page 15: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

After all these manipulations, the percentage of the disease in the total population is calculated for each year and for each age groups. This percentage gives us an idea of how much the disease is in distributed among the total population. The raw data is transposed in a way that the data can be used in the time series analysis. Below is the sample of the transformed data.

Above is the time series data group by disease and gender. For example, Asthma_All is the sum of case number of different age group of both sex. Above is the time series data group by disease and gender in age group 0-17. For example, Asthma_All is the sum of case number of different age group of both sex in age group 0-17.

Above is the time series data group by disease and gender in age group 55-64. For example, Asthma_All is the sum of case number of different age group of both sex in age group 55-64. Labor force population Similar to the diesase dataset, the age intervals of the labor force and census are different. So we normalized the census dataset in accordance with the labor force dataset. The age intervals are as follows 16 to 19 years,20 to 24 years,25 to 34 years,35 to 44 years,45 to 54 years, 55 to 64 years, 65 years and over.

Page 16: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Labour force and census table join The census population is merged in such a way that the age intervals are in the above mentioned groups. The gender ‘both sexes’ are used for the modelling purpose as the gender variable is not found in the labor force population. After all these manipulations, the percentage of the disease in the total population is calculated for each year and for each age groups. This value gives us an idea on the trend of the distribution of the labor force in various sectors. Below is an example of the resulted data.

For Timeseries modeling we need the data as a series of observations at different points of time. For a consisent look and to use the Macro’s ability of the SAS we have redrawn the data into wide format. We have reformatted the data such that the data has 13 row with 13 years and population of each age group in the each column. Below is the sample of the reformatted data.

The columns are coded as follows: First Letter Industry: M - Management, professional, and related occupations N - Natural resources, construction, and maintenance occupations P - Production, transportation, and material moving occupations S - Sales and office occupations T - Service occupations

Page 17: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Age Groups Coding 1619 - 16_to_19_years 2024 - 20_to_24_years 2534 - 25_to_34_years 3544 - 35_to_44_years 4554 - 45_to_54_years 5564 - 55_to_64_years 6599 - 65_years_and_ove This helped us to develop a single and efficent code to run the time series on all the variables. Social Security Tax In order to predict the time at which the tax benefit will be exceeding the Social Security fund, and due to the limitation of data given, we have made the following assumptions

All non-working individuals over 65 will receive tax benefit. The annual mean wage of each occupation will not change through years (data acquired in 2014). The social security tax rate = 0.124 will not change. The Social Security fund before 2000 is 0.

The Labor Force Population data, Occupation and Pay data, and Social Security data are used to estimate the tax collected each year, and the amount of tax benefit paid each year. Last but not least, the formated and projected labor force population data has been utilized to examine the Tax and Social Security policies. Estimation of collected Taxes Using 2013 Wage limit, assuming it won’t change, and past data of occupation and pay, we can calculate annual Taxable Wage from Annual Mean Wage. Multiply it by 12.4% we have the average Annual Collected Tax for each person in each occupation.

(Creating Annual Collected Tax variable) Then, we can calculate the mean of annual collected tax; multiply it by the labor population to get the total collected taxes of a specific year. We have decided to take the mean of Annual Collected Tax of all occupations as the average amount of taxes the government collects from a working person and his/her employer in a working year:

Page 18: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Annual Collected Tax = $6,653.1111 Total Collected Taxes (of a specific year) = Labor Force Population * Mean of Annual Collected Tax

Estimation of Tax Benefits Since we assumed the mean wage wouldn’t change, we can calculate the tax benefit based on formula provided by current government (Appendix). - AIME is the Average Index Monthly Earning, which, we assumed, equals Annual Mean Wage / 12

=> We can get the Annual Tax Benefit (Paid out) for each occupation:

Unfortunately, we don’t have data of retired population in each industry. Thus, we decided to take the mean of Annual Tax Benefit (paid out) as an average amount of tax benefit a pensioner will receive in a year of their retirement ages.

Page 19: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

=> Annual Tax Benefit = $21,203.496 Total Tax Benefit (Paid out) (in a specific year) = Retiring Population * Mean of Annual Tax Benefit (Paid out) 6. Modelling Research question #1: Trend of disease Time series analysis: In time series analysis we have applied AR(1) AR(2) MA(1) MA(2) ARIMA(1,0,1) ARIMA(1,1,1) to find the clue how disease cases changes by year. Cancer in age group 55-64

Page 20: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Above is the output of ARIMA(1,1,1). All other time series models have the similar result. The model is not working well since it generate a large AIC(175.9), BIC(178.9). Thus this will not be a valid forecasting model. However, the forecasts could be used to estimate the trend of cancer cases, which is a downward trend. So we can suggest that the cancer cases will decrease in next few years, so does the health care cost on cancer.

Page 21: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Osteoarthritis

Above is the output of ARIMA(1,1,1). All other time series models have the similar result. The model is not working well since it generate a large AIC(181.36), BIC(184.2). Thus this will not be a valid forecasting model. However, the forecasts could be used to estimate the trend of cancer cases, which is a flat trend. So we can suggest that the Osteoarthritis cases will probably not change very much in next few years, so does the health care cost on Osteoarthritis.

Page 22: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Septicemia

Above is the output of ARIMA(1,1,1). All other time series models have the similar result. The model is not working well since it generate a large AIC(132.71), BIC(132.54). Thus this will not be a valid forecasting model. However, the forecasts could be used to estimate the trend of cancer cases, which is a flat trend. So we can suggest that the Osteoarthritis cases will probably increase, so does the health care cost on Septicemia. Dementia

Page 23: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Above is the output of ARIMA(1,1,1). All other time series models have the similar result. The model is not working well since it generate a large AIC(101.1), BIC(103.9). Thus this will not be a valid forecasting model. However, the forecasts could be used to estimate the trend of cancer cases, which is a flat trend. So we can suggest that the Dementia cases will probably increase in next few years, so does the health care cost on Dementia. Problem question #2 : Trend of labor force population

Page 24: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Models Suppose that a response variable can be predicted by a linear function of a regression variable. You can estimate, the intercept, and , the slope. We presumed that the working population is dependent on the Total Population. Used the PROC REG to score the future working population age group wise for different industry. The model was not good as we got negative numbers for 16-19 age group and the trend was static. The ARIMA procedure analyzes and forecasts equally spaced univariate time series data, transfer function data, and intervention data by using the autoregressive integrated moving-average (ARIMA) or autoregressive moving-average (ARMA) model. Used the Arima model to predict the future working population. The basic assumptions of the time series modeling are not fulfilled. We should have a large amount of historical data. The future data that can be plotted is for point t+1, t+2, t+3. But we have 12 years’ data for the past and want an estimate for 45 years in the future. The same problem is faced with the Proc X11 which is based on the US Census estimation logic.

Page 25: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

Problem question #3: Estimation of the year that the Tax Benefit paid out exceeds the Collected Taxes Step 1: Given the projected data of labor force and retiring population, we have created a variable Fund, which is the government’s Social Security fund.

The result illustrates that, from 2038, with the current Tax and Social Security policies, the Social Security fund will have negative values, which means the government will not have enough money to pay benefits to recipients after 2038. Step 2: Assessing effects of changing policies: 2.1. Increasing the Tax Limit 10 per cent ( Tax Limit = 113,700*1.1). Apply the same procedure to calculate the average annual collected tax from 1 working person, we have the Annual Collected Tax = $6,701.6042 Change X to 6701.6042 in the previous program we have:

Conclude: If we increase the Tax Limit by 10 per cent and keep everything else constant, the Social Security Fund will not have money to pay benefit to recipients after 2039.

Page 26: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

2.2. Increase 1% of Payroll Tax Rate for employee/employer (tax rate = 12.4 +1 = 13.4) Apply the same procedure to calculate the average annual collected tax from 1 working person, we have the Annual Collected Tax = $7,189.6523

Conclude: If we raise the Tax Rate by 1 percent, keep everything else constant, the government’s social security fund will have enough money to pay recipicient until 2052. 7. CONCLUSIONS Healthcare The cost of health care in Cancer would be less in next several years. Cost of Osteoarthritis cases would not change much. However, Dementia and Septicemia will cost more since the cases seems to be increasing by year. So we suggest the government could collect more detailed data of the above two diseases to forecast how the cost will change. The data could be like how each patient cost at different phase of it. Social Security Revenue In the next 16 years, the entire boomer cohort will reach retirement ages. The U.S. government’s Social Security system will probably face serious challenges of paying benefit to recipients since then. In this paper, we have examined the changing demographic and age structure in the U.S, as well as the effects of changing policies on the U.S Tax and Social Security System. Our model predicted that the Social Security fund will run out of money after 2038 with the current policies. That prediction absolutely make sense, since a large working population get retired at that time. Next, we have tested how changing policies might affect the social security system. Our model suggested that raising the Tax Limit (threshold) by 10% doesn’t really make a difference (out of fund in 2039); however, increasing the Tax Rate for employee/employer by 1% (0.5% for each employee and each employer) keeps the Social Security Fund having a positive balance until 2052. Since there are still many factors that can affect the Tax and Social Security system, it is impossible to conclude which policy is the best one after that project. The U.S government might want to take that project’s result as suggestions when building the development strategies.

Page 27: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

8. References 1. “Social Security Facts” By James D. Agresti and Stephen F. Cardone, Just Facts, January 27, 2011.

Revised 4/6/2016. <justfacts.com/socialsecurity.asp> 2. https://www.ssa.gov/policy/docs/policybriefs/pb2011-02.html 3. http://health.usnews.com/health-news/family-health/pain/articles/2009/11/30/osteoarthritis-

costs-us-over-185-billion-a-year 4. 5. http://www.healthline.com/health/septicemia#Overview1

Page 28: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5

I. Appendix I Disease data libname shootout 'M:\wga\shoot out'; proc contents data=shootout.census_population; run; proc contents data=shootout.disease_cases; run; proc print data=census(obs=9); ******var year; run; proc sql; create table shootout.census_bothsex (year CHAR, gender char , age char, population num ); insert into shootout.census_bothsex (year,gender,age,population) select date,gender,age_group,Population_in_thousands from shootout.census_population where date not in ('2013','2012','2011') and gender='both sexes'; quit; proc sql; create table shootout.census (year CHAR, gender char , age char, population num ); insert into shootout.census (year,gender,age,population) select date,gender,age_group,Population_in_thousands from shootout.census_population where date not in ('2013','2012','2011') quit; proc sql; create table shootout.census_allage (year CHAR, gender char , age char, population num ); insert into shootout.census_allage (year,gender,age,population) select date,gender,age_group,Population_in_thousands from shootout.census_population where date not in ('2013','2012','2011')and age_group='Population_all_ages'; quit; proc sql; create table shootout.test (year num, gender char , age char, population num ); insert into shootout.test (year,gender,age,population) select date,gender,age_group,Population_in_thousands from shootout.census_population where date not in ('2013','2012','2011')and age_group='Population_all_ages'; quit; proc print data=shootout.test(obs=8); run;

Page 29: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5 proc print data=shootout.census(obs=9); run; data disease; set shootout.disease_cases; ***************census regroup by age_group***************census1; data shootout.census1; set shootout.census_population; *******input run; proc sql; update shootout.census1 set gender= case when gender='both sexes' then 'All' when gender='female' then 'Female' when gender='male' then 'Male' end; update shootout.census1 set Age_group= case when Age_group='00_to_04_years' then '0-17' when Age_group='5_to_13_years' then '0-17' when Age_group='14_to_17_years' then '0-17' when Age_group='18_to_24_years' then '18-24' when Age_group='25_to_29_years' then '25-44' when Age_group='30_to_34_years' then '25-44' when Age_group='35_to_39_years' then '25-44' when Age_group='40_to_44_years' then '25-44' when Age_group='18_to_24_years' then '25-44' when Age_group='25_to_29_years' then '25-44' when Age_group='30_to_34_years' then '25-44' when Age_group='35_to_39_years' then '25-44' when Age_group='40_to_44_years' then '25-44' when Age_group='45_to_49_years' then '45-54' when Age_group='50_to_54_years' then '45-54' when Age_group='55_to_59_years' then '55-64' when Age_group='60_to_64_years' then '55-64' when Age_group='65_to_69_years' then '65-74' when Age_group='70_to_74_years' then '65-74' when Age_group='75_to_79_years' then '75-84' when Age_group='80_to_84_years' then '75-84' when Age_group='85_years_and_over' then '85+' else 'else' end; delete from shootout.census1 where age_group='else' ; quit; proc sql; select * from shootout.census1 group by gender order by age_group,date ; quit; proc means data=shootout.census1 sum; class date gender age_group;

Page 30: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5 var Population_in_thousands; run; data SHOOTOUT.disease; set SHOOTOUT.DISEASE_CASES; date=put(year,z4.); run; *******proc contents data=SHOOTOUT.disease; *******run; proc sql; alter table SHOOTOUT.disease drop year; quit; proc print data=SHOOTOUT.disease(obs=10); run; *********************end of update*****************; **************join two tables**************; data shootout.census3; set scensus2; data shootout.disease; set SHOOTOUT.DISEASE_CASES; Age_group=(age,$5.); date=input(year,BEST12.); gender=input(gender,$6.); case=input(Cases_in_1000s,comma12.); run; proc sort data=shootout.disease; by date age_group gender; proc sort data=shootout.census1; by date age_group gender; proc print data=shootout.census1(obs=4); run; data two; merge shootout.census1 shootout.disease; by date age_group gender; run; proc print data=shootout.census1(obs=8); run; proc sql; select * from SHOOTOUT.disease natural join shootout.census2; quit; left join shootout.census1 on disease.gender=census1.gender and disease.Age_group=census1.Age_group and disease.date=census1.date; proc print data=one(obs=3); var one; quit; *********data exploration******* ; /* Define the titles */ title1 "Currency Percent Change Against the U.S. Dollar"; title2 "Quarterly Data - Baseline 1/1/2008";

Page 31: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5 /* Define the axis characteristics */ axis1 label=none order=('01Jan2008'd to '01Jan2013'd by year) offset=(1,1)pct; axis2 label=(angle=90 "Percent Change") minor=(n=1); /* Define the symbol characteristics */ symbol1 interpol=join color=vibg height=14pt font='Arial' value='80'x; /* Euro */ symbol2 interpol=join color=mob height=14pt font='Arial' value='A3'x; /* Pound */ symbol3 interpol=join color=depk height=14pt font='Arial' value='A5'x; /* Yen */ /* Define the legend */ legend1 repeat=1 shape=symbol(5,2.5) label=none frame; /* Create the graph */ proc gplot data=shootout.census2; plot (Euro Pound Yen)*YearQuarter / overlay legend=legend1 haxis=axis1 vaxis=axis2; format YearQuarter year4.; run; quit; Labor force data libnamesasshoot 'D:\3spring2016\SASHOOT'; run; procsql; drop table sasshoot.NewCensus; create table sasshoot.NewCensus ( year numeric, age_groupchar(16),pop numeric); insert into sasshoot.NewCensus select date, '16_to_19_years',population_in_thousands * 0.8 from sasshoot.census_population where gender='both sexes' and age_group = ('15_to_19_years'); insert into sasshoot.NewCensus select date, age_group,population_in_thousands * 1.0 from sasshoot.census_population where gender='both sexes' and age_group = ('20_to_24_years'); insert into sasshoot.NewCensus select date, '25_to_34_years',sum(population_in_thousands) from sasshoot.census_population where gender='both sexes' and (age_group = ('25_to_29_years')orage_group = ('30_to_34_years')) group by date; insert into sasshoot.NewCensus select date, '35_to_44_years',sum(population_in_thousands) from sasshoot.census_population where gender='both sexes' and (age_group = ('35_to_39_years')orage_group = ('40_to_44_years')) group by date; insert into sasshoot.NewCensus select date, '45_to_54_years',sum(population_in_thousands) from sasshoot.census_population where gender='both sexes' and (age_group = ('45_to_49_years')orage_group = ('50_to_54_years')) group by date; insert into sasshoot.NewCensus select date, '55_to_64_years',sum(population_in_thousands) from sasshoot.census_population where gender='both sexes' and (age_group = ('55_to_59_years')orage_group = ('60_to_64_years')) group by date; insert into sasshoot.NewCensus select date, age_group,population_in_thousands from sasshoot.census_population where gender='both sexes'

Page 32: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5 and age_group = ('65_years_and_over'); quit; Procsql; update sasshoot.labor_force_population set age_group = '16_to_19_years' where age_group='16-19years'; update sasshoot.labor_force_population set age_group = '20_to_24_years' where age_group='20-24years'; update sasshoot.labor_force_population set age_group = '25_to_34_years' where age_group='25-34years'; update sasshoot.labor_force_population set age_group = '35_to_44_years' where age_group='35-44years'; update sasshoot.labor_force_population set age_group = '45_to_54_years' where age_group='45-54years'; update sasshoot.labor_force_population set age_group = '55_to_64_years' where age_group='55-64years'; update sasshoot.labor_force_population set age_group = '65_years_and_ove' where age_group='65years and over'; quit; procsql; drop table sasshoot.NewLFP; Create table sasshoot.NewLFP as select a.*, pop*1000 as TotalPop, Labor_Force_Pop / (pop * 1000) * 100 as Percentage from sasshoot.labor_force_population a, sasshoot.NewCensus b where a.year=b.year and a.age_group=b.age_group; quit; Time series SAS code libname shootout 'D:\3spring2016\SASHOOT'; %let case=M2534; title'M2534'; %ar2; %ar2; %ma1; %arima101; %datats; %arima111; footnote'end'; ********************************************; *ARIMA(1,0,0) or AR(1); %macroar1; proc arima data=shootout.Labor_force_ts ; identify var=&case(1); estimate p=1 method=ml; forecast lead=12 id=year out=results; run; %mend ar1; *ARIMA(2,0,0); %macroar2; proc arima data=shootout.Labor_force_ts ; identify var=&case(1); estimate p=2 ; forecast lead=12 id=year out=results; run; %mend ar2; *ARIMA(0,0,1); %macroma1; proc arima data=shootout.Labor_force_ts ;

Page 33: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5 identify var=&case(1); estimate q=1 ; forecast lead=12 id=year out=results; run; %mend ma1; *ARIMA(1,0,1); %macroarima101; proc arima data= shootout.Labor_force_ts; identify var=&case(1); estimate p=1 q=1 ; forecast lead=12 id=year out=results; run; %mend arima101; %macrodatats; data ts; setshootout.Labor_force_ts; dif=dif(&case); run; %menddatats; *ARIMA(1,1,1); %macroarima111; proc arima data= ts; identify var=dif(1); estimate p=1 q=1 ; forecast lead=12 id=year out=results; run; %mend arima111; PROC X11 SAS Code data sales; input sales @@; date = intnx( 'month', '01jan2000'd, (_n_-1)*3 ); format date monyy7.; /* title 'Monthly Labor Force Data (in 1000) in Management, professional, and related occupations Age 35-44'; datalines; 3355500 3355500 3355500 3355500 3360500 3360500 3360500 3360500 3289500 3289500 3289500 3289500 3251500 3251500 3251500 3251500 3250750 3250750 3250750 3250750 3252750 3252750 3252750 3252750 3265000 3265000 3265000 3265000 3318000 3318000 3318000 3318000 3321000 3321000 3321000 3321000 3193250 3193250 3193250 3193250 3113500 3113500 3113500 3113500 3143500 3143500 3143500 3143500 3202500 3202500 3202500 3202500 3248250 3248250 3248250 3248250 ; */ datalines; 11575250 11575250 11575250 11575250 11760750 11760750 11760750 11760750

Page 34: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5 11795000 11795000 11795000 11795000 11982250 11982250 11982250 11982250 12133000 12133000 12133000 12133000 12311250 12311250 12311250 12311250 12604750 12604750 12604750 12604750 12947000 12947000 12947000 12947000 13190250 13190250 13190250 13190250 13054500 13054500 13054500 13054500 12935750 12935750 12935750 12935750 13136500 13136500 13136500 13136500 13511000 13511000 13511000 13511000 13678000 13678000 13678000 13678000 ; proc x11 data=sales; quarterly date=date; var sales; tables d11; run; title 'Monthly Labor Force Data (in 1000) in Management, professional, and related occupations All Ages'; proc x11 data=sales noprint; quarterly date=date; var sales; output out=out b1=sales d11=adjusted; run; proc sgplot data=out; series x=date y=sales / markers markerattrs=(color=red symbol='asterisk') lineattrs=(color=red) legendlabel="original" ; series x=date y=adjusted / markers markerattrs=(color=blue symbol='circle') lineattrs=(color=blue) legendlabel="adjusted" ; yaxis label='Original and Seasonally Adjusted Time Series'; run; title "Management Labor Force"; data ManAge; input date TOTAL A16to19 A20to24 A25to34 A35to44 A45to54 A55to64 A65Abv; datalines; 2000 10055000 454000 2606000 10982000 13422000 12201000 5151000 1485000 2001 10055000 453000 2636000 11107000 13442000 12453000 5482000 1470000 2002 10055000 410000 2581000 10781000 13158000 12705000 6005000 1540000 2003 10055000 354000 2624000 10822000 13006000 12889000 6561000 1673000 2004 10055000 345000 2526000 10793000 13003000 13118000 6988000 1759000 2005 10055000 355000 2587000 10890000 13011000 13122000 7386000 1894000 2006 10055000 358000 2636000 11023000 13060000 13568000 7800000 1974000 2007 10055000 359000 2763000 11428000 13272000 13659000 8304000 2003000 2008 10055000 318000 2815000 11623000 13284000 13824000 8659000 2238000 2009 10055000 303000 2679000 11407000 12773000 13625000 8954000 2477000 2010 10055000 308000 2522000 11365000 12454000 13330000 9176000 2588000 2011 10055000 290000 2611000 11643000 12574000 13277000 9383000 2768000 2012 10055000 339000 2690000 11894000 12810000 13329000 9908000 3074000 2013 10055000 326000 2777000 12084000 12993000 13206000 10055000 3271000

Page 35: SAS Shootout - Analysis on Baby Boomer Population

SAS Shootout 2016 Group 5 ; title 'Monthly Labor Force Data (in 1000) in Management, professional, and related occupations '; proc sgplot data=ManAge noautolegend; scatter x=date y=TOTAL; scatter x=date y=A20to24 / markerattrs=(symbol=asterisk); scatter x=date y=A25to34 / markerattrs=(symbol=asterisk color=green); scatter x=date y=A35to44 / markerattrs=(symbol=asterisk color=green); format date yyc4.; xaxis values=('1jan2000'd to '31dec2013'd by YEAR); refline '1jan2000'd / axis=x; run; Social Security SAS Code libname thaidata 'H:\Documents'; run; data exceeding_year; set thaidata.working_retiring_pop; x= 6653.1111; y= 21203.496; retain fund 0; fund = fund + (x*laborforce- y*retiring_pop); run; proc print data=exceeding_year; var year fund ; format fund dollar20.; run;