Download - Data mining (DM) in the pharmaceutical industry

Data Mining in the Pharmaceutical Industry

Introduction• Data Mining is the process of extracting information

from large data sets through the use of algorithms

and techniques drawn from the field of Statistics,

Machine Learning and Data Base Management

Systems.

• “Mining” means to find something that already exists.

• Therefore, data mining can be defined as a process of

identifying hidden patterns and relationships, and

trends within data.

• Traditional methods often involves:-

1) manual work

2) interpretation of data.

• that is slow, expensive and highly subjective.

• Data Mining, popularly called as knowledge

discovery in

• large data

• Enables organizations to make calculated

decisions by

• Assembling

• accumulating

• analyzing and

• accessing corporate data.

______

______

______

Transformed

Data

Patterns

and

Rules

Target

Data

Interpretation

& Evaluation

Un

de

rsta

nd

ing

Knowledge

DATA

Ware

house

Raw

Dat

a

Integration

• The scope of pharmaceutical applications is large and it

may involve drug manufacturing processes as well as

data processing.

• Data processing and analysis is a key area in the

pharmaceutical industry.

• The vision of a pharmaceutical industry that can be

achieved with data mining.

• pharmaceutical companies delivers drugs, developing

test kits (including genetic tests) and computer

programs to deliver the best drug to the patient.

Pharmaceutical companies can also employ data mining

methods to huge masses of genomic data to predict how

a patient’s genetic makeup determines his or her response

to a drug therapy .

genomic data :-The complete set of chromosomal and

extra chromosomal genes of an organism, a cell, an

organelle or a virus; the complete DNA component of an

organism.

It uses variety of tools like

• Query and reporting tools:-

Analytical processing tools

•Use to analyze database information from multiple database

systems at one time.

Decision Support System (DSS) tools.

• Decision support systems (DSS) are defined as

• interactive computer-based systems intended to help decision makers to utilize data and models in order to

• identify problems, solve problems and make decisions.

DATA MINING TECHNIQUES.

•Many organizations generate

mountains of data about their new

drugs discovered and its

performance reports, etc.

•This data is a strategic resource.

Now, making use of most of these

strategic resources will lead to

•improving the quality of pharma

industries.

• Six important steps in the Data Mining process as

1. Problem Definition.

2. Knowledge acquisition.

3. Data selection.

4. Data Preprocessing.

5. Analysis and Interpretation.

6. Reporting and Use.

Identify the data mining process as

1. Definition of the objectives of the analysis.

2. Selection &Pretreatment of the data.

4. Explanatory analysis.

5. Specification of the statistical methods.

6. Analysis of the data.

7. Evaluation and comparison of methods.

8. Interpretation of the chosen model.

1. Definition of the objectives of the analysis.

Understanding the project objectives and

requirements from a business perspective and then

converting this knowledge into a data mining

problem definition with a preliminary plan

designed to achieve the objectives.

Relevant data sources for the pharma industry are:

•clinical data (patient data, pharmaceutical data,

medical treatments, length of stay);

•administrative data (staff skills, overtime, nursing

care hours, staff sick leave);

• financial data (treatment costs, drug costs, staff

salaries, accounting, cost-effectiveness studies); and

• organizational data (room occupation, facilities,

equipment).

Data mining is used to support:

•The clinicians at the point of care delivery;

•The controlling of clinical treatment pathways;

•The administrative and management tasks; and

•Efficient management of organizational and

financial data.

Associations, Mining Frequent

Patterns.

• These methods identify rules of affinities

among the collections.

• rules of affinities:- relationships among

data

• That the patterns occur frequently during

Data Mining process.

• The applications of association rules

include market basket analysis

• attached mailing in direct marketing

• Fraud detection

• department store floor/shelf planning etc.

•Association of training undertaken diseases

with drugs

•Association and analysis of staff movements

•Application tracking mechanism in

physicians adopting drugs with customer’s

prescription

Classification And Prediction.

• The classification and

prediction models are two

data analysis techniques

that are used to describe

data classes and predict

future data classes.

• E.g. A credit card company

whose customer credit

history is known can

classify its customer Record

as

• Good, Medium, or Poor.

•Predicting consumer behavior

•Predicting the likelihood of success in a drug

adoption process

•Predicting the percentage accuracy in performance of

a drug

•Classifying the historical health records

•Prediction of what type of drugs most likely to be

retained, most likely to be left, most likely to

transform their composition.

Predicting pharma product behavior and attitude

•Predicting demand projections by seasonal variations

•Predicting the performance progress of segments

throughout the performance period

•Identifying the best profile for different drugs

•Classify trends of movements through the

organization for successful/unsuccessful patient

historical records

•Categorization of drugs, diseases and patients.

• The models of decision

trees, neural networks

based classifications

schemes are very much

useful in pharma industry.

• Decision trees:- Decision-tree is a common knowledge

representation used for classification.

• In classification, one is given data from a specific

instance, and the decision tree predicts, based on the

data, into which of two or more classes the instance

belongs.

• Each instance contains data from multiple attributes.

• Instances are collections of previously acquired data

which are sorted into class labels.

• It does so by determining which tests best divide the

instances into separate classes, forming a tree.

• Neural Networks

–Learn through training

–Resemble to biological

networks in structure

–Can produce very good

predictions

–Not easy to use and to

understand

–Cannot deal with

missing data

Uses Bayesian neural networkPrior probability is probability that any report contains reference to adverse eventPosterior probability is probability that report has link between drug and adverse eventDetermines “strength” of link between adverse event and drug (called Information Component or IC)More complicated than appears: patient may consume multiple drugs – which one caused adverse event?

Bayesian Neural Network

Adverse Event

Drug

Strength of link between adverse event and drug

• Classification works on discrete and unordered data, while prediction

works on continuous data.

• E.g. Discrete data This data set shows a group of discrete data.

• This is called discrete data because the units of measurement (for example,

CDs) cannot be split up; there is nothing between 1 CD and 2 CDs

• E.g. Continues data

• This data is called continuous because the scale of measurement - distance -

has meaning at all points between the numbers given, e.g we can travel a

distance of 1.2 and 1.85 and even 1.632 miles.

Music format Number sold

CD albums 140

CD singles 70

Downloads 55

Vinyl 5

Total sales 270

Distance in miles 0.1 0.2 0.6 1.1 1.2 1.8 2.0 2.7 3.4 4.6 6.2 8.0 12.1 14.2

• Regression is often used as it is a

statistical method used for numeric

prediction.

• Primary emphasis should be made on

the selection measurement accuracy

and predicative efficiency of any

new drug discovery.

• Simple or multiple regressions is

the basic prediction model that

enables a decision maker to forecast

each criterion status based on

predictor information.

• neural network technology is useful

from different areas of business.

CLUSTERING.• It is a method by which similar

records are grouped together.

• Clustering is usually used to mean

segmentation.

• An organization can take the

hierarchy of classes that group

similar events.

• Using clustering, patients can be

grouped based on age, name,

diseases etc.

• In business, clustering helps identify

groups of similarities;

• characterize customer groups based

on purchasing patterns, etc.

DATA MINING AND STATISTICS.• The ability to build a successful

predictive model depends on past

data.

• Data Mining is designed to learn from

past success and failures and will be

able to predict what will happen

next (future prediction).

• The Data Mining tool checks the

statistical significance of the

predicted patterns and reports.

The difference between Data Mining

and statistics

• Data Mining automates the statistical process

requiring in several tools.

• Statistical inference is assumption driven in the

sense that a hypothesis is formed and tested

against data.

• Data Mining, in contrast is discovery driven.

That is, the hypothesis is automatically

extracted from the given data.

Data Mining can answer analytical

questions such as:

• what are discovery of new molecules and

issues over it?

• What factors or combinations are directly

impacting the drugs?

• What are the best and outstanding drugs?

• Which drugs are likely to be retained?

• How to optimally allocate resources to ensure

effectiveness and efficiency? etc.

• An intelligent text mining system could provide a platform for extracting and managing specific information at the entity level.

• For e.g. Information pertaining to

• genes

• proteins

• diseases

• organisms

• chemical substance etc can be analytically extracted for patterns .

It would also provide insights into inter relationships such as

• protein-protein

• Gene-gene

• Protein-Chemical

• Gene-Disease and

• Drug-Drug interactions.

• Text mining can be applied to biomedical literature, clinical documents and other medical literary sources for data curation and database population in a semi-automated manner.

Applications Of Data Mining In

The Pharmaceutical Industry

• A lot of information is hidden in the legacy

systems.

• This information can easily be extracted.

• Most of the times this can not be done directly

from the legacy systems, because these are not

build to answer questions that are

unpredictable.

• A user-interface may be designed to accept all kinds

of information from the user (e.g. weight, sex, age,

foods consumed, reactions reported, dosage, length of

usage).

• Then, based upon the information in the databases

and the relevant data entered by the user,

• a list of warnings or known reactions (accompanied

by probabilities) should be reported.

• Note that user profiles can contain large amounts of

information, and efficient and effective data mining

tools need to be developed to probe the databases for

relevant information.

• Secondly, the patient's (anonymous) profile should be recorded along with any adverse reactions reported by the patient, so that future correlations can be reported.

• Over time, the databases will become much larger, and interaction data for existing medicines will become more complete.

• The amount of existing pharmaceutical information pharmacological properties, dosages, contraindications, warnings, etc. is enormous;

• however, this fact reflects the number of medicines on the market, rather than an abundance of detailed information about each product.

One of the major problems with pharmaceutical

data is a lack of information.

• a food and drug administration department

estimated that

• only about 1% of serious events are reported to

the food and drug administration department.

Fear of litigation may be a contributing factor;

• however, most health care providers simply

don't have the time to fill out reports of

possible adverse drug reactions.

•Furthermore, it is expensive and time consuming

for pharmaceutical companies to perform a

thorough job of data collection, especially when

most of the information is not required by law.

•Finally, one should note that the food and drug

administration department does not require

manufacturers to test new medicines for potential

interactions.

Three stages of drug development

• Finding of new drugs

• Development tests and Predicts drug behavior

• Clinical trials test the drug in humans and

• Commercialization takes drug and sells it to

likely Consumers (doctors and patients).

APPLICATIONS OF DATA

MINING IN THE

PHARMACEUTICAL INDUSTRY

1) Clinical data analysis – clinical data analysis

evaluates and streamlines from large amount of

information.

Data mining helps to see trends, irregularity, and

risk during product development and launch.

2) Marketing and sales analysis –the

identification of the most profitable product and

allocation of marketing funds.

Data mining here helps to examine consumer

behavior in terms of prescription renewal and

product purchases.

3) Customer analysis – using data mining one can

develop more targeted customer profiles that focus

not only on products, but also on the ability to pay

for them by analyzing historical health trends in

combination with demographics.

4) Target physicians who have high prescription

rates of a certain drug or treatment with new drug

information that treat complementary symptoms or

conditions.

DEVELOPMENT OF NEW

DRUGS.

• This can be achieved by clustering the

molecules into groups according to the

chemical properties of the molecules via

cluster analysis.

• every time a new molecule is discovered it can

be grouped with other chemically similar

molecules.

•Mining can help us to measure the chemical activity

of the molecule on specific disease say tuberculosis

and find out which part of the molecule is causing the

action.

•This way we can combine a vast number of

molecules forming a super molecule with only the

specific part of the molecule which is responsible for

the action and inhibiting the other parts.

•This would greatly reduce the adverse effects

associated with drug actions.

• They use high speed screening to test tens,

hundreds, or thousands of drugs very quickly.

• The general goal is to find activity on

relevant genes or to find drug compounds that

have desirable characteristics.

• The Data mining techniques that are used in

developing of new drugs are clustering,

classification and neural networks.

• The basic objective is to determine

compounds with similar activity.

• The reason is for similar activity compounds

behave similarly.

• This is possible only when we have known

compound and looking for something better.

• When we don’t have known compounds but

have desired activity and want to find

compound that exhibits this activity, then data

mining rescues this.

DEVELOPMENT TESTS AND

PREDICTS DRUG BEHAVIOR

• Issues which affect the success of a drug which can impact the future development of the drug.

1) Adverse reactions to the drugs are reported spontaneously and not in any organized manner.

2) we can only compare the adverse reactions with the drugs of our own company and not with other drugs from competing firms.

3) we only have information on the patient taking the drug not the adverse reaction that the patient is suffering from

Solution

• All this can be solved with creation of a data warehouse for drug reactions and running business intelligence tools on them.

• BI tool:- Business intelligence tools are a type of software that is designed to retrieve, analyze and report data.

• This broad definition includes everything from spreadsheets, visual analytics, and querying software to data mining, warehousing, and decision engineering.

•The drug undergoes testing in animals and human

tissue to observe effect and determines how much

drug to consume for desired effect or how

dangerous is the drug.

•The Data mining techniques can be here used is

classification and neural networks.

• The goal here is to predict if treatment will aid patients.

• Because if drug will not aid patients, what purpose does drug serve.

• Predicting the drug behavior is essential when we have data supporting use of drug and also have training data that shows effects of drug (positive or negative).

• The test should be able to predict which patients will benefit and which treatment help sickle cell anemia patients.

How it works

•The information like gender, body weight, disease state, etc will play crucial role.

•This crucial data should be fed into neural network and predict whether patient will benefit from drug.

•Only one of two classifications yes/no will be available on training data.

•Network is trained for the yes classifications and a snapshot is taken of the neural network.

•Then network is trained for the no classifications and another snapshot is taken.

•The output is yes or no, depending on whether the inputs are more similar to the yes or the no training data.

•E.G. ARTMAP.

Weight

Height

Gender

Blood Pressure

Patient Benefits?

Imagine array of weights, one for each “template”

Template closest to input chosen.

Path of “least resistance” chosen for output.

CLINICAL TRIALS TEST THE

DRUG IN HUMANS

• Company tests drugs in actual patients on larger scale.

• company has to keep track of data about patient progress.

• The Government wants to protect health of citizens, many rules govern clinical trials.

• In developed countries food and drug administration oversees trials.

• The Data mining techniques used here can be neural networks.

• Here data is collected by pharmaceutical

company but undergoes statistical analysis to

determine success of trial.

• Data is generally reported to food and drug

administration department and inspected

closely.

• Too many negative reactions might indicate

drug is too dangerous.

• An adverse event might be medicine causing

drowsiness.

• The goal is to detect when too many adverse events occur or detect link between drug and adverse event.

• Too many adverse events linked to a drug might indicate drug is too dangerous or health of patient is at risk.

• Adverse events are reported to food and drug administration when link is suspected.

• One can feed the information on drug causing too many adverse events pertaining to drugs into a neural network and let network lead us to what is meant by ‘too many’.

Benefits

• Research Stage – instead of trial and error, data mining can help find drugs that have desirable activity

• Development Stage – data mining can help predict who will benefit from drug

• Clinical Trials Stage – data mining protects patients and helps regulate drug testing

• Commercialization Stage – data mining can optimize use of sales resources like manpower, advertising

CONCLUSION.• Due to increased computerization and consumer/patient

awareness.

• Reporting (via the internet) by health care workers can easily

be facilitated.

• Data collection in hospitals and extended care facilities is not

difficult, and this information is of high quality since such

institutions typically have tailored diets for their patients, and

maintain accurate records of treatments, lab tests, and

administration of prescriptions.

• Furthermore, given the popularity of the internet, it is

relatively easy for consumers to voluntarily fill in and submit

detailed profiles of themselves.

•It is mostly observed that data mining techniques are

seldom used in a pharmaceutical environment.

•How data mining can help find drugs that have desirable

activity and predict who will benefit from drug.

•Data mining protects patients and helps regulate drug

testing and optimizes use of sales resources like

manpower, advertising.