Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra...

23
Finding Sequence Motifs in Finding Sequence Motifs in Alu Alu Transposons that Enhance Transposons that Enhance the Expression of Nearby the Expression of Nearby Genes Genes Kendra Baughman Kendra Baughman York Marahrens’ Lab York Marahrens’ Lab UCLA UCLA
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra...

Page 1: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

Finding Sequence Motifs in Finding Sequence Motifs in AluAlu

Transposons that Enhance Transposons that Enhance the Expression of Nearby the Expression of Nearby

GenesGenesKendra Baughman Kendra Baughman

York Marahrens’ LabYork Marahrens’ Lab

UCLAUCLA

Page 2: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

OverviewOverviewGoalGoal

BackgroundBackground

Prior StudiesPrior Studies

StrategyStrategy

ResultsResults

Remaining TasksRemaining Tasks

Future DirectionsFuture Directions

Page 3: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

GoalGoal

Determine if there are motifs present among Alu Determine if there are motifs present among Alu elements near highly expressed genes, and elements near highly expressed genes, and

missing from Alu elements near poorly missing from Alu elements near poorly expressed genes, that might contribute to gene expressed genes, that might contribute to gene

expressionexpression

Page 4: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

Background – Alu Background – Alu ElementsElements

Repetitive sequenceRepetitive sequence

Transposons Transposons (DNA sequences that make copies (DNA sequences that make copies of themselves and insert elsewhere in the genome)of themselves and insert elsewhere in the genome)

Over 1 million in human genomeOver 1 million in human genome

~50 subfamilies categorized by ~50 subfamilies categorized by sequence differencessequence differences

Page 5: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

Prior StudiesPrior Studies

“Repetitive sequence environment distinguishes housekeeping genes”

Eller, Daniel et al. submitted

“Alu abundance positively correlates with gene expression level”

C.D. Eller et. al. submitted

Page 6: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

Higher Alu concentration near widely expressed genes

05

1015

20

HK TS RS

Pe

rce

nt

p= 2e-45Alu

Page 7: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

Higher Alu concentration

near highly expressed

genes

Page 8: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

Alu SubfamiliesAlu Subfamilies

Subfamily

# A

lu i

n t

he

Su

bfa

mil

y

Page 9: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

DataData

Human gene expression levels from Human gene expression levels from microarray data (Stan Nelson’s lab, UCLA)microarray data (Stan Nelson’s lab, UCLA)

Alu information from UCSC Genome Alu information from UCSC Genome Browser, Repeat masker tracksBrowser, Repeat masker tracks

Page 10: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

Goal, reiteratedGoal, reiterated

Determine if there are motifs present among Alu Determine if there are motifs present among Alu elements near highly expressed genes, and elements near highly expressed genes, and

missing from Alu elements near poorly missing from Alu elements near poorly expressed genes, that might contribute to gene expressed genes, that might contribute to gene

expressionexpression

Page 11: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

StrategyStrategyFind Alu “near” high and low expression Find Alu “near” high and low expression genes (within 20kb)genes (within 20kb)

Perform multiple sequence alignment on Perform multiple sequence alignment on Alu sequencesAlu sequences

Identify motifs preferentially conserved Identify motifs preferentially conserved around highly expressed genes (these around highly expressed genes (these motifs could help the genes be highly motifs could help the genes be highly expressed)expressed)

Page 12: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

StrategyStrategyFind Alu “near” high and low expression Find Alu “near” high and low expression genes (within 20kb)genes (within 20kb)

Perform multiple sequence alignment on Perform multiple sequence alignment on Alu sequencesAlu sequences

Identify motifs preferentially conserved Identify motifs preferentially conserved around highly expressed genes (these around highly expressed genes (these motifs could help the genes be highly motifs could help the genes be highly expressed)expressed)

Page 13: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

Used Perl scripts to Used Perl scripts to extract information extract information from MySQL from MySQL databasesdatabases

Grouped genes by Grouped genes by expression level in Rexpression level in R

Chose genes in top Chose genes in top and bottom 20%and bottom 20%

Genes

Exp

ress

ion

Leve

lScreening the genes…

Page 14: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

Used MySQL queries to determine flanking region

Used Perl scripts to screen

Alu located within 20kb of genes

Omitted Alu in overlapping flanking regions

PERCENTAGES OF ALU THROWNOUT

50%11%17%50kb

28%7%7%20kb

20%6%3%10kb

Chrom19 1st 20mb

Chrom10Chrom1 1st 20mb

HI-gene

LO-gene

HI-Alu ??-Alu LO-Alu

Screening the Alu…

Page 15: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

StrategyStrategyFind Alu “near” high and low expression Find Alu “near” high and low expression genes (within 20kb)genes (within 20kb)

Perform multiple sequence alignment on Perform multiple sequence alignment on Alu sequencesAlu sequences

Identify motifs preferentially conserved Identify motifs preferentially conserved around highly expressed genes (these around highly expressed genes (these motifs could help the genes be highly motifs could help the genes be highly expressed)expressed)

Page 16: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

Alignment Process…Alignment Process…

First alignment tool: Clustalw First alignment tool: Clustalw – Slow, inaccurateSlow, inaccurate

Second alignment tool: T-COFFEESecond alignment tool: T-COFFEE– Can’t handle hundreds of sequencesCan’t handle hundreds of sequences

Third alignment tool: MUSCLEThird alignment tool: MUSCLE

Aligning thousands of sequences = big gaps and Aligning thousands of sequences = big gaps and processing limitationsprocessing limitationsChose to analyze by subfamily (S, Sp/q)Chose to analyze by subfamily (S, Sp/q)– Aligned Aligned elements around highly expressed geneselements around highly expressed genes– Aligned elements around poorly expressed genesAligned elements around poorly expressed genes– Profile high/low alignmentProfile high/low alignment– Consensus sequence alignmentConsensus sequence alignment

Page 17: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

Alignment viewed in Jalview

Page 18: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

AluS

AluSp-q EPS

AluSp/q

Alignments of Alu Sp/q and AluS Elements

High Alu

High conserv.

Low conserv.

Page 19: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

StrategyStrategyFind Alu “near” high and low expression Find Alu “near” high and low expression genes (within 20kb)genes (within 20kb)

Perform multiple sequence alignment on Perform multiple sequence alignment on Alu sequencesAlu sequences

Identify motifs Identify motifs preferentially conserved preferentially conserved around highly expressed genes (these around highly expressed genes (these motifs could help the genes be highly motifs could help the genes be highly expressed)expressed)

Page 20: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

AluS*5547666896759699995769699999999999*9989979Alu w/ a base:

All Alu: 0444762289674300448576809499545545409449808

Frequency of consensus

base

77488 66899 67444999995 455645 98 9Alu w/ a base:All Alu: 76044 55899 37444989894 454045 98 8Frequency of

consensus base

High Alu: TATCCACGCCTGCAAAATCTCAGCCACTCCCAAAGTTGCTGCG

CANCC-CGCCT-CGTAATCCCAA--------AATGTT--TG-GLow Alu

Alu consensus

sequence

All Alu: 0860005458443600233333323333333345400000000

All Alu: 55 4 58 444544 0 77

Alu w/ a base: 596**65559458765699999978999999966566******

Alu w/ a base: 56 5 69 555655 6 99

High Alu: TGCTCAGAAATTTCTCGGCTCACTGCAACCTCCGTATCACCCC

Low Alu: CG---A-AA--------------------CTCCGT--T---CT

AluSp/q

Alu consensus

sequence

Frequency of consensus

base

Frequency of consensus

base

Page 21: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

Remaining TasksRemaining TasksAnalyze the remaining sub-familiesAnalyze the remaining sub-families

Determine whether identified motifs agree Determine whether identified motifs agree across subfamiliesacross subfamilies

BLAST motifs against all Alu sequences BLAST motifs against all Alu sequences and correlate alignment scores with and correlate alignment scores with expression levelexpression level

Page 22: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

Future DirectionsFuture DirectionsCluster alignments into a relationship tree Cluster alignments into a relationship tree to see if HI and LO Alu groups cluster to see if HI and LO Alu groups cluster differently from each otherdifferently from each other– Create a matrix of pairwise alignments and Create a matrix of pairwise alignments and

cluster these into a tree using nearest cluster these into a tree using nearest neighbour clusteringneighbour clustering

Use Hidden Markov Models or Gibbs Use Hidden Markov Models or Gibbs sampling to identify sequence motifs (non-sampling to identify sequence motifs (non-multiple sequence alignment method of multiple sequence alignment method of motif finding)motif finding)

Page 23: Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

AcknowledgementsAcknowledgementsDanny EllerDanny Eller

York MarahrensYork Marahrens

Marc SuchardMarc Suchard

Chiara SabattiChiara Sabatti

SoCalBSISoCalBSI

NIH/NSFNIH/NSF