source: orange-bioinformatics/docs/reference/obiAssess.htm @ 1661:6c5f448f2563

Revision 1661:6c5f448f2563, 8.9 KB checked in by mitar, 2 years ago (diff)

Moved style CSS file.

Line 
1<html>
2
3<head>
4<title>obiAssess: pathway enrichment for each sample</title>
5<link rel=stylesheet href="style.css" type="text/css">
6<link rel=stylesheet href="style-print.css" type="text/css" media=print>
7</head>
8
9<body>
10<h1>obiAssess: pathway enrichment for each sample</h1>
11<index name="modules/assess enrichment gsea">
12
13<p>Gene Set Enrichment Analysis (GSEA) is a method which tries to identify groups of genes that are regulated together. It computes pathway enrichments for the whole data set. ASSESS in inspired by GSEA and it computes enrichments for each sample in the data set.</p>
14
15<p>ASSESS takes gene expression with sample phenotypes and computes gene set enrichments for given gene sets. First pathway &quot;models&quot; have to be created with AssessLearner. Afterwards they are used to calculate enrichments for each pair of sample and pathway.</p>
16
17<h2>AssessLearner</h2>
18
19Class is used to build models, that can be used later to determine enrichments scores for each example. Note that domains of input data to <code>AssessLearner</code> and <code>Assess</code> instances must be the same.
20
21<dl class=attributes>
22
23<dt>__call__(self, data, organism, geneSets, minSize=3, maxSize=1000, minPart=0.1, classValues=None, rankingf=None)</dt>
24<dd>Function <code>__call__</code> returns an instance of class <code>Assess</code> which can be given an example and returns its enrichemnt in all pathways. Argument descriptions follow.
25
26<dl class=arguments>
27
28  <dt>data</dt>
29  <dd>An <A href="ExampleTable.htm"><CODE>ExampleTable</CODE></A> with gene expression data. An example
30  should correspond to a sample with its phenotype (class value). Attributes represent individual genes. Their names
31  should be meaningful gene aliases.</dd>
32
33  <dt>organism</dt>
34  <dd>Organism code as used in KEGG. Needed for matching gene names in data to those in gene sets. Some
35  examples: <code>hsa</code> for human, <code>mmu</code> for mouse. This is an required argument.</dd> 
36
37  <dt>classValues</dt>
38  <dd>A pair of class values describing phenotypes that are chosen as two distinct phenotypes on which gene correlations
39  are computed. Only examples with one of chosen class values are considered for analysis. If not specified, first
40  two class values in <code>classVar</code> attribute descriptor are used.</dd>
41
42  <dt>geneSets</dt>
43  <dd>A python dictionary of gene sets, where key is a gene set name which points to a list of gene aliases for genes
44  in the gene set. Default: gene sets in your collection directory.</dd>
45
46  <dt>minSize, maxSize</dt>
47  <dd>Minimum and maximum number of genes from gene set also present in the data set for that gene set to be analysed.
48  Defaults: 3 and 1000.</dd>
49
50  <dt>minPart</dt>
51  <dd>Minimum fraction of genes from the gene set also present in the data set for that gene set to be analysed. Default: 0.1.</dd> 
52
53  <dt>rankingf</dt>
54  <dd>Used to specify model type for individual gene sets. See source code for reference. We recommend leaving the parameter blank. In that case, a parametric model from Edelman, 2006 is used.</dd>
55
56</dl>
57
58</dd>
59
60</dl>
61
62<h2>Assess</h2>
63
64<dl class=attributes>
65
66<dt>__init__(**kwargs)</dt>
67<dd>Function <code>__init__</code> is usually called only by <code>AssessLearner</code>. It is used to save built &quot;model&quot; data. Saves all keyword arguments into object's namespace.</dd>
68
69<dt>__call__(example)</dt>
70<dd>Returns enrichments of all gene sets for this example. Enrichments are returned in a dictionary, where keys are gene set and values their enrichments. Note that example's domain must be the same as the domain on which the &quot;model&quot; was built.</dd>
71
72
73</dl>
74
75<h3>Example 1</h3>
76
77This example prints enrichmentes for the first sample in the data set. It uses KEGG as a gene set source.
78
79<p class="header"><a href="assess1.py">assess1.py</a> (uses <a href="http://www.ailab.si/orange/datasets/DLBCL.tab">DLBCL.tab</a>)</p>
80
81<xmp class=code>import orange
82import obiAssess
83import obiGeneSets
84
85gs = obiGeneSets.collections([":kegg:hsa"])
86data = orange.ExampleTable("DLBCL.tab")
87
88asl = obiAssess.AssessLearner()
89ass = asl(data, "hsa", geneSets=gs)
90
91print "Enrichments for the first example (10 pathways)"
92enrichments = ass(data[0])
93for patw, enric in sorted(enrichments.items())[:10]:
94    print patw, enric
95</xmp>
96
97<p>Output:</p>
98
99<xmp class=code>Enrichments for the first example (10 pathways)
100[KEGG] 1- and 2-Methylnaphthalene degradation -0.84674671525
101[KEGG] 3-Chloroacrylic acid degradation -0.587923507915
102[KEGG] ABC transporters - General -0.292198856631
103[KEGG] Acute myeloid leukemia 0.305086037192
104[KEGG] Adherens junction 0.387903973883
105[KEGG] Adipocytokine signaling pathway 0.404448748545
106[KEGG] Alanine and aspartate metabolism 0.400113861834
107[KEGG] Alkaloid biosynthesis I -0.677360944415
108[KEGG] Alkaloid biosynthesis II -0.437492650183
109[KEGG] Allograft rejection 0.491535468415
110</xmp>
111
112<h3>Example 2: transforming data sets</h3>
113
114This example builds a new data set, where attributes are gene sets instead of genes. It prints first 10 attributes for the first example of transformed data set. Note, that the output matches previous example (well, with the exception of floating point discrepancies).
115
116<p class="header"><a href="assess2.py">assess2.py</a> (uses <a href="http://www.ailab.si/orange/datasets/DLBCL.tab">DLBCL.tab</a>)</p>
117
118<xmp class=code>import orange
119import obiAssess
120import obiGeneSets
121
122gs = obiGeneSets.collections([":kegg:hsa"])
123data = orange.ExampleTable("DLBCL.tab")
124
125asl = obiAssess.AssessLearner()
126ass = asl(data, "hsa", geneSets=gs)
127
128def genesetsAsAttributes(data, ass, domain=None):
129    """
130    Construct new data set with gene sets as attributes from data
131    set "data" with assess model "ass".
132    """
133
134    ares = {}
135    for ex in data:
136        cres = ass(ex)
137        for name,val in cres.items():
138            aresl = ares.get(name, [])
139            aresl.append(val)
140            ares[name] = aresl
141
142    ares = sorted(ares.items())
143
144    if not domain: #construct new domain instance if needed
145        domain = orange.Domain([ orange.FloatVariable(name=name) \
146            for name in [ a[0] for a in ares]], data.domain.classVar )
147
148    examples = [ [ b[zap] for a,b in ares ] + \
149        [ data[zap][-1] ]   for zap in range(len(data)) ]
150
151    et = orange.ExampleTable(domain, examples)
152    return et
153
154tdata = genesetsAsAttributes(data, ass)
155
156print "First 10 attributes of the first example in transformed data set"
157for pathw, enric in zip(tdata.domain,tdata[0])[:10]:
158    print pathw.name, enric.value
159</xmp>
160
161<p>Output:</p>
162
163<xmp class=code>First 10 attributes of the first example in transformed data set
164[KEGG] 1- and 2-Methylnaphthalene degradation -0.846746742725
165[KEGG] 3-Chloroacrylic acid degradation -0.587923526764
166[KEGG] ABC transporters - General -0.292198866606
167[KEGG] Acute myeloid leukemia 0.305086046457
168[KEGG] Adherens junction 0.387903988361
169[KEGG] Adipocytokine signaling pathway 0.404448747635
170[KEGG] Alanine and aspartate metabolism 0.400113850832
171[KEGG] Alkaloid biosynthesis I -0.6773609519
172[KEGG] Alkaloid biosynthesis II -0.437492638826
173[KEGG] Allograft rejection 0.491535454988
174</xmp>
175
176<h3>Example 3: testing transformed data set quality</h3>
177
178We measure CA and AUC of transformed data set using cross validation and compare them to the original data set. Care needs to be taken to prevent overfitting: we must not use any knowledge about testing set when creating &quot;ASSESS models&quot; and we have to use the same &quot;ASSESS model&quot; for both learning and testing set. We solve this by saving the model to a global variable.
179
180<p class="header">part of <a href="assess3.py">assess3.py</a> (uses <a href="http://www.ailab.si/orange/datasets/DLBCL.tab">DLBCL.tab</a>)</p>
181
182<xmp class=code>offer = None
183
184def transformLearningS(data):
185    ass = asl(data, "hsa", geneSets=gs)
186    et = genesetsAsAttributes(data, ass)
187
188    global offer
189    offer = (et.domain, ass) #save assess model
190
191    return et
192   
193def transformTestingS(data):
194    global offer
195    if not offer:
196        a = fdfsdsdd #exception
197
198    domain, ass = offer
199    offer = None
200
201    return genesetsAsAttributes(data, ass, domain)
202
203
204import orngBayes, orngTest, orngStat
205learners = [ orngBayes.BayesLearner() ]
206
207resultsOriginal = orngTest.crossValidation(learners, data, folds=10)
208resultsTransformed = orngTest.crossValidation(learners, data, folds=10,
209    pps = [("L", transformLearningS), ("T", transformTestingS)])
210
211print "Original", "CA:", orngStat.CA(resultsOriginal), "AUC:", orngStat.AUC(resultsOriginal)
212print "Transformed", "CA:", orngStat.CA(resultsTransformed), "AUC:", orngStat.AUC(resultsTransformed)
213</xmp>
214
215<p>Output:</p>
216
217<xmp class=code>Original CA: [0.8214285714285714] AUC: [0.78583333333333338]
218Transformed CA: [0.80714285714285716] AUC: [0.84250000000000003]
219</xmp>
220
221<HR>
222<H2>References</H2>
223
224<p>Edelman E, Porrello A, Guinney J, Balakumaran B, Bild A, Febbo PG, Mukherjee S. Analysis of sample set enrichment scores: assaying the enrichment of sets of genes for individual samples in genome-wide expression profiles. Bioinformatics. 2006 Jul 15; 22(14):e108-16. </p>
225
226</body>
227</html>
228
Note: See TracBrowser for help on using the repository browser.