source: orange-bioinformatics/docs/reference-html/obiGene.htm @ 1662:427f1876f3e6

Revision 1662:427f1876f3e6, 9.2 KB checked in by mitar, 2 years ago (diff)

Revamped documentation.

Line 
1<html>
2
3<head>
4<title>obiGene: gene matching and gene info</title>
5<link rel=stylesheet href="style.css" type="text/css">
6<link rel=stylesheet href="style-print.css" type="text/css" media=print>
7</head>
8
9<body>
10<h1>obiGene: gene matching and gene info</h1>
11<index name="modules/gene match matching info">
12
13<p><code>obiGene</code> module provides access to NCBI gene info and gene name matching.</p>
14
15<hr>
16
17<h2>Gene name matching</h2>
18
19<p>Genes usually have multiple aliases. When combining data from different sources (for example expression data from one dataset with gene sets from another one), care needs to be taken to match gene aliases representing the same genes. The implemented alias matching methods are based on sets of aliases, where each set contains a group of gene aliases for a single gene. Matching gene aliases are target gene aliases residing in the same sets of aliases as the query gene alias. Target gene aliases are gene aliases which the matcher outputs as matching results. </p>
20
21<h2>Common interface</h2>
22
23<p>Since all gene matcher are subclasses of class <code>Matcher</code>, they all support methods <code>set_targets</code>, <code>match</code>, <code>explain</code>, <code>umatch</code>.</h2>
24
25<h3>Matcher</h3>
26
27<dl class=attributes>
28<dd>An abstract gene matcher class. All gene matchers should implement functions <code>set_targets</code>, <code>match</code> and <code>explain</code>. </dd>
29<dl class=attributes>
30<dt>set_targets(targets)</dt>
31<dd>Set gene aliases in the input list (of strings) as target gene aliases. Abstract.</dd>
32<dt>match(gene)</dt>
33<dd>Returns a list of target gene aliases which share the same set of aliases with the input gene. If there are no matches it returns an empty list. Abstract.</dd>
34<dt>explain(gene)</dt>
35<dd>Returns gene matches with their explanations as a list of tuples. Each tuple consists of a list of target genes in a set of aliases matched to the input gene. The set of aliases is returned as a second part of the tuple. Abstract.</dd>
36<dt>umatch(gene)</dt>
37<dd>Return unique matching gene aliases. If the <code>match</code> function returns exactly one gene alias, then it is returned. If not, the function returns <code>None</code>.</dd>
38</dl>
39</dl>
40
41<h2>Concrete matchers and their use</h2>
42
43<p>Almost all matchers are subclasses of <code>MatcherAliasesPickled</code> class. The only exception is <code>MatcherDirect</code>, where caching would be pointless.</p>
44
45<h3>MatcherAlisesKEGG or GMKEGG</h3>
46
47<dl class=attributes>
48<dd>Uses aliases from the KEGG database for matching.</dd>
49<dl class=attributes>
50<dt>__init__(organism, ignore_case=True)</dt>
51<dd>Initialization of the gene matcher for the given organism.</dd>
52</dl>
53</dl>
54
55<h3>MatcherAlisesGO or GMGO</h3>
56
57<dl class=attributes>
58<dd>Uses aliases from GO annotations.</dd>
59<dl class=attributes>
60<dt>__init__(organism, ignore_case=True)</dt>
61<dd>Initialization of the gene matcher for the given organism.</dd>
62</dl>
63</dl>
64
65<h3>MatcherAlisesDictyBase or GMDicty</h3>
66
67
68<dl class=attributes>
69<dd>Uses the aliases from the Dictybase.</dd>
70<dl class=attributes>
71<dt>__init__(ignore_case=True)</dt>
72<dd>Initialization of the gene matcher.</dd>
73</dl>
74</dl>
75
76<h3>MatcherAlisesNCBI or GMNCBI</h3>
77
78<dl class=attributes>
79<dd>Uses aliases from NCBI gene info database.</dd>
80<dl class=attributes>
81<dt>__init__(organism, ignore_case=True)</dt>
82<dd>Initialization of the gene matcher for the given organism.</dd>
83</dl>
84</dl>
85
86<h3>MatcherDirect or GMDirect</h3>
87
88<dl class=attributes>
89<dd>Direct matching to target gene aliases (possibly ignoring case).</dd>
90<dl class=attributes>
91<dt>__init__(ignore_case=True)</dt>
92<dd>Initialization.</dd>
93</dl>
94</dl>
95
96<p>Gene name matchers can either be chained (try to apply them in sequence) or joined (overlapping sets of aliases are combined). This can be accomplished using the <code>matcher</code> function.</p>
97
98<h3>matcher(targets, direct=True, ignore_case=True)</h3>
99<dl class=attributes>
100<dd>Builds a new matcher from the list of matchers. Chain matchers in the input list. If a list element is another list, join matchers in the list by joining overlapping sets of aliases.</dd>
101<dl class=arguments>
102<dt>direct</dt> 
103<dd>If True (default), insert an instance of MatcherDirect in front of the specified gene matcher sequence.</dd>
104<dt>ignore_case</dt>
105<dd>Specifies handling of letter case for the added direct matcher.</dd>
106</dl>
107</dl>
108
109<h3>Example: using different gene matchers to match onto KEGG gene aliases</h3>
110
111<p>The following example tries to match input genes onto KEGG gene aliases. As you can see in the results, GO aliases alone can not match onto KEGG database. For the last gene only joined GO and KEGG aliases produce a match.</p>
112
113<p class="header"><a href="geneMatch.py">geneMatch.py</a></p>
114
115<xmp class=code>import obiGene
116import obiKEGG
117
118targets = obiKEGG.KEGGOrganism("9606").get_genes() #human NCBI ID
119
120gmkegg = obiGene.GMKEGG("9606")
121gmgo = obiGene.GMGO("9606")
122gmkegggo = obiGene.matcher([[gmkegg, gmgo]], direct=False)
123
124gmkegg.set_targets(targets)
125gmgo.set_targets(targets)
126gmkegggo.set_targets(targets)
127
128genes = [ "cct7", "pls1", "gdi1", "nfkb2", "dlg7" ]
129
130print "%12s" % "gene", "%12s" % "KEGG", "%12s" % "GO", "%12s" % "KEGG+GO"
131for gene in genes:
132    print "%12s" % gene, "%12s" % gmkegg.umatch(gene), \
133          "%12s" % gmgo.umatch(gene), \
134          "%12s" % gmkegggo.umatch(gene)
135</xmp>
136
137<p>Output:</p>
138
139<xmp class=code>        gene         KEGG           GO      KEGG+GO
140        cct7    hsa:10574         None    hsa:10574
141        pls1     hsa:5357         None     hsa:5357
142        gdi1     hsa:2664         None     hsa:2664
143       nfkb2     hsa:4791         None     hsa:4791
144        dlg7         None         None     hsa:9787
145</xmp>
146
147
148<h2>Auxiliary functionality</h2>
149
150<h3>MatcherAliases</h3>
151
152<dl class=attributes>
153<dd>Gene matcher based on sets of aliases. A subclass of <code>Matcher</code>.</dd>
154<dl class=attributes>
155<dt>__init__(aliases, ignore_case=True)</dt>
156<dd>Constructs a gene matcher based on sets of aliases. Input aliases have to be represented as a list of sets, where the sets contain equivalent aliases for the given gene.</dd>
157<dt>to_ids(gene)</dt>
158<dd>Returns the index of the sets of aliases (as given to the constructor) which include input gene alias.</dd>
159</dl>
160</dl>
161
162<h3>MatcherAliasesPickled</h3>
163
164<dl class=attributes>
165<dd>An abstract class for alias matchers, which support pickling. A subclass of <code>MatcherAliases</code>. Its subclasses must implement functions <code>filename</code>, <code>create_aliases</code> and <code>create_aliases_version</code>. They are needed for automatic pickling to work. Loading of gene aliases is done lazily - only when really needed, as loading of aliases for individual components of joined gene matcher is often unnecessary.</dd>
166<dl class=attributes>
167<dt>filename()</dt>
168<dd>Returns the filename for the pickled file. Different organism and gene matcher combinations should have different filenames. Abstract.</dd>
169<dt>create_aliases()</dt>
170<dd>Returns a list of sets of gene aliases. Abstract.</dd>
171<dt>create_aliases_version()</dt>
172<dd>Returns the version of the gene aliases. If a file containing pickled gene matcher with the same version exists, it is read from file. If not, it is rebuild. Abstract.</dd>
173</dl>
174</dl>
175
176<h3>MatcherSequence</h3>
177
178<dl class=attributes>
179<dd>Supports chaining of gene matchers. User defines the order of used gene matchers. Gene matchers are queried in sequence until the match is found. The matching target aliases are then returned.</dd>
180<dl class=attributes>
181<dt>__init__(matchers)</dt>
182<dd>Input is a list of gene matcher objects (subclasses of type <code>Matcher</code>). </dd>
183</dl>
184</dl>
185
186<h3>MatcherAliasesPickledJoined</h3>
187
188<dl class=attributes>
189<dd>Creates a new matcher by joined gene aliases from input gene matchers. Sets of genes are joined if they contain common genes. The joined gene matcher is pickled only if all input gene matchers support pickling.</dd>
190<dl class=attributes>
191<dt>__init__(matchers)</dt>
192<dd>Constructs a joined gene matcher based on input matchers. The parameter <code>ignore_case</code> of the joined matcher is set to a common value of <code>ignore_case</code> in the input matches. </dd>
193</dl>
194</dl>
195
196
197<h2>Further examples</h2>
198
199<h3>Listing pathways with given genes</h3>
200
201<p>The following example works in conjunction with <code>obiKEGG</code>.  It takes a list of mouse gene names to find pathways with the given gene.</p>
202
203<p class="header"><a href="geneMatch1.py">geneMatch1.py</a></p>
204
205<xmp class=code>import obiGene
206import obiKEGG
207
208keggorg = obiKEGG.KEGGOrganism("mmu")
209kegg_genes = keggorg.get_genes()
210
211query = [ "Fndc4", "Itgb8", "Cdc34", "Olfr1403" ]
212
213gm = obiGene.GMKEGG("mmu") #use KEGG aliases for gene matching
214gm.set_targets(kegg_genes) #set KEGG gene aliases as targets
215
216pnames = keggorg.list_pathways()
217
218for name in query:
219    match = gm.umatch(name) # matched kegg alias or None
220    if match:
221        pwys = keggorg.get_pathways_by_genes([match])
222        print name, "is in", [ pnames[p] for p in pwys ]
223</xmp>
224
225<p>Output:</p>
226
227<xmp class=code>Fndc4 is in []
228Itgb8 is in ['Cell adhesion molecules (CAMs)',
229             'ECM-receptor interaction',
230             'Regulation of actin cytoskeleton',
231             'Focal adhesion']
232Cdc34 is in ['Ubiquitin mediated proteolysis']
233Olfr1403 is in ['Olfactory transduction']
234</xmp>
235
236</body>
237</html>
238
Note: See TracBrowser for help on using the repository browser.