source: orange/orange/doc/modules/orngSQL.htm @ 6538:a5f65d7f0b2c

Revision 6538:a5f65d7f0b2c, 14.8 KB checked in by Mitar <Mitar@…>, 4 years ago (diff)

Made XPM version of the icon 32x32.

Line 
1<html>
2<head>
3<link rel="stylesheet" href="../style.css" type="text/css">
4</head>
5<body>
6
7<h1>orngSQL: Orange's Interface to SQL databases</h1>
8<index name="modules+database access"/>
9<p name="SQL">
10
11<p>The orngSQL module provides access to relational databases from Orange.
12It currently supports <a href="http://www.mysql.com/">MySQL</a> through <a
13href="http://sourceforge.net/projects/mysql-python">MySQL for Python</a> 
14and <a href="http://www.postgresql.org/">Postgres</a> through
15<a href="http://www.initd.org/tracker/psycopg/wiki/PsycopgTwo">Psycopg 2</a>.
16Support for other <a href="http://www.python.org/dev/peps/pep-0249/">Python DB-API 2.0</a>
17compliant databases can and will be added upon request.
18</p>
19<p>
20Unlike <a href="orngMySQL.htm">OrngMySQL</a>, orngSQL
21does not support the automatic detection of discrete attributes - this was intentional, since the required "select distinct" can be too slow on large databases. The support for
22writing data back into the database exists, but is rudimentary.
23</p>
24<h2>SQLReader</h2>
25
26<p><code><index name="classes/SQLReader (in orngSQL)">SQLReader</index></code> establishes a connection with a database and provides the methods needed to fetch the data from the database into Orange.</p>
27
28<p class=section>Attributes</p>
29
30<dl class=attributes>
31<dt>className</span></dt>
32<dd>The name of the class variable</dd>
33
34<dt>metaNames</span></dt>
35<dd>The list of the names of meta attributes</dd>
36
37<dt>discreteNames</span></dt>
38<dd>The list of the names of the variables whose type is to be discrete</dd>
39
40<dt>query</dt>
41<dd>The SQL query to be executed</dd>
42<dt>domain</dt>
43<dd>The Orange <a href="../reference/Domain.htm">domain</a> used.</dd>
44</dl>
45
46<p class=section>Methods</p>
47<dl class=attributes>
48<dt>execute(statement [,domain=None])</dt>
49<dd>Executes SQL select statement <code>statement</code>.
50If the <code>domain</code> argument is provided, the new domain is used.</dd>
51
52<dt>update()</dt>
53<dd>Executes a pending SQL query.
54</dd>
55
56<dt>data()</dt>
57<dd>returns The Orange ExampleTable produced by the last executed query.</dd>
58
59<dt>connect(uri)</dt>
60<dd>Connects to the database, specified in the <code>uri</code>.
61</dd>
62<dt></dt>
63
64<dt>disconnect()</dt>
65<dd>Disconnects from the database.</dd>
66<dt></dt>
67</dl>
68
69<p><code><index name="classes/SQLWriter (in orngSQL)">SQLWriter</index></code>
70establishes a connection with a database and provides the methods needed to create
71an appropriate table in the database and/or write the data from an ExampleTable into
72the database.</p>
73
74<p class="section">Methods</p>
75<dl class="attributes">
76<dt>write(self, data, table, renameDict = None)</dt>
77<dd>Writes the data into the table. If <code>renameDict</code> 
78is provided, the names used are remapped - the orange attribute "X" is written
79into the database column <code>renameDict["X"]</code> of the table.
80</dd>
81
82<dt>create(self, data, table, renameDict = None, typeDict = None)</dt>
83<dd>Creates the required SQL table, then writes the data into it. If
84<code>typeDict</code> is provided, the SQL type of the orange attribute "X" is set to
85<code>typeDict["X"]</code>.</dd>
86
87<dt>connect(uri)</dt>
88<dd>Connects to the database, specified in the <code>uri</code>.
89</dd>
90<dt></dt>
91
92<dt>disconnect()</dt>
93<dd>Disconnects from the database.</dd>
94<dt></dt>
95</dl>
96
97<p><code><index name="methods/loadSQL (in orngSQL)">loadSQL(filename, dontCheckStored = False, domain = None)</index></code>Executes the sql query stored in the file under <code>filename</code>. This method allows the standard file reading functions with .sql files.</p>
98
99<p><code><index name="classes/saveSQL (in orngSQL)">saveSQL</index></code> Does nothing. This function will be implemented as soon as someone suggests a good way to do so.</p>
100
101<h2>Attribute Names and Types</h2>
102
103<p>The rows returned by an SQL query have to be converted into orange
104<code>Example</code>s. Each column in a row has to be converted into a
105certain attribute type. The following conversions between SQL and Orange
106types are used:
107</p>
108<ul>
109<li><code>STRING</code> and <code>DATETIME</code> attributes are converted to Orange strings</li>
110<li>The attributes whose names are contained in <code>discreteNames</code> are converted
111to Orange discrete attributes.</li>
112<li>The attributes whose names are listed in <code>metaNames</code></li>
113<li>Other attributes are converted to continuous Orange attributes</li>
114<li>The attribute in <code>className</code> is set as the class attribute. If no <code>className</code> is set,
115the column with the name "class" in the returned SQL query is set as the class attribute. If no such column exists,
116the last column is set as the class attribute.</li>
117</ul>
118
119<h2>Orange domain handling</h2>
120<p>In Orange, attributes in different ExampleTables can represent different things, even if they share the same names.
121To remedy this, Orange employs <a href="../reference/Domain.htm">domains.</a>. To make the reuse and creation of new domains easier,
122Orange provides the <a href="../reference/DomainDepot.htm">DomainDepot</a>.</p>
123<p>When constructing an <a href="../reference/ExampleTable.htm">ExampleTable</a> of results, <code>sqlReader</code> also has to somehow create a
124domain. This can be done in three ways:
125<ul>
126<li>By default, the domain is constructed automatically, based on <code>className</code>, <code>metaNames</code>, and <code>discreteNames</code>.
127</li>
128<li>The domain can be provided as an argument to <code>execute()</code>.</li>
129<li>The domain can be set explicitly. The set domain is then used with the query, affecting the attribute types of
130the <code>ExampleTable</code>, returned by <code>data()</code></li>
131</ul>
132</p>
133<h2>Examples</h2>
134
135<p>The examples below were tested with the postgress adapter.
136To use them, you will probably have to modify them according to your
137database setup. In the example, the name of the database is assumed to be "test",
138the username "user" and the password "somepass"
139</p>
140
141<p>We start with setting-up a small data table in the database to be used in
142our examples:</p>
143
144<p class="header"><a href="bus-postgres.sql">bus-postgres.sql</a> (uses <a href=
145"bus.txt">bus.txt</a>)</p>
146<xmp class="code">DROP TABLE bus;
147CREATE TABLE bus
148  (id varchar(5),
149   line integer,
150   daytime varchar,
151   temp float,
152   weather varchar,
153   arrival varchar);
154
155\COPY "bus" FROM 'bus.txt' USING DELIMITERS '   '
156SELECT * from bus;
157</xmp>
158
159<p class="header"><a href="bus-mysql.sql">bus-mysql.sql</a> (uses <a href=
160"bus.txt">bus.txt</a>)</p>
161<xmp class="code">DROP TABLE bus;
162CREATE TABLE bus
163  (id varchar(5),
164   line integer,
165   daytime varchar,
166   temp float,
167   weather varchar,
168   arrival varchar);
169
170LOAD DATA LOCAL INFILE 'bus.txt' INTO TABLE bus;
171SELECT * from bus;
172</xmp>
173
174
175<p>If everything goes well, running the script from shell by something like</p>
176<xmp class="code">mysql -u user -p < bus.sql</xmp> or
177<xmp class="code">psql -u user -p < bus-postgres.sql</xmp>
178
179<p>should produce something along the lines of:</p>
180
181<xmp class="printout">
182id      line    daytime temp    weather arrival
1831       10      morning 10      sunny   late
1842       11      morning 13      rainy   late
1853       9       morning 15      rainy   late
1864       10      evening 25      sunny   on-time
1875       9       evening 29      sunny   on-time
1886       11      morning 26      sunny   late
1897       9       evening 9       rainy   on-time
1908       9       midday  20      rainy   late
1919       11      midday  21      sunny   late
19210      10      evening 5       rainy   on-time
19311      10      midday  8       rainy   late
19412      9       morning 5       rainy   on-time
195</xmp>
196
197<p>Let us now write the script that reads some data from this table from
198the database and produces an <a href="../reference/ExampleTable.htm">ExampleTable</a>.</p>
199
200<p class="header"><a href="sql-new1.py">sql-new1.py</a></p>
201<xmp class="code">
202
203</xmp>
204
205<p>Running this script reports on four data instances and lists the
206attributes (with types) in new Orange data table:</p>
207
208<xmp class="printout">['1', '10', 'morning', 10.000, 'sunny', 'late']
209['4', '10', 'evening', 25.000, 'sunny', 'on-time']
210['10', '10', 'evening', 5.000, 'rainy', 'on-time']
211['11', '10', 'midday', 8.000, 'rainy', 'late']
212
213StringVariable 'id'
214EnumVariable 'line'
215EnumVariable 'daytime'
216FloatVariable 'temp'
217EnumVariable 'weather'
218EnumVariable 'arrival'
219</xmp>
220
221<p>Now let us suppose that some of the attributes should actually be treated
222as discrete variables. The arrival column will now be explicitly set as the
223class variable.</p>
224
225<p class="header"><a href="sql-new2.py">sql-new2.py</a></p>
226<xmp class="code">import orange, orngMySQL
227
228t = orngMySQL.Connect('localhost','root','','test')
229data = t.query("SELECT * FROM busclass WHERE line='10'")
230for x in data:
231    print x
232
233print
234for a in data.domain.attributes:
235    print a
236print 'Class:', data.domain.classVar
237</xmp>
238
239<p>By running it, we get:</p>
240
241<xmp class="printout">['10', 'morning', 10.000, 'sunny', 'late'], {"id":'1'}
242['10', 'evening', 25.000, 'sunny', 'on-time'], {"id":'4'}
243['10', 'evening', 5.000, 'rainy', 'on-time'], {"id":'10'}
244['10', 'midday', 8.000, 'rainy', 'late'], {"id":'11'}
245
246EnumVariable 'line'
247EnumVariable 'daytime'
248FloatVariable 'temp'
249EnumVariable 'weather'
250Class: EnumVariable 'arrival'
251</xmp>
252
253<p>For our final examples, we load Iris data set in Orange, write it to
254a data base, and read back only the data records whose sepal
255length is below 5 mm. By default, all discrete and string orange attributes
256are treated as VARCHARs and all continuous are treated as FLOATs.</p>
257
258<p class="header"><a href="sql-new4.py">sql-new4.py</a>  (uses <a href=
259"iris.tab">iris.tab</a>)</p>
260<xmp class="code"># Description: Writes a data set to and reads from an SQL database
261# Category:    file formats
262# Classes:     ExampleTable, orngSQL.SQLReader, orngSQL.SQLWriter
263# Uses:        iris.tab
264# Referenced:  orngSQL.htm
265
266import orange, orngSQL, orngTree
267
268data = orange.ExampleTable("iris")
269print "Input data domain:"
270for a in data.domain.variables:
271    print a
272r = orngSQL.SQLReader('mysql://user:somepass@localhost/test')
273w = orngSQL.SQLWriter('mysql://user:somepass@localhost/test')
274
275t.write('iris', data, overwrite=True)
276
277sel = t.query("SELECT petal_width, petal_length FROM iris WHERE sepal_length<5.0")
278print "\n%d instances returned" % len(sel)
279print "Output data domain:"
280for a in sel.domain.variables:
281    print a
282</xmp>
283
284<p>The output of the script above is:</p>
285<xmp class="printout">Input data domain:
286FloatVariable 'sepal length'
287FloatVariable 'sepal width'
288FloatVariable 'petal length'
289FloatVariable 'petal width'
290EnumVariable 'iris'
291
29222 instances returned
293Output data domain:
294FloatVariable 'petal_width'
295FloatVariable 'petal_length'
296</xmp>
297<p>Usually, one needs more control over the sql names and types. Here's the same example
298as above, with some names and types set explicitly:</p>
299<p class="header"><a href="sql-new5.py">sql-new5.py</a>(uses <a href=
300"iris.tab">iris.tab</a>)</p>
301<xmp class="code"># Description: Writes a data set to and reads from an SQL database
302# Category:    file formats
303# Classes:     ExampleTable, orngSQL.SQLReader, orngSQL.SQLWriter
304# Uses:        iris.tab
305# Referenced:  orngSQL.htm
306
307import orange, orngSQL, orngTree
308
309data = orange.ExampleTable("iris")
310print "Input data domain:"
311for a in data.domain.variables:
312    print a
313r = orngSQL.SQLReader('mysql://user:somepass@localhost/test')
314w = orngSQL.SQLWriter('mysql://user:somepass@localhost/test')
315# the following line only works with mysql because it uses the enum type.
316w.create('iris', data,
317    renameDict = {'sepal length':'seplen',
318        'sepal width':'sepwidth',
319        'petal length':'petlen',
320        'petal width':'petwidth'},
321    typeDict = {'iris':"""enum('Iris-setosa', 'Iris-versicolor', 'Iris-virginica')"""})
322
323
324r.execute("SELECT petwidth, petlen FROM iris WHERE seplen<5.0;")
325data = r.data()
326print "\n%d instances returned" % len(data)
327print "Output data domain:"
328for a in data.domain.variables:
329    print a
330</xmp>
331<p>And the output:</p>
332<xmp class="printout">Input data domain:
333FloatVariable 'sepal length'
334FloatVariable 'sepal width'
335FloatVariable 'petal length'
336FloatVariable 'petal width'
337EnumVariable 'iris'
338CREATE TABLE "iris" ( "seplen" DOUBLE, "sepwidth" DOUBLE, "petlen" DOUBLE, "petwidth" DOUBLE, "iris" enum('Iris-setosa', 'Iris-versicolor', 'Iris-virginica') );
339
34022 instances returned
341Output data domain:
342FloatVariable 'petwidth'
343FloatVariable 'petlen'
344</xmp>
345
346<p>When accessing data, stored in a relational database, SQL queries are often reused.
347Unfortunately, an sql query is not sufficient to retrieve the data. Before executing the
348query, one must first connect to a database. It is also impossible to infer important
349properties, such as the desired class variable, from the retrieved data.
350</p>
351<p>The simplest method of accessing data, stored in a relational database, from Orange, in most cases,
352is through .sql files with special comments. The examples listed below should be more or less self-explanatory.
353The statements after the keywords <code>uri</code>, <code>discrete</code>, <code>class</code> and <code>metas</code>
354are actually evaluated by the python interpeter. The returned values are then used appropriately.
355</p>
356<p>For now, there is no write support for .sql files. To transfer data from Orange back into an SQL database,
357the orng should be used directly.
358</p>
359<p class="header"><a href="sql-new-embed.sql">sql-new-embed.sql</a></p>
360<xmp class="code">--orng uri 'mysql://user:somepass@localhost/test'
361--orng discrete ['registration', 'num', 'time of day', 'arrival']
362--orng meta ['weather', 'arrival', 'time']
363--orng class ['arrival']
364
365SELECT
366    "id" as registration,
367    line as num,
368    daytime as "time of day",
369    temp as temperature,
370    weather,
371    arrival
372FROM
373    bus
374WHERE
375    line='10';
376</xmp>
377<p class="header"><a href="sql-new-embed.py">sql-new-embed.py</a></p>(uses <a href=
378"sql-new-embed.sql">sql-new-embed.sql</a>)
379<xmp># Description: Reads data from a database
380# Category:    file formats
381# Classes:     orngSQL.SQLReader
382# Referenced:  orngSQL.htm
383
384import orange, orngSQLFile
385
386
387orange.registerFileType("SQL", orngSQLFile.loadSQL, None, ".sql")
388data = orange.ExampleTable('sql-new-embed.sql')
389for x in data:
390    print x
391
392print
393for a in data.domain.attributes:
394    print a
395</xmp>
396<p>The expected output:</p>
397<xmp class=printout>
398(('registration', 253, 2, 2, 2, 31, 0), ('num', 254, 2, 2, 2, 0, 1), ('time of day', 254, 7, 7, 7, 0, 1), ('temperature', 4, 2, 12, 12, 31, 1), ('weather', 254, 5, 5, 5, 0, 1), ('arrival', 254, 7, 7, 7, 0, 1))
399(('registration', 253, 2, 2, 2, 31, 0), ('num', 254, 2, 2, 2, 0, 1), ('time of day', 254, 7, 7, 7, 0, 1), ('temperature', 4, 2, 12, 12, 31, 1), ('weather', 254, 5, 5, 5, 0, 1), ('arrival', 254, 7, 7, 7, 0, 1))
400['id', '10', 'morning', 10.0], {"weather":'sunny', "arrival":'late'}
401['id', '10', 'evening', 25.0], {"weather":'sunny', "arrival":'on-time'}
402['id', '10', 'evening', 5.0], {"weather":'rainy', "arrival":'on-time'}
403['id', '10', 'midday', 8.0], {"weather":'rainy', "arrival":'late'}
404
405EnumVariable 'registration'
406EnumVariable 'num'
407EnumVariable 'time of day'
408FloatVariable 'temperature'
409
410</xmp>
411</body>
412</html> 
Note: See TracBrowser for help on using the repository browser.