source: orange/orange/doc/modules/orngMySQL.htm @ 6538:a5f65d7f0b2c

Revision 6538:a5f65d7f0b2c, 10.2 KB checked in by Mitar <Mitar@…>, 4 years ago (diff)

Made XPM version of the icon 32x32.

Line 
1<html>
2<head>
3<link rel=stylesheet href="../style.css" type="text/css">
4</head>
5<body>
6
7<h1>orngMySQL: Orange's Interface to MySQL</h1>
8<index name="modules+database access"/>
9<index name="SQL"/>
10
11<p>Module orngMySQL is Orange's gateway to <a
12href="http://www.mysql.com/">MySQL DBMS</a>. It sits on a top of a <a
13href="http://sourceforge.net/projects/mysql-python">MySQL for
14Python</a> that provides for a package called MySQLpy for interface
15between MySQL and Python. While MySQLpy enables you to manipulate data
16in MySQL DBMS from Python, orngMySQL takes care that the Orange's data
17tables can be transferred to and from MySQL data bases. orngMySQL
18implements a single class called Connect, which establishes a link
19with the data base and provides means for communication.</p>
20
21<h2>Connect</h2>
22
23<p><code><index name="classes/Connect (in orngMySQL)"/>Connect</code> establishes a connection with a data base and
24provides for a number of methods that are used for interchange of the
25data between Orange and MySQL.</p>
26
27<p class=section>Attributes</p>
28
29<dl class=attributes>
30<dt>host</span></dt>
31<dd>Computer hosting the MySQL DBMS.</dd>
32
33<dt>user</span></dt>
34<dd>User name for MySQL account.</dd>
35
36<dt>passwd</span></dt>
37<dd>Password that provides access to the account.</dd>
38
39<dt>db</dt>
40<dd>Name of the data base to establish a connection with.</dd>
41</dl>
42
43
44<p class=section>Methods</p>
45<dl class=attributes>
46<dt>query(statement [,use=None])</dt>
47<dd>Executes SQL select statement <code>statement</code> and returns
48     the data in the form of Orange's example table
49     (<code>ExampleTable</code>). If the return data has to conform to
50     some existing data domain, this has to be passed using
51     <code>use</code> argument.</dd>
52
53<dt>load(table [use])</dt>
54<dd>Returns a complete data set from the MySQL table <code>table</code>
55     in the form of Orange's example table. The same can be obtained
56     by calling <code>query</code> with SELECT * FROM statement, which
57     is in fact exactly what this method does. Like in
58     <code>query</code>, the domain of Orange's table to be used with
59     returned data set can be passed using the argument <code>use</code>.</dd>
60
61<dt>showTables()</dt>
62<dd>Lists the tables in the data base.</dd>
63
64<dt>write(table, data [, overwrite])</dt>
65<dd>Stores Orange data set <code>data</code> to the MySQL table
66     <code>table</code>. If the table already exists, throws
67     exception. To overwrite, use <code>overwrite=0</code>.</dd>
68<dt></dt>
69<dd></dd>
70
71</dl>
72
73<h2>Attribute Names and Types</h2>
74
75<p>While there are few (or almost no) restrictions for attribute names
76in Orange, names of the fields in SQL tables have to comply to quite a
77few restrictions. When writing data to MySQL, orngMySQL replaces any
78special characters (like blanks, slashes, ...) with underscores, and
79if an attribute name in Orange data set is an SQL keyword, it will
80appear in the tables with a "$" prefix.</p>
81
82<p>Following type conversion between SQL field types and Orange's
83attribute types is used when reading from mySQL:</p>
84<ul>
85  <li>ENUM, BIT, BOOLEAN, BINARY fields are converted to
86  discrete attributes,</li>
87  <li>FLOAT, REAL, DOUBLE are converted to continuous attributes,</li>
88  <li>CHAR, TEXT, STRING, DATE, TIME, VARCHAR, TIMESTAMP, LONGVARCHAR
89       are converted to string attributes,</li>
90  <li>INT, INTEGER, BIGINT, SMALLINT, DECIMAL, TINYINT are converted
91       to continuous atribute if there are more than 10 different field
92       values used in the table, otherwise, a corresponding Orange's
93       attribute is discrete.</li>
94</ul>
95
96<p>When writing to mySQL, ENUM will be used for discrete attributes,
97FLOAT for continuous attributes and CHAR(200) for string attributes.</p>
98
99<p>There are two special kinds of attributes in Orange: class
100attributes and meta attributes. In each data set, there can be only one
101class attribute but any number of meta attributes. When reading the
102data from MySQL, orngMySQL parses the field names: those preceded
103with "m$" will be stored as meta attribute in Orange's data table, and
104"c$" will be treated as an Orange's class attribute.</p>
105
106<p>When orngMySQL writes to mySQL, the prefix notation as described
107above will be used if there are any class or meta attributes in
108Orange's data table.</p>
109
110<h2>Examples</h2>
111
112<p>The example below will only work if you have installations of <a
113href="http://www.mysql.com/">MySQL</a> and <a
114href="http://sourceforge.net/projects/mysql-python">MySQLpy</a> on
115your computer; neither are included in Orange distribution.</p>
116
117<p>We start with setting-up a small data table in MySQL to be used in
118our examples:</p>
119
120<p class="header"><a href="bus.sql">bus.sql</a> (uses <a href=
121"bus.txt">bus.txt</a>)</p>
122<xmp class=code>SELECT VERSION();
123USE test;
124-- DROP TABLE bus;
125CREATE TABLE bus
126  (id varchar(5),
127   line enum('9','10','11'),
128   daytime enum('morning','evening', 'midday'),
129   temp float,
130   weather enum('rainy','sunny'),
131   arrival enum('late','on-time'));
132LOAD DATA LOCAL INFILE 'bus.txt' INTO TABLE bus;
133
134SELECT * FROM bus;
135</xmp>
136
137<p>If you already have a table called bus in the data base test and
138want to replace it with the data above, uncomment the line "-- DROP
139TABLE bus;" (remove leading "--").</p>
140
141<p>If everything goes well, running the script from shell by something like</p>
142
143<xmp class=code>mysql -u root < bus.sql</xmp>
144
145<p>should have the following output:</p>
146
147<xmp class=printout>VERSION()
1485.0.20-nt
149id      line    daytime temp    weather arrival
1501       10      morning 10      sunny   late
1512       11      morning 13      rainy   late
1523       9       morning 15      rainy   late
1534       10      evening 25      sunny   on-time
1545       9       evening 29      sunny   on-time
1556       11      morning 26      sunny   late
1567       9       evening 9       rainy   on-time
1578       9       midday  20      rainy   late
1589       11      midday  21      sunny   late
15910      10      evening 5       rainy   on-time
16011      10      midday  8       rainy   late
16112      9       morning 5       rainy   on-time
162</xmp>
163
164<p>Let us now write the script that reads this table from mySQL and
165in Orange data table stores, say, only the rows that concern bus line
16610:</p>
167
168<p class="header"><a href="sql1.py">sql1.py</a></p>
169<xmp class=code>import orange, orngMySQL
170
171t = orngMySQL.Connect('localhost','root','','test')
172data = t.query ("SELECT * FROM bus WHERE line='10'")
173for x in data:
174    print x
175
176print
177for a in data.domain.attributes:
178    print a
179</xmp>
180
181<p>Running this script reports on four data instances and lists the
182attributes (with types) in new Orange data table:</p>
183
184<xmp class=printout>['1', '10', 'morning', 10.000, 'sunny', 'late']
185['4', '10', 'evening', 25.000, 'sunny', 'on-time']
186['10', '10', 'evening', 5.000, 'rainy', 'on-time']
187['11', '10', 'midday', 8.000, 'rainy', 'late']
188
189StringVariable 'id'
190EnumVariable 'line'
191EnumVariable 'daytime'
192FloatVariable 'temp'
193EnumVariable 'weather'
194EnumVariable 'arrival'
195</xmp>
196
197<p>Say that our script that stores the data set in MySQL would be a
198little different and would
199
200<p class="header"><a href="busclass.sql">busclass.sql</a> (uses <a href=
201"bus.txt">bus.txt</a>)</p>
202<xmp class=code>USE test;
203-- DROP TABLE busclass;
204CREATE TABLE busclass
205  (m$id varchar(5),
206   line enum('9','10','11'),
207   daytime enum('morning','evening', 'midday'),
208   temp float, weather enum('rainy','sunny'),
209   c$arrival enum('late','on-time'));
210LOAD DATA LOCAL INFILE 'bus.txt' INTO TABLE busclass;
211
212SELECT * FROM busclass;
213</xmp>
214
215<p>Notice that now we have prefixed the id field of the table letting
216orngMySQL know this will be used as meta attribute. We did similar to
217arrival field, this time dedicating this field to a class
218attribute. The script below should reveal the effects of these
219subtle changes:</p>
220
221<p class="header"><a href="sql2.py">sql2.py</a></p>
222<xmp class=code>import orange, orngMySQL
223
224t = orngMySQL.Connect('localhost','root','','test')
225data = t.query("SELECT * FROM busclass WHERE line='10'")
226for x in data:
227    print x
228
229print
230for a in data.domain.attributes:
231    print a
232print 'Class:', data.domain.classVar
233</xmp>
234
235<p>By running it, we get:</p>
236
237<xmp class=printout>['10', 'morning', 10.000, 'sunny', 'late'], {"id":'1'}
238['10', 'evening', 25.000, 'sunny', 'on-time'], {"id":'4'}
239['10', 'evening', 5.000, 'rainy', 'on-time'], {"id":'10'}
240['10', 'midday', 8.000, 'rainy', 'late'], {"id":'11'}
241
242EnumVariable 'line'
243EnumVariable 'daytime'
244FloatVariable 'temp'
245EnumVariable 'weather'
246Class: EnumVariable 'arrival'
247</xmp>
248
249<p>Once we have a class attribute, we can run some learner on our data
250set. Let us read the complete data on buses from MySQL, and construct
251a classification tree to predict when the busses will be late:</p>
252
253<p class="header"><a href="sql3.py">sql3.py</a></p>
254<xmp class=code>import orange, orngMySQL, orngTree
255
256t = orngMySQL.Connect('localhost','root','','test')
257data = t.query("SELECT * FROM busclass")
258tree = orngTree.TreeLearner(data)
259orngTree.printTxt(tree, nodeStr="%V (%1.0N)", leafStr="%V (%1.0N)")
260</xmp>
261
262<p>Here's our newly induced model (numbers in brackets are number of
263instances that reached a specific node in the tree):</p>
264<xmp class=printout>root: late (12)
265|    daytime=evening: on-time (4)
266|    daytime=midday: late (3)
267|    daytime=morning: late (5)
268|    |    temp<7.500: on-time (1)
269|    |    temp>=7.500: late (4)
270</xmp>
271
272<p>For our final example, we load Iris data set in Orange, write it to
273MySQL data base, and read from it only those data records whose sepal
274length is below 5 mm.</p>
275
276<p class="header"><a href="sql4.py">sql4.py</a>  (uses <a href=
277"iris.tab">iris.tab</a>)</p>
278<xmp class=code>import orange, orngMySQL, orngTree
279
280data = orange.ExampleTable("iris")
281print "Input data domain:"
282for a in data.domain.variables:
283    print a
284
285t = orngMySQL.Connect('localhost','root','','test')
286t.write('iris', data, overwrite=True)
287
288sel = t.query("SELECT petal_width, petal_length FROM iris WHERE sepal_length<5.0")
289print "\n%d instances returned" % len(sel)
290print "Output data domain:"
291for a in sel.domain.variables:
292    print a
293</xmp>
294
295<p>The output of the script above is:</p>
296<xmp class=printout>Input data domain:
297FloatVariable 'sepal length'
298FloatVariable 'sepal width'
299FloatVariable 'petal length'
300FloatVariable 'petal width'
301EnumVariable 'iris'
302
30322 instances returned
304Output data domain:
305FloatVariable 'petal_width'
306FloatVariable 'petal_length'
307</xmp>
308</body>
309</html> 
Note: See TracBrowser for help on using the repository browser.