Changeset 7411:83d327c6c9ce in orange


Ignore:
Timestamp:
02/04/11 11:59:28 (3 years ago)
Author:
miha <miha.stajdohar@…>
Branch:
default
Convert:
62fbf536525c65d86388ee70b74e3fe16bc37af8
Message:

refactored documentation

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/network/__init__.py

    r7381 r7411  
    105105edges. The number of edge types is unlimited, but needs to be set in advance. 
    106106 
    107 Before constructing a graph, you will also need to decide for the underlying  
    108 data structure. The differences for smaller graphs (e.g. with less than  
    109 100 nodes) should be insignificant, while for the larger, the decision should  
    110 be based upon the expected number of edges ("density" of the graph) and  
    111 the operations you plan to execute. Graphs with large number of edges  
    112 (eg.>n2/4, where n is the number of vertices) should be represented with  
    113 adjacency matrices (class GraphAsMatrix), graphs with small number of edges  
    114 with adjacency lists (GraphAsList) and those in between with adjacency trees  
    115 (GraphAsTrees). Regarding the speed, matrices are generally the fastest, while  
    116 some operations, such as finding all edges leading from certain node, will  
    117 sometimes be faster with lists or trees. 
     107Before constructing a graph, you will also need to decide for the underlying 
     108data structure. The differences for smaller graphs (e.g. with less than 100 
     109nodes) should be insignificant, while for the larger, the decision should be 
     110based upon the expected number of edges ("density" of the graph) and the 
     111operations you plan to execute. Graphs with large number of edges (eg.>n2/4, 
     112where n is the number of vertices) should be represented with adjacency 
     113matrices (class :obj:`Orange.network.GraphAsMatrix`), graphs with small number 
     114of edges with adjacency lists (:obj:`Orange.network.GraphAsList`) and those in 
     115between with adjacency trees (:obj:`Orange.network.GraphAsTree`). Regarding 
     116the speed, matrices are generally the fastest, while some operations, such as 
     117finding all edges leading from certain node, will sometimes be faster with 
     118lists or trees. 
    118119 
    119120One thing that is not supported (at the moment?) are multiple edges of the  
     
    132133 
    133134You can choose between three constructors, all derived from a single ancestor 
    134 Graph: 
     135:obj:`Orange.network.Graph`: 
    135136 
    136137.. class:: GraphAsMatrix(nVertices, directed[, nEdgeTypes]) 
    137138 
    138     Edges are stored in a matrix with either n2 or n(n+1)/2 elements, depending 
    139     upon whether the graph is directed or not. (In C++, it is stored as float * 
    140     pointing to an array of length n*n*nEdgeTypes or (n*(n+1))/2*nEdgeTypes 
    141     elements, where nEdgeTypes is the number of edge types.) This 
    142     representation is suitable for smaller graphs and for dense large graphs. 
    143     For graph with only one edge type, this representation is more economical 
    144     than representation with lists or trees when the number of edges is larger 
    145     than n2/4. Inserting, deleting and checking the edges is fast; listing the 
    146     neighbours of a certain node is fast unless the graph is sparse, in which 
    147     case a graph represented with a list or a tree would be faster. 
     139    Bases: :obj:`Orange.network.Graph` 
     140 
     141    Edges are stored in a matrix with either n^2 or n(n+1)/2 elements, 
     142    depending upon whether the graph is directed or not. (In C++, it is stored 
     143    as float * pointing to an array of length n*n*nEdgeTypes or 
     144    (n*(n+1))/2*nEdgeTypes elements, where nEdgeTypes is the number of edge 
     145    types.) This representation is suitable for smaller graphs and for dense 
     146    large graphs. For graph with only one edge type, this representation is 
     147    more economical than representation with lists or trees when the number of 
     148    edges is larger than n2/4. Inserting, deleting and checking the edges is 
     149    fast; listing the neighbours of a certain node is fast unless the graph is 
     150    sparse, in which case a graph represented with a list or a tree would be 
     151    faster. 
    148152     
    149153.. class:: GraphAsList(nVertices, directed[, nEdgeTypes])  
     154     
     155    Bases: :obj:`Orange.network.Graph` 
    150156     
    151157    Edges are stored in an ordered lists of neighbours, one list for each node. 
     
    164170.. class:: GraphAsTree(nVertices, directed[, nEdgeTypes])  
    165171     
     172    Bases: :obj:`Orange.network.Graph` 
     173     
    166174    This structure is similar to GraphAsTree except that the edges are stored 
    167175    in trees instead of lists. This should be a structure of choice for all 
     
    171179    of edges is really small). Internally, nodes of the tree contain the vertex 
    172180    number, two pointers and a list of floats. With one edge type, this should 
    173     be 16 bytes on 32-bit platforms. An ordinary undirected graph with 10 
    174     vertices stored in a matrix would thus be constructed by 
     181    be 16 bytes on 32-bit platforms. 
    175182 
    176183Examples 
    177184-------- 
    178185 
    179 graph = orange.GraphAsMatrix(10, 0) 
     186An ordinary undirected graph with 10 vertices stored in a matrix would thus be 
     187constructed by:: 
     188 
     189    >>> graph = orange.GraphAsMatrix(10, 0) 
    180190 
    181191A directed graph with 1000 vertices and edges of three types, stored with 
    182 adjacency trees would be constructed by 
    183  
    184 graph = orange.GraphAsTree(1000, 1, 3) 
     192adjacency trees would be constructed by:: 
     193 
     194    >>> graph = orange.GraphAsTree(1000, 1, 3) 
    185195 
    186196Usage 
    187197===== 
    188198 
    189 All three graph types are used the same way, independent of the underlying 
    190 structure. All methods are defined in basic class :obj:Orange.network.Graph. 
     199All three graph types are used in the same way, independent of the underlying 
     200structure. All methods are defined in basic class :obj:`Orange.network.Graph`. 
    191201 
    192202.. class:: Graph 
     
    220230        integers even when objects are given. 
    221231 
    222     .. Basic Operations 
    223     .. ---------------- 
    224  
    225     .. Indexing 
    226     .. -------- 
     232    **Indexing** 
    227233     
    228234    Vertices are referred to by either integer indices or Python objects of any 
     
    231237    "Gender", "Height", "Weight"] then graph["Age", "Height"] would be equivalent 
    232238    to graph[0, 2] and graph.getNeighbours("Weight") to graph.getNeighbours(3). 
    233     Vertex identifier doesn't need to be a string, , it can be any Python 
     239    Vertex identifier doesn't need to be a string, it can be any Python 
    234240    object. 
    235241     
     
    258264    the neighbours of the corresponding (not necessarily the first) node. 
    259265 
    260     .. Getting and Setting Edges 
    261     .. ------------------------- 
     266    **Getting and Setting Edges** 
    262267     
    263268    .. method:: graph[v1, v2] 
    264269     
    265         For ordinary graphs with a single edge type, graph[v1, v2] returns the 
    266         weight of the edge between v1 and v2, or None if there is no edge 
    267         (edge's weight can also be 0). Edges can also be set by assigning them 
    268         a weight, e.g.graph[2, 5]=1.5. As described above, if objects is set, 
    269         we can use other objects, such as names, as v1 and v2 (the same goes 
    270         for all other functions described below). For graphs with multiple edge 
    271         types, graph[v1, v2] returns a list of weights for various edge types. 
    272         Some (or all, if there is no edge) elements of the list can be None. If 
    273         the edge does not exist, graph[v1, v2] returns a list of Nones, not a 
    274         None. You can assign a list to graph[v1, v2]; in graph with three edge 
     270        For graphs with a single edge type, graph[v1, v2] returns the weight of 
     271        the edge between v1 and v2, or None if there is no edge (edge's weight 
     272        can also be 0). 
     273         
     274        For graphs with multiple edge types, graph[v1, v2] returns a list of 
     275        weights for various edge types. Some (or all, if there is no edge) 
     276        elements of the list can be None. If the edge does not exist, graph[v1, 
     277        v2] returns a list of Nones, not a None. 
     278         
     279        Edges can also be set by assigning them a weight, e.g.graph[2, 5]=1.5. 
     280        As described above, if objects is a set, we can use other objects, such 
     281        as names, as v1 and v2 (the same goes for all other functions described 
     282        below). 
     283         
     284        You can assign a list to graph[v1, v2]; in graph with three edge 
    275285        types you can set graph[2, 5] = [1.5, None, -2.0]. After that, there 
    276286        are two edges between vertices 2 and 5, one of the first type with 
    277287        weight 1.5, and one of the third with weight -2.0. To remove an edge, 
    278         you can assign it a least of Nones or a single None, e.g. graph[2, 
    279         5]=None; this removes edges of all types between the two nodes. The 
    280         list returned for graphs with multiple edge types is actually a 
     288        you can assign it a list of Nones or a single None, e.g. graph[2, 
     289        5]=None; this removes edges of all types between the two nodes.  
     290         
     291        The list returned for graphs with multiple edge types is actually a 
    281292        reference to the edge, therefore you can set e = graph[v1, v2] and then 
    282293        manipulate e, for instance e[1]=10 or e[0]=None. Edge will behave just 
    283294        as an ordinary list (well, almost - no slicing ets). However, don't try 
    284295        to assign a list to e, eg e=[1, None, 4]. This assigns a list to e, not 
    285         to the corresponding edge... graph[v1, v2, type] This is defined only 
    286         for graph with multiple edge types; it returns the weight for the edge 
    287         of type type between v1 and v2, or None if there is no such edge. You 
    288         can also establish an edge by assigning a weight (e.g. graph[2, 5, 2] = 
    289         -2.0) or remove it by assigning it a None (graph[2, 5, 2] = None). 
    290         edgeExists(v1, v2[, type]) Returns true if the edge between v1 and v2 
    291         exists. For multiple edge type graphs you can also specify the type of 
    292         the edge you check for. If the third argument is omitted, the method 
    293         returns true if there is any kind of edge between the two vertices. It 
    294         is recommended to use this method when you want to check for a node. In 
    295         single edge type graphs, if graph[v1, v2]: will fail when there is an 
    296         edge but it has a weight of zero. With multiple edge types, if 
    297         graph[v1, v2]: will always success since graph[v1, v2] returns a non- 
    298         empty list; if there is no edges, this will be a list of Nones, but 
    299         Python still treats it as "true". addCluster(list_of_vertices) Creates 
    300         a cluster - adds edges between all listed vertices. Queries 
    301          
    302         Graph provides a set of function that return nodes connected to a 
    303         certain node. 
     296        to the corresponding edge...  
     297         
     298    .. method:: graph[v1, v2, type]  
     299         
     300        This is defined only for graph with multiple edge types; it returns the 
     301        weight for the edge of type type between v1 and v2, or None if there is 
     302        no such edge. You can also establish an edge by assigning a weight 
     303        (e.g. graph[2, 5, 2] = -2.0) or remove it by assigning it a None 
     304        (graph[2, 5, 2] = None). 
     305         
     306    .. method:: edgeExists(v1, v2[, type])  
     307         
     308        Returns true if the edge between v1 and v2 exists. For multiple edge 
     309        type graphs you can also specify the type of the edge you check for. If 
     310        the third argument is omitted, the method returns true if there is any 
     311        kind of edge between the two vertices. It is recommended to use this 
     312        method when you want to check for a node. In single edge type graphs, 
     313        if graph[v1, v2]: will fail when there is an edge but it has a weight 
     314        of zero. With multiple edge types, if graph[v1, v2]: will always 
     315        success since graph[v1, v2] returns a non- empty list; if there is no 
     316        edges, this will be a list of Nones, but Python still treats it as 
     317        "true".  
     318         
     319    .. method:: addCluster(list_of_vertices)  
     320         
     321        Creates a cluster - adds edges between all listed vertices. 
     322     
     323    **Queries** 
     324         
     325    Graph provides a set of functions that return nodes connected to a certain 
     326    node. 
    304327     
    305328    .. method:: getNeighbours(v1[, type]) 
     
    324347        undirected graphs this function is equivalent to getNeighbours. 
    325348         
    326         If objects is set, functions return a list of objects (names of 
    327         vertices or whatever objects you stored in objects). Otherwise, a list 
    328         of integer indices is returned. If you want to force Graph to return 
    329         integer indices even if objects is set, set graph.returnIndices to 
    330         True. 
    331          
    332         Of the three operations, the expensive one is to look for the vertices 
    333         with edges pointing to the given edge. There is no problem when graph 
    334         is represented as a matrix (graphAsMatrix); these are always fast. On 
    335         directed graph, getEdgeFrom is always fast as well. 
    336          
    337         In undirected graphs represented as lists or trees, the edge between 
    338         vertices with indices v1 and v2 is stored at the list/tree in the 
    339         smaller of the two indices. Therefore to list all neighbours of v1, 
    340         edges with v2<v1 are copied form v1's list, while for edges with v2>v1 
    341         the function needs to look for v1 in each v2's list/tree. Lookup in 
    342         trees is fast, while in representation with adjacency list, the 
    343         function is slower for v1 closer to nVertices/2. If v1 is small there 
    344         is a great number of v2>v1 whose lists are to be checked, but since the 
    345         lists are ordered, v1 is more to the beginning of these lists (and when 
    346         a vertex with index higher than v1 is encountered, we know that v1 is 
    347         not on the list). If v2 is great, there it is more toward the end of 
    348         the list, but there is smaller number of lists to be checked. 
    349         Generally, the average number of traversed list elements for 
    350         getNeighbours/getEdgesFrom/getEdgesTo on undirected graphs with 
    351         p*nVertices2 edges is p(nVertices-v1)v1. 
    352          
    353         To sum up, if you have a large undirected graph and intend to query for  
    354         neighbours (or, equivalently, edges to or from a node) a lot, don't use  
    355         GraphAsList. If the graph is small or you won't use these functions, it  
    356         doesn't matter. 
    357          
    358         For directed graphs, getEdgesFrom is trivial. The other two functions 
    359         are even slower than for undirected graphs, since to find the edges 
    360         leading from any vertex to a given one, all lists/trees need to be 
    361         searched. So, if your algorithm will extensively use getEdgesTo or 
    362         getNeighbours and your graph is large but the number of edges is less 
    363         than nEdges2/2, you should use GraphAsTree or, to be faster but consume 
    364         more memory store the graph as GraphAsMatrix. If the number of edges is 
    365         greater, GraphAsMatrix is more economic anyway. (This calculation is 
    366         for graph with only one edge type; see the description of graph types 
    367         for details. 
    368          
    369         However, this is all programmed in C++, so whatever you do, the 
    370         bottleneck will probably still be in your Python code and not in C++. 
    371         You probably cannot miss by using GraphAsTree disregarding the size of 
    372         the graph and the operations you perform on it. 
    373      
    374     .. Graph Analyses 
    375     .. -------------- 
     349    If objects is set, functions return a list of objects (names of 
     350    vertices or whatever objects you stored in objects). Otherwise, a list 
     351    of integer indices is returned. If you want to force Graph to return 
     352    integer indices even if objects is set, set graph.returnIndices to 
     353    True. 
     354     
     355    Of the three operations, the expensive one is to look for the vertices with 
     356    edges pointing to the given edge. There is no problem when graph is 
     357    represented as a matrix (:obj:`Orange.network.GraphAsMatrix`); these are 
     358    always fast. On directed graph, getEdgeFrom is always fast as well. 
     359     
     360    In undirected graphs represented as lists or trees, the edge between 
     361    vertices with indices v1 and v2 is stored at the list/tree in the 
     362    smaller of the two indices. Therefore to list all neighbours of v1, 
     363    edges with v2<v1 are copied form v1's list, while for edges with v2>v1 
     364    the function needs to look for v1 in each v2's list/tree. Lookup in 
     365    trees is fast, while in representation with adjacency list, the 
     366    function is slower for v1 closer to nVertices/2. If v1 is small there 
     367    is a great number of v2>v1 whose lists are to be checked, but since the 
     368    lists are ordered, v1 is more to the beginning of these lists (and when 
     369    a vertex with index higher than v1 is encountered, we know that v1 is 
     370    not on the list). If v2 is great, there it is more toward the end of 
     371    the list, but there is smaller number of lists to be checked. 
     372    Generally, the average number of traversed list elements for 
     373    getNeighbours/getEdgesFrom/getEdgesTo on undirected graphs with 
     374    p*nVertices2 edges is p(nVertices-v1)v1. 
     375     
     376    To sum up, if you have a large undirected graph and intend to query for 
     377    neighbours (or, equivalently, edges to or from a node) a lot, don't use 
     378    :obj:`Orange.network.GraphAsList`. If the graph is small or you won't use 
     379    these functions, it doesn't matter. 
     380     
     381    For directed graphs, getEdgesFrom is trivial. The other two functions are 
     382    even slower than for undirected graphs, since to find the edges leading 
     383    from any vertex to a given one, all lists/trees need to be searched. So, if 
     384    your algorithm will extensively use getEdgesTo or getNeighbours and your 
     385    graph is large but the number of edges is less than nEdges2/2, you should 
     386    use :obj:`Orange.network.GraphAsTree` or, to be faster but consume more 
     387    memory store the graph as :obj:`Orange.network.GraphAsMatrix`. If the 
     388    number of edges is greater, :obj:`Orange.network.GraphAsMatrix` is more 
     389    economic anyway. This calculation is for graph with only one edge type; 
     390    see the description of graph types for details. 
     391     
     392    However, this is all programmed in C++, so whatever you do, the bottleneck 
     393    will probably still be in your Python code and not in C++. You probably 
     394    cannot miss by using :obj:`Orange.Network.GraphAsTree` disregarding the 
     395    size of the graph and the operations you perform on it. 
     396 
     397    **Graph Analyses** 
    376398     
    377399    .. method:: getConnectedComponents() 
    378400     
    379         Returns list of all connected components sorted descending by component 
    380         size. 
     401        Return a list of all connected components sorted descending by 
     402        component size. 
    381403     
    382404    .. method:: getDegreeDistribution() 
    383405     
    384         Returns degree distribution as dictionary of type 
     406        Return degree distribution as dictionary of type 
    385407        {degree:number_of_vertices}. 
    386408     
    387409    .. method:: getDegrees() 
    388410     
    389         Returns list of degrees. List size matches number of vertices. Index of 
    390         given degree matches index of corresponding vertex. 
     411        Return a list of degrees. List size matches number of vertices. Index 
     412        of given degree matches index of corresponding vertex. 
    391413     
    392414    .. method:: getHubs(n) 
    393415     
    394         Returns list of n largest hubs. 
     416        Return a list of n largest hubs. 
    395417     
    396418    .. method:: getShortestPaths(u, v) 
    397419     
    398         Returns list of vertices in the shortest path between u and v. 
     420        Return a list of vertices in the shortest path between u and v. 
    399421     
    400422    .. method:: getDistance(u, v) 
    401423     
    402         Returns a distance between vertices u and v. 
     424        Return a distance between vertices u and v. 
    403425     
    404426    .. method:: getDiameter() 
    405427     
    406         Returns a diameter of the graph. 
     428        Return a diameter of the graph. 
    407429 
    408430============================= 
     
    438460class Network(orangeom.Network): 
    439461     
    440     """Data structure for representing directed and undirected networks. 
     462    """Bases: :obj:`Orange.network.GraphAsList`, :obj:`Orange.network.Graph`  
     463     
     464    Data structure for representing directed and undirected networks. 
    441465     
    442466    Network class holds network structure information and supports basic  
Note: See TracChangeset for help on using the changeset viewer.