Saturday, April 13, 2013

PYTHON explains exactally how repository works


http://www.franz.com/agraph/allegrograph/doc/python-tutorial/python-tutorial-40.html


thon Sesame API Tutorial for AllegroGraph 4.2

This is an introduction to the Python client API to AllegroGraph RDFStore™ version 4.2 from Franz Inc.The Python Sesame API offers convenient and efficient access to an AllegroGraph server from a Python-based application. This API provides methods for creating, querying and maintaining RDF data, and for managing the stored triples.The Python Sesame API deliberately emulates the Aduna Sesame API to make it easier to migrate from Sesame to AllegroGraph.  The Python Sesame API has also been extended in ways that make it easier and more intuitive than the Sesame API. 

Contents

OverviewPrerequisites (Linux)TerminologyCreating Users with WebViewRunning Python Tutorial ExamplesCreating a Repository and Triple IndicesAsserting and Retracting TriplesA SPARQL QueryStatement MatchingLiteral ValuesImporting TriplesExporting TriplesSearching Multiple GraphsNamespacesFree Text SearchSelect, Ask, Describe, and Construct QueriesParametric QueriesRange MatchesFederated RepositoriesProlog Rule QueriesLoading Prolog RulesRDFS++ InferenceGeospatial SearchSocial Network AnalysisTransaction ControlDuplicate Triples  

Overview   Return to Top

The Python client tutorial rests on a simple architecture involving AllegroGraph, disk-based data files, Python, and a file of Python examples called tutorial_examples_40.py.AllegroGraph 4.1 Server contains the Python API, which is part of the AllegroGraph installation.Python communicates with AllegroGraph through HTTP port 8080 in this example. Python and AllegroGraph may be installed on the same computer, but in practice one server is shared by multiple clients.Load tutorial_examples_40.py into Python to view the tutorial examples.Each lesson in tutorial_examples_40.py is encapsulated in a Python function, named exampleN(), where N ranges from 0 to 21 (or more). The function names are referenced in the title of each section to make it easier to compare the tutorial text and the living code of the examples file.

Prerequisites (Linux)   Return to Top

The tutorial examples can be run on a Linux system, running AllegroGraph and the examples on the same computer ("localhost"). The tutorial assumes that AllegroGraph and Python 2.5 have been installed and configured using the procedure posted on this webpage.

Terminology   Return to Top

We need to clarify some terminology before proceeding."RDF" is the Resource Description Framework defined by the World Wide Web Consortium (W3C). It provides a elegantly simple means for describing multi-faceted resource objects and for linking them into complex relationship graphs. AllegroGraph Server creates, searches, and manages such RDF graphs.A "URI" is a Uniform Resource Identifier. It is label used to uniquely identify variosu types of entities in an RDF graph. A typical URI looks a lot like a web address: <http:\\www.company.com\project\class#number>. In spite of the resemblance, a URI is not a web address. It is simply a unique label.A "triple" is a data statement, a "fact," stored in RDF format. It states that a resource has an attribute with a value. It consists of three fields:Subject: The first field contains the URI that uniquely identifies the resource that this triple describes.Predicate: The second field contains the URI identifying a property of this resource, such as its color or size, or a relationship between this resource and another one, such as parentage or ownership.Object: The third field is the value of the property. It could be a literal value, such as "red," or the URI of a linked resource.A "quad" is a triple with an added "context" field, which is used to divide the repository into "subgraphs." This context or subgraph is just a URI label that appears in the fourth field of related triples.A "quint" is a quad with fifth field used for the "tripleID." AllegroGraph Server implements all triples as quints behind the scenes. The fourth and fifth fields are often ignored, however, so we speak casually of "triples," and sometimes of "quads," when it would be more rigorous to call them all "quints."A "resource description" is defined as a collection of triples that all have the same URI in the subject field. In other words, the triples all describe attributes of the same thing.A "statement" is a client-side Python object that describes a triple (quad, quint). In the context of AllegroGraph Server:A "catalog" is a list of repositories owned by an AllegroGraph server.A "repository" is a collection of triples within a Catalog, stored and indexed on a hard disk.A "context" is a subgraph of the triples in a repository.If contexts are not in use, the triples are stored in the background (default) graph.

Creating Users with WebView    Return to Top

Each connection to an AllegroGraph server runs under the credentials of a registered AllegroGraph user account.

Initial Superuser Account

The installation instructions for AllegroGraph advise you to create a default superuser called "test", with password "xyzzy". This is the user (and password) expected by the tutorial examples. If you created this account as directed, you can proceed to the next section and return to this topic at a later time when you need to create non-superuser accounts.If you created a different superuser account you'll have to edit the tutorial_examples_40.py file before proceeding. Modify these entries near the top of the file:AG_USER = 'test'AG_PASSWORD = 'xyzzy' Otherwise you'll get an authentication failure when you attempt to connect to the server.

Users, Permissions, Access Rules, and Roles

AllegroGraph user accounts may be given any combination of the following three permissions:SuperuserStart SessionEvaluate Arbritray CodeIn addition, a user account may be given read, write or read/write access to individual repositories.You can also define a role (such as "librarian") and give the role a set of permissions and access rules. Then you can assign several users to a shared role. This lets you manage their permissions and access by editing the role instead of the individual user accounts.A superuser automatically has all possible permissions and unlimited access. A superuser can also create, manage and delete other user accounts. Non-superusers cannot view or edit account settings.A user with the Start Sessions permission can use the AllegroGraph features that require spawning a dedicated session, such as Transactions and Social Network Analysis. If you try to use these features without the appropriate permission, you'll encounter authentication errors.A user with permission to Evaluate Arbitrary Code can run Prolog Rule Queries. This user can also do anything else that allows executing Lisp code, such as defining select-style generators, or doing eval-in-server, as well as loading server-side files.

WebView

WebView is AllegroGraph's HTTP-based graphical user interface for user and repository management. It provides a SPARQL endpoint for querying your triple stores as well as various tools that let you create and maintain triple stores interactively.To connect to WebView, simply direct your Web browser to the AllegroGraph port of your server. If you have installed AllegroGraph locally (and used the default port number), use:http://localhost:10035You will be asked to log in. Use the superuser credentials described in the previous section.The first page of WebView is a summary of your catalogs, repositories, and federations. Click the user account link in the lower left corner of the page. This exposes the Users and Roles page.This is the environment for creating and managing user accounts.To create a new user, click the [add a user] link. This exposes a small form where you can enter the username (one symbol) and password. Click OK to save the new account.The new user will appear in the list of users. Click the [view permissions] link to open a control panel for the new user account:Use the checkboxes to apply permissions to this account (superuser, start session, evaluate arbitrary code).It is imporant that you set up access permissions for the new user. Use the form to create an access rule by selecting read, write or read/write access, naming a catalog (or * for all), and naming a repository within that catalog (or * for all). Click the [add] link. This creates an access rule for your new user. The access rule will appear in the permissions display:This new user can log in and perform transactions on any repository in the system.To repeat, the "test" superuser is all you need to run all of the tutorial examples. This section is for the day when you want to issue more modest credentials to some of your operators.

Running Python Tutorial Examples    Return to Top

The AllegroGraph Python Tutorial examples are in thetutorial subdirectory where you unpacked the Python client tar.gz file. Navigate to that directory and follow one of these examples: $ python tutorial_examples_40.py runs all tests. $ python tutorial_examples_40.py all runs all tests. $ python tutorial_examples_40.py 10 runs example10() $ python tutorial_examples_40.py 1 5 22 runs examples 1, 5, and 22

Creating a Repository and Triple Indices (example1())    Return to Top

The first task is to attach to our AllegroGraph Server and open a repository. This task is implemented inexample1() from tutorial_examples_40.py.In example1() we build a chain of Python objects, ending in a"connection" object that lets us manipulate triples in a specific repository. The overall process of generating the connection object follows this diagram:The example1() function opens (or creates) a repository by building a series of client-side objects, culminating in a "connection" object. The connection object will be passed to other functions in tutorial_examples_40.py.The connection object contains the methods that let us manipulate triples in a specific repository.The example first connects to an AllegroGraph Server by providing the endpoint (host IP address and port number) of an already-launched AllegroGraph server. This creates a client-side server object, which can access the AllegroGraph server's list of available catalogs through the listCatalogs() method:.def example1(accessMode=Repository.RENEW): server = AllegroGraphServer(AG_HOST, AG_PORT, 'user', 'password') print "Available catalogs", server.listCatalogs() This is the output so far:>>> example1() Defining connnection to AllegroGraph server -- host:'localhost' port:8080 Available catalogs ['/', 'scratch']This output says that the server has two catalogs, the default rootCatalog, '/', and a named catalog 'scratch' that someone has created for some experimentation.In the next line of example1(), we use the server.openCatalog() method to create a client-side catalog object. In this example we will connect to the default rootCatalog by calling openCatalog() with no arguments. When we look inside the root catalog, we can see which repositories are available: catalog = server.openCatalog() print "Available repositories in catalog '%s': %s" % (catalog.getName(), catalog.listRepositories()) The corresponding output lists the available repositories. (When you run the examples, you may see a different list of repositories.)Available repositories in catalog 'None': ['tutorial'] The next step is to create a client-side repository object representing the respository we wish to open, by calling the getRepository() method of the catalog object. We have to provide the name of the desired repository (AG_REPOSITORY, bound to python-tutorial in this case), and select one of four access modes:Repository.RENEW clears the contents of an existing repository before opening. If the indicated repository does not exist, it creates one.Repository.OPEN opens an existing repository, or throws an exception if the repository is not found.Repository.ACCESS opens an existing repository, or creates a new one if the repository is not found.Repository.CREATE creates a new repository, or throws an exception if one by that name already exists.Repository.RENEW is the default setting for the example1() function of tutorial_examples_40.py. It can be overridden by calling example1() with the appropriate argument, such as example1(Repository.OPEN). myRepository = catalog.getRepository(AG_REPOSITORY, accessMode) myRepository.initialize()A new or renewed repository must be initialized, using the initialize() method of the repository object. If you try to initialize a respository twice you get a warning message in the Python window but no exception.The goal of all this object-building has been to create a client-side connection object, whose methods let us manipulate the triples of the repository. The repository object's getConnection() method returns this connection object. connection = myRepository.getConnection() print "Repository %s is up! It contains %i statements." % ( myRepository.getDatabaseName(), connection.size()) The size() method of the connection object returns how many triples are present. In the example1() function, this number should always be zero because we "renewed" the repository. This is the output in the Python window: Repository pythontutorial is up! It contains 0 statements. <franz.openrdf.repository.repositoryconnection.RepositoryConnection object at 0x0127D710> >>> Whenever you create a new repository, you should stop to consider which kinds of triple indices you will need.  This is an important efficiency decision.  AllegroGraph uses a set of sorted indices to quickly identify a contiguous block of triples that are likely to match a specific query pattern.These indices are identified by names that describe their organization. The default set of indices are called spogi, posgi, ospgi, gspoi, gposi, gospi, and i , where:S stands for the subject URI.P stands for the predicate URI.O stands for the object URI or literal.G stands for the graph URI.I stands for the triple identifier (its unique id number within the triple store).The order of the letters denotes how the index has been organized. For instance, the spogi index contains all of the triples in the store, sorted first by subject, then by predicate, then by object, and finally by graph. The triple id number is present as a fifth column in the index. If you know the URI of a desired resource (the subjectvalue of the query pattern), then the spogi index lets you retrieve all triples with that subject as a single block. The idea is to provide your respository with the indices that your queries will need, and to avoid maintaining indices that you will never need. We can use the connection object's listValidIndices()method to examine the list of all possible AllegroGraph triple indices: indices = conn.listValidIndices() print "All valid triple indices: %s" % (indices)This is the list of all possible valid indices:All valid triple indices: ['spogi', 'spgoi', 'sopgi', 'sogpi', 'sgpoi', 'sgopi', 'psogi', 'psgoi', 'posgi', 'pogsi', 'pgsoi', 'pgosi', 'ospgi', 'osgpi', 'opsgi', 'opgsi', 'ogspi', 'ogpsi', 'gspoi', 'gsopi', 'gpsoi', 'gposi', 'gospi', 'gopsi', 'i']AllegroGraph can generate any of these indices if you need them, but it creates only seven indices by default.  We can see the current indices by using the connection object's listIndices() method: indices = conn.listIndices() print "Current triple indices: %s" % (indices)There are currently seven indices:Current triple indices: ['i', 'gospi', 'gposi', 'gspoi', 'ospgi', 'posgi', 'spogi']The indices that begin with "g" are sorted primarily by subgraph (or "context").  If you application does not use subgraphs, you should consider removing these indices from the repository.  You don't want to build and maintain triple indices that your application will never use.  This wastes CPU time and disk space.  The connection object has a convenient dropIndex()method: print "Removing graph indices..." conn.dropIndex("gospi") conn.dropIndex("gposi") conn.dropIndex("gspoi") indices = conn.listIndices() print "Current triple indices: %s" % (indices)Having dropped three of the triple indices, there are now four remaining:Removing graph indices...Current triple indices: ['i', 'ospgi', 'posgi', 'spogi']The i index is for deleting triples by using the triple id number.  The ospgi index is sorted primarily by object value, which makes it possible to grab a range of object values as a single block of triples from the index.  Similarly, the posgi index lets us reach for a block of triples that all share the same predicate.  We mentioned previously that the spogi index lets us retrieve blocks of triples that all have the same subject URI. As it happens, we may have been overly hasty in eliminating all of the graph indices.  AllegroGraph can find the right matches as long as there is any one index present, but using the "right" index is much faster.  Let's put one of the graph indices back, just in case we need it. We'll use the connection object's addIndex() method: print "Adding one graph index back in..." conn.addIndex("gspoi") indices = conn.listIndices() print "Current triple indices: %s" % (indices) return connAdding one graph index back in...Current triple indices: ['i', 'gspoi', 'ospgi', 'posgi', 'spogi']The last line (return conn) is the pointer to the new connection object. This is the value returned by example1() when it is called by other functions in tutorial_examples_40.py. The other functions then use the connection object to access the repository.

Asserting and Retracting Triples (example2())    Return to Top

In example example2(), we show how to create resources describing two people, Bob and Alice, by asserting individual triples into the respository. The example also retracts and replaces a triple. Assertions and retractions to the triple store are executed by 'add' and 'remove' methods belonging to the connection object, which we obtain by calling the example1() function (described above).Before asserting a triple, we have to generate the URI values for the subject, predicate and object fields. The Python Sesame API to AllegroGraph Server predefines a number of classes and predicates for the RDF, RDFS, XSD, and OWL ontologies. RDF.TYPE is one of the predefined predicates we will use.The 'add' and 'remove' methods take an optional 'contexts' argument that specifies one or more subgraphs that are the target of triple assertions and retractions. When the context is omitted, triples are asserted/retracted to/from the background graph. In the example below, facts about Alice and Bob reside in the background graph.The example2() function begins by calling example1() to create the appropriate connection object, which is bound to the variable conn.def example2(): conn = example1() The next step is to begin assembling the URIs we will need for the new triples. The createURI() method generates a URI from a string. These are the subject URIs identifying the resources "Bob" and "Alice": alice = conn.createURI("http://example.org/people/alice") bob = conn.createURI("http://example.org/people/bob") Bob and Alice will be members of the "person" class (RDF:TYPE person). person = conn.createURI("http://example.org/ontology/Person") Both Bob and Alice will have a "name" attribute. name = conn.createURI("http://example.org/ontology/name") The name attributes will contain literal values. We have to generate the Literal objects from strings: bobsName = conn.createLiteral("Bob") alicesName = conn.createLiteral("Alice")The next line prints out the number of triples currently in the repository. print "Triple count before inserts: ", conn.size() Triple count before inserts: 0 Now we assert four triples, two for Bob and two more for Alice, using the connection object's add() method. After the assertions, we count triples again (there should be four) and print out the triples for inspection. ## alice is a person conn.add(alice, RDF.TYPE, person) ## alice's name is "Alice" conn.add(alice, name, alicesName) ## bob is a person conn.add(bob, RDF.TYPE, person) ## bob's name is "Bob": conn.add(bob, name, bobsName) print "Triple count: ", conn.size() for s in conn.getStatements(None, None, None, None, False): print s The "None" arguments to the getStatements() method say that we don't want to restrict what values may be present in the subject, predicate, object or context positions. Just print out all the triples.This is the output at this point. We see four triples, two about Alice and two about Bob:Triple count: 4 (<http://example.org/people/alice>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/ontology/Person>) (<http://example.org/people/alice>, <http://example.org/ontology/name>, "Alice") (<http://example.org/people/bob>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/ontology/Person>) (<http://example.org/people/bob>, <http://example.org/ontology/name>, "Bob")We see two resources of type "person," each with a literal name.The next step is to demonstrate how to remove a triple. Use the remove() method of the connection object, and supply a triple pattern that matches the target triple. In this case we want to remove Bob's name triple from the repository. Then we'll count the triples again to verify that there are only three remaining. Finally, we re-assert Bob's name so we can use it in subsequent examples, and we'll return the connection object.. conn.remove(bob, name, bobsName) print "Triple count: ", conn.size() conn.add(bob, name, bobsName) return connTriple count: 3 <franz.openrdf.repository.repositoryconnection.RepositoryConnection object at 0x01466830>

A SPARQL Query (example3())    Return to Top

SPARQL stands for the "SPARQL Protocol and RDF Query Language," a recommendation of the World Wide Web Consortium (W3C). SPARQL is a query language for retrieving RDF triples.Our next example illustrates how to evaluate a SPARQL query. This is the simplest query, the one that returns all triples. Note that example3() continues with the four triples created in example2().def example3(): conn = example2() try: queryString = "SELECT ?s ?p ?o WHERE {?s ?p ?o .}" The SELECT clause returns the variables ?s, ?p and ?o in the bindingSet. The variables are bound to the subject, predicate and objects values of each triple that satisfies the WHERE clause. In this case the WHERE clause is unconstrained. The dot (.) in the fourth position signifies the end of the pattern.The connection object's prepareTupleQuery() method creates a query object that can be evaluated one or more times. (A "tuple" is an ordered sequence of data elements in Python.) The results are returned in an iterator that yields a sequence of bindingSets. tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString) result = tupleQuery.evaluate(); Below we illustrate one (rather heavyweight) method for extracting the values from a binding set, indexed by the name of the corresponding column variable in the SELECT clause. try: for bindingSet in result: s = bindingSet.getValue("s") p = bindingSet.getValue("p") o = bindingSet.getValue("o") print "%s %s %s" % (s, p, o) http://example.org/people/alice http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://example.org/ontology/Person http://example.org/people/alice http://example.org/ontology/name "Alice" http://example.org/people/bob http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://example.org/ontology/Person http://example.org/people/bob http://example.org/ontology/name "Bob" The Connection class is designed to be created for the duration of a sequence of updates and queries, and then closed. In practice, many AllegroGraph applications keep a connection open indefinitely. However, best practice dictates that the connection should be closed, as illustrated below. The same hygiene applies to the iterators that generate binding sets. finally: result.close(); finally: conn.close(); myRepository = conn.repository myRepository.shutDown()

Statement Matching (example4())    Return to Top

The getStatements() method of the connection object provides a simple way to perform unsophisticated queries. This method lets you enter a mix of required values and wildcards, and retrieve all matching triples. (If you need to perform sophisticated tests and comparisons you should use the SPARQL query instead.)Below, we illustrate two kinds of 'getStatement' calls. The first mimics traditional Sesame syntax, and returns a Statement object at each iteration. This is the example4() function of tutorial_examples_40.py. It begins by calling example2() to create a connection object and populate the pythontutorial repository with four triples describing Bob and Alice. We're going to search for triples that mention Alice, so we have to create an "Alice" URI to use in the search pattern:def example4(): conn = example2() alice = conn.createURI("http://example.org/people/alice") Now we search for triples with Alice's URI in the subject position. The "None" values are wildcards for the predicate and object positions of the triple. print "Searching for Alice using getStatements():" statements = conn.getStatements(alice, None, None) The getStatements() method returns a repositoryResult object (bound to the variable "statements" in this case). This object can be iterated over, exposing one result statement at a time. It is sometimes desirable to screen the results for duplicates, using the enableDuplicateFilter() method. Note, however, that duplicate filtering can be expensive. Our example does not contain any duplicates, but it is possible for them to occur. statements.enableDuplicateFilter() for s in statements: print s This prints out the two matching triples for "Alice."Searching for Alice using getStatements():(<http://example.org/people/alice>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/ontology/Person>) (<http://example.org/people/alice>, <http://example.org/ontology/name>, "Alice") At this point it is good form to close the respositoryResponse object because they occupy memory and are rarely reused in most programs. statements.close()

Literal Values (example5())   Return to Top

The next example, example5(), illustrates some variations on what we have seen so far. The example creates and asserts typed and plain literal values, including language-specific plain literals, and then conducts searches for them in three ways:getStatements() search, which is an efficient way to match a single triple pattern.SPARQL direct match, for efficient multi-pattern search.SPARQL filter match, for sophisticated filtering such as performing range matches.The getStatements() and SPARQL direct searches return exactly the datatype you ask for. The SPARQL filter queries can sometimes return multiple datatypes. This behavior will be one focus of this section.If you are not explicit about the datatype of a value, either when asserting the triple or when writing a search pattern, AllegroGraph will deduce an appropriate datatype and use it. This is another focus of this section. This helpful behavior can sometimes surprise you with unanticipated results.

Setup

Example5() begins by obtaining a connection object from example1(), and then clears the repository of all existing triples.def example5(): conn = example1() conn.clear() For sake of coding efficiency, it is good practice to create variables for namespace strings. We'll use this namespace again and again in the following lines. We have made the URIs in this example very short to keep the result displays compact. exns = "http://people/" The example creates new resources describing seven people, named alphabetically from Alice to Greg. These are URIs to use in the subject field of the triples. The example shows how to enter a full URI string, or alternately how to combine a namespace with a local resource name. alice = conn.createURI("http://example.org/people/alice") bob = conn.createURI("http://example.org/people/bob") carol = conn.createURI("http://example.org/people/carol") dave = conn.createURI(namespace=exns, localname="dave") eric = conn.createURI(namespace=exns, localname="eric") fred = conn.createURI(namespace=exns, localname="fred") greg = conn.createURI(namesapce=exns, localname="greg")

Numeric Literal Values

This section explores the behavior of numeric literals.

Asserting Numeric Data

The first section assigns ages to the participants, using a variety of numeric types. First we need a URI for the "age" predicate. age = conn.createURI(namespace=exns, localname="age")The next step is to create a variety of values representing ages. Coincidentally, these people are all 42 years old, but we're going to record that information in multiple ways: fortyTwo = conn.createLiteral(42) # creates long fortyTwoDouble = conn.createLiteral(42.0) # creates double fortyTwoInt = conn.createLiteral('42', datatype=XMLSchema.INT) fortyTwoLong = conn.createLiteral('42', datatype=XMLSchema.LONG) fortyTwoFloat = conn.createLiteral('42', datatype=XMLSchema.FLOAT) fortyTwoString = conn.createLiteral('42', datatype=XMLSchema.STRING) fortyTwoPlain = conn.createLiteral('42') # creates plain literalIn four of these statements, we explicitly identified the datatype of the value in order to create an INT, a LONG, a FLOAT and a STRING. This is the best practice.In three other statements, we just handed AllegroGraph numeric-looking values to see what it would do with them. As we will see in a moment, 42 creates a LONG, 42.0 becomes into a DOUBLE, and '42' becomes a "plain" (untyped) literal value. (Note that plain literals are not quite the same thing as typed literal strings. A search for a plain literal will not always match a typed string, and vice versa.)Now we need to assemble the URIs and values into statements (which are client-side triples): stmt1 = conn.createStatement(alice, age, fortyTwo) stmt2 = conn.createStatement(bob, age, fortyTwoDouble) stmt3 = conn.createStatement(carol, age, fortyTwoInt) stmt4 = conn.createStatement(dave, age, fortyTwoLong) stmt5 = conn.createStatement(eric, age, fortyTwoFloat) stmt6 = conn.createStatement(fred, age, fortyTwoString) stmt7 = conn.createStatement(greg, age, fortyTwoPlain)And then add the statements to the triple store on the AllegroGraph server. We can use either add() or addStatement() for this purpose. conn.add(stmt1) conn.add(stmt2) conn.add(stmt3) conn.addStatement(stmt4) conn.addStatement(stmt5) conn.addStatement(stmt6) conn.addStatement(stmt7) Now we'll complete the round trip to see what triples we get back from these assertions. This is how we use getStatements() in this example to retrieve and display triples for us: print "\nShowing all age triples using getStatements(). Seven matches." statements = conn.getStatements(None, age, None) for s in statements: print sThis loop prints all age triples to the interaction window. Note that the retrieved triples are of six types: two ints, a long, a float, a double, a string, and a "plain literal." All of them say that their person's age is 42. Note that the triple for Greg has the plain literal value "42", while the triple for Fred uses "42" as a typed string.Showing all age triples using getStatements(). Seven matches.(<http://people/greg>, <http://people/age>, "42")(<http://people/fred>, <http://people/age>, "42"^^<http://www.w3.org/2001/XMLSchema#string>)(<http://people/eric>, <http://people/age>, "4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>)(<http://people/dave>, <http://people/age>, "42"^^<http://www.w3.org/2001/XMLSchema#long>)(<http://people/carol>, <http://people/age>, "42"^^<http://www.w3.org/2001/XMLSchema#int>)(<http://people/bob>, <http://people/age>, "4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>)(<http://people/alice>, <http://people/age>, "42"^^<http://www.w3.org/2001/XMLSchema#long>)If you ask for a specific datatype, you will get it. If you leave the decision up to AllegroGraph, you might get something unexpected such as an plain literal value.

Matching Numeric Data

This section explores getStatements() and SPARQL matches against numeric triples.In the first example, we asked AllegroGraph to find an untyped number, 42.Query TypeQueryMatches which types?getStatements()conn.getStatements(None, age, 42)"42"^^<http://www.w3.org/2001/XMLSchema#long>SPARQL direct matchSELECT ?s ?p WHERE {?s ?p 42 .}No direct matches.SPARQL filter matchSELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = 42)}"42"^^<http://www.w3.org/2001/XMLSchema#int>"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>"42"^^<http://www.w3.org/2001/XMLSchema#long>"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>The getStatements() query returned triples containing longs only. The SPARQL direct match didn't know how to interpret the untyped value and found zero matches. The SPARQL filter match, however, opened the doors to matches of multiple numeric types, and returned ints, floats, longs and doubles."Match 42.0" without explicitly declaring the number's type.Query TypeQueryMatches which types?getStatements()conn.getStatements(None, age, 42.0)"42"^^<http://www.w3.org/2001/XMLSchema#double>Matches a double but not a float.SPARQL direct matchSELECT ?s ?p WHERE {?s ?p 42.0 .}No direct matches.SPARQL filter matchSELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = 42.0)}"42"^^<http://www.w3.org/2001/XMLSchema#int>"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>"42"^^<http://www.w3.org/2001/XMLSchema#long>"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>The getStatements() search returned a double but not the similar float. The filter match returned all numeric types that were equal to 42.0."Match '42'^^<http://www.w3.org/2001/XMLSchema#int>."Note that we have to use a variable (fortyTwoInt) bound to a Literal value in order to offer this int to getStatements(). We can't just type the value into the getStatements() method directly.Query TypeQueryMatches which types?getStatements()conn.getStatements(None, age, fortyTwoInt)"42"^^<http://www.w3.org/2001/XMLSchema#int>SPARQL direct matchSELECT ?s ?p WHERE {?s ?p "42"^^<http://www.w3.org/2001/XMLSchema#int>}"42"^^<http://www.w3.org/2001/XMLSchema#int>SPARQL filter matchSELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "42"^^<http://www.w3.org/2001/XMLSchema#int>)}"42"^^<http://www.w3.org/2001/XMLSchema#int>"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>"42"^^<http://www.w3.org/2001/XMLSchema#long>"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>Both the getStatements() query and the SPARQL direct query returned exactly what we asked for: ints. The filter match returned all numeric types that matches in value."Match '42'^^<http://www.w3.org/2001/XMLSchema#long>."Again we need a bound variable to offer a Literal value to getStatements().Query TypeQueryMatches which types?getStatements()conn.getStatements(None, age, fortyTwoLong)"42"^^<http://www.w3.org/2001/XMLSchema#long>SPARQL direct matchSELECT ?s ?p WHERE {?s ?p "42"^^<http://www.w3.org/2001/XMLSchema#long>}"42"^^<http://www.w3.org/2001/XMLSchema#long>SPARQL filter matchSELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "42"^^<http://www.w3.org/2001/XMLSchema#long>)}"42"^^<http://www.w3.org/2001/XMLSchema#int>"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>"42"^^<http://www.w3.org/2001/XMLSchema#long>"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>Both the getStatements() query and the SPARQL direct query returned longs. The filter match returned all numeric types."Match '42'^^<http://www.w3.org/2001/XMLSchema#double>."Query TypeQueryMatches which types?getStatements()conn.getStatements(None, age, fortyTwoDouble)"42"^^<http://www.w3.org/2001/XMLSchema#double>SPARQL direct matchSELECT ?s ?p WHERE {?s ?p "42"^^<http://www.w3.org/2001/XMLSchema#double>}"42"^^<http://www.w3.org/2001/XMLSchema#double>SPARQL filter matchSELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "42"^^<http://www.w3.org/2001/XMLSchema#double>)}"42"^^<http://www.w3.org/2001/XMLSchema#int>"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>"42"^^<http://www.w3.org/2001/XMLSchema#long>"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>Both the getStatements() query and the SPARQL direct query returned doubles. The filter match returned all numeric types.

Matching Numeric Strings and Plain Literals

At this point we are transitioning from tests of numeric matches to tests of string matches, but there is a gray zone to be explored first. What do we find if we search for strings that contain numbers? In particular, what about "plain literal" values that are almost, but not quite, strings?"Match '42'^^<http://www.w3.org/2001/XMLSchema#string>."This example asks for a typed string to see if we get any numeric matches back.Query TypeQueryMatches which types?getStatements()conn.getStatements(None, age, fortyTwoString)"42"^^<http://www.w3.org/2001/XMLSchema#string>It did not match the plain literal. SPARQL direct matchSELECT ?s ?p WHERE {?s ?p "42"^^<http://www.w3.org/2001/XMLSc

Matching Date, Time, and Datetime Literals

"Match date." What happens if we search for the date literal we defined? We'll use the "date" variable with getStatements(), but just type the expected value into the SPARQL queries.Query TypeQueryMatches which types?getStatements()conn.getStatements(None, age, date)"1984-12-06"^^<http://www.w3.org/2001/XMLSchema#date>SPARQL direct matchSELECT ?s ?p WHERE {?s ?p '1984-12-06'^^<http://www.w3.org/2001/XMLSchema#date>"1984-12-06"^^<http://www.w3.org/2001/XMLSchema#date>SPARQL filter matchSELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o ='1984-12-06'^^<http://www.w3.org/2001/XMLSchema#date>)}"1984-12-06"^^<http://www.w3.org/2001/XMLSchema#date>All three queries match narrowly, meaning the exact date and datatype we asked for is returned."Match datetime." What happens if we search for the datetime literal? We'll use the "datetime" variable with getStatements(), but just type the expected value into the SPARQL queries.Query TypeQueryMatches which types?getStatements()conn.getStatements(None, age, datetime)"1984-12-06T09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>SPARQL direct matchSELECT ?s ?p WHERE {?s ?p '1984-12-06T09:00:00Z'^^<http://www.w3.org/2001/XMLSchema#dateTime> .}"1984-12-06T09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>SPARQL filter matchSELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = '1984-12-06T09:00:00Z'^^<http://www.w3.org/2001/XMLSchema#dateTime>)}"1984-12-06T09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>The matches are specific for the exact date, time and type."Match time." What happens if we search for the time literal? We'll use the "time" variable with getStatements(), but just type the expected value into the SPARQL queries.Query TypeQueryMatches which types?getStatements()conn.getStatements(None, age, time)"09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#time>SPARQL direct matchSELECT ?s ?p WHERE {?s ?p "09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#time> .}"09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#time>SPARQL filter matchSELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#time>)}"09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#time>The matches are specific for the exact time and type."Match datetime with offset." What happens if we search for a datetime with zulu offset?Query TypeQueryMatches which types?getStatements()conn.getStatements(None, age, datetimeOffset)"1984-12-06T08:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>SPARQL direct matchSELECT ?s ?p WHERE {?s ?p "1984-12-06T09:00:00+01:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> .}"1984-12-06T08:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>SPARQL filter matchSELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "1984-12-06T09:00:00+01:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)}"1984-12-06T08:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>Note that we searched for "1984-12-06T09:00:00+01:00" but found "1984-12-06T08:00:00Z". It is the same moment in time.

Importing Triples (example6() and example7())    Return to Top

The Python Sesame API client can load triples in either RDF/XML format or NTriples format. The example below calls the connection object's add() method to load an NTriples file, and addFile() to load an RDF/XML file. Both methods work, but the best practice is to use addFile().Note: If you get a "file not found" error while running this example, it means that Python is looking in the wrong directory for the data files to load. The usual explanation is that you have moved the tutorial_examples_40.py file to an unexpected directory. You can clear the issue by putting the data files in the same directory as tutorial_examples_40.py, or by setting the Python current working directory to the location of the data files using os.setcwd().The RDF/XML file contains a short list of v-cards (virtual business cards), like this one: <rdf:Description rdf:about="http://somewhere/JohnSmith/"> <vCard:FN>John Smith</vCard:FN> <vCard:N rdf:parseType="Resource"> <vCard:Family>Smith</vCard:Family> <vCard:Given>John</vCard:Given> </vCard:N> </rdf:Description> The NTriples file contains a graph of resources describing the Kennedy family, the places where they were each born, their colleges, and their professions. A typical entry from that file looks like this:<http://www.franz.com/simple#person1> <http://www.franz.com/simple#first-name> "Joseph" . <http://www.franz.com/simple#person1> <http://www.franz.com/simple#middle-initial> "Patrick" . <http://www.franz.com/simple#person1> <http://www.franz.com/simple#last-name> "Kennedy" . <http://www.franz.com/simple#person1> <http://www.franz.com/simple#suffix> "none" . <http://www.franz.com/simple#person1> <http://www.franz.com/simple#alma-mater> <http://www.franz.com/simple#Harvard> . <http://www.franz.com/simple#person1> <http://www.franz.com/simple#birth-year> "1888" . <http://www.franz.com/simple#person1> <http://www.franz.com/simple#death-year> "1969" . <http://www.franz.com/simple#person1> <http://www.franz.com/simple#sex> <http://www.franz.com/simple#male> . <http://www.franz.com/simple#person1> <http://www.franz.com/simple#spouse> <http://www.franz.com/simple#person2> . <http://www.franz.com/simple#person1> <http://www.franz.com/simple#has-child> <http://www.franz.com/simple#person3> . <http://www.franz.com/simple#person1> <http://www.franz.com/simple#profession> <http://www.franz.com/simple#banker> . <http://www.franz.com/simple#person1> <http://www.franz.com/simple#birth-place> <http://www.franz.com/simple#place5> . <http://www.franz.com/simple#person1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.franz.com/simple#person> . Note that AllegroGraph can segregate triples into contexts (subgraphs) by treating them as quads, but the NTriples and RDF/XML formats can not include context information. They deal with triples only, so there is no place to store a fourth field in those formats. In the case of the add() call, we have omitted the context argument so the triples are loaded the default background graph (sometimes called the "null context.") The addFile() call includes an explicit context setting, so the fourth argument of each vcard triple will be the context named "/tutorial/vc_db_1_rdf". The connection size() method takes an optional context argument. With no argument, it returns the total number of triples in the repository. Below, it returns the number '16' for the 'context' context argument, and the number '28' for the null context (None) argument.The example6() function of tutorial_examples_40.py creates a dedicated session connection to AllegroGraph, using methods you have seen before, plus the Connection object's openSession() method:def example6(close=True): print "Starting example6()." server = AllegroGraphServer(AG_HOST, AG_PORT, AG_USER, AG_PASSWORD) catalog = server.openCatalog(AG_CATALOG) myRepository = catalog.getRepository(AG_REPOSITORY, Repository.RENEW) myRepository.initialize() conn = myRepository.getConnection() conn.clear() conn.openSession()The dedicated session is not immediately pertinent to the examples in this section, but will become important in later examples that reuse this connection to demonstrate Prolog Rules and Social Network Analysis.The variables path1 and path2 are bound to the RDF/XML and NTriples files, respectively. path1 = "./python-vcards.rdf" path2 = "./python-kennedy.ntriples" The NTriples about the vcards will be added to a specific context, so naturally we need a URI to identify that context. context = conn.createURI("http://example.org#vcards")In the next step we use addFile() to load the vcard triples into the #vcards context: conn.addFile(path1, None, format=RDFFormat.RDFXML, context=context)Then we use add() to load the Kennedy family tree into the null context: conn.add(path2, base=None, format=RDFFormat.NTRIPLES, contexts=None)Now we'll ask AllegroGraph to report on how many triples it sees in the null context and in the #vcards context: print "After loading, repository contains %i vcard triples in context '%s'\n and %i kennedy triples in context '%s'." % (conn.size(context), context, conn.size('null'), 'null') return connThe output of this report was:After loading, repository contains 16 vcard triples in context 'http://example.org#vcards' and 1214 kennedy triples in context 'null'. The SPARQL query below is found in example7() of tutorial_examples_40.py. It borrows the same triples we loaded in example6(), above, and runs two unconstrained retrievals. The first uses getStatement, and prints out the subject URI and context of each triple.def example7(): conn = example6() print "Match all and print subjects and contexts" result = conn.getStatements(None, None, None, None, limit=25) for row in result: print row.getSubject(), row.getContext()This loop prints out a mix of triples from the null context and from the #vcards context. We set a limit of 25 triples because the Kennedy dataset contains over a thousand triples.The following loop, however, does not produce the same results. This is a SPARQL query that should match all available triples, printing out the subject and context of each triple. We limited this query by using the DISTINCT keyword. Otherwise there would be many duplicate results. print "\nSame thing with SPARQL query (can't retrieve triples in the null context)" queryString = "SELECT DISTINCT ?s ?c WHERE {graph ?c {?s ?p ?o .} }" tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString) result = tupleQuery.evaluate(); for i, bindingSet in enumerate(result): print bindingSet[0], bindingSet[1] conn.close()In this case, the loop prints out only v-card results. The SPARQL query is not able to access the null context when a non-null context is also present.

Exporting Triples (example8() and example9())    Return to Top

The next examples show how to write triples out to a file in either NTriples format or RDF/XML format. The output of either format may be optionally redirected to standard output (the Python command window) for inspection.Example example8() begins by obtaining a connection object from example6(). This means the repository contains v-card triples in the #vcards context, and Kennedy family tree triples in the null context.def example8(): conn = example6()In this example, we'll export the triples in the #vcards context. context = conn.createURI("http://example.org#vcards")To write triples in NTriples format, call NTriplesWriter(). You have to tell it the path and file name of the exported file. If the output file argument is 'None', the writers write to standard output. You can uncomment that line if you'd like to see it work. This code exports the vcards triples in ntriples format: outputFile = "/tmp/temp.nt" #outputFile = None if outputFile == None: print "Writing RDF to Standard Out instead of to a file" ntriplesWriter = NTriplesWriter(outputFile) conn.export(ntriplesWriter, context);To write triples in RDF/XML format, call RDFXMLWriter(). This code exports the Kennedy triples in RDF/XML format. outputFile2 = "/tmp/temp.rdf" #outputFile2 = None if outputFile2 == None: print "Writing NTriples to Standard Out instead of to a file" rdfxmlfWriter = RDFXMLWriter(outputFile2) conn.export(rdfxmlfWriter, 'null')The export() method writes out all triples in one or more contexts. This provides a convenient means for making local backups of sections of your RDF store. The 'null' argument targets the triples of the default graph. If two or more contexts are specified, then triples from all of those contexts will be written to the same file. Since the triples are "mixed together" in the file, the context information is not recoverable. If the context argument is omitted, all triples in the store are written out, and again all context information is lost.Finally, if the objective is to write out a filtered set of triples, the exportStatements() method can be used. The example below (from example9()) writes out all familyName triples from the vcards context to standard output. familyName = conn.createURI("http://www.w3.org/2001/vcard-rdf/3.0#FN") conn.exportStatements(None, familyName, None, False, RDFXMLWriter(None), context)

Searching Multiple Graphs (example10())    Return to Top

We have already seen contexts (subgraphs) at work when loading and saving files. In example10() we provide more realistic examples of contexts, and we explore the FROM, FROM DEFAULT, and FROM NAMED clauses of a SPARQL query to see how they interact with multiple subgraphs in the triple store. Finally, we will introduce the dataset object. A dataset is a list of contexts that should all be searched simultaneously. It is an object for use with SPARQL queries.To set up the example, we create six statements, and add two of each to three different contexts: context1, context2, and the null context. The process of setting up the six statements follows the same pattern as we used in the previous examples: ## Create URIs for resources, predicates and classes. exns = "http://example.org/people/" alice = conn.createURI(namespace=exns, localname="alice") bob = conn.createURI(namespace=exns, localname="bob") ted = conn.createURI(namespace=exns, localname="ted") person = conn.createURI(namespace=exns, localname="Person") name = conn.createURI(namespace=exns, localname="name") ## Create literal name values. alicesName = conn.createLiteral("Alice") bobsName = conn.createLiteral("Bob") tedsName = conn.createLiteral("Ted") ## Create URIs to identify the named contexts. context1 = conn.createURI(namespace=exns, localname="context1") context2 = conn.createURI(namespace=exns, localname="context2") The next step is to assert two triples into each of three contexts: ## Assemble new statements and add them to the contexts. conn.add(alice, RDF.TYPE, person, context1) conn.add(alice, name, alicesName, context1) conn.add(bob, RDF.TYPE, person, context2) conn.add(bob, name, bobsName, context2) conn.add(ted, RDF.TYPE, person) // Added to null context conn.add(ted, name, tedsName) // Added to null contextNote that the final two statements (about Ted) were added to the null context (the unnamed default graph).

GetStatements

The first test uses getStatements() to return all triples in all contexts (context1, context2, and null). This is default search behavior, so there is no need to specify the contexts in the conn.getStatements() method. Note that conn.size() also reports on all contexts by default. statements = conn.getStatements(None, None, None) print "All triples in all contexts: %s" % (conn.size()) for s in statements: print sThe output of this loop is shown below. The context URIs are in the fourth position. Triples from the null context have no context value.All triples in all contexts: 6(<http://example.org/people/alice>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/context1>)(<http://example.org/people/alice>, <http://example.org/people/name>, "Alice", <http://example.org/people/context1>)(<http://example.org/people/bob>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/context2>)(<http://example.org/people/bob>, <http://example.org/people/name>, "Bob", <http://example.org/people/context2>)(<http://example.org/people/ted>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>)(<http://example.org/people/ted>, <http://example.org/people/name>, "Ted")The next match explicitly lists 'context1' and 'context2' as the only contexts to participate in the match. It returns four statements. The conn.size() method can also address individual contexts. statements = conn.getStatements(None, None, None, [context1, context2]) print "Triples in contexts 1 or 2: %s" % (conn.size([context1, context2])) for s in statements: print sThe output of this loop shows that the triples in the null context have been excluded.Triples in contexts 1 or 2: 4(<http://example.org/people/bob>, <http://example.org/people/name>, "Bob", <http://example.org/people/context2>)(<http://example.org/people/bob>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/context2>)(<http://example.org/people/alice>, <http://example.org/people/name>, "Alice", <http://example.org/people/context1>)(<http://example.org/people/alice>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/context1>)This time we use getStatements() to search explicitly for triples in the null context and in context 2. statements = conn.getStatements(None, None, None, ['null', context2]) print "Triples in contexts null or 2: %s" % (conn.size(['null', context2])) for s in statements: print sThe output of this loop is:Triples in contexts null or 2: 4(<http://example.org/people/bob>, <http://example.org/people/name>, "Bob", <http://example.org/people/context2>)(<http://example.org/people/bob>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/context2>)(<http://example.org/people/ted>, <http://example.org/people/name>, "Ted")(<http://example.org/people/ted>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>)The lesson is that getStatements() can freely mix triples from the null context and named contexts. It is all you need as long as the query is a very simple one.

SPARQL using FROM, F

No comments:

Post a Comment