GISIN: Technical Introduction
This document provides an introduction to the Global Invasive Species Information Network (GISIN). The goals of this system are to allow users of the world-wide-web to access the large amount of data that is available on invasive species in an easier way. Currently, accessing data on invasive species consists primarily of searching through Google, searching the scientific literature to discover individuals who have data, or asking others, i.e., by simple word-of-mouth. After finding the data it can be a time-consuming task to filter through it to find exactly what one is looking for and then convert it into a desired format.
GISIN will change the user experience by enabling 1) a centralized registry of web sites and services with information on invasive species, 2) web sites where end-users can contribute data and make it available to other web sites, 3) web sites with summaries, maps, and models of invasive species distributions based on all available data, and 4) web portals that allow browsing across all available invasive species data.
This project will only be successful if it is easy for those with invasive species data to add their data into the system. For this reason the system has been built to be as simple and easy as possible for those with data, referred to as data providers, to add their data into the system.
The system must also provide services that are of value to the end-user community. This means it must be easy to access the available data. Ease of access means the system must support the standards, platforms, and languages that are used today on the Internet. It also means any additional standards or vocabularies must be simple, well documented, and easily available.
GISIN has been created and is managed by the Global Invasive Species Information Network (GISIN) and is based on the framework originally created by NISbase, a collaboration between the Smithsonian Institution and the US Geological Survey (Greg Ruiz, Brian Steves, Pam Fuller, and Shawn Dalton) that can be seen at http://www.nisbase.org.
2. System Components
The ISS is made up of a variety of components (Figure 1). With the exception of the Registry, any of these components may be created by anyone with access to a web server attached to the Internet. The Registry is maintained by the National Institute for Invasive Species Science (NIISS). Providers include a large variety of organizations and individuals who have information on invasive species and an available web server. Providers include herbariums, museums, resource manages, volunteer groups, and scientific research teams. A Common is a special case of a provider that includes a web site where end-users with data can add their own data into the system without having to manage a web server themselves. Consumers are web servers that wish to access invasive species data within the system. Consumers include servers that provide summaries of the available data, maps of individual species distributions, and models on their potential spread. A portal is a web site that integrates data from multiple providers making it easier for end-users to browse the information.
Existing examples of portal that are similar to GISIN within the biological and biodiversity community include the Global Biodiversity Information Facility (GBIF) and the Nonindigenous Species Database Network (NISbase). Both of these systems allow users to search across multiple sources of data for information on organisms. GBIF does not handle data that is specific to invasive species and NISbase is focus on aquatic species.GISIN will move forward from these systems to provide more comprehensive support of invasive species data across a wide range of data sources.
3. Communication Protocol
One key element in GISIN is a protocol that allows the various web servers to communicate with one another and exchange data. This communication protocol has been specifically designed to meet the needs of the invasive species community but it borrows from other work in the web services industry to insure it is as simple, fast, and reliable as possible.
These protocols emerged from client-server technology developed when computer networks first appeared. The World-Wide-Web relies upon a communication protocol call Hypertext Transfer Protocol (HTTP). Browsers such as Internet Explorer, FireFox, and NetScape operate as the client on the Web while 'web servers' operate as the servers, or 'providers'. Whenever the user enters a uniform resource locator (URL) or clicks on a 'hyperlink', the browser makes a request to a web server for additional information. The first part of a URL defines the protocol, typically 'http', the next part is a domain name (i.e. www.google.com) which identifies the web server to receive the URL, the next portion identifies the specific resource (typically a file) desired from the server, while the remainder of the URL contains parameters for the file. These parts of a URL can be seen by examining a typical URL created when searching for 'http' within Google:
'http' is the acronym for Hypertext Transfer Protocol, 'www.google.com' identifies the desired server for the request, 'search' tells the server to execute a search page, while the parameters 'source', 'hl', and 'q' indicate how to execute the search. The web server then returns an HTML file containing the results of the search. Right clicking on the html page in Internet Explorer and selecting 'View Source' will show the actual data that is received from the server. The browser then formats this data for the user to view. The GISIN system uses an almost identical approach as this example for exchanging data on invasive species between computers.
Web protocols are used heavily on the Web to exchange information between computers. If you've ever had one of those annoying adds popup within your browser you are probably seeing the results of a request of a web service request to another computer. Web services are also used to display weather information and Map Quest maps in Web pages. The major differences between a web service request and a URL is that the web service request did not originate with a user's click. Web services are executed by a computer to request information from another computer. The information is then typically parsed and converted into a more readable format for end-users.
GISIN includes a web service protocol specially designed for the exchange of invasive species data on the Internet. There is a specific set of steps that are required to use the protocol (Figure 2).
Figure 2. Simplified diagram showing the operation of the protocol.
3.1 User Requests Information
This request can come from clicking on a link in a browser or from a researcher starting a program to extract information by using a web service. The user will specify parameters such as the type of information they are looking for or where they are interested in finding information. If the user wants to know which species were invasive to New Zealand (s)he might enter:
- Location: New Zealand
- Bio Status: Non-Indigenous AND Harmful
3.2 Create Request String
Software will convert the user's request into a request string and then send it to the server of interest. For the example above the request string might look like:
/www.invasivespecies.org/giss.php?DataType=CheckList&LocationType=Country&LocationCode=New Zealand &Native=No&Harmful=Yes
'www.invasivespecies.org' identifies the server while 'giss.php' is a file on the server of interest and both of these would have been obtained from the registry. DataType indicates the user wants check lists. LocationType indicates we are looking for a country while LocationCode specifies the particular country of interest. Native and Harmful then indicate the specific parameters for this search.
3.3 Parse Request For Parameters
The server that receives the request will parse out the parameters to obtain:
3.4 Build A SQL Query String
A Structure Query Language string would typically be built that executes the desired operation such as:
Note: This example is shown for instructional purposes, but in reality the types, codes, native, and harmful fields would be converted to integer identifiers for speed.
The query string is sent to the database which will return a record set. The record set will contain the first row of data that matched the user's request and a cursor that can be used to obtain the remaining data. This cursor is similar to a cursor in a spreadsheet program such as Excel, where you can move the cursor down to examine each row.
3.6 Fetch the Record Set and Convert the Records to XML
The script will then fetch all the records that match the user's request from the database and will format them into XML. Below is a portion of an XML document for our example:
3.7 Parse XML for information
The data is returned to the client as an HTTP response and can then be parsed for display to the user. This display could be as an HTML table or the data could be stored in a database for later retrieval.
The registry provides a central location to find out specific details about information and other invasive species related resources. The registry can be searched through a web browser or through web services to find resources in a particular area that contain information about specific invasive species.
The registry can be considered a directory to the contents of the GISIN information system, containing metadata about information providers and the type of content they offer.
An IBIS website Updated 0/0/0