GISIN Protocol Specification
Last Update: =$Date?>
Every organization tends to collect data in a unique way. This document defines the protocol that GISIN users to transfer data between organizations in a standard way. For information on why the specification includes the features it does, please see the =$RequirementsURL?>.
This document is written for those who wish to provide data to GISIN through file upload or a web service and assumes you are familiarly with the Global Invasive Species Information Network (GISIN). The protocol specification described here has borrowed from other specifications wherever possible. See the DarwinCore documentation, the OpenGIS Web Map Service (WMS), and Geography Markup Language (GML) for related topics.
Note: The documentation on web services has been moved to the new Web Service page.
A large number of individuals and organizations helped with the initial idea, content, and reviews of this protocol. They include; Jim Graham and Annie Simpson for coordinating and documenting the effort, Jerry Cooper, Bob Morris and Michael Browne for original work on the IASPS documentation which was a starting point for this document; Greg Ruiz and Jim Carlton for their Framework for Vector Science; Pam Fuller, Greg Ruiz, Brian Steves, and Shawn Dalton, for their development of NISBase as the ground breaking work on implementing invasive species data exchange; Michael Browne for facilitating the development of the status categories; Robert Hilliard, Kevin Thiele and Aaron Wilton for assistance in integrating vocabularies, Roger Hyam, Renato De Giovanni, and Markus Doring for help with implementing TAPIR; Donald Hobern and Hannu Saarenmaa for overall guidance, and Liz Sellers, Rob Emery, Jacob Asiedeau, Greg Newman, Catherine Jarnevich, Silvia Ziller, Andrea Grosse, Olivier de Munck, the staff of Invasive Species Specialist Group and the Global Invasive Species Database.
If we have forgotten anyone please let us know!
The protocol has been designed with potential data providers in mind, including, a recognition that these organizations will tend to have simple, flat databases, with minimal technical resources to modify their databases to make them available as a web service. At the same time the protocol must perform at high speeds to allow for both a large number of providers and for providers with very large data sets. The protocol is used by both the file upload and web service capabilities of GISIN.
2.1 Data Models
Below are the "Data Models" or types of data that are supported by the protocol.
Note: We broke out ManagementStatus and ImpactStatus from SpeciesStatus when we realized there could be multiple records for ManagementStatus and/or ImpactStatus for a single SpeciesStatus record (i.e. a single species and location). This allows us to keep the Model records flat.
The GISIN protocol was created as an extension to the DarwinCore standard. This standard refers to each of the types of data in a data model as a "concept". Concepts can also be thought of as the labels over the columns in a table or spreadsheet file, or as the fields in a database.
3.1 Common Concepts
The following Concepts are used as fields to identify the Date, Taxonomy, Location, and the Language for text results within the data for all Models.LKU_GISIN_Concepts::WriteModelTable($Database,0,"Fields","ProtocolSpecification.php",true,false,1); ?>
Note: If a provider does not support a particular Concept it should simply not return that Concept as an element of the record.
Dates are represented as documented in International Standard ISO 8601. This format is YYYY-MM-DD where YYYY is the decimal year in the Gregorian calendar. See Markus's web page for a quick summary. At least a year is required, month and day are preferred as well.
The Modified is the last date that the record was changed. This will be used by data consumers of if they cache data. If the provider does not maintain a DateLastModfiied in their database they should always return the current date.
The StartValidDate and EndValidDate represent the range of when a "status" data Model is valid. Data providers should return an empty element for the EndValidDate if the status is still current.
ScientificName is the primary means of identifying a taxa. At least a Genus is required. Species, subspecies, variety, author, and date may be provided in standard taxonomic notation. Kingdom is recommended to be included in all requests with ScientificNames to resolve the few conflicts where the same genus appears in more than one Kingdom.
Examples of scientific names include:
Note: We need to define how Taxonomic Concepts will be included.
Invasive species location data can be in one of three forms: 1) country codes with additional concepts, 2) standard location codes, and/or 3) geographic coordinates.
Readers should recognize that certain providers will have only local names while others will have only geographic coordinates (of a variety of types). Consumers may request just LocationNames, just Geographic coordinates, or both. The data provider should provide all the information it has available that meets the requested content.
International codes exist for countries and are available from the ISO 3166 and can be specified with the CountryCode concept. Once a country code is specified, a state/province should be specified with the StateProvince concept. If approrpiate a county (which includes cantons) should then be specified with the County concept. If available, the name of the local area can be specified with the LocalityName concept. A LocalityType should accompany a LocalityName to ensure it is unique. If names are not associated with a LanguageCode then they are assumed to be in the default language of the provider.
Another approach is to identify the location of the record with a known defined standard (i.e. US_HUC, US_FIPS, AR_PostalCode, etc.). This is done by providing a LocationStandard and LocationCode. Supported LocationStandards are listed below. There can only be one location code per record.
Please contact a member of the GISIN steering committee to have a new standard added.
A precise coordinate can be specified with the DecimalLatitude and DecimalLongitude concepts. These values should be in decimal degrees. The accuracy of the coordinates should be provided with the HorizontalAccuracy concept and how the coordinates were created should be provided with the GeographicProtocol concept if this information is available.
Filtering for geographic coordinates is only supported for geographic bounding boxes with the DecimalLatitudeMin, DecimalLatitudeMax, DecimalLongitudeMin, and DecimalLongitudeMax parameters.
Languages are specified with IS0 639-2 codes. These are 3-letter codes. In some cases a bibliographic code and a terminology code is provided. In this case GISIN uses the terminology code. Some examples are below.
3.1.5 Globally Unique Identifiers
Globally Unique Identifiers, or GUIDs, are a mechanism to uniquely identify data that is moved around the Internet. GUIDs are extremely important both to make sure we are not duplicating records and to make sure that the originators of data are identifiable. GUIDs are also a means to allow corrections to data to be updated and to have old data removed from a system.
GISIN recommends GUIDs be attached to each record in the original source of the data. When this is not possible, providers should add a GUID and cite the original source in the Citation concept. If a provider changes the contents of a record that they do not own, they should create a new GUID and cite the original source in the Citation concept. Since GUIDs are used to identify a unique record, they should never be reused with a different record.
GUIDs must also be traceable or resolvable. Any GUID must be able to be used to determine the original source of a record, typically by entering it into a web browser as a URL (in the case of GISIN GUIDs, with the addition of http://www.). GUIDs must also last for a very long time (indefinitely). This means that a "GUID Authority" must be used to provide the GUID. An authority is defined as an organization that has made a long-term commitment to providing a resolving functionality for GUIDs.
There are a variety of formats for GUIDs. GISIN uses a GUID standard that is easy to read and resolve. The format appears as:
Where the values in brackets would be replaced with:
GISIN has volunteered to be a GUID authority so if you as a data provider do not have another authority to use, you can contact GISIN to obtain an InstitutionCode and CollectionCode(s) and then determine your own unique CatalogNumbers. The format for GUIDs from GISIN is:
The SpeciesStatus model contains information on the status of a species within a specific location, and within a specific date range.
3.2.1 SpeciesStatus Concepts
SpeciesStatuses are required to support the following general Concepts:
SpeciesStatuses support the following additional Concepts. Possible values for the Concepts are listed below this table.WriteModelFields($Database,1); ?>
3.3.1 SpeciesResourceURL Concepts
SpeciesResourceURLs are required to support the following general Concepts:
SpeciesResourceURL support the following additional Concepts as fields:WriteModelFields($Database,2); ?>
Occurrence data are especially important for modeling present and future distributions of species. The GISIN system utilized other TDWG standards in the management of occurrence information.
3.4.1 Occurrence Concepts
Occurrences are required to support the following general Concepts:
Occurrences support the following additional Concepts:WriteModelFields($Database,3); ?>
The preferred datums are World Geodetic System 1984 (WGS84) or High Accuracy Reference Network (HARN) but providers may have data in datums that are not global and may not have the facilities to convert them to a global datum. Ignoring the datum can cause errors of thousands of meters! We highly encourage providers to provide data in WGS84 or HARN and consumers should always check the datum or else filter to choose just the datums they accept.
3.5.1 ImpactStatus Concepts
ImpactStatus represents the type of impact a species is having on a habitat. Multiple ImpactStatues should be provided for species that impact multiple habitats (i.e. marine and terrestrial).
ImpactStatuses are required to support the following general Concepts:
ImpactStatuses support the following additional Concepts:WriteModelFields($Database,5); ?>
Note: we need more quantifiable terms here
DispersalStatus represents the type of Dispersal a species is having on a habitat. Multiple DispersalStatuses should be provided for species that Dispersal multiple habitats (i.e. marine and terrestrial).
3.6.1 DispersalStatus Concepts
DispersalStatuses are required to support the following general Concepts:
DispersalStatuses support the following additional Concepts:WriteModelFields($Database,7); ?>
3.7.1 ManagementStatus Concepts
ManagementStatus represents the type of management activities involved with a species in a specified area. Multiple ManagementStatuses should be provided for if multiple activities are engaged in the same time period.
ManagementStatuses are required to support the following general Concepts:
ManagementStatuses support the following additional Concepts:WriteModelFields($Database,4); ?>
Carlton, J.T. and G.M. Ruiz. 2002. Principles of Vector Science and Integrated Vector Management. In Mooney, H. et al. (eds.) Best Practices for the Prevention and Management of Alien Invasive Species. Island Press
IUCN 2000. Guidelines for the prevention of biodiversity loss due to biological invasion. IUCN – The World Conservation Union, Gland, Switzerland
Appendix A - Issues
See the protocol issues in the technical documentation
Appendix B - Changes