.. _services_harvesting:
Harvesting services
===================
This section describes the services used to create, update and manage GeoNetwork
harvesters. These services allow complete control over harvester behaviour.
Authentication is required for all services described in this section. In addition, these services can only be run by users with the **Administrator** profile.
.. index:: xml.harvesting.get
Get harvester definitions (xml.harvesting.get)
----------------------------------------------
Retrieves information about one or all configured harvesters.
Request
```````
Called without parameters, this service returns all harvesters. Example::
Otherwise, an **id** parameter can be specified to request the definition of a specific harvester instance::
123
Response
````````
When called without parameters the service returns HTTP status code 200 along
with an XML document with all harvester instances. The XML document has a root element called ``nodes`` with a ``node`` child for each harvester.
**Example of an xml.harvesting.get response for a GeoNetwork harvester**::
test 10619cc50-708b-11da-8202-000d9335aaaehttp://www.fao.org/geonetworktruetruetruenone0 0 0/3 ? * *falseinactivefalsefalsefalse
If you specify a harvester **id** parameter in the request, then the XML document returned has a ``node`` root element that describes the harvester.
**Example of an xml.harvesting.get response for a WebDAV harvester**::
test 10619cc50-708b-11da-8202-000d9335aaaehttp://www.mynode.org/metadatadefault.gifadminadmin0 0 0/3 ? * *falsefalsetrueinactivefalse
Each harvester has some common XML elements, plus
additional elements that are specific to each harvesting type.
The common XML elements are described at :ref:`harvesting_nodes`.
If an error occurred then HTTP status code 500 is returned along with an XML document which contains details of what went wrong. An example of such an error response is:
::
Object not foundObjectNotFoundEx
.....
See :ref:`exception_handling` for more details.
Errors
``````
- **ObjectNotFoundEx** If a harvester definition with the specified **id**
cannot be found.
.. index:: xml.harvesting.add
.. _xml.harvesting.add:
Create harvester instance (xml.harvesting.add)
----------------------------------------------
Create a new harvester. The harvester can be of any type supported by
GeoNetwork (see :ref:`harvesting_nodes` for a list). When a new harvester
instance is created, its status is set to inactive.
A call to the ``xml.harvesting.start`` service is
required to set the status to active and run the harvester at the scheduled
time.
Request
```````
The service requires an XML tree with all information about the harvesting node to be added. The common XML elements that must be in the tree are described at :ref:`harvesting_nodes`. Settings and example requests for each type of harvester in GeoNetwork are as follows:
- :ref:`geonetwork_harvesting`
- :ref:`webdav_harvesting`
- :ref:`csw_harvesting`
- :ref:`z3950_harvesting`
- :ref:`oaipmh_harvesting`
- :ref:`thredds_harvesting`
- :ref:`wfsfeatures_harvesting`
- :ref:`filesystem_harvesting`
- :ref:`arcsde_harvesting`
- :ref:`ogcwxs_harvesting`
- :ref:`geoportal_rest_harvesting`
Summary of features of the supported harvesting types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
=============== ============== ================ ============
Harvesting type Authentication Privileges Categories
=============== ============== ================ ============
GeoNetwork native through policies yes
WebDAV HTTP digest yes yes
CSW HTTP Basic yes yes
=============== ============== ================ ============
Response
````````
If the request succeeds and the harvester instance is created, then HTTP status code 200 is returned along with an XML document containing the definition of the harvester as is described in the response section of the ``xml.harvesting.get`` service above.
If an error occurred then HTTP status code 500 is returned along with an XML document which contains details of what went wrong. An example of such an error response is:
::
Object not foundObjectNotFoundEx
.....
See :ref:`exception_handling` for more details.
.. index:: xml.harvesting.info
.. _xml_harvesting_info:
Get information for Harvester definition (xml.harvesting.info)
--------------------------------------------------------------
This service can be used to obtain information from the server that is relevant
to defining a harvester eg. harvester icons, stylesheets etc.
Request and Response
````````````````````
All requests must have a **type** parameter which defines the type of information required. The requests and responses for each value of the **type** parameter are:
.. _xml_harvesting_info&type=icons:
icons
^^^^^
Return the list of icons that can be used when creating a harvester instance. Icons are usually set in **site/icon** harvester setting.
POST Request Example::
icons
URL:
http://localhost:8080/geonetwork/srv/eng/xml.harvesting.info
Response Example::
wfp.gifunep.gifwebdav.gifgn20.gifthredds.gifwfs.gifcsw.giffilesystem.giffao.gifdefault.gifZ3950.gifoai-mhp.gifesri.gif
.. _xml_harvesting_info&type=importStylesheets:
importStylesheets
^^^^^^^^^^^^^^^^^
Return the list of stylesheets that can be used when creating a harvester instance. The ``id`` element in the response can be used in the **content/importxslt** harvester setting for those harvesters that support it.
POST Request Example::
icons
URL:
http://localhost:8080/geonetwork/srv/eng/xml.harvesting.info
Response Example::
ArcCatalog8_to_ISO19115.xslArcCatalog8_to_ISO19115CDMCoords-to-ISO19139Keywords.xslCDMCoords-to-ISO19139Keywords
.....
.. _xml_harvesting_info&type=oaiPmhServer:
oaiPmhServer
^^^^^^^^^^^^
Request information about the sets and prefixes of an OAIPMH server. This request requires an additional url attribute on the type parameter specifying the name of the OAIPMH server to query.
POST Request Example::
oaiPmhServer
URL:
http://localhost:8080/geonetwork/srv/eng/xml.harvesting.info
Response Example::
iso19115fgdc-stdiso19139csw-recordiso19110dublin-coreoai_dcmapsdatasets
......
.. _xml_harvesting_info&type=wfsFragmentSchemas:
wfsFragmentSchemas
^^^^^^^^^^^^^^^^^^
Return list of schemas that have WFS Fragment conversion stylesheets. These stylesheets are stored in the ``WFSToFragments`` directory in the ``convert`` directory of a metadata schema. eg. for schema iso19139 this directory would be ``GEONETWORK_DATA_DIR/config/schema_plugins/iso19139/convert/WFSToFragments``.
POST Request Example::
wfsFragmentSchemas
URL:
http://localhost:8080/geonetwork/srv/eng/xml.harvesting.info
Response Example::
iso19139iso19139
.. _xml_harvesting_info&type=wfsFragmentStylesheets:
wfsFragmentStylesheets
^^^^^^^^^^^^^^^^^^^^^^
Return WFS Fragment conversion stylesheets for a schema previously returned by the request type ``wfsFragmentSchemas`` described above. These stylesheets are stored in the ``WFSToFragments`` directory in the ``convert`` directory of a metadata schema. eg. for schema iso19139 this directory would be ``GEONETWORK_DATA_DIR/config/schema_plugins/iso19139/convert/WFSToFragments``.
POST Request Example::
iso19139wfsFragmentStylesheets
URL:
http://localhost:8080/geonetwork/srv/eng/xml.harvesting.info
Response Example::
deegree22_philosopher_fragments.xsldeegree22_philosopher_fragmentsiso19139geoserver_boundary_fragments.xslgeoserver_boundary_fragmentsiso19139
.. _xml_harvesting_info&type=threddsFragmentSchemas:
threddsFragmentSchemas
^^^^^^^^^^^^^^^^^^^^^^
Return list of schemas that have THREDDS Fragment conversion stylesheets. These stylesheets are stored in the ``ThreddsToFragments`` directory in the ``convert`` directory of a metadata schema. eg. for schema iso19139 this directory would be ``GEONETWORK_DATA_DIR/config/schema_plugins/iso19139/convert/ThreddsToFragments``.
POST Request Example::
threddsFragmentSchemas
URL:
http://localhost:8080/geonetwork/srv/eng/xml.harvesting.info
Response Example::
iso19139iso19139
.. _xml_harvesting_info&type=threddsFragmentStylesheets:
threddsFragmentStylesheets
^^^^^^^^^^^^^^^^^^^^^^^^^^
Return WFS Fragment conversion stylesheets for a schema previously returned by the request type ``threddsFragmentSchemas`` described above. These stylesheets are stored in the ``ThreddsToFragments`` directory in the ``convert`` directory of a metadata schema. eg. for schema iso19139 this directory would be ``GEONETWORK_DATA_DIR/config/schema_plugins/iso19139/convert/ThreddsToFragments``.
POST Request Example::
iso19139threddsFragmentStylesheets
URL:
http://localhost:8080/geonetwork/srv/eng/xml.harvesting.info
Response Example::
netcdf-attributes.xslnetcdf-attributesiso19139thredds-metadata.xslthredds-metadataiso19139
.. _xml_harvesting_info&type=ogcwxsOutputSchemas:
ogcwxsOutputSchemas
^^^^^^^^^^^^^^^^^^^
Return list of schemas that have GetCapabilities conversion stylesheets for a particular three letter OGC service type code. These stylesheets are stored in the ``OGCWxSGetCapabilitiesto19119`` directory in the ``convert`` directory of a metadata schema. eg. for schema iso19139:
- the directory for these stylesheets would be ``GEONETWORK_DATA_DIR/config/schema_plugins/iso19139/convert/OGCWxSGetCapabilitiesto19119``
- if a conversion from the GetCapabilities statement of a particular OGC service to a metadata record of this schema exists, then a stylesheet for that serviceType will be present in the directory eg. for schema iso19139 and serviceType ``WFS``, the conversion stylesheet name would be ``OGCWFSGetCapabilities-to-ISO19119_ISO19139.xsl``
POST Request Example::
ogcwxsOutputSchemasWFS
URL:
http://localhost:8080/geonetwork/srv/eng/xml.harvesting.info
Response Example::
iso19139iso19139
Errors
``````
If an error occurred then HTTP status code 500 is returned along with an XML document which contains details of what went wrong. An example of such an error response is:
::
typeBadParameterEx
.....
See :ref:`exception_handling` for more details.
Update a Harvester Instance (xml.harvesting.update)
---------------------------------------------------
This service can be used to change the parameters of a harvester instance.
.. note:: You cannot change the harvester type.
Request
```````
The simplest way to use this service is to:
#. use the ``xml.harvesting.get`` service to obtain the XML definition of the harvester that you want to update.
#. modify the parameters as required.
#. call this service with the modified XML definition of the harvester as the request.
The XML request is the same as that used in ``xml.harvesting.add``.
Response
````````
If the update succeeded then HTTP status code 200 is returned along with an XML document containing the harvester definition as supplied in the request.
If an error occurred then HTTP status code 500 is returned along with an XML document which contains details of what went wrong. An example of such an error response is:
::
Object not foundObjectNotFoundEx
.....
See :ref:`exception_handling` for more details.
.. index:: xml.harvesting.remove
.. index:: xml.harvesting.start
.. index:: xml.harvesting.stop
.. index:: xml.harvesting.run
Control or Remove a Harvester Instance (xml.harvesting.remove, xml.harvesting.start, xml.harvesting.stop, xml.harvesting.run)
-----------------------------------------------------------------------------------------------------------------------------
These services are described in on section because they share a common request
interface. Their purpose is to remove, start, stop or run a harvester:
#. **remove**: Remove a harvester. Deletes the harvester instance.
#. **start**: When created, a harvester is in the inactive state. This operation makes it active which means it will be run at the enxt scheduled time.
#. **stop**: Makes a harvester inactive - it will no longer be executed at the scheduled time. Note this will *not* stop a harvester that is already performing a harvest.
#. **run**: Start the harvester now. Used to test the harvesting.
Request
```````
A set of ids to operate on. Example::
123456789
Response
````````
Similar to the request but every id has a status attribute indicating the
success or failure of the operation. For example, the response to the
previous request could be::
123456789
The table below summarises, for each service, the
possible status values.
.. |ok| image:: button_ok.png
================ ====== ===== ==== ====
Status value remove start stop run
================ ====== ===== ==== ====
ok |ok| |ok| |ok| |ok|
not-found |ok| |ok| |ok| |ok|
inactive |ok|
already-inactive |ok|
already-active |ok|
already-running |ok|
================ ====== ===== ==== ====
If the request has no id parameters, an empty response is returned.
Most errors relating to a harvester specified in the request (eg. harvester id not found) are returned as status attributes in the response. However, exceptions can still occur, in which case HTTP status code 500 is returned along with an XML document which contains details of what went wrong. An example of such an error response is:
::
Service not allowedServiceNotAllowedEx
.....
See :ref:`exception_handling` for more details.
.. index:: xml.harvesting.history
.. _xml.harvesting.history:
Retrieve Harvesting History (xml.harvesting.history)
----------------------------------------------------
This service can be used to retrive the history of harvest sessions for a
specified harvester instance or all harvester instances. The harvester history
information is stored in the GeoNetwork database in the HarvestHistory table.
Request
```````
Called without an **id** parameter, this service returns the harvest history of all harvesters. The response can be sorted by harvest *date* or by harvester *type*. The sort order is specified in the parameter **sort**. Example::
date
Otherwise, an **id** parameter can be specified to request the harvest history of a specific harvester instance. In this case the sort order is by *date* of harvest::
123
Response
````````
If the update succeeded then HTTP status code 200 is returned along with an XML document containing the harvest history. The response for both types of requests is the same except that the response to a request for the history of a specific harvester will only have history details for that harvester. An example of the response is::
12013-01-01T19:24:54b6a11fc3-3f6f-494b-a8f3-35eaadced575test plajageonetworkn55000000
.....
.....
.....
date
Each **record** element in the embedded **response** element contains the details of a harvest session. The elements are:
- **id** - harvest history record id in harvesthistory table
- **harvestdate** - date of harvest
- **harvesteruuid** - uuid of harvester that ran
- **harvestername** - name of harvester (Site/Name parameter) that ran
- **harvestertype** - type of harvester that ran
- **deleted** - has the harvester that ran been deleted? 'y' - yes, 'n' - no
- **info** - results of the harvest. May contain one of the following elements:
- **result** - details of the successful harvest (a harvester dependent list of results from the harvest)
- **error** - an exception from an unsuccessful harvest - see :ref:`exception_handling` for content details of this element
- **params** - the parameters that the harvester had been configured with for the harvest
After the embedded **response** element, the currently configured harvesters are returned as **node** children of a **nodes** element - see :ref:`xml.harvesting.add` for references to each of the harvester types that can be returned here.
If an error occurred then HTTP status code 500 is returned along with an XML document which contains details of what went wrong. An example of such an error response is:
::
Object not foundObjectNotFoundEx
.....
See :ref:`exception_handling` for more details.
.. index:: xml.harvesting.history.delete
Delete Harvesting History Entries (xml.harvesting.history.delete)
-----------------------------------------------------------------
This service can be used to delete harvester history entries from the harvesthistory table in the GeoNetwork database.
Request
```````
One or more **id** parameters can be specified to request deletion of the harvest history entries in the harvesthistory table. The **id** element values can be obtained from :ref:`xml.harvesting.history`::
12
Response
````````
If successful then HTTP status code 200 is returned along with an XML document with details of how many harvest history records were successfully deleted. An example of this response is::
2
.. note:: If records with the id specified in the parameters are not present, they will be quietly ignored.
If an error occurred then HTTP status code 500 is returned along with an XML document which contains details of what went wrong. An example of such an error response is:
::
Service not allowedServiceNotAllowedEx
.....
See :ref:`exception_handling` for more details.