opendatahub.it is an indexing platform for open datasets (Open Data) available in Italy.
The index and information about the datasets are compiled and kept updated by a search and data enrichment engine created by SciamLab and called Amaca.
Amaca uses Apache Hadoop to process and maintain the catalogue update cycle within the limits of a few hours. In the elaboration, which is done through MapReduce, algorithms and learning techniques have been employed to analyse the Italian texts and to produce and enrich automatically the datasets metadata, making the dataset research simpler and more effective for the end user.
Besides, Amaca provides the extraction of part of the metadata from Public Administrations and Public and Private Organizations/Companies using the available public APIs or, when not available, through the extraction of the information directly from the HTML code.
In the project OpenDataHub, other than the Amaca Platform core, the Amaca Open Data and Amaca Premium specialized modules have been used, which include the connectors to the following realm/API:
|CKAN||CKAN API v1/v2||Full support|
|CKAN||CKAN API v3||Full support, including API used by CKAN extensions|
|Socrata||Socrata Open Data API (SODA)||Full support|
|Open Data Protocol||Open Data Protocol (OData)||Supported only OData Atom v4.0.|
|Google API||Support for the following API:|
|RSS||RSS 2.0 Feed||Full support|
The internal data model employed by Amaca complies with DCAT format and supports the DCAT-AP Application Profile for interoperability between European portals in which precisely define the minimum set of information that must be present in the descriptive metadata of open datasets.
The architecture of the Open Data Hub platform is illustrated in the figure below:
In addition to the Public Administrations we added additional sources including public available content on the network but not necessarily classified as open data. Examples of public available data deliberately opened by whom have created or published them are: Web tables, Fusion Tables and others.