opendatahub.it is an indexing platform for open datasets (Open Data) available in Italy.

The index and information about the datasets are compiled and kept updated by a search and data enrichment engine created by SciamLab and called Amaca.

Amaca uses Apache Hadoop to process and maintain the catalogue update cycle within the limits of a few hours. In the elaboration, which is done through MapReduce, algorithms and learning techniques have been employed to analyse the Italian texts and to produce and enrich automatically the datasets metadata, making the dataset research simpler and more effective for the end user.

Besides, Amaca provides the extraction of part of the metadata from Public Administrations and Public and Private Organizations/Companies using the available public APIs or, when not available, through the extraction of the information directly from the HTML code.

In the project OpenDataHub, other than the Amaca Platform core, the Amaca Open Data and Amaca Premium specialized modules have been used, which include the connectors to the following realm/API:

RealmAPISupport Type
CKANCKAN API v1/v2Full support
CKANCKAN API v3Full support, including API used by CKAN extensions
SocrataSocrata Open Data API (SODA)Full support
Open Data ProtocolOpen Data Protocol (OData)Supported only OData Atom v4.0.
GoogleGoogle APISupport for the following API:
RSSRSS 2.0 FeedFull support

The internal data model employed by Amaca complies with DCAT format and supports the DCAT-AP Application Profile for interoperability between European portals in which precisely define the minimum set of information that must be present in the descriptive metadata of open datasets.

The internal model of dataset is easily interoperable with any platform and allows Amaca to publish the information in the most common open data catalogues as CKAN, Socrata, DataPublic, etc.

The architecture of the Open Data Hub platform is illustrated in the figure below:

In addition to the Public Administrations we added additional sources including public available content on the network but not necessarily classified as open data. Examples of public available data deliberately opened by whom have created or published them are: Web tables, Fusion Tables and others.