Volume 19, Number 5/6
Table of Contents
Unlocking Open Educational Resources (OERs) Interaction Data
David Massart, Elena Shulman
Point of Contact: David Massart, firstname.lastname@example.org
Each time a teacher or a learner interacts with an Open Educational Resource (OER), these interactions produce data. This "interaction data" includes "artifact data" routinely captured during any online interaction by Web server logs (e.g., users' browsers, users' IP addresses) and "social data" created during Web 2.0-style interactions with resources (e.g., tags, comments, ratings, favorites). Interaction data can serve a number of purposes in a period of increased interest worldwide in OERs quality and uptake. First, interaction data is a valuable source of analytics about OERs and typical audience profiles. Second, combined with metadata, interaction data can enhance searching, ranking, and recommendations of learning resources. However, obtaining this data is not always easy since OERs, in particular, are generally dispersed among different systems where the interactions between resources and their users take place. This paper describes approaches to unlocking, collecting and aggregating this interaction data.
Open Educational Resources (OERs) are digital learning resources made freely available online under open licenses. Most OERs are educational content like simulations, animations, videos, lesson plans and educational games. Although they can be embedded into learning management systems, they are generally online objects that can be viewed, downloaded or played from a web server. Their increasing number and popularity have led to the creation of online catalogs that reference them. These catalogs consist of searchable collections of metadata (i.e., machine-readable descriptions of the OERs). Metadata aims at facilitating the management, discovery, and exchange of OERs and at allowing users of these resources (e.g., teachers, learners) to more easily evaluate their usefulness. Typical metadata is generated by indexers (humans, software, or a combination of both). They look at a resource and its context for information that describes it and use this information to create a metadata record. Metadata in a catalog is obtained either by creating new records (i.e., describing OERs) or by exchanging already existing records with other catalogs. These exchanges are enabled by the use of standard metadata formats (e.g., Dublin Core, IEEE LOM ) and standard metadata exchange protocols (e.g., OAI-PMH , SPI ).
Each time a teacher or a learner interacts with an OER, these interactions produce data. This "interaction data" includes "artifact data" routinely captured during any online interaction by Web server logs (e.g., users' browsers, users' IP addresses) and "social data" created during Web 2.0-style interactions with resources (e.g., tags, comments, ratings, favorites). This interaction data can serve a number of purposes. Interaction data is a very valuable source of analytics about OERs and typical audience profiles. Moreover, combined with metadata, interaction data can be used to enhance searching, ranking, and recommendations of OERs. However, obtaining this data is not always easy since OERs are generally dispersed among different systems where the interactions between resources and their users take place. Moreover, in most countries, interaction data is governed by privacy protection laws that restrict the way it can be stored, collected, exchanged, and used. Nevertheless, having a way to assess the quality of OERs by collecting data indicating their actual uptake and to understand which OERs are of most relevance for particular regions is of critical importance for current OER initiatives worldwide .
This paper explores different methods for overcoming barriers in collecting and exchanging interaction data. From a strictly technical point of view, the exchange of interaction data requires the participating systems to agree on a common data format and data exchange protocol. The main limitation of these approaches comes from the fact that they require the active collaboration of the systems where the interaction data is produced. Each of them has to capture interaction data, export it into the desired format and publish/expose it using one or more of the available protocols. All of this is rather cumbersome and usually of limited interest for most of these systems since there is little incentive to share interaction data with metadata catalogs of OERs.
Using the Learning Resource Exchange (LRE) as a case study, this paper explores ways for catalogs of OERs to collect and exchange interaction data. This approach makes it possible to minimize the data sharing burden on systems where teachers and learners interact with educational resources (e.g., websites where the resources are hosted, Moodle instances).
This paper is divided into sections exploring the best ways to exchange two categories of interaction data: social data and artifact data. Section 1 provides an overview of the Learning Resource Exchange, which is used as an illustrative example for the discussion. Section 2 presents a typology of interaction data and specifications to encode it. Section 3 discusses possible mechanisms for capturing and exchanging social data among catalogs of OERs. Section 4 proposes a non-invasive way for catalogs to capture artifact data and briefly describes how to aggregate metadata and interaction data of different kinds using the facet mechanism of Information for Learning Object eXchange (ILOX) , which is part of the IMS Learning Object Discovery and Exchange (LODE) specification.
1. Learning Resource Exchange
The Learning Resource Exchange (LRE) from European Schoolnet is a service that enables schools to find OERs from many different countries and providers. The principle upon which the LRE is based is very simple. The LRE collects descriptions (i.e., metadata) of OERs and compiles them into a searchable catalog that can be consulted by users of connected e-learning platforms.
Figure 1: Overview of the Learning Resource Exchange infrastructure
As depicted on Figure 1, the LRE relies on several approaches to obtain metadata. Metadata is acquired from existing metadata repositories, automatically generated, or manually produced by human indexers. This metadata is then compiled into the LRE catalog and, using various web services, exposed to educational portals, mobile platforms, virtual learning environments, interactive whiteboards, and to a widget that can be embedded in any html document (e.g., blog, email, web page). Connecting such platforms and tools to the LRE allows all the different platform users to consult the LRE catalog, and to discover and obtain OERs in their native environments. It is important to note that the role of the LRE technical infrastructure is limited to the handling of metadata. This infrastructure is not involved in the actual exchange of OERs between their online locations (referenced in the LRE metadata) and the users' platforms. The LRE brings users to the providers and all further actions involving the use of the resource such as downloading, interacting with applets or playing videos or games occurs on the content providers' or users' environments. Thus the LRE has no way to access interaction data since it is always external to the LRE and takes place where users interact with educational resources. Lacking interaction data on the actual uptake of OER's in the LRE catalog makes it nearly impossible for the LRE to determine which resources are generating the most interest and to determine if some resources are of particular interest in one region over another. Understanding which resources are used (downloaded, played, etc.), and where, would better inform the curation and acquisition strategies for large multinational and multilingual catalogs such as the LRE.
2. Interaction Data and Its Aggregation
In the context of this paper, interaction data is data generated from an interaction between a user and an OER. Any interaction between a user (described in a user profile) and a resource (typically described in a metadata record) can be modeled as a relationship between the resource and its user. Instances of this relationship are atomic descriptions of interactions: "user X viewed resource Y on Fri. Mar. 22, 2013 at 2:54pm", "user Z tagged resource W with tag 'algebra' on Fri. March. 22, 2013 at 2:55pm". They can include more or less information about the context in which an interaction occurs.
Depending on one's objectives and interests, atomic interaction data can be either used as such, or be aggregated in different ways:
- By couple user/resource: "In March 2013, user X viewed resource Y 5 times";
- By user: "In March 2013, user X viewed 5 resources about geometry and 3 about algebra all in English for 6th grade";
- By resource: "In March 2013 resource W was bookmarked by 153 users."
The latter is a way to anonymize data and to overcome some of the legal issues related to privacy.
Relatively few open specifications are available for modeling and exchanging interaction data. One approach is Contextualized Attention Metadata (CAM) that can be used to represent atomic interaction data. The CAM specification was originally proposed for capturing behavioral information about learners in learning contexts. However, as shown in the examples above, atomic interaction data is personal data that requires some kind of identification of both the user and the resource. Thus its capture and its use potentially raise privacy issues and, in practice, atomic interaction data is rarely exchanged.
In 2010, as part of their Stem Exchange initiative, the U.S. National Science Digital Library (NSDL) released a "paradata" data model to "capture a user activity related to a resource that helps to elucidate its potential educational utility" (i.e., interaction data aggregated by resources). The paradata model comes with an XML binding. In a paradata record, a resource can be identified using either an identifier or its location, which allows for relating paradata and metadata records that refer to the same resource. As another option, the Learning Registry proposes to model interaction data as activities that are encoded using a JSON binding . The resulting JSON document is one of the categories of "resource data" that can be exchanged by submitting and retrieving from the Learning Registry a "resource data description" that references it.
Alternatively, Information for Learning Object eXchange (ILOX) , which is part of the IMS Learning Object Discovery and Exchange (LODE) specification, provides an XML framework that combines metadata and paradata about a resource as different facets of a single ILOX record and can therefore be handled as a whole. The latter solution is used by the LRE federation of learning resource repositories that relies on the Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) to exchange ILOX records between the participating repositories.
3. Social Data Capture and Exchange
In addition to the simple consultation of the LRE catalog of OERs, most portals connected to the LRE offer Web 2.0-like functionalities permitting users to personalize their information retrieval experience and interact with each other. For example users bookmark and tag OER descriptions and thus exchange their feedback with members of virtual communities.
Figure 2: Screenshot of the LRE for schools portal, resource detail and social data
The LRE for schools portal is an example of an educational portal that offers Web 2.0 functionalities. As shown on the screenshot of Figure 2, when consulting a resource description, users are offered various ways of personalizing search results and metadata descriptions. They can:
- Rate and comment on the resource (1) and consult comments and ratings left by others (2);
- Tag the resource, consult the tags left by others, and search using these tags (3);
- Favorite (bookmark) the resource (4), see who else has favorited it (5), and look at what other resources they have favorited;
- Send the resource description to a friend (6).
As shown on Figure 3, the social data generated by these types of Web 2.0 activities was previously stored locally by the portal where the activities took place. For example, if an LRE user rated an online game, these ratings were stored locally in the LRE portal and not with the OER and thus were not visible when the same game was accessed from another portal. Therefore, this kind of social data was only useful locally (in our example, in the LRE portal) to rank results during searches for resources (e.g., resources with the highest ratings appear at the top of the result list) or to recommend resources. However, neither other portals consuming LRE's metadata nor the OER's provider could benefit from this social data. This also made it more difficult to reach the critical mass of social data necessary to build useful rankings and recommendations for a wider community.
Figure 3: Social Data previously stored only locally where it was produced
To unlock this social data, we introduced an LRE Social Data Manager (SDM). It consists of a set of REST services to centrally manage social data on behalf of LRE catalog's clients.
Figure 4: The LRE Social Data Manager
As suggested by the diagram of Figure 4, the role of the SDM is threefold:
- It is used by various LRE clients to store, manage, and retrieve their users' favorites, tags, ratings, and comments. This presents several advantages. For example, it makes it possible for people to interact with a larger community of users producing and exchanging more user-generated content. Once properly authenticated, it also lets users access their favorite resources from any educational system connected to the LRE and its SDM;
- It indexes social data in order to provide additional search (e.g., search by tags) and sorting (e.g., highest ratings appear at the top of the result lists) criteria that can be combined with the metadata-based criteria supported by the LRE indexes;
- It aggregates social data per resource and makes it available to the LRE catalog.
However, this all-in-one solution has drawbacks stemming from the differences between its functionalities and the management constraints associated with these functionalities. On the one hand, the SDM is used to manage users' personal data that may only be modified by its authenticated owner and has to be created, retrieved, updated, or deleted by portals in real time. On the other hand, this data is aggregated per resource (which made it anonymous) and used to create indexes that must support efficient searches.
These constraints required a new social data architecture. The solution is presented in Figure 5, which shows the Social Data Manager broken up into two different systems:
- A new LRE Social Data Manager specifically designed for efficiently managing social data and the real-time and authorized accesses to this data by the LRE portals and their users.
- A Social Data Collector optimized for importing anonymized data originating from both the Social Data Manager and external portals1; for indexing this merged data and using it to improve the LRE searching and ranking capabilities.
The Social Data Collector can also be used to expose social data produced by the LRE Social Data Manager and other learning environments to third parties (e.g., OER content providers).
Figure 5: Refactoring the LRE Social Data Manager
4. Collecting Artifact Data
The LRE is an infrastructure that makes it possible for users of autonomous systems connected to the LRE to discover and exchange OERs stored on systems that are also outside of the LRE. As a consequence, the LRE has no way of knowing when an LRE user interacts with a resource discovered via the LRE, and valuable information (i.e., artifact data typically collected by web server logs or Google analytics) is lost. In order for the LRE to collect this artifact data, we introduced an "LRE Proxy" between the resources referenced in the LRE and their users. This proxy is very similar to URL shorteners such as goo.gl or tinyurl.com. LRE "short" URLs are used in the LRE metadata in place of the actual resource locations. Each time users choose to access a resource they discovered by consulting the LRE catalog, they contact the LRE Proxy which captures artifact data. The LRE Proxy in turn redirects the users to the actual resources. As suggested on the diagram of Figure 6, the artifact data collected using this mechanism is added to the LRE catalog where it is encoded as NSDL paradata records.
Figure 6: Capturing artifact data
A drawback of this approach comes from the fact that it introduces a single point of failure and thus requires that sufficient resources are provided to guarantee the proxy's performance and availability of services. Note that this approach can also be used to improve users' experiences. When there are multiple copies of the same resource at different locations, these locations can be tested for availability. This makes it possible for the proxy to direct users to a location where a copy is available and to avoid sending users to broken urls.
The LRE's application profile makes it possible to use the information generated by the LRE Proxy to assess which kinds of resources are popular in which regions. The diagram of Figure 7 summarizes the main mechanisms that the LRE has put in place to collect various types of information (e.g., metadata, social data, and artifact data) about the OERs described in its catalog. In the LRE, this information is organized according to the LRE Metadata Application Profile , which is based on the IMS LODE Information for Learning Object eXchange (ILOX) specification . ILOX can be viewed as a container in which multiple (meta)data specifications are organized according to a FRBR  hierarchy (Functional Requirements for Bibliographic Records specification)2. The combination of the LRE proxy and the FRBR data model enables the production of handy analytics that can be used to compare the popularity and usage of different versions and formats of a given resource by different group of users. This information will make it possible to identify collections exchanged outside of the LRE infrastructure that are in high demand and those that are typically passed over by users. Knowing that a collection is popular among target audiences can be of great help in ongoing curation and acquisition decision-making.
Figure 7: Collecting and integrating metadata and interaction data
There are many ways this kind of information can be used. For instance, interaction data makes it possible for content providers and repository owners to tailor the packaging and marketing of resources for particular regions. It makes it possible to track how current efforts to improve OERs uptake are succeeding and in what regions. Having interaction data for OERs makes it also possible to identify the most popular OERs (worldwide, continent, region, country, etc.) relying on crowdsourcing to sift out the best OERs from an ever-growing, global body of resources.
This paper presents tools and approaches to capture, collect, and exchange different types of interaction data that are usually out of reach for systems built by aggregating metadata from different sources. OERs rarely provide comprehensive data about their uses and users. Even when OER's are described in metadata catalogues such as the LRE, interaction data tends to remain isolated in the silos where it was produced (e.g., portals, learning management systems). This paper also presents solutions to unlock this interaction data making, it possible to exploit this data for improved OER search experiences. It also describes how to combine interaction data and metadata in a consistent and semantically rich way using the IMS ILOX specification described in a previous paper .
Interaction data is produced when a user interacts with a resource. Importing and aggregating such data in a meaningful way requires mechanisms to unambiguously identify both the resources and their users, which raises obvious privacy issues beyond the technical and organizational challenges. The issue of reliably identifying resources and users, as well as the issue of privacy, is still largely unresolved, and poses infinitely complex challenges for a global OERs exchange context, given the different laws in each jurisdiction.
Associated with metadata, interaction data enables catalogs to improve the curation, searching, ranking, and recommending of OERs. These tools and techniques have particular relevance in the present day. Given the current international momentum to assess and promote the uptake of OERs, better data on which OERs are most likely to be used is required and highly desirable. The combination of interaction data and ILOX provides a valuable source of analytics of OERs' audience preferences and helps to identify quality resources by crowdsourcing. It also makes it possible to measure the impacts of marketing campaigns for the uptake of OERs and track shifts in educational policies on OERs globally.
1 The LRE catalog references collections of OERs that are also used by other (i.e., non-LRE) portals that, in some cases, share the anonymized social data generated by their users. Taking into account all the social data available enables more accurate searching, ranking, and recommendations for learning resources.
2 In FRBR, an educational resource, called a "work", can have several versions (e.g., an English version, a French version), called "expressions". Each expression can be available in multiple formats (e.g., as a SCORM package, as an IMS Common Cartridge package), called "manifestations". Eventually, there may exist multiple copies, called "items", of a manifestation. ILOX has a mechanism that makes it possible to describe each FRBR level with multiple facets (e.g., a facet for the main descriptive metadata, a facet for social data, a facet for artifact data).
 IEEE Standards Department. IEEE 1484.12.1-2002, Learning Object Metadata Standard. July 2002.
 C. Lagoze, H. Van de Sompel, M. Nelson, and S. Warner. The open archives initiative protocol for metadata harvesting version 2.0. Document Version 2004/10/12T15:31:00Z., 2002.
 S. Ternier, D. Massart, M. Totschnig, J. Klerkx, and E. Duval. The Simple Publishing Interface (SPI). D-Lib Magazine, 16(9/10), September/October 2010. http://dx.doi.org/10.1045/september2010-ternier.
 UNSECO. 2012 Paris OER Declaration.
 D. Massart, E. Shulman, N. Nicholas, N. Ward, and F. Bergeron. Taming the metadata beast: ILOX. D-Lib Magazine, 16(11/12), November/December 2010. http://dx.doi.org/10.1045/november2010-massart.
 Modeling Paradata and Assertions as Activities V2.0.1.
 D. Massart and E. Shulman. Learning Resource Exchange Metadata Application Profile version 4.7. European Schoolnet, October 2011.
 IFLA Study Group on the Functional Requirements for Bibliographic Records. Functional requirements for bibliographic records: final report, volume 19 of UBCIM publications, new series. K.G. Saur, München, 1998.
About the Authors
David Massart is a technical strategist with a PhD in information modeling and object-oriented design. He is an expert in technical and semantic interoperability, leading several open standards groups for interoperable systems integration, authoring international API and metadata standards and specifications. Dr. Massart co-founded ZettaDataNet to work on innovative information architectures such as those described in this article.
Elena Shulman, PhD, MLIS, is an internationally recognized expert on interoperability, controlled vocabularies and digital asset management. She has extensive experience in managing premier information retrieval infrastructures both in Europe and the United States. Dr. Shulman is guiding research, services and quality control since co-founding ZettaDataNet.