Model-Oriented Scientific Research Reports

Search D-Lib:

D-Lib Magazine

May/June 2011
Volume 17, Number 5/6
Table of Contents

Model-Oriented Scientific Research Reports

Robert B. Allen
College of Information Science and Technology, Drexel University
rba@boballen.info

doi:10.1045/may2011-allen

Printer-friendly Version

Abstract

While the familiar text-based scientific research report format has served well, it has shortcomings that would be ameliorated by a model-oriented approach. Although many aspects of scientific research reports are already structured, we propose extending structure as far as possible. Models which have been developed in computer and information science for structuring information would provide the framework for scientific research reports while text would play a smaller role. We introduce an approach to describing research design and provide a framework for conceptual process models. We also outline some possibilities for user interfaces and a library of research reports that leverage the model-oriented approach. Model-oriented research reports would improve the traditional IMRD format by providing much greater structure and specification of constraints.

1. From Text-Based Research Reports to Model-Oriented Research Reports

Scientific communication takes many forms but the research report holds a special place. Research reports are the primary record of research activities and are extensively used as teaching tools. However, while textual research reports allow a flexible and rich expressiveness, they have several limitations: (a) indexing for text depends on the terms which happen to be used; (b) textual research reports are not readily navigable by browsing; (c) information extraction from text is tedious and error-prone; (d) textual research reports do not readily support completeness and consistency checking; and (e) they must be translated across languages.

Early research reports were simply narrative descriptions. Structure has increasingly been used to address some of the limitations of that simple narrative text. By the 1950s, most research reports had adopted the IMRD (Introduction, Methods, Results, and Discussion) format [11]. Recently, structured abstracts (e.g., [1]), which impose specific topics to be addressed in the abstract, have been adopted in some fields. In addition to increasing structure in written reports, we see greater use of structure for data description and preservation (e.g., [7]). Similarly, workflows are increasingly common for describing and replicating aspects of scientific processing [12]; for instance, the Kepler Project allowed workflows to be specified for data analysis procedures.

Adding even more structure to research reports can further overcome some of the limitations of text and improve communication. We propose applying to scientific communication modeling techniques such as have been developed by computer and information science (CIS). A scientific research report has interlocking constraints from end to end; the research question affects the research design and the researchdesign in turn determines the data that is collected and the analysis of those data. A highly-structured research report (cf., [3]) would incorporate specification of the interlocking constraints and the constraints could be highlighted for human users. Because the research report would be highly structured, some consistency check would be possible but we do not emphasize formal compilation or validation of the relationships. The constraints would be flexible in the way that thesaurus terms are a simple knowledge organizing system. The limitations of text identified earlier could be ameliorated by this approach as it has often proved difficult to develop exact semantics for the real world.

We explore the possibility of model-oriented research reports by outlining a framework for them. In Section 2, we outline some of the major features of our approach. In Section 3, we consider variations of model-oriented research reports for different research paradigms. Section 4 discusses how the model-oriented research report approach might be deployed, from research using such reports, to libraries of model-oriented research reports and beyond. Section 5 is the conclusion. In the Technical Appendix, we describe a specific approach to conceptual process models.

2. Major Features and Implications of Model-Oriented Research Reports

There several ways in which models could be used to enhance research reports. In this section, we consider two major areas of innovation that form the basis of the approach. In Section 3, we present a comprehensive scenario for model-oriented research reports. As illustration, we use Pasteur's classic experiment for the causes of spoilage of nutrient-rich solutions such as broth, beer, and wine [8].

2.1. Toward A Notation for Research Designs

Research designs are highly structured. As an extreme case, a complex analysis of variance is often described with a model for nesting, repeated measures, etc. We seek to capture such structures and other aspects of research methods and to incorporate them as a part of model-oriented research reports.

Campbell and Stanley [5] use a simple notation to describe the sequences of manipulation and observation in experimental and quasi-experimental designs. Their purpose was to illustrate differences in a set of prototypical research designs; we use this notation as the basis for a simple language for describing the research design and extend it with additional operators such as initialization, randomization, timing, grouping, conditionals, repetition (i.e., looping), and data recording.

We apply this notation to Pasteur's experiment. Pasteur showed that broth would not spoil when air could reach it but microbe-laden dust could not. Spoilage occurred when dust was allowed to come into contact with the broth. For Pasteur's experiment, the basic research design was straightforward: (1) initialize the conditions (boil broth, place it in a flask and create the swan neck for the flask), (2) wait, observe and record, (3) manipulate (tilt the flask so that some of the broth flows into the swan neck and comes into contact with the dust), and (4) wait, observe and record.

2.2. Systematic Description of Conceptual Process Models

In general, conceptual models represent aspects of the world. They include any non-physical model and range greatly in detail. Some conceptual models simply describe entities and systems while other conceptual models incorporate interactions and processes. We are especially interested in the latter, which we term conceptual process models. The Technical Appendix elaborates on using entities/systems and interactions/processes as the bases for conceptual process models.

Articulating conceptual process models is an important, if sometimes implicit, activity for science [12] and scientific research attempts to establish, explore or validate them. In research reports, the conceptual models being tested are typically presented in the introduction, and described in a relatively predictable format. Swales [11] has described such a process of developing a hypothesis for testing as "creating a research space" and proposed a set of functional action units for accomplishing that. Swales's approach has been used to characterize textual genres. We propose that it be applied in conjunction with conceptual process models.

3. An Example of a More Complete Model-Oriented Research Report for Scientific Experiments

The previous section described distinctive features and implications of a model-oriented research report. A complete research report would comprise a unified wrapper with four components. Each component may have extensive textual commentary. These four components parallel the traditional IMRD framework but enhance it by providing much greater structure and specification of constraints. The elements of each of these components could be specified with a notation like XPath and XPointer which identify specific locations in XML documents.

3.1. Metadata, Research Question, and Background

The first component of a model-oriented research report provides the context of the report — metadata, research questions, and background. The metadata describes attributes of the report itself such as the author and date. The research question would be framed as a model, but unlike typical CIS models, it would be incomplete, indicating the question to be researched.

The research question is elaborated by linking to previous research reports. It suggests areas which require further investigation, and proposes an investigation strategy. This process of critiquing previous results would be accomplished with citation links and, ultimately, the conceptual process models to be tested would be generated. For Pasteur's microbe experiment (as introduced in Section 2), the research question was to identify the entities and processes which lead to spoilage of food products. Earlier experiments had shown that there was no spontaneous generation for large organisms such as flies but the source of the microbe which spoiled food products was still debated. The research question was incomplete as to processes.

3.2. Conceptual Process Models to be Tested, and Research Strategy and Design

The conceptual process models that are to be tested need to be elaborated, based on previous findings and on the specific research question. Further, the research strategy must be specified based on previous findings. For Pasteur, the critical issue was allowing air to enter the flask without also allowing microbes. Determining a suitable set of research manipulations can be a creative act, a form of abductive reasoning which is not automated in this approach but is documented along with the description of the components of the model.

3.3. Research Implementation/Execution, Results, and Analyses

Research strategies are high-level. The research implementation/execution provides the details of the research strategy and describes what was actually done in the research. For Pasteur, this includes descriptions of heating flasks and creating swan necks for the flasks. Observations about specific details of the implementation can be recorded. The results record data resulting from the research implementation/execution related to the research question. At this point, techniques from a wide variety of data preservation strategies could be introduced. For Pasteur's study, the data is primarily whether spoilage was observed or not. Research involving statistics might include data integration and/or analyses. Like text-based research reports, model-oriented research reports would also include observations beyond the immediate goals of the research but still be related to the conceptual model being tested.

3.4. Research Report Conclusions

Finally, the implications of the study would be examined and possibly extended. If no models had been proposed originally, then possible models consistent with the data could be explored at this point. For Pasteur's experiment, this section of a model-oriented research report might also include consideration of parameters for some of the treatments for contamination. For instance, the researcher might ask how much heating is needed to safely kill the microbes as Pasteur did when he developed Pasteurization.

4. Implications and Deployment

If it were implemented, this proposal would have many benefits and could substantially change the way that researchers access the scientific literature. Ultimately, we might imagine a set of services which allowed these reports to be part of a scientist's workbench.

4.1. Research Interaction with Model-Oriented Reports

One way the structure of the model-oriented reports could be used is to provide structure for browsing the reports. For instance, the models could be used for schematic visualization of the processes being researched. In the strong form of our approach, the models would be preeminent with minimal text. We could also imagine a hybrid approach in which both text and schematics of the conceptual process models were presented. The use of such schematics could make the reports more accessible for non-experts. Indeed, because they would have a systematic structure, the models might be used for tutorials in which levels of complexity were adapted to the users' backgrounds. User tools could also be developed for authoring model-oriented research reports and for browsing the library.

4.2. Research Report Library

Collected model-oriented research reports could form a highly interwoven library of research reports. The notion of a unified library of research literature is related to current digital libraries of articles linked by citations. Shum et al. [10] have proposed a library in which discourse claims of research articles would be linked. The approach proposed here, particularly the inclusion of process descriptions, should allow a much richer linking of research reports.

In our approach, the library would hold several types of content. First, it would archive completed, reviewed model-oriented research reports. Second, it would include master records for entities/systems and interactions/processes. As such, it might include a range of empirical values in addition to that accepted by consensus. The library would include a record of research reports which relate to a particular entity as well as its properties and dimensions, states, and their observed values. Third, the library would hold annotations, abstractions, classifications, and conceptual frameworks. In addition, it could also contain theoretical studies that combine information from other sources in the library. Fourth, it would include standard descriptions of instruments and procedures. The primary attribute would be the functionality — what the instrument measures. There would also be descriptions of standard research processes such as laboratory and analysis workflows.

Citations, which are links between or among objects in the library, are first-class objects. That is, they are stored separately from the reports they describe. They are links between reports and previous reports in the library. This is comparable to a citation network among collections of text documents. Ambiguity should be reduced because of the relative systematicity of the formal notation. There could be several explicit citation types (e.g. [14]). Furthermore, the links could be multi-headed to allow components to be combined as needed. Indeed, a citation to a new methodology might link to the workflow model for implementing that method. Such citations should be more useful than traditional citations because the anchors and the roles will be explicitly defined.

The library would be updated as new observations and results are added and components are refined. As described below, special users may provide additional consistency checks and updates for the contents of the library. In addition, it should be possible for readers to annotate entries in the library.

4.3. Deployment

To the extent that model-oriented research reports ameliorate the limitations of text-based research reports, the approach would provide better service to researchers and students. The advantages of the model-oriented approach will encourage journals to adopt it much the way that structured abstracts are being adopted. We expect deployment and adoption of model-oriented research reports to follow the familiar s-curve for new technologies.

The editorial processing of model-oriented research reports may be organized much the way editorial processing of text-based research reports is now organized, though the cleaner structure of model-oriented research reports and the ease of linking to other research reports may facilitate the review process. The library would be maintained through several channels. In many cases, there would be routine additions of new data when the library is updated upon the library editor's approval. In other cases, new findings might require substantially revising conceptual structures in the library. We also envision that theoreticians may recommend the addition or revision of entities/systems or interactions/processes within the library. There may be human "gardeners" who would participate in revisions, but, eventually sophisticated version management techniques would be required.

4.4. Applying the Model-Oriented Research Report Concept beyond Scientific Research Reports

Research is extremely varied. Nonetheless, model-oriented research reports might be used for other paradigms for scientific research and other types of research, beyond that exemplified by Pasteur's experiment.

Exploring Properties and States: Science often involves empirical observations of parameters for existing models, such as exploring melting points or mapping phase diagrams of substances. This is readily handled with the model-oriented approach by simply providing links to the entities whose properties are being studied and presumably to standard procedures for that type of investigation.

Describing Specific Natural Phenomena: Natural phenomena are often the result of combinations of processes (e.g., [4]). Analyses of such phenomena are often a form of forensics. In this case, the research often involves compiling and analyzing evidence in support of instantiations of conceptual processes. This process can be readily described by a model-oriented research report.

Induction of, Fitting, or Selecting Models from Observations: In some cases, so little is known about a phenomenon that there is no plausible conceptual process model to describe it that may be tested. In such cases, the best strategy may be simply to gather and then organize observations through induction. The model-oriented research report can provide a framework for describing such research.

Closely related to induction of process models is fitting models to data. Research employing structural equation models typically attempts to select among the possible models given the data. Usually, these structural equation models are applied to modeling systems. Structural equation models (e.g., [2]) are similar to conceptual process models and may be incorporated into model-oriented research reports. This fitting of models to data might also be done with a regression analysis which results in a regression equation, although this does not explicitly address the processes involved. A third related approach is simulation. While in some cases simulation is used simply to develop a visual animation of a system or phenomenon without full consideration for the underlying processes, in other cases those underlying processes are carefully modeled. Simulation can employ a broad range of specific models such as neural networks and autonomous agents.

Data-Driven Science: There has been considerable discussion of data-driven science in which data collection is separated from the research question. Typically, a large data set is made available as a resource for whatever questions may be presented. Descriptions from this research paradigm are easily handled within the model-oriented research report; there would simply be no conceptual process models to be tested and no manipulations in the research design.

Non-Scientific Research Procedures: Many activities beyond science that combine model-driven expectations, complex procedures, and rich data sets could be described in model-oriented research reports. For instance, medical tests are often conducted for the doctor to investigate specific hypotheses in a diagnosis about the patient's health. Presumably, the tests reflect instantiations of processes models that the physician believes may be relevant.

Conclusion

Recently, there has been a great deal of emphasis on systematizing the description, organization, and preservation of data in scientific research. We extend that effort to systematizing the reporting of the entire research process. This outline of a model-oriented research report and research report library suggests that this approach deserves further exploration. Developing such an approach would ameliorate many of the limitations of current text-based research reports.

A. Technical Appendix

Developing the parameters of conceptual process models is the most complex part of the proposal for model-oriented research reports. There are several ways that the conceptual process models could be implemented. Here, we explore one possible approach. At the top-level, the model is similar to a natural language statement with subject, verb, and object — the basis of process. The model is primarily a discrete (qualitative) rather than a continuous (quantitative) model, which is consistent with the extensive literature on qualitative models of cognition [6]. This model can be extended to include quantitative values, especially with respect to the Interactions of components of a system.

A.1. Entities and Systems

In this approach, entities and systems are conceptual models (rather than conceptual process models). We define entity descriptions as frames having properties and dimensions. Properties have a single value. Some properties are defining properties. For instance, gold atoms have a specific number of protons; that number is a defining property. Dimensions allow sets of states. For instance, gold atom electrons have quantum levels, with quantum levels generally being the dimension and the specific level an electron is in being a state. An instance of a gold atom will have specific properties (e.g., location) and dimensions (e.g., be in a specific quantum level). Entities can be related to other entities in a variety of ways. For instance, both gold atoms and a collection of gold atoms would be entities. Collections of gold atoms will have a dimension of phase with possible states of solid, liquid, or vapor. The full entity description for collections of atoms would include a phase transition table. Isotopes and isomers may be seen as distinct entities and as sub-divisions of higher-level entities. Entities can be organized into classes with associated properties. For example, the periodic table is a two-way classification that predicts atomic number. Other classification systems such as biological classifications are hierarchical and in those, properties may be inherited and/or more abstract.

We define systems as combinations of entities whose internal structure is not easily dissected. For instance, in Pasteur's experiment, when microbes reach the broth, it becomes a contaminated broth. The distinction between entities and systems can be subtle and is a matter of convention, community consensus, context, and editorial judgment. A system is sometimes a weak form of entity that also includes constituent entities. In some cases, for example, we can treat an atom as a distinct entity but in other cases, it is important to consider its constituents, in which case we focus on it as a system. In other cases, the system is a collection of related entities but is not itself considered a distinct entity. A black hole and its accretion disk form a natural system but not an entity. In still other cases, systems can be ad hoc, such as random mixtures of chemical solutions. A system can also be defined just by framing two or more apparently unconnected entities. For example, the gravitational interactions in the Earth-Jupiter system could be studied but we do not normally focus on that pairing. In such studies the components are often treated as entities. In some notable cases, such as relativity, the frame of reference itself can be re-conceptualized.

A.2. Interactions and Processes

Interactions are transitions involving entities and systems. They may include state transitions for a single entity or the creation, association, disassociation, or destruction of entity instances. For some interactions, there may be a functional relationship that determines the outcome. For instance, hydrogen and oxygen molecules interact to form water as well as release energy in an interaction that follows certain ratios. There may also be constraints on the entities involved for the interaction to occur. For instance, the entities may need to be in a certain state (gaseous when making water) and a spark (in the case of making water) or some other trigger or catalyst is needed for the interaction to occur. Or, as with the reaction of hydrogen and oxygen, the interaction may release energy. The effect of that released energy would generally be considered in the broader system in which the interaction takes place. There would be many cases where we explore systems within systems.

Just as types of entities can be related to each other, there are also families of interactions. Some of these are interactions which are common across a set of related entities. For instance, in chemistry there is a typology of interactions such as oxidation-reduction reactions or substitution reactions.

Processes are chains of Interactions. The process of spoilage of a broth might be said to be composed of two interactions — first, the contamination of the broth by microbes, and, second, the growth of the microbe population spoiling the broth. As with the distinction between entities and systems, there is considerable flexibility in the distinction between interactions and processes. Many interactions could be viewed as a process and decomposed into more granular levels of interactions. For example, an infection causes a disease but the mechanics of infection may be decomposed into lower-level processes and interactions. Generally, as complex phenomena are explored more fully, their component processes are increasingly refined.

Some behavior of systems could be described with methods such as those familiar from object-oriented data models. For example, animals of a species may have typical instinctive behaviors that could be included as a part of the dynamics of a system. In addition, interactions and processes may be modeled with the Unified Modeling Language (UML) that is widely used for modeling discrete processes in information systems. In particular, the activity diagrams within UML provide workflow models that would be particularly useful. Moreover, components of systems may have complex behavior, and formalisms from UML can be used to describe that behavior. In fact, UML has been proposed as a conceptual foundation for systems biology [9].

While UML can be useful, there are many complex systems for which it is inadequate. These systems may be modeled with techniques such as simulations. There are a variety of simulation techniques ranging from autonomous agents to numerical analysis of coupled differential equations. For some systems which show learning, such as natural selection in the evolution of species, the underlying representation may be modified.

Acknowledgment

Jordon Steele and Michael Zarro provided useful comments on earlier drafts of this paper.

References

[1] Ad Hoc Working Group for Critical Appraisal of the Medical Literature. A proposal for more informative abstracts of clinical articles. 1987, Annals of Internal Medicine, 106, 598-604.

[2] Aickin, M., 2002, Causal Analysis in Biomedicine and Epidemiology: Based on Minimal Sufficient Causation. Marcel Decker Inc., New York.

[3] Allen, R. B., 2007, Highly Structured Scientific Publications. ACM/IEEE Joint Conference on Digital Libraries, 472. doi:10.1145/1255175.1255271

[4] Allen, R. B., Wu, Y. J., & Jun, L., 2005, Interactive Causal Schematics for Qualitative Scientific Explanations, ICADL (LNCS 3815/2005) 411-415. doi:10.1007/11599517_50

[5] Campbell, D. T. & Stanley, J. C., 1966, Experimental and Quasi-Experimental Designs. Chicago: Rand-McNally.

[6] Forbus, K. D., 1996, Qualitative Reasoning. CRC Hand-book of Computer Science and Engineering. CRC Press.

[7] Hunter, J., 2006, Scientific Models — A User-Oriented Approach To The Integration Of Scientific Data And Digital Libraries, in Victorian Association for Library Automation.

[8] Pasteur, L., 1879, Studies on Fermentation: the Diseases of Beer, their Causes, and the Means. Translated by F. Faulkner and D.C. Robb. Macmillan, London.

[9] Roux-Rouquie, M., & Schuch da Rosa, D., 2006, Ten Top Reasons for Systems Biology to Get Into Model-Driven Engineering, ICSE.

[10] Shum, S. B., Motta, E., & Domingu, J., 2000, ScholOnto: An Ontology-Based Digital Library: Server for Research Documents and Discourse, International. Journal on Digital Libraries, 3(3), 237-248. doi:10.1007/s007990000034

[11] Swales, J. M., 1990, Genre Analysis: English in Academic and Research Settings. Cambridge University Press, Cambridge UK.

[12] Taylor, I. J., Edelman, E., Gannon, D. B., & Shields, M., 2007, Workflows for e-Science: Scientific Workflows for Grids. Springer, London.

[13] Thagard, P., 1992, Conceptual Revolutions. Princeton University Press: Princeton NJ.

[14] Trigg, R., 1983, A Network-Based Approach to Text Handling for the Online Scientific Community, PhD Dissertation, Department of Computer Science, University of Maryland.

About the Author

Robert B. Allen was a pioneer in the development of recommender systems. Recently, he has explored novel access techniques for digital history such as text extraction from collections of digitized historical newspapers and interactive timeline interfaces. Dr. Allen is at the iSchool at Drexel University. He has prepared a comprehensive online informatics textbook online "Information: A Fundamental Construct". Before joining Drexel, he was at the University of Maryland, a Senior Scientist at Bellcore, and a Member of Technical Staff at Bell Laboratories. His Ph.D. was in Social and Cognitive Experimental Psychology from UCSD. Dr. Allen was Editor in Chief of the ACM Transactions on Information Systems and Chair of the ACM Publications Board.