Domain-specific templates for refinement transformations

Lucia Kapova; Thomas Goldschmidt; Jens Happe; Ralf H. Reussner

To keep up to date, manufacturing enterprises need to use the latest results from the ICT sector, especially when collaborating with external partners in a supply chain and exchanging products and data. This has led to dealing with an increasing amount of heterogeneous information exchanged between partners including machines (physical means), humans and IT in the Supply Chain of ICT Systems (SC-ICTS). In this context, interoperability management is becoming more and more critical, but paradoxically, it is not yet fully efficiently anticipated, controlled and accompanied to recover from incompatibilities issues or failures. This paper intends to present how enterprise modeling, enterprise interoperability and model driven approaches can lead, together with system engineering architecture, to contribute to developing and improving the interoperability in the SC-ICTs. Model Driven System Engineering Architecture (MDSEA) is based on Enterprise Modeling using GRAI Model and its extensio...

This paper presents an interoperability framework for model-driven development of enterprise applications and software systems. The framework provides a foundation for how to apply MDD in software engineering disciplines in order to support the business interoperability needs of an enterprise. The framework introduces reference models for conceptual integration, technical integration and applicative integration of software systems.

ABSTRACT Enterprise Interoperability is the ability of enterprises to interoperate in order to achieve their business goals. Although the purpose of enterprise interoperability is determined at the business level, the use of technical (IT) services to support business services implies that interoperability solutions at both the business and technical level should be aligned. This paper introduces and demonstrates the suitability of an approach based on model transformations to automate enterprise interoperability. We start by considering that a set of enterprises are willing to interoperate in the context of their individual goals. The interactions necessary for their cooperation are then properly captured in terms of a so-called choreography. Our approach allows a choreography to be mapped and transformed to an orchestration, which defines the operation of the actual technical services of the interoperating enterprises. The paper discusses the technical challenges of implementing the transformation, and illustrates our approach with two application scenarios.

Proceedings of the First International Workshop on Model-Driven Interoperability (MDI 2010) In conjunction with MoDELS 2010 Oslo, Norway, October 3-5, 2010 http://mdi2010.lcc.uma.es/ ACM International Conference Proceedings Series ACM Press Editors: Jean Bézivin, INRIA & Ecole de Mines de Nantes, France Richard Mark Soley, OMG, Needham, USA Antonio Vallecillo, University of Málaga, Spain ISBN: 978-1-4503-0292-0 The Association for Computing Machinery 2 Penn Plaza, Suite 701 New York New York 10121-0701 ACM COPYRIGHT NOTICE. Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM, Inc., fax +1 (212) 869-0481, or permissions@acm.org For other copying of articles that carry a code at the bottom of the first or last page, copying is permitted provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, +1-978-7508400, +1-978-750-4470 (fax). Notice to Past Authors of ACM-Published Articles ACM intends to create a complete electronic archive of all articles and/or other material previously published by ACM. If you have written a work that was previously published by ACM in any journal or conference proceedings prior to 1978, or any SIG Newsletter at any time, and you do NOT want this work to appear in the ACM Digital Library, please inform permissions@acm.org, stating the title of the work, the author(s), and where and when published. ACM ISBN: 978-1-4503-0292-0 TABLE OF CONTENTS Editorial to the MDI 2010 Workshop Jean Bézivin, Richard M. Soley and Antonio Vallecillo…………................................. 1 Model Driven Interoperability in practice: preliminary evidences and issues from an industrial project Youness Lemrabet, David Clin, Michel Bigand, Jean-Pierre Bourey and Nordine Benkeltoum ……………………………………………………………..... 3 Semantic Interoperability of Clinical Data Idoia Berges, Jesus Bermudez, Alfredo Goñi and Arantza Illarramendi ..……………. 10 A Process Model Discovery Approach for Enabling Model Interoperability in Signal Engineering Wikan Danar Sunindyo, Thomas Moser, Dietmar Winkler and Stefan Biffl …………. 15 Efficient Analysis and Execution of Correct and Complete Model Transformations Based on Triple Graph Grammars Frank Hermann, Hartmut Ehrig, Ulrike Golas and Fernando Orejas............................. 22 Towards an Expressivity Benchmark for Mappings based on a Systematic Classification of Heterogeneities Manuel Wimmer, Gerti Kappel, Angelika Kusel, Werner Retschitzegger, Johannes Schoenboeck and Wieland Schwinger ……………………………………… 32 Specifying Overlaps of Heterogeneous Models for Global Consistency Checking Zinovy Diskin, Yingfei Xiong and Krzysztof Czarnecki ……………………………... 42 Anticipating Unanticipated Tool Interoperability using Role Models Mirko Seifert, Christian Wende and Uwe Assmann .…………………………………. 52 Aligning Business and IT Models in Service-Oriented Architectures using BPMN and SoaML Brian Elvesæter, Dima Panfilenko, Sven Jacobi and Christian Hahn…………………. 61 Domain-specific Templates for Refinement Transformations Lucia Kapova, Thomas Goldschmidt, Jens Happe and Ralf Reussner ………………. 69 Advanced Modelling Made Simple with the Gmodel Metalanguage Jorn Bettin and Tony Clark ..………………………………………………………….. 79 Model-driven Rule-based Mediation in XML Data Exchange Yongxin Liao, Dumitru Roman and Arne J. Berre …………………………………… 89 Behavioural Interoperability to Support Model-Driven Systems Integration Alek Radjenovic and Richard Paige ………………………………………………….. 98 iii List of Authors Assmann, Uwe Benkeltoum, Nordine Berges, Idoia Bermudez, Jesus Berre, Arne.J Bettin, Jorn Biffl, Stefan Bigand, Michel Bourey, Jean-Pierre Clark, Tony Clin, David Czarnecki, Krzysztof Diskin, Zinovy Ehrig, Hartmut Elvesaeter, Brian Goñi, Alfredo Golas, Ulrike Goldschmidt, Thomas Hahn, Christian Happe, Jens Hermann, Frank Illarramendi, Arantza 52 3 10 10 89 79 15 3 3 79 3 42 42 22 61 10 22 69 61 69 22 10 Jacobi, Sven Kapova, Lucia Kappel, Gerti Kusel, Angelika Lemrabet, Youness Liao, Yongxin Moser, Thomas Orejas, Fernando Paige, Richard Panlenko, Dima Radjenovic, Alek Retschitzegger, Werner Reussner, Ralf Roman, Dumitru Schoenboeck, Johannes Schwinger, Wieland Seifert, Mirko Sunindyo, Wikan Danar Wende, Christian Wimmer, Manuel Winkler, Dietmar Xiong, Yingfei 61 69 32 32 3 89 15 22 98 61 98 32 69 89 32 32 52 15 52 32 15 42 Program Committee Patrick Albert Uwe Assmann Colin Atkinson Jorn Bettin Jean Pierre Bourey Tony Clark Robert Clarisó Gregor Engels Jean Marie Favre Robert France Dragan Gasevic Sébastien Gérard Martin Gogolla Jeff Gray Esther Guerra Tihamer Levendovszky Richard Paige Alfonso Pierantonio Bernhard Rumpe Jim Steel Hans Vangheluwe Andrew Watson Jon Whittle Manuel Wimmer IBM, France Technische Universitat Dresden, Germany University of Mannheim, Germany Sofismo AG, Switzerland Laboratoire de Génie Industriel de Lille, France Middlesex University, UK Universitat Oberta de Catalunya, Spain University of Paderborn, Germany University of Grenoble, France Colorado University, USA Atabasca University, Canada CEA LIST, France University of Bremen, Germany University of Alabama, USA Carlos III University, Spain Vanderbilt University, USA University of York, UK University of L'Aquila, Italy Aachen University, Germany Queensland University of Technology, Australia University of Antwerp, Belgium OMG, Needham, USA Lancaster University, UK Viena University of Technology, Austria Additional reviewers Fabian Buettner, Lars Hamann, Mirco Kuhlmann, Ivano Malavolta, Antonio Navarro Perez, Ingo Weisemoeller, Christian Wende, Claas Wilke. iv Editorial to the Proceedings of the First International Workshop on Model-Driven Interoperability Jean Bézivin Richard Mark Soley Antonio Vallecillo INRIA and Ecole de Mines de Nantes 4 rue Alfred Kastler - F-44307 Nantes Cedex 3 - France +33 251 858 704 Object Management Group, Inc. Building A, Suite 300.140 Kendrick Street. Needham, MA 02494 +1 781 444 0404 Universidad de Málaga Bulevar Louis Pasteur 35 29071 Málaga, Spain +34 952 132794 Jean.Bezivin@inria.fr soley@omg.org av@lcc.uma.es Model interoperability is much more complex than simply defining a local serialization format, e.g., XMI. This would just resolve the syntactic (or “plumbing”) issues between models and modeling tools. However, interoperability should also involve further aspects, including behavioral specifications of models (which in turn describe the behavioral aspects of the systems being modeled), and other “semantic” issues [2] such as agreements on names, context-sensitive information, agreements on concepts (ontologies), integration conflict analysis (including for example automatic data model matching), semantic reasoning, etc. Furthermore, interoperability not only means being able to exchange information and to use the information that has been exchanged [3], but also to exchange services and functions to operate effectively together. All these interoperability issues and needs become clear in any complex system, as it has recently happened in the HL/7 and DICOM healthcare projects, for instance. ABSTRACT This paper describes the scope, structure and contents of the First International Workshop on Model Driven Interoperability (MDI 2010), which was held on October 5, 2010, in conjunction with the MoDELS 2010 conference in Oslo, Norway. Categories and Subject Descriptors D.2.12 [Software Engineering]: Interoperability. I.6.5 [Simulation and Modeling]: Model Development – Modeling methodologies. General Terms Design, Standardization, Languages. Keywords Model-driven engineering, interoperability. Models and MDE techniques (especially metamodeling and model transformations) can play a fundamental role for fully accomplishing these tasks. Thus, models can become cornerstone elements for enabling and achieving interoperability between all kinds of systems and artifacts, including data sets (under the presence of different data schemata, and possibly at different levels of abstraction), services (despite their differences in data representation, access protocols and underlying technological platforms), event systems (with different complex types and origins), languages (that use different notations and may have different semantics), tools (with different data formats and semantic representations), technological platforms (with different notations, tools and semantics), etc. It should also be emphasized that the success of MDE has created accidental complexity, for example by generating a number of overlapping metamodels (UML, SySML, BPML, etc.) and this situation reveals itself in a number of contexts as an additional metamodel interoperability problem. 1. INTRODUCTION Interoperability is the ability of separate entities, systems or artifacts (organizations, programs, tools, etc.) to work together. Although there has always been the need to achieve interoperability between heterogeneous systems and notations [1], the difficulties involved in overcoming their differences, the lack of consensus on the common standards to use and the shortage of proper mechanisms and tools, have severely hampered this task. Model-Driven Engineering (MDE) is an emergent discipline that advocates the use of (software) models as primary artifacts of the software engineering process. In addition to the initial goals of being useful to capture user requirements and architectural concerns, and to generate code from them, models are proving to be effective for many other engineering tasks. New model-driven engineering approaches, such as model-driven modernization, models-at-runtime, model-based testing, etc. are constantly emerging. 2. THE MDI 2010 WORKSHOP The goal of the MDI2010 workshop was to discuss the potential role of models as key enablers for interoperability, and the challenges ahead. The workshop aims to provide a venue where researchers and practitioners concerned with all aspects of models and systems interoperability could meet, disseminate and exchange ideas and problems, identify some of the key issues related to model-driven interoperability, and explore together possible solutions. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10…$10.00. 1 The MDI 2010 workshop was held on October 5, 2010, in conjunction with the MoDELS 2010 conference in Oslo, Norway.  The Workshop was a huge success. An excellent Program Committee was assembled to help with the review process, which included very well-known and respected experts in the topics of the workshop: Patrick Albert, Uwe Assmann, Colin Atkinson, Jorn Bettin, Jean Pierre Bourey, Tony Clark, Robert Clarisó, Gregor Engels, Jean Marie Favre, Robert France, Dragan Gasevic, Sébastien Gérard, Martin Gogolla, Jeff Gray, Esther Guerra, Tihamer Levendovszky, Richard Paige, Alfonso Pierantonio, Bernhard Rumpe, Jim Steel, Hans Vangheluwe, Andrew Watson, Jon Whittle and Manuel Wimmer.     In response to the call for papers, a total of 19 submissions were received. Submitted papers were formally peer-reviewed by three referees, and 12 papers were finally accepted for presentation at the workshop and publication at the Proceedings, that have been published in the ACM Digital Library.   We counted with some external reviewers that helped PC members to review the papers: Fabian Buettner, Lars Hamann, Mirco Kuhlmann, Ivano Malavolta, Antonio Navarro Perez, Ingo Weisemoeller, Christian Wende and Claas Wilke. These papers contribute in different aspects to the area of model driven interoperability, from its foundations to the potential benefits it may bring to the emerging field of MDE. The workshop was organized in four sessions. The first three were dedicated to the presentation of the selected papers. The last session was dedicated to discussions among the participants about the open issues and topics identified during the paper presentations. 4. ACKNOWLEDGMENTS We would like to thank the MoDELS 2010 organization for giving us the opportunity to organize this Workshop, especially to the Workshop Chairs, Juergen Dingel and Arnor Solberg. Many thanks to all those that submitted papers, and particularly to the contributing authors. Our gratitude also goes to the paper reviewers and the members of the MDI 2010 Program Committee, for their timely and accurate reviews and for their help in choosing and improving the selected papers. Finally we would like to acknowledge the research projects TIN2008-03107 and P07-TIC-03184 that have helped supporting this workshop. 3. WORKSHOP PAPERS The following 12 papers were presented in the workshop:      Retschitzegger, Johannes Schoenboeck and Wieland Schwinger. “Specifying Overlaps of Heterogeneous Models for Global Consistency Checking” by Zinovy Diskin, Yingfei Xiong and Krzysztof Czarnecki. “Anticipating Unanticipated Tool Interoperability using Role Models” by Mirko Seifert, Christian Wende and Uwe Assmann. “Behavioural Interoperability to Support Model-Driven Systems Integration” by Alek Radjenovic and Richard Paige. “Aligning Business and IT Models in Service-Oriented Architectures using BPMN and SoaML” by Brian Elvesæter, Dima Panfilenko, Sven Jacobi and Christian Hahn. “Domain-specific Templates for Refinement Transformations” by Lucia Kapova , Thomas Goldschmidt , Jens Happe and Ralf Reussner. “Advanced Modelling Made Simple with the Gmodel Metalanguage” by Jorn Bettin and Tony Clark. “Model-driven Rule-based Mediation in XML Data” by Yongxin Liao, Dumitru Roman and Arne.J. Berre. “Model Driven Interoperability in practice: preliminary evidences and issues from an industrial project” by Youness Lemrabet, David Clin, Michel Bigand, Jean-Pierre Bourey and Nordine Benkeltoum. “Semantic Interoperability of Clinical Data Exchange” by Idoia Berges, Jesús Bermudez, Alfredo Goñi and Arantza Illarramendi. “A Process Model Discovery Approach for Enabling Model Interoperability in Signal Engineering” by Wikan Danar Sunindyo and Thomas Moser. “Efficient Analysis and Execution of Correct and Complete Model Transformations Based on Triple Graph Grammars” by Frank Hermann, Hartmut Ehrig, Ulrike Golas and Fernando Orejas. “Towards an Expressivity Benchmark for Mappings based on a Systematic Classification of Heterogeneities” by Manuel Wimmer, Gerti Kappel, Angelika Kusel, Werner 5. REFERENCES [1] Wegner, P., Interoperability, ACM Comput. Surv., 28, 1 (March 1996), 285-287 [2] Heiler. S., Semantic interoperability. ACM Comput. Surv. 27, 2 (Jun. 1995), 271-273. [3] 2 Institute of Electrical and Electronics Engineers. IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries. 1990. Model Driven Interoperability in practice: preliminary evidences and issues from an industrial project Youness Lemrabet Univ Lille Nord de France, F-59000 Lille, France Michel Bigand Univ Lille Nord de France, F-59000 Lille, France David.Clin Univ Lille Nord de France, F-59000 Lille, France LM2O, Ecole Centrale de Lille, BP48 LM2O, Ecole Centrale de Lille, BP48 LM2O, Ecole Centrale de Lille, BP48 59651 Villeneuve d'Ascq cedex, 59651 Villeneuve d'Ascq cedex, 59651 Villeneuve d'Ascq cedex, France. France. France. (+33) 3 20 33 54 60 (+33) 3 20 67 60 25 (+33) 6 71 15 33 55 Youness.Lemrabet@centralienslille.org Michel.Bigand@ec-lille.fr David.clin@ec-lille.fr Jean-Pierre Bourey Univ Lille Nord de France, F-59000 Lille, France Nordine BENKELTOUM Univ Lille Nord de France, F-59000 Lille, France LM2O, Ecole Centrale de Lille, BP48 59651 Villeneuve d'Ascq cedex, France. (+33) 320 33 54 08 LM2O, Ecole Centrale de Lille, BP48 59651 Villeneuve d'Ascq cedex, France. (+33) 20 67 60 25 Jean-Pierre.Bourey@ec-lille.fr nordine.benkeltoum@ec-lille.fr ABSTRACT 1. INTRODUCTION Problems of interoperability inside and outside organizations have recently been the subject of considerable amount of studies. Although the Model Driven Interoperability (MDI) and Service Oriented Architecture approaches are widely accepted among scholars to improve interoperability, little was known about the ins and outs of the combination between these approaches in practice. This article is based on an industrial project called ASICOM which aimed at building a platform that enables interoperability among industrial partners. It suggests some preliminary evidences and issues for both theories and practices. Interoperability is defined as “the ability of two or more systems or components to exchange information and to use the information that has been exchanged” [1]. It is an important issue for Information Systems (IS) practitioners since the growing need of integration of heterogeneous IS. Enterprises and more widely organizations meet problems that are similar to the lack of interoperability. Enterprises have to fit their functions and processes taking into consideration internal and external constraints. Thanks to this strategy they are able to take advantage of new business opportunities and improve their competitiveness by delivering high quality products/services while keeping the production cost as low as possible [2]. Categories and Subject Descriptors [Software Engineering]: Interoperability Recent studies show that Model Driven Interoperability (MDI) and a Service Oriented Architecture (SOA) can be combined to support interoperability [3]. The main research question of this article is the following: “how combine MDI and SOA approaches in a collaborative context to improve interoperability and strategic alignment of IS?” The paper reflects on aspects of enterprise interoperability within the framework of the ASICOM project. [Simulation and Modeling]: Model Development – Modeling methodologies. General Terms Experimentation, Languages Keywords ASICOM project aimed at providing Small and Medium Enterprises (SMEs) from trade and logistics sectors with a pragmatic and generic approach that allows to set up simplified, interoperable and adaptable solutions that improve communication with their partners throughout dematerialization. More precisely, the ASICOM project focuses on customers (firm from retail industry and stockiest) relations to make administrative procedures easier (i.e.: goods clearance procedure, customs’ duties payment). Furthermore, it will allow SME’s to manage their customers’ bonded warehouse in which dutiable goods are Model Driven Interoperability (MDI), Business Process Management (BPM), Business Process Modeling Notation (BPMN), Service oriented architecture (SOA), ATHENA Interoperability framework (AIF). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10...$10.00 3 stored and manipulated without payment of duty and also to communicate with the French Administrative custom systems like Delt@D and Delt@C systems using dematerialized documents. In the ASICOM project the Service Oriented Architecture was chosen to guide and facilitate alignment effort between business models and IT models. SOA provides the required flexibility to integrate new SMEs to the ASICOM project. Based on our own implication in the ASICOM project, the existing interoperability framework and modeling practices, we identified the elements that have to be taken into consideration in an enterprise interoperability project. Especially for projects that use model-driven development and service-oriented architecture as a key solution to tackle the interoperability problem. To facilitate the interoperability and communication at both the modeling and the technical levels, we assume the use of existence modeling practices and standard notations such as Model-Driven Architecture (MDA)1, Business Motivation Model (BMM)2, Business Process Modeling Notation (BPMN)3 , Service oriented architecture Modeling Language (SoaML)4, Business, Business Process Execution Language (BPEL)5, Unified Modeling Language (UML)6, eXtended Markup Language (XML)7, and Web Service Description Language (WSDL)8. Figure 1. Reference Model for MDI. The Model Driven Interoperability (MDI) proposal [4] explains how a model-driven approach can be a useful way to solve interoperability problems. It attempts to introduce different abstraction levels to reduce the gap between enterprise models and code level. The level definition is based on the three levels of MDA: CIM, PIM, and PSM. The remainder of this paper is divided into four parts. The second section deals with the state of the art on MDI, SOA and SoaML. The third section introduces advantages and challenges for model driven systems and a reflection on the combination of MDI and SOA approaches to support interoperability through the ATHENA Interoperability framework (AIF). It describes also evidences and issues from the ASICOM project. The paper closes by a conclusion and describes further research. A considerable number of interoperability frameworks have evolved during the last 10 years [5]. Project like ATHENA [6] provides interoperability frameworks that explain how MDD should be applied in software engineering practice and support business interoperability. ATHENA Interoperability Framework (AIF) describes each system by enterprise models and different aspects. It focuses on the provided and required artifacts of each collaborating systems inside or outside an enterprise. In the AIF, interoperations take place at different viewpoints: enterprise/business, process, service and information/data. At each viewpoint a model-driven interoperability is prescribed. 2. RELATED WORK 2.1 Overview of MDI Model-Driven Development (MDD), and in particular OMG’s MDA is emerging as a standard in practice to develop model driven applications and systems. Figure 1 presents the Reference Model for MDI. 1 http://www.omg.org/cgi-bin/doc?omg/03-06-01.pdf 2 http://www.omg.org/cgi-bin/doc?formal/08-08- 02.pdf 3 http://www.omg.org/spec/BPMN/2.0 4 http://www.omg.org/spec/SoaML/1.0/Beta2/ 5 http://docs.oasis-open.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.pdf 6 http://www.omg.org/spec/UML/2.2/ 7 http://www.w3.org/TR/2008/REC-xml-20081126/ 8 http://www.w3.org/TR/2008/REC-xml-20081126/ Figure 2. AIF conceptual framework-simplistic view. Figure 2 is derived from the ATHENA Interoperability Framework. It gives a simplistic view of the reference model that indicates the required and provided artifacts of two collaborating enterprises. Each enterprise is described by enterprise models and different viewpoints (business, process, service, information) on 4 using Unified Modeling Language10 (UML). It is a set of extensions to UML that define SOA concepts and support service modeling and designing [10]. The goal of SoaML is also to support automatic generation of SOA derived artifacts following an MDA approach. different abstract levels. For [7] interoperations are only meaningful where all aspects of an enterprise are addressed. 2.2 Overview of SOA The expression service-oriented architecture (SOA) refers to a way of organizing and understanding organizations, communities and systems to maximize agility and scale. Thus it is also seen as an architectural approach, guideline and pattern to realize a system through a set of provided and required services. SoaML offer several benefits such as:   SOA is technology independent; it means that the choice of technologies and tools is secondary. Various technologies might be used to support SOA implementation. According to a recent research [8], “to achieve its potential, an SOA needs to be business-relevant, thus driven by the business and implemented to support the business”.   2.2.1 SOA infrastructure Patterns While SOA infrastructure is far from being sufficient in making SOA work, it is a necessary component that underlies any architectural approach [9]. It is crucial to understand the merit of each infrastructure pattern before choosing a style of infrastructure (See Section 3.3.2). It is also important to note that a discussion about the targeted SOA infrastructure patterns does not map a specific vendor or open source application infrastructure on one to one basis. Many products implement a hybrid infrastructure patterns. According to [9] there are four SOA infrastructure patterns: First the service container infrastructure pattern: In this pattern the service can be implemented on a “container” that provide a runtime environment which coordinates the service interactions by marshalling request to and from the service. For example in this type of infrastructure a service can be implemented as a servlet in an application server platform. Then, the hub-and-spoke infrastructure pattern, in which an integration middleware platform acts as the coordinate point for all the interactions between services, this coordination point interacts with services through adapters. This pattern is known as Enterprise Application Integration. The third pattern is the centralized messaging infrastructure pattern which leverage message-oriented middleware and messaging infrastructure to coordinate messages between services (managing the messages matters more than managing the specific runtime endpoint). Consequently, rather than connecting to Service endpoints through adapters or a hub-and-spoke approach, one simply needs to instrument the end points to utilize a particular message bus or publish/subscribe infrastructure. And fourthly, the network intermediary as infrastructure pattern, in this pattern the challenge is to use a single standard for system interoperability at the seventh layer of the OSI9 network. Intelligent network can be used as SOA infrastructure to intermediate the interactions between services. To perform this feature the seventh layer of the OSI model network must be more specific, intelligent, and enabled with respect to Services [9]. enabling a community or organization to work together using SOA services at a higher level of abstraction; addressing service interaction concerns at the architectural levels by using architecture as the bridge between business requirements and automated IT solutions; Leveraging standards. and integrating with existing OMG 3. SOLUTION AND LESSONS LEARNED: SOA TO RATIONALISE MDI Our work has been inspired by will-known and existing frameworks. Several engineering methods and frameworks dealing with the design, construction, implementation, governance and tools for the development of information systems exist. These methods belong to the following areas: (i) Model driven development (MDD) frameworks. (ii) Enterprise architecture (EA) methodologies frameworks. (iii) Service oriented development methodologies and frameworks. The most known Enterprise Modelling Frameworks and Architectures are: The Zachman Framework [11], The Open Group Architecture Framework [12], The GERAM Framework from ISO IS 15704:2000 [13], the GIM architecture [14], The CIMOSA Framework [15] and Praxeme methodology [16]. However these frameworks don’t give a special focus on interoperability problems. Other projects (Shape11 and Bsopt12) aim to support the development of enterprise systems by developing a methodology bac ed by the concepts of SOA’s and a model driven engineering tool set. ATHENA project is based on a multidisciplinary approach that combine three research fields to support the development of enterprise interoperability [7]: (i) enterprise modeling which defines interoperability requirements and supports solution implementation, (ii) architectures and platforms which provide the technological base of interoperability system, and (iii) ontology which identifies interoperability semantics in the enterprise. We do not take into consideration the ontology since we consider it to be outside the scope of this study. We rather focus on enterprise modeling and architecture and platforms areas. The idea of interoperability is multi-faces. Actually, it is necessary to distinguish the interoperability concepts. Using the AIF and MDI approaches, we suggest to use a grid to capture the good 2.2.2 Overview of SoaML The OMG standard Service oriented architecture Modelling Language (SoaML) is aimed at taking advantages of SOA. SoaML provides a new way of designing and modelling SOA solutions 9 allowing service interoperability at the model level; Open Systems Interconnection 5 10 http://www.omg.org/spec/UML/2.3/ 11 http://www.shape-project.eu 12 http://www.bsopt.at/ practices at each level of MDI (CIM, PIM, PSM) for each level defined in the AIF (business, process, service and information). 3.1.3 PSM level At this level each partners must first choose a style of architecture to implement (e.g. SOA), and then understand the various styles of projects to build. The Gartner Group identifies three styles of projects (which are not detailed here) [21] [22]: Table 1. MDI approach in each aspect of AIF. CIM PIM PSM  Business Process  Service Data Table 1 aims to give a holistic perspective on interoperability to allow each partner to analyze and understand their business needs and technical requirements. This grid defines interoperability components as a set of sub-domains. The interaction of a level (line) and an aspect (column) constitutes a sub-domain. The 12 sub-domains of interoperability make easier the definition of expertise area among partners. However the fulfillment of all subdomains is not a sign of excellence or maturity. A partner is fully interoperable in the sense that new business relationships can be done at low costs [17]. This section does not address the full scope of each AIF aspect, but rather suggests an overview of the main issues of this project. We will give examples and describe each level of interoperability based on our experience in the ASICOM project. The following gives an overview of the central formalisms and concepts as well as the methods of each level of the matrix.  Execute a new and complete SOA approach: the primary objective of these projects is the design, creation and execution of new SOA artifacts. Composite applications and business process support: the primary objective of these projects is the assembly and deployment of composite applications and processes. Orchestration of services in support of an application process is important. The focus of these projects is on combining existing functionality rather than creating new business functionality. Application integration: the primary objective of these projects is the integration of the data and business logic of applications. 3.2 Process Aspect Business process models contain what has to be done in the business to achieve the business goals and vision [8]. Business analyst starts by distinguishing business processes from goals and models. The OMG specification BPMN can be used to capture the business processes that can be shared between the stakeholders. BPMN is very expressive and provides a notation that is intuitive to business users. In this methodology, business processes are designed to cover many types of modeling and can be used at different levels of details (CIM and PIM). 3.1 Business aspect Interoperability at this level is seen as the organizational and operational ability of an enterprise to cooperate with external organization in spite of different working practices, legislations, cultures and commercial approaches [18]. Cooperating partners must have a compatible vision and focusing on the same elements [19]. Thus each partner must start by focusing on its business goals and project objectives using business modeling practices. BMM should be used to define clear goals and objectives of each partner. An industrial network is not a stable and permanent entity, business objectives of each partner can change, and this evolution must be taken into account. 3.2.1 CIM level BPMN choreography diagrams which focus on the exchange of information between participants can be used at this level to create the initial drafts of processes. Nevertheless, these models must be further refine and related to other kind of BPMN models. The BPMN 2.0 specifies that implementation is not expected to support directly choreography modeling elements. 3.2.2 PIM level The BPMN choreography Business process can be refined using BPMN collaboration diagrams which describe in details collaboration between participants. First, the business analyst has to identify two types of business processes: (i) public business processes that involve by the interaction with the partners and (ii) private business processes under the ownership control of each participant. Then it has to identify the parts of the process to computerize. 3.1.1 CIM level At this level partners have to find the factors that motivate the establishment of business plans and business perspectives by interviews involving relevant stakeholders and workshops. 3.1.2 PIM level The PIM specifies the elements of business plans and stresses on the description of business goals, tactics and rules. It is necessary to define the interoperability approach and then to choose the project style that will be implemented. According to ISO 14258 (1990), there are three ways to establish interoperations between related systems (which are not details here) [20]: (i) Integrated approach (ii) Unified approach (iii) Federal approach. In the ASICOM project, none of the partners imposes their models, languages and methods of work: (i) all partners do not use a common format for all models (not integrated) and (ii) there is no common meta-model between partners (not Unified). The chosen way to tackle interoperability issue is the federal interoperability approach. Then BPMN models are mapped to more technical models on the PSM level using Business Process Execution Language (BPEL). 3.2.3 PSM level BPMN specifications explicitly suggest BPEL to be used for the execution of business process. After the description of orchestration process in BPMN, they can be formalized and refined with the implementation details using BPEL that describes how the partners collaborate. 6 processes have to be able to respond to changes in the customs regulations through a single shared business solution. 3.3 Service aspect The main concern at this aspect is to identify SOA services which can be used to enable business agility through business processes reuse. This viewpoint bridges the gap between business requirements and a service based solution. According to [23], BPMN model does not contain all information needed to implement SOA. Consequently, modeling services can be supported by SoaML formalism which can be used to model services at the CIM level and then subsequently refining them towards a platform-specific implementation. SOA has been associated with a variety of approaches such as Service-Oriented Analysis and Design (SOAD) [24], Service-Oriented Modeling and Architecture (SOMA) [25], Praxeme [9]) and technologies such as Enterprise Service Bus (ESB). The different approaches are intended to identify SOA services. The service-centric approach suggested by [5], argues that a goal-driven identification of services allows a better strategic alignment. In this approach BMM and SoaML are used to describe the realization of interoperability through business services. This approach proposes to map BMM to business services instead of business process to reduce the complexity introduced by the interorganizational business process. At this level SoaML models should be used to give to support IT concerns. The most used SoaML concepts are: ServiceInterface and MessageType. 3.3.3 PSM level A lot of products propose different infrastructure patterns or hybrids SOA infrastructural approaches. At this level, it is important to provide an answer to this question: what type of SOA solution to implement? Architect must care about two points: the targeted technologies to implement SOA services and the application infrastructure13 to support SOA solutions. Both points will be discussed below. The architect must specify the implementation artifacts of the services-oriented architecture of the chosen technology, e.g. Web Services, Java Enterprise Edition (JEE), .NET, multi-agent systems (MAS). Then it has also to choose the adapted application infrastructure to implement SOA solutions: The diversity, heterogeneity of application solutions, business processes, and the business context of each partner must be considered [7]. In the ASICOM project, two target infrastructures have been tested to support SOA solutions: the open source Petals ESB from the Petals SOA Suite14 and the Business Process Management solution BizAgi15. BizAgi BPM suite is very intuitive but it suffers from the limitation of the technology (i.e.: it does not support the execution of BPEL and it provides access to existing applications through Web services only). Consequently, we chose the standard-based integration platform Petals ESB as application infrastructure. The implementation of each interoperability approach can be supported by one or many SOA infrastructure patterns (see Section 2.4.1). 3.3.1 CIM level To work together the participant must agree about a formalism to describe services at a high level of abstraction. SoaML can be used to model services both at CIM and PIM levels. For more details the MDSE methodology [26] and IBM [8] provide guidelines for how to use SoaML to define and specify a serviceoriented architecture. SoaML concepts as Capability, Participant, ServiceArchitacture and ServiceContract can be used at this level. Those concepts give a top view and describe the communication between different participants. They are used to express the business operations supported with the service-oriented architecture. In the ASICOM project we chose Web Services and Java Enterprise Edition (JEE) technologies to build upon SOA services. Thus WSDL and XML Schema Definition Language16 (XSD) are used to support syntactical Interoperation. 3.4 Data aspect The Data models have to be studied in parallel with the process and service models. A traditional item in service and process modeling is to create and manipulate the information. As we said in section 3.3 the data structure deficit is evident in BPMN; the concept of message flow is not supported by data models [25]. Data and information models are out of the scope of BPMN and UML class diagrams can be used to describe messages. 3.3.2 PIM level Even if implementing SOA should not depend on a SOA platform strategy, enterprises have to define a SOA platform target and a SOA infrastructure patterns (see Section 2.4.1). The partners do not have to choose a specific product at this level, but the discussion about the target SOA infrastructure patterns is very important. It is imperative for each partner to understand the advantages and drawbacks of each SOA infrastructure patterns. Thus, the definition of the SOA patterns must be strongly motivated by the interoperability approach chosen at the PIM level of the Business aspect (see Section 3.1.2). In the ASICOM project, a Mediation Information System (MIS) was chosen to support the mediation interoperability approach. The MIS is in charge of (i) information exchange, (ii) services sharing and (iii) behavior orchestration [27]. And at least the MIS must implements the centralized messaging infrastructure pattern. In this section we present a very simple example from the ASICOM warehouse management module data-model and we show how it can be refined to generate a physical model which represents the relational database concepts. The warehouse In the ASICOM project a MIS seems to be a pertinent way of supporting interoperability for three reasons. Firstly, the members of the ASICOM project need to communicate with their own channels. Secondly, their systems are not adapted to exchange information between each other. Thirdly, the collaborative 7 13 Gartner has defined a new category of software called “Application Infrastructure”. “Application infrastructure includes the majority of runtime middleware, as well as application development and management tools that support the new generation of applications, based on service-oriented architecture (SOA), event-driven architecture (EDA) and business process management (BPM)” [21] 14 http://www.petalslink.com/en 15 http://www.bizagi.com/ 16 http://www.w3.org/TR/xmlschema-2/ management module provides functionality to manage multi and structured stock locations. resolve the interoperability problem. This initial work shows that model-driven approach and service-oriented architecture enhance interoperability. However, a number of challenges must be overcome. 3.4.1 CIM level UML has become widely used in object-oriented system modeling such as J2EE and .NET. A first version of conceptual data-model can be done at this level figure 3. This model permits to identify the different entities and how they relate to one another. The ASICOM project is based on a processes-centered approach which associates methodologies, information technologies and governance. It aims to allow people from different background to collaborate together on a project of interoperability. Our next goal is to refine and investigated further in detail the relation between aspects. We believe that being able to model service orchestration with BPMN and BPEL and services details with SoaML to generate SOA artifacts is an important step to solve interoperability problem. We will continue to work on services modeling and transformation, in particular using the Software and Systems Process Engineering Meta-Model (SPEM) to defining the development process in an interoperable project using a service-oriented architecture. Figure 3. Data-model at the CIM level. 3.4.2 PIM level The conceptual data-model is refined at the PIM level. So we add the details to the logical model without worrying about how they will be implemented. For example Data type can be added to the diagram at this level (figure 4). 5. ACKNOWLEDGMENTS This work was partially funded by the ASICOM project. This project started in April 2008 was approved by two French poles of competitiveness: PICOM in Trade Industries domain and Nov@log in Logistics domain. 6. REFERENCES [1] IEEE, 1990. IEEE (Institute of Electrical and Electronics Engineers): Standard Computer Dictionary- A Compilation of IEEE Standard Computer Glossaries. [2] Jean-Pierre Lorre, Yiannis Verginadis, Nikos Papageorgiou, and Nicolas Salatge. 2010. Ad-hoc Execution of Collaboration Patterns using Dynamic Orchestration. Enterprise interoperability IV 2010, Part I, 3-12, DOI: 10.1007/978-1-84996-257-5_1. Figure 4. Data-model at the PIM level. 3.4.3 PSM level The Unified Modeling Language has become a standard objectoriented system modeling language and is supported by major corporations. Thus it can be used for object-relational database modeling. There are many techniques for transforming UML models to object-relational database systems, as discussed in [28]. Those techniques focused on transformations and are suited to be used with the Model Driven Development (MMD) approach. [3] ATHENA. Model-Driven Interoperability (MDI) Framework, http://www.modelbased.net/mdi/framework.html [4] Jean-Pierre Bourey, Reyes Grangel, Guy Doumeingts, Arne J. Berre, Report on Model Driven Interoperability. Technical Report, INTEROP, 2007. http://interopvlab.eu/ei_public_deliverables/interop-noe-deliverables. At this level details are added to the PIM models (UML class diagrams) to adapt it to a specific platform (i.e.: relational database). Figure 5 shows a very simple example from the database model diagram of the ASICOM project. This diagram includes concepts as tables, columns, views, and foreign keys. [5] Fenglin Han, Espen Moller, Arne.J.Berre. 2009. Organizational interoperability supported through goal alignment with BMM and service collaboration with SoaML. Interoperability for Enterprise Software and Applications (268 – 274). IESA '09. International Conference, China (2122 April 2009). [6] ATHENA, Advanced Technologies for Interoperability of Heterogeneous Enterprise Networks and their Applications, FP6-2002-IST-1, Integrated Project, (April. 2003). [7] Arne-Jørgen Berre, Brian Elvesæter1, Nicolas Figay, Claudia Guglielmina, Svein G. Johnsen, Dag Karlsen, Thomas Knothe and Sonia Lippe. 2007. The ATHENA Interoperability Framework. Enterprise Interoperability II, 2007, Part VI, 569-580, DOI: 10.1007/978-1-84628-8586_62. Figure 5. Physical model at the PSM level. 4. CONCLUSION In this paper we have introduced a new practical vision of interoperability based on Model Driven Architecture and ATHENA Interoperability Framework. Our research goal is not to propose yet another approach, but combine existing ones to [8] Jim Amsden, Modeling with SoaML, the Service-Oriented Architecture Modeling. (January. 2010) 8 http://www.ibm.com/developerworks/rational/library/09/mod elingwithsoaml-1/index.html [21] Hayward, Simon. and Natis, Yefim. V. 2006. Application Infrastructure' Reflects New Dynamics in the Software Market. Gartner (December. 2006). [9] Ronald Schmelzer. 2007. SOA Infrastructure Patterns and the Intermediary Approach (July. 2007) http://www.zapthink.com/2007/07/04/soa-infrastructurepatterns-and-the-intermediary-approach/. [22] Johan den Haan. 2008. Architecture requirements for Service-Oriented Business Applications, (May. 2008), http://www.theenterprisearchitect.eu/archive/2008/05/19/arc hitecture-requirements-for-service-oriented-businessapplications. [10] Michael Stollberg. 2009. Integrated and tool-supported Methodology Deliverable D2.2 – Initial Version – Work Package 2, SHAPE Project No 216408 (January. 2009). [23] Jihed Touzi, Frédérick Bénaben, Hervé Pingaud, Jean-Pierre Lorré. 2009. A model-Driven approach for collaborative service-oriented architecture design. International journal of production economics, Volume 121, Issue 1, Pages 5-20, Modelling and Control of Productive Systems: Concepts and Applications Elsevier (September. 2009). [11] Zachman, A Framework for Information Systems Architecture, IBM Systems Journal, vol. 31, no. 3, pp. 445– 470, 1999. [12] The Open Group Architecture Framework. 2009. TOGAF version 9, http://www.opengroup.org/ [24] O. Zimmermann, P. Krogdahl, and C. Gee, Elements of Service-Oriented Analysis and Design, An interdisciplinary modeling approach for SOA projects, IBM, 2 June 2004. http://www128.ibm.com/developerworks/webservices/library/ws-soad1/ [13] IFIP-IFAC Task Force, 1999. GERAM: Generalized Enterprise Reference Architecture and Methodology, Version 1.6.2, Annex to ISO WD15704, IFIP-IFAC [14] Doumeingts G., Vallespir B., Zanettin M., Chen D. 1992 GIM: GRAI IntegratedMethodology. A methodology for designing CIM systems. GRAI/LAP. Université-Bordeaux 1, version 1.0. [25] Arsanjani A., Service-oriented modeling and architecture: how to identify, specify, and realize servIces for your SOA. IBM whitepaper, 2004. [15] AMICE 1993. CIMOSA: Open System Architecture for CIM. 2nd extended revised version. Springer-Verlag, Berlin. [26] Brian Elvesæter, Cyril Carrez, Parastoo Mohagheghi, ArneJørgen Berre, Svein G. Model-Based Development with SoaML.2010. http://www.uio.no/studier/emner/matnat/ifi/INF5120/v10/un dervisningsmateriale/MDSE-SoaML-INF5120.pdf. [16] Praxeme Institue, Version 2.0, (June.2006), http://www.praxeme [17] Roland Jochem. 2010. Enterprise Interoperability assessment, 8th International Conference of Modeling and Simulation, MOSIM’10 - Hammamet-Tunisia, (December, 2010). [27] Frédérick Bénaben, Jihed Touzi, Vatcharaphum Rajsiri, Sebastien Truptil, Jean-Pierre Lorré, and Hervé Pingaud. 2008. Mediation Information System Design in a Collaborative SOA Context through a MDD Approach (June. 2008). [18] ATHENA. 2005. D.A1.3.1: Report on Methodology description and guidelines definition, Version 1.0, ATHENA Integrated Project, Deliverable D.A1.3.1, (March. 2005). [28] E.S. Grant, R. Chennamaneni, and H. Reza, Towards analyzing UML class diagram models to object-relational database systems transformations, Proceedings of the 24th IASTED international conference on Database and applications, Innsbruck, Austria: ACTA Press, 2006, pp. 129-134. [19] Arne-Jørgen Berre, Brian Elvesæter. 2008. Model-based System Development Part IV : MDI – Model Driven Interoperability Notes for Course material “Model Based System Development, INF5120 , (2008). [20] Chen, D. Dassisti, M. and Tsalgatidou, A. 2005. Interoperability Knowledge Corpus, An intermediate Report, Deliverable DI.1, Workpackage DI (Domain of Interoperability), INTEROP NoE, (November. 2005). 9 Semantic Interoperability of Clinical Data Idoia Berges Jesus Bermudez Alfredo Goñi University of the Basque Country P. Manuel de Lardizabal 1 Donostia-San Sebastian, Spain University of the Basque Country P. Manuel de Lardizabal 1 Donostia-San Sebastian, Spain University of the Basque Country P. Manuel de Lardizabal 1 Donostia-San Sebastian, Spain idoia.berges@ehu.es jesus.bermudez@ehu.es Arantza Illarramendi alfredo@ehu.es University of the Basque Country P. Manuel de Lardizabal 1 Donostia-San Sebastian, Spain a.illarramendi@ehu.es ABSTRACT the links among the terms of the upper and lower levels of the ontology. It obtains a declarative mapping specified in OWL2 and puts a wide range of mapping scenarios within reach of health systems’ administrators. The use of Electronic Health Records (EHRs) has brought multiple benefits to the healthcare domain. However, those advantages would be greater if seamless interoperability of EHRs between heterogeneous Health Information Systems were achieved. Nowadays, achieving that kind of interoperability is on the agenda of many national and regional initiatives, and in the majority of the cases, the problem is addressed through the use of different standards. In this paper we present a proposal that goes one step further and tackles the interoperability problem from a formal ontology driven perspective. So, our proposal allows one system to interpret on the fly clinical data sent by another one even when they use different representations. We present in the paper the three key components of the proposal: 1. An ontology that provides –in its upper level– a canonical representation of EHR statements, more precisely of medical observations, which can be then specialized –in the lower level– by health institutions according to their proprietary models, 2. A translator module that facilitates the definition of the lower level of the ontology from the particular EHRs data storage structures following a semiautomatic approach: first a translation process of underlying data structures, using –whenever possible– information about properties (functional dependencies, etc.) into ontology elements described in OWL2, and next, an edition process where the health system administrators can define new axioms to adjust and enrich the result obtained in the semi-automatic process. Finally we show the third component, a mapping module that helps in the task of defining Categories and Subject Descriptors D.2.12 [Software Engineering]: Interoperability General Terms Design 1. INTRODUCTION It is of no doubt that Information Technologies are playing a relevant role in the research and improvement of the healthcare domain. In the case of Electronic Health Records, several advantages can be mentioned: first, legibility problems due to poor handwriting –which might lead to misunderstandings– are avoided. Moreover, EHRs hold great clinical decision support, by translating practice guidelines into automated reminders and actionable recommendation [10], which can lead to safer, less error-prone, less expensive and higher-quality care. Finally, another advantage is the possibility of exchanging EHRs among different organizations. A patient is likely to receive medical attention from several institutions over his lifetime, so it seems reasonable for each institution to have unrestricted access at any time to the previously recorded patient’s data. Authors in [4] have identified certain problems that can be avoided thanks to an effective exchange of EHRs: Communicating vital information like adverse drug reaction histories can prevent from deaths and other serious consequences. Moreover, providing the clinicians with easy access to patients’ previous test results eliminates unnecessary duplication of tests. Finally, monitoring chronically ill patients, which usually requires great costs and collaboration between many professionals at distinct points of care becomes easier. As beneficial as EHR interoperability may seem, nowadays it is still an unreached goal1 , mainly because Health Information Systems Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010 October 5, 2010, Oslo, Norway Copyright 2010 ACM 978-1-4503-0292-0/10/10 ...$10.00. 1 10 Epsos project in the European community [6] used within the medical institutions have been developed independently, which results in a high number of heterogeneous proprietary models for representing and recording EHR information. One of the most recurring approaches to solve interoperability issues is the use of standards. In the case of EHR interoperability, several standards are under development for this purpose, such as openEHR [16], CEN-13606 [5] and HL7-CDA [9]. The openEHR standard follows a dual model approach for representing EHRs: The Reference Information Model (RIM) contains basic and generic structures for representing EHR information. Terms such as list, table or entry are described in this level. It is a stable model which is not expected to change over the time. However, since the RIM is composed of a small number of classes and they are too general to describe the semantics that clinical terms require, another model is necessary: the archetype model. The archetype model describes knowledge elements, such as Heart Rate or Barthel Index that are created by using and restricting components of the RIM. The CEN standard also follows the aforementioned dual model approach and provides by now a quite simple RIM and few archetypes based on those of openEHR. Finally, HL7-CDA has been developed by HL7 and also follows the layered model approach. More precisely, it provides a RIM and a draft template specification, which in this standard represents the same idea of the openEHR archetypes. Although the idea of using a standard may seem suitable for the desired goal, the interoperability problem remains unsolved unless these standards merge into a single one. Moreover, in [11] three different levels of interoperability that can be considered for EHRs are described: level 1 refers to syntactical interoperability, level 2 to partial semantic interoperability and level 3 to full semantic interoperability. They also express that the research effort should be nowadays oriented to the development of mechanisms that will allow achieving full semantic interoperability, in which case neither language nor technological differences will prevent Health Information Systems to seamlessly integrate the received EHRs into the local model. In general, semantic interoperability is defined as the ability of one computer system to receive some information and interpret it in the same sense as intended by the sender system, without prior agreement on the nature of the exchanged data. In this paper we present a proposal to move towards the notion of full semantic interoperability of EHRs of medical observations, based on semantic web technologies, and more precisely on OWL2 [17] ontologies and corresponding reasoners. These technologies facilitate semantic interoperation between heterogeneous information systems ([15]; [2]) as opposed to other formats for interchanging data –such as XML– which do not deal with the semantics of the exchanged data [7]. Two general approaches for interoperability among systems are described in [12]: Using a canonical model to which the particular systems are linked or aligning the particular models two by two. The proposal presented in this paper is sustained in the former approach and additionally presents the following novel contributions: • The development of the EHROnt ontology, which represents at different levels the definitions of clinical terms that appear in EHRs. At the Canonical level, it contains ontological definitions of EHR statements (in particular of medical observations) and at the Application level, it contains the specializations of the definitions of the Canonical level according to the standards mentioned previously or according to proprietary models of health institutions (it favors the notion of extensibility to different models). • The management of a reasoning mechanism that, using axioms stated in the ontology, infers knowledge that allows the discovery of more relationships among the different models used by the different Health Information Systems (it decreases the need of human intervention). • The provision of one module that facilitates the task of obtaining the definitions of the lower level of EHROnt from the particular EHRs data storage structures and another module that facilitates the task of linking definitions of the lower level to definitions of the upper level (it facilitates seamlessly adaptation of existing Health Information Systems). In the area of EHRs’ interoperability a certain number of related works can be found at present. Among those works closer to our proposal we can mention the following ones: Authors in [13] provide a solution to achieve interoperability between systems that have been developed under the HL7 RIM. However, this proposal requires that the source system has some prior knowledge about the target system and moreover, it does not tackle the communication between systems that use proprietary EHR specifications. In [3] ontology mappings are proposed between pairs of archetype-based models. Moreover, in [14] a software architecture that transforms one openEHR archetype into a CEN-13606 archetype is presented. Ontologies that describe archetype models of both standards, in addition to an integrated ontology are used in the process. Notice that in those works, the features of extensibility and lower grade of user intervention provided by our framework are not supported. In summary in this paper we show a proposal that allows one system to interpret on the fly clinical statements sent by another one –even when they use proprietary formats. We support our claim on the following techniques: • Logic-based descriptions: Representations of clinical statements considered by particular Health Information Systems, described using standards as well as proprietary models, are expressed in our approach by using OWL2 ontology axioms. Moreover, terms in those axioms are related with canonical ontology terms that focus their descriptions on language and technology independent aspects. This approach increases the opportunities of solving the interoperability issue since it relies mainly on semantic aspects. • Automated reasoning: All ontology descriptions, as well as the mappings among elements of the ontology, are expressed in the same formalism OWL2. This uniform representation allows the use of well-known reasoners in order to derive new axioms from the existing ones. Furthermore, the mismatch problem is avoided and automatic integration is facilitated. • The use of formal ontologies as canonical conceptual model, which allows to focus on aspects that are independent of the languages or technologies used to describe EHRs (it favors the notion of semantic interoperability). 11 which the observation was taken; and the state of the patient, which is intended to record the state of the subject of the observation during the observation process. On the other hand, composite observations are composed of two or more observations, either simple or composite. They are intended to represent observations of phenomena such as the Glasgow Coma Scale (GCS) value –which is calculated as the sum of the values obtained from three simple observations: the Eye Response (EyeR), the Motor Response (MotorR) and the Verbal Response (VerbalR)– or the more complex Revised Trauma Score (RTS), a physiological scoring system for predicting death taking into account three measures: the aforementioned Glasgow Coma Scale value, the Systolic Blood Pressure (SysBP) and the Respiration Rate (RespRate). Below, we present some OWL2 axioms that represent classes of medical observations. • Transfer mechanism: A process, guided by the previous two items, is implemented to transform a particular clinical statement from a health institution into a corresponding clinical statement for another health institution. In the rest of the paper we present briefly first, the main features of the EHROnt ontology developed for representing different kinds of medical observations. Then, the main characteristics of the translator and mapping modules are presented in sections 3 and 4 respectively. We finish with some conclusions. 2. CANONICAL REPRESENTATION OF MEDICAL OBSERVATIONS In general, an EHR includes clinical statements such as observations, laboratory tests, diagnostic imaging reports, treatments, therapies, administered drugs and allergies. The different standards mentioned in the previous section reflect those kinds of statements in one or another way. Formally, a clinical statement is an expression of a discrete term of clinically related information that is recorded because of its relevance to the care of a patient [8]. In this paper we focus on the exchangeability of medical observations statements, which are used to record all notionally objective observations of phenomena and patient-reported phenomena, such as physical examinations, laboratory results or basic information about the patient (weight, sex,...). We advocate for representing those observations in one ontology called EHROnt. That ontology is made up of two layers (Canonical layer and Application layer ) that attempt to collect observation statements at different levels of abstraction. This division into layers allows a clearer visualization of the ontology, but it does not imply a technical division of it. The elements of the Canonical layer should be designed by experts in the medical field and they should be considered as a framework agreement. Moreover each element of the Canonical layer may be associated to its corresponding SNOMED code [19]. The elements of the Application layer, describe the medical observations as they are understood in the specific e-health systems. While the Canonical layer will be the same in all versions of EHROnt, the Application layer will be proper to each system. Thus, each health institution will be responsible for creating this layer and relating it to the Canonical layer, using the tools that we have developed to help this process and which will be described in sections 3 and 4, respectively. The representation of the statements described in the EHR standards also belongs to the Application layer. In the EHROnt ontology, the elements that compose EHRs are described as classes and properties using the OWL2 language. Moreover, in the Canonical layer we propose a subdivision of medical observations into two groups depending on their complexity: simple observations and composite observations. Simple observations have a single value and unit of measurement. Additionally, we have also identified three properties that may be relevant at the time of characterizing an observation: the protocol, which records information about how the observation process was carried out, either by indicating a particular clinical protocol (e.g. the Balke protocol for treadmill graded exercise testing) or the medical device used for taking the measurement (e.g. a stethoscope); the anatomical site, to indicate the specific body location in Observation ≡ Simple_Obs ⊔ Comp_Obs Simple_Obs ≡ =0 comp Simple_Obs ⊑ =1 value ⊓ ≤ 1 unit ⊓ ≤ 1 protocol.Protocol ⊓ Comp_Obs ≡ ≥ 2 comp.Observation RTS ≡ Comp_Obs ⊓ ∃comp.GCS ⊓ ∃comp.SysBP GCS ≡ Comp_Obs ⊓ ∃comp.EyeR ⊓ ∃comp.VerbalR EyeR ⊑ Simple_Obs VerbalR ⊑ Simple_Obs MotorR ⊑ Simple_Obs SysBP ⊑ Simple_Obs RespRate ⊑ Simple_Obs ∀state.State ⊓ =1 site.AnatomicalSite ⊓∃comp.RespRate ⊓∃comp.MotorR Additional axioms may exist that associate classes of medical observations to SNOMED codes: RTS ≡ owl:hasValue snomed.{’273885003’} (1) EyeR ≡ owl:hasValue snomed.{’281395000’} (2) In addition to the EHROnt ontology, our framework also uses three auxiliary domain ontologies. As it was pointed out previously, there are three relevant properties that often characterize observations: the protocol, the anatomical site and the state of the patient. As a result, one Protocol ontology is necessary to represent this information in a controlled way. We advocate for using an ontology that comprises classes from the Device and Procedure categories of SNOMED-CT. Moreover, in order to represent anatomical information, the Foundational Model of Anatomy ontology [18] is suggested. Finally, one ontology has been developed for describing information about the state of the patient, such as the level of exertion (low, medium, high intensity) or the position of the patient (standing, sitting,...). It is up to the particular systems whether to use these same auxiliary ontologies or to choose other ones. In the latter case, mappings with the proposed auxiliary ontologies should be created. Finally, our ontology driven approach can present some similarities with the Knowledge Discovery Metamodel (KDM) notion used in the Architecture Driven Modernization (ADM) [1]. In our case, knowledge is obtained from existing data sources. 12 3. TRANSLATOR MODULE Each health institution has its own information system and in the majority of the cases it deals with a proprietary EHR representation. However the interoperability opportunities increase if an ontological representation of the proprietary representations is obtained, because the shared logic-based representation allows formal inference of implicit knowledge. For that reason we have developed a translator module that is in charge of building the Application layer of the EHROnt ontology for each proprietary information system. In many cases this module will receive as input a relational database schema but in other cases it may receive schemata for semi-structured data sources or plain files. The output of the translator module is a description mapping D = hS, O, Mi that consists of a source schema S, a set of OWL2 axioms O that comprises the Application layer corresponding to the source S, and a valid mapping M. The set of ontology axioms O is the semantic description of source S and the third component M is a set of correspondences of the form hC, CSi, hP, P Si, where C, P , respectively, are class and property names appearing in O, and CS, P S are sentences, expressed in an appropriate language for the source schema S, that define sets of ground values. We can consider a universal domain of interpretation ∆ and then an extension function ε that associates a set CS ε ⊆ ∆ to every CS sentence, and associates a correspondence P S ε ⊆ ∆ × ∆ to every P S sentence. The universal domain ∆ represents the real world objects of an actual extension of the considered source S. Given some basic correspondences of the form hC, CSi, hP, P Si (let us write M(C) = CS, M(P ) = P S), it is straightforward to define compositionally the correspondences for class expressions Cexp and property expressions P exp (let us write M(Cexp) and M(P exp)), following the same technique as interpretation definitions in description logics. Then, we say that a set of correspondences M satisfies a OWL2 axiom C ⊑ Cexp if M(C)ε ⊆ M(Cexp)ε . Analogously for P ⊑ P exp. Notice that any equivalence axiom (using ≡) can be expressed as a pair of subsumption axioms (using ⊑ and ⊒). We say that M is a valid mapping if its correspondences satisfy the axioms in O for any possible extension of the source schema S. The translation process is divided into two main steps: a semi-automatic one and an edition one. Semi-automatic process We present this step for the case of having a relational database schema as input. In fact this case is the most complete from the translation perspective. First of all, relations of the relational schema are translated into OWL2 classes, and attributes into properties that have as domain the class related to the relation in which it is defined and as a range the type of the attribute. Moreover integrity constraints are translated into descriptions associated with the properties. Once the previous task is accomplished, the next one involves enriching the obtained descriptions by using information about dependencies (inclusion, exclusion and functional dependencies), null values and semantic properties (that correspond to domain information for attribute values). This type of information is provided most of the time by the health systems’ administrators, because it is rarely available in the database system. Health systems’ administrators are supposed to be technically prepared people who have a deep knowledge of the source information system. All the previous types of properties are applied in the following sequence: first inclusion properties; then when the input relational schema is not in second or third normal form, functional dependencies are used to create new classes; next exclusion dependencies are exploited and last integrity constraints and domain information for attribute values are considered. For example, a particular registration for Revised Trauma Score values may consist of two relational tables according to the following schema: RTS-Table(code, RR, SBP, GCS, total) GCS-Table(code, ER, MR, VR) RTS-Table.GCS ⊆ GCS-Table(code) Then, some axioms obtained using the mentioned inclusion property, for the Application layer for that information system, are the following: sa:RTS ≡ ∃sa:hasRR.sa:RR ⊓ ∃sa:hasSBP.sa:SBP sa:GCS ≡ ∃sa:hasER.sa:ER ⊓ ∃sa:hasMR.sa:MR ⊓∃sa:hasGCS.sa:GCS ⊓ ∃sa:hasTotal.float ⊓∃sa:hasVR.sa:VR Edition process The goal of this step is to permit the health system administrator create the Application layer of the ontology in a flexible way. The administrator can choose to start from scratch or from the ontological definitions obtained using the semiautomatic module. In any case the health system administrator can add new axioms to obtain the desired result. For example, the edition process can be used to assign SNOMED codes to the classes created by the semi-automatic process. sa:RTS ≡ owl:hasValue sa:hasSnomed.{’273885003’} (3) sa:ER ≡ owl:hasValue sa:hasSnomed.{’281395000’} (4) In summary the translator module obtains semantic descriptions of proprietary formats used to represent EHRs, and it has to capture –with the health system administrator’s collaboration– semantics that are hidden, in order to make them explicit. 4. MAPPING MODULE This module is in charge of managing the mappings between the terms of the Application layer and the terms of the Canonical layer. In our context, an integration mapping is a structure I = hO, G, Mi where O is a set of OWL2 axioms that comprises the Application Layer corresponding to a health care institution, G is the set of OWL2 axioms for the Canonical Layer, and M is a set of mapping axioms of the form C ⊑ Gexp, C ⊒ Gexp, C ≡ Gexp where C is a class name from O, and Gexp is a OWL2 class expression using only terms from G. Furthermore, M may include generalized property inclusion axioms as provided by OWL2, as well as pathMappings, that relate one path in the Application Layer with another path 13 in the Canonical layer. A path is a valid composition of properties. The Mapping module receives as input a set of basic mapping axioms, specifically defined by the system administrator, that relate classes or properties of both layers, such as: sa:hasSnomed ≡ snomed [3] (5) [4] These basic mapping axioms are incorporated into the ontology and, with the help of a reasoner, new relationships between terms in the Application layer and those in the Canonical layer are inferred. For instance, applying the basic mapping axiom 5 to axioms 3 and 4 infers: sa:RTS ≡ owl:hasValue snomed.{’273885003’} (6) sa:ER ≡ owl:hasValue snomed.{’281395000’} (7) [5] [6] [7] and consequently, applying axioms 1 and 2 from the Canonical layer (see section 2) the equivalence mappings sa:RTS ≡ RTS and sa:ER ≡ EyeR are obtained. All those mappings are expressed through OWL2 axioms that put a wide range of mapping scenarios within reach of health systems’ administrators. Following with the process, the Mapping Module checks whether some path mappings may exist. It is captured from the definition of sa:RTS in the Application layer that there is a path sa:hasGCS·sa:hasER from class sa:RTS to class sa:ER. Moreover, it is captured from the Canonical layer that there is a path comp·comp from class RTS to class ER. Since the Mapping Module has already discovered an equivalence mapping between the source classes of both paths (sa:RTS ≡ RTS) and also another equivalence mapping between their target classes (sa:ER ≡ EyeR), the Mapping Module suggests that there may be a path mapping between those paths. The system administrator may then either accept or delete the suggested path mapping. [8] [9] [10] [11] [12] [13] 5. CONCLUSION The use of Electronic Health Records has brought several advantages to the healthcare domain. However there is still much work to do regarding certain issues such as EHR interoperability. We have presented one approach that supports the notion of interoperability of medical observations sustained in two techniques: one, logic-based ontology descriptions of EHRs statements as well as of mappings defined among elements of the ontologies and two, automated inference on ontology descriptions. [14] [15] 6. ACKNOWLEDGMENTS This work is supported by the Spanish Ministry of Education and Science (TIN2007-68091-C02-01) and the Basque Government (IT-427-07). The work of Idoia Berges is also supported by the Basque Government (Programa de Formación de Investigadores del Departamento de Educación, Universidades e Investigación). [16] [17] [18] 7. REFERENCES [1] Architecture-Driven Modernization, 2010. Available at http://adm.omg.org. [2] I. Berges, J. Bermudez, A. Goñi, and A. Illarramendi. Semantic Web Technology for Agent Communication Protocols. In Proceedings of the 5th European [19] 14 Semantic Web Conference (ESWC 2008), pages 5–18, Tenerife, Spain, 2008. V. Bicer, O. Kilic, A. Dogac, and G. B. Laleci. Archetype-Based Semantic Interoperability of Web Service Messages in the Health Care Domain. Int’l Journal on Semantic Web & Information Systems, 1(4):1–22, 2005. L. Bird, A. Goodchild, and Z. Z. Tun. Experiences with a Two-Level Modelling Approach to Electronic Health Records. Journal of Research and Practice in Information Technology, 35(2):121–138, 2003. EN 13606-1: Electronic Health Record Communication, 2007. The Epsos project. http://www.epsos.eu/. J. Hefflin and J. Hendler. Semantic Interoperability on the Web. In Proceedings of Extreme Markup Languages 2000, pages 111–120. Graphic Communications Association, 2000. HL7 Version 3 Standard: Clinical Statement Pattern, Release 1. Available at http://www.hl7.org/v3ballot/html/domains/uvcs/uvcs.htm. HL7-CDA, 2009. Available at http://www.hl7.org. L. Hoffman. Implementing Electronic Medical Records. Communications of the ACM, 52(11):18–20, nov 2009. D. Kalra, P. Lewalle, A. Rector, J. M. Rodrigues, K. A. Stroetmann, G. Surjan, B. Ustun, M. Virtanen, and P. E. Zanstra. Semantic Interoperability for Better Health and Safer Healthcare. Technical report, European Commission, Jan. 2009. V. Kashyap and A. P. Sheth. Semantic and schematic similarities between database objects: A context based approach. The Very Large Databases Journal, 5(4):276–304, 1996. O. Kilic and A. Dogac. Achieving Clinical Statement Interoperability using R-MIM and Archetype-based Semantic Transformations. IEEE Transactions on Information Technology in Biomedicine, to appear., 2009. C. Martı́nez-Costa, M. Menárguez-Tortosa, R. Valencia-Garcı́a, J. Maldonado, and J. T. Fernández-Breis. Transformación Automática de Arquetipos UNE-EN 13606 y openEHR para Facilitar la Interoperabilidad Semántica. In Inforsalud 2009, Madrid, Spain, mar 2009. L. Obrst. Ontologies for Semantically Interoperable Systems. In Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management, pages 366–369, New Orleans, Louisiana, USA, nov 2003. ACM. openEHR, 2009. Available at http://www.openehr.org. OWL2 Web Ontology Language. http://www.w3.org/TR/2009/REC-owl2-overview20091027/. C. Rosse and J. L. V. Mejino. A Reference Ontology for Biomedical Informatics: the Foundational Model of Anatomy. Journal of Biomedical Informatics, 36:478–500, 2003. SNOMED, 2009. Available at http://www.ihtsdo.org/snomed-ct/. A Process Model Discovery Approach for Enabling Model Interoperability in Signal Engineering Wikan Danar Sunindyo, Thomas Moser, Dietmar Winkler, Stefan Biffl Christian Doppler Laboratory for Software Engineering Integration for Flexible Automation Systems Vienna University of Technology Favoritenstrasse 9-11/188 1040 Vienna, Austria +43 588 01 - 18801 {wikan,moser,winkler,biffl}@ifs.tuwien.ac.at ABSTRACT 1. INTRODUCTION In automation systems engineering, signals are considered as common concepts for linking information across different engineering disciplines, such as mechanical, electrical, and software engineering. Signal engineering is facing tough challenges in managing the interoperability of heterogeneous data tools and models of each individual engineering discipline, e.g., to make signal handling consistent, to integrate signals from heterogeneous data models/tools, and to manage the versions of signal changes across engineering disciplines. Currently, signal changes across engineering disciplines are primarily managed manually which is costly and error-prone. The main contribution of this paper is the signal change management process model as an input for semantic integration of engineering tools and models to support (semi) automated signal change management. Major result was that the process model discovery approach well supports the discovery of semantic integration requirements across heterogeneous engineering tools and models more efficient compared to the manual signal change management. Complex automation systems, like power plants or car manufacturing workshops, typically involve several different engineering disciplines, e.g., mechanical engineering, electrical engineering, and software engineering that should collaborate to achieve their goals. In such complex automation systems, stakeholders from different engineering fields usually apply individual and discipline-specific tools and models for task execution. Nevertheless, information sharing and collaboration across disciplines and data exchange are pre-conditions for successful project execution. Thus it exists the need for interoperability between different tools and models of such complex automation systems. Currently, a lot of research is done on achieving interoperability between heterogeneous systems and notations [6, 9, 13]. However, most of the approaches are still facing the difficulties involved in overcoming their differences, the lack of consensus on common required standards, and the shortage regarding proper mechanisms and tools [7, 11]. Results of our observation in industry identified signals are common concepts in complex automation systems that link information across different engineering disciplines, e.g., mechanical interfaces, electrical signals (wiring), and software I/O variables. The application field called “Signal engineering” deals with managing signals from different engineering disciplines and is facing some important challenges, e.g., (1) to make signal handling consistent, (2) to integrate signals from heterogeneous data models/tools, and (3) to manage versions of signal changes across engineering disciplines. Categories and Subject Descriptors D.2.9 [Software Engineering]: Management – software configuration management, software process models (e.g., CMM, ISO, PSP). D.2.12 [Software Engineering]: Interoperability I.6.5 [Simulation and Modeling]: Model Development – Modeling methodologies. To overcome these challenges, one needs to define an interoperability model that illustrates the signal data models and tools from each engineering field as well as their interactions. However, manual design of an interoperability model from different engineering fields is costly and error prone. In the manual model design, all models and required information have to be collected from the domain expert of each engineering fields. Then, the domain expert needs to create the model and its interactions based on the different models collected and cross check with each stakeholder whether the model is correct and the interactions between different engineering fields are correct also. One should do this work and refinement repetitively to obtain conflict-free models. Sometimes, it is quite hard to get a final model that fulfills the requirement from every party, since the requirements themselves could change over the time. General Terms Management, Design. Keywords Signal Change Management, Model Interoperability, Automation Systems Engineering. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10...$10.00. The main contribution of this paper is the proposition of a process model discovery approach to identify the process model for a 15 manage signal change and enhance decision making characteristics of PLM. VISE is a highly integrated application of recent CAD/CAM, human computer, collaborative, product data management, Internet portal, and intelligent information processing techniques in PLM system. The authors introduce the concept of change affect zones (CAZ). CAZ comprises a set of engineering objects on which a change may have any effect. Objects in an affect zone may be both inside and outside of a virtual space. So, the new changes/modifications or conflicts will be handled in CAZ before they are executed. exemplary signal change management process and find out the requirements of semantic integration between heterogeneous data models and tools. By using this approach we are able to discover the interoperability model based on the actual data. This model can be useful for illustrating the interactions between engineering fields and detecting the needs of semantic integration in the signal change management. Major results show that by using the process model discovery approach, the requirements of semantic integration across heterogeneous tools and data models from different engineering fields can be discovered efficiently. This model can support further semantic integration and interoperability of the models, e.g., by using the Engineering Knowledge Base (EKB) approach [4, 12]. 2.2 Process Modeling and Analysis Process analysis approaches focus on analyzing (engineering) process data collected during the systems operation. Process analysis approaches have been applied to some types of complex systems, for example workflow management systems, Enterprise Resource Planning (ERP), and Customer Relationship Management (CRM) systems. Van der Aalst et al. [16] used workflow technologies to illustrate the structure of the operational processes of a system. Workflow technology provides event data that could be useful for process analysis in software engineering (SE) by enabling particular models that link basic tool events to process/workflow events [16]. The remainder of this paper is structured as follows. Section 2 summarizes related work on signal change management, semantic integration and process modeling and analysis. Section 3 identifies the research issues. Section 4 develops the solution approach to discover model for signal change management in complex automation systems. Section 5 describes the evaluation based on signal change management processes. Section 6 discusses benefits and limitation of model discovery approach; and finally section 7 concludes the paper. Van der Aalst et al. [16] also used stored events, which refer to tasks and process cases originating from people/tools/systems, to monitor and analyze real workflows with respect to designed workflows. This approach is called process mining and can be used for process discovery, performance analysis, and conformance checking. The approach has been implemented in the open source tool ProM2 and can be used to discover the process model based on the available event log, analyze the performance of the processes, and suggest possible process improvement candidates. 2. RELATED WORK This section summarizes related work on signal change management, semantic integration technologies and process analysis approach as ways to build models for heterogeneous engineering areas. 2.1 Signal Change Management According to the Meriam Webster dictionary1, a signal can be defined as an object used to transmit or convey information. In this paper we define a signal as a common concept for linking information between disciplines. Thus, signals are not limited to electrical signal (wiring) in electrical engineering, but also include mechanical interfaces in mechanical engineering and software I/O variables in software engineering. In complex automation systems, we define relationships between different kinds of signals from different engineering fields and use them to collaborate and communicate. Ferreira and Ferreira [8] proposed a reusable workflow engine based on Petri Net theory as a basis for workflow management. They introduced the workflow kernel, a prototype implementation of common workflow functionality which can be abstracted and reused in systems or embedded in applications intended to become workflow-enabled. The workflow kernel is based on common workflow functionality from several workflow engines, while the Petri net theory can be used as a process representation language for process analysis. Formerly, domain experts used manual change management approaches like in [1] to manage signal changes between different engineering fields. Manual changes use documents to manage changes between different engineering fields in the system. By using a primarily manual approach, the researchers collect the signal lists from each engineering field and then connect relationships between different engineering fields manually. If there is any signal change in one document, then the change has to be mapped to the relationship document and all relevant stakeholders have to find out which other signals in different engineering fields could be affected with this change. Manual change handling is costly and error prone. Thus, signal change handling automation is a promising research area to improve product and process quality. Sunindyo et al. [15] proposed an approach to monitor, analyze, and improve tool-based engineering processes. Main idea is to generate an interoperability model based on event-based process analysis activities to link heterogeneous software engineering tools. 2.3 Semantic Integration Semantic integration is an approach to solve problems from an intention to share data across disparate and semantically heterogeneous data [9], which are including (a) the detection of duplicate entries, (b) the matching of ontologies or schemas, (c) the Research on signal change management in product lifecycle management (PLM) context is done by e.g., Horvath and Rudas [11]. They propose a virtual intelligent space for engineering (VISE) to 1 2 http://www.merriam-webster.com 16 http://www.processmining.org Figure 1. Challenges in Signal Engineering. with the new tools and data formats that make their work even more difficult. modeling of complex relations in different data sources, and (d) the reconciliation of inconsistencies [13]. One of the most important and the most actively studied problems in semantic integration is how to establish semantic correspondences (mapping) between vocabularies of different data sources [6]. Hence, the application of ontologies as semantic web technologies to manage knowledge in specific domains is inevitable. There are five reasons to develop ontology, i.e., (a) to make domain assumptions more explicit, (b) to share common understanding of the structure of information among software agents or people, (c) to enable reuse of domain knowledge, (d) to analyze domain knowledge, and (e) to separate domain knowledge from the operational knowledge [14]. Other challenges in the signal change management include how to integrate the signal data originating from heterogeneous data models and tools. Figure 1 shows the requirements of mechanical engineers, electrical engineers and software engineers to share related signal data. The mechanical engineer uses different formats of data than the electrical engineer and the software engineer do. The challenge is how to integrate signals from heterogeneous data models/tools (1). By using a so-called “virtual common data model” [12], the different engineers can share their related data from electrical to mechanical signals and to the software variables. The “virtual common data model” becomes a foundation for mapping proprietary tool-specific engineering knowledge and more generic domain-specific engineering knowledge to support transformation between related engineering tools. It is “virtual” because actually there is no need to provide a separate repository to store the common data model. The management of the common data model with respect to different engineering fields is done via a specified mapping mechanism. The mechanism of the “virtual common data model” approach includes 5 steps: (a) Extraction of tool data from each engineering field; (b) Storage of extracted tool data into its own model; (c) Description of the tool knowledge for each engineering field’s tool: (d) Description of common domain knowledge: (e) Mapping of tool knowledge to the common domain knowledge. This work should be done carefully to obtain a complete list of signal mappings from the electrical to the mechanical and the software engineers. In real systems, stakeholders could also include people from other engineering fields. Moser et al. [12] introduced the Engineering Knowledge Base (EKB) framework as a semantic web technology approach to address challenges from data heterogeneity which is applied in the production automation domain [12]. Biffl et al. [4] also used the EKB framework for solving similar problem in the context of Open Source Software Projects. This EKB framework is applicable to solve semantic heterogeneity problems in other automation engineering systems. 3. RESEARCH ISSUES Complex automation systems, like power plants, need to handle a high amount of data, e.g., up to 40,000 signals originating from different engineering fields. Stakeholders need to manage these signals to enable signal data consistency within the project. Thus, efficient and effective signal data management approaches are required to handle signal changes properly. In addition, individual engineers may not pay attention to signal data management but keep focused on their individual engineering work within their discipline, i.e., engineers from different fields don’t have to deal This semantic integration challenge can be solved for example by applying semantic integration approaches like the Engineering Knowledge Base (EKB) framework [4, 12]. Other challenges are 17 to manage version of signal changes across engineering disciplines and to manage common concepts based on the semantic integration (2). The research question is how to discover the process model from the actual data provided by heterogeneous engineering fields? Based on this research question, we can discover the structure across heterogeneous data models/tools and their interactions and we can identify the need for the semantic integration to link heterogeneous data models and tools. 5. RESULTS For discovering the interoperability model for signal change management processes in the design time and runtime of complex automation systems, we collect process event data from each engineering field, e.g., electrical, engineering, and software engineering. By using the ProM tool, we conduct an analysis to discover the underlying process model by applying the Alpha Algorithm [5] to the collected data. The work of Alpha Algorithm is based on discovering transitions which are causally related between different event traces. From collected event log data as an input, we can discover a set of related transition from all event traces. For each tuple (A,B) in this set, each transition in set A causally relates to all transitions in set B, and no transitions within A (or B) follow each other in some firing sequence. We refine the set by taking only the largest elements with respect to set inclusion. Linking heterogeneous disciplines can enable a so-called end-toend test (see Figure 2) to trace signals from hardware sensors to software variables across system borders. This approach support defect detection during development and changes. Figure 2. Interaction between different engineering fields. Figure 2 shows the interaction between different engineering fields in managing the signal changes. Three different engineers, namely mechanical engineer, electrical engineer, and software engineer, typically share a lot of signals that are connected to each other. These relationships should be maintained in such an Engineering Knowledge Base, such that when some changes happen in one engineering field, they can be propagated to other engineering field automatically or semi-automatically. The evaluation is done by comparing the manual signal change management process and the automated/semi-automated signal change management process after applying the process model discovery approach to reveal semantic requirements in engineering processes across different engineering fields. Figure 3. Manual Signal Change Management. 4. USE CASE The output is a workflow net that connected each event trace to other related event traces via transitions [5] To show how to manage interoperability between engineering tools in the complex automation systems, we use a signal change management use case from mechanical to electrical and software engineer. Figure 3 illustrates how to merge different signals (and changes) and resolve conflicts between signals coming from different disciplines manually. The conceptual steps are as follows: Error! Reference source not found. shows the results of the model discovery analysis by using the Alpha Algorithm [5]. Here we have 4 different scenarios of the process model of the signal change management process. (1) no conflict: the mechanical engineer executes changes and performs a manual difference analysis to other engineering fields via interaction between mechanical engineering plan, electrical plan, and software development environment. The mechanical engineer manually propagates changes to (1) The mechanical engineer executes changes in the mechanical plan that will also affect the tool data. (2) The mechanical engineer manually make difference analysis for interaction with other engineering tools, to check whether there is any conflict with other engineering tools data. (3) The mechanical engineer makes manual propagation to mechanical engineering tools and software engineering tools. (4) The mechanical engineer and the software engineer execute changes in their mechanical plan and software development environment. 18 Figure 4. Signal Change Management Processes Model. ASB via connector components, which allows addressing all deployed components as services via the ASB. The ASB integrates components in both office-like design and onsite environments with a common integration architecture but different implementations [3]. In signal change management, the different tools to manage different signals from heterogeneous engineering fields are connected to the ASB via connector components. Each tool is treated as a component. The communication between components is also managed by the ASB, so when there is a signal changed in one tool it will be communicated via ASB and distributed to other tools automatically. other tools. The electrical engineer and software engineer execute changes in their environments. (2) normal conflict: after manual difference analysis the mechanical engineer starts managing conflicts and resolves the conflicts, by modifying the old signal with the new signal. If the conflicts are resolved, the mechanical engineer transforms the change to other engineering fields. (3) critical conflict: almost the same as the normal conflict. The difference is in the action after managing the conflict is over. The mechanical engineer has to remove the signal and send a notification to the electrical engineer and software engineer. The electrical engineer and software engineer will consider this as a critical conflict and decide whether to accept the signal removal or reject it. The EKB is a semantic-web-based framework, which supports the efficient integration of information originating from different expert domains without a complete common data schema [12]. The EKB framework stores the engineering knowledge in ontologies and provides semantic mapping services to access design-time and run-time concepts and data. The EKB framework aims at making tasks, which depend on linking information across expert domain boundaries, more efficient [12]. The EKB is connected to other tools via the ASB. In the signal change management, the EKB plays a role as semantic integration between different signal data from heterogeneous engineering fields. Each signal is stored in the ontology as a base of EKB together with its relationships to other signals. The changing of signal in the ontology means the modification of the signal entity in the ontology and its relationship. (4) looping condition: if the electrical engineer and software engineer reject the signal removal, then there will be any option to argue on signal change on the electrical engineers side. Hence, the situation loops back to the condition before the change is transformed to other engineering fields. From Figure 4, we can suggest for signal change management process improvement, by collecting and integrating the heterogeneous signal data models and tools from different engineering fields using automation service bus (ASB) [3] and EKB [4, 12]. ASB technically integrates heterogeneous tools while the EKB semantically integrate the heterogeneous data models of electrical engineers, mechanical engineers, and software engineers. The result of the signal change management improvement can be seen in Figure 5. It shows the usage of ASB and EKB to improve the signal change from mechanical engineer to electrical engineer and software engineer. (1) The mechanical engineer executes change in his mechanical plan. (2) The mechanical engineer checks in the change and makes difference analysis by using ASB and EKB. (3a & 3b) The electrical engineer and software engineer check out changes from ASB and EKB. ASB is an approach similar to the “Enterprise Service Bus” in the business IT context [10] for complex automation systems engineering. The current “Enterprise Service Bus” approach is applied in the business IT context and the most of its implementations are making some design assumptions, e.g., service will always be online and resources (computing, network bandwidth, memory) are not the main issues of the design. These assumptions are not suitable with the requirements of the signal change management. Thus the ASB has to be designed more lightweight and be able to bridges technical gaps between engineering processes, models and tools for quality and process improvements [2]. Engineering components are connected to the 19 7. CONCLUSION AND FURTHER WORK Collaboration and interaction between different engineering fields are critical issues in heterogonous engineering environments because individual disciplines apply different tools and data models. This heterogeneity hinders efficient collaboration and interaction between various stakeholders, e.g., mechanical, electrical, and software engineers. Semantic integration based the purposed model enables data exchange based on common concepts, e.g., signals, and increase collaboration efficiency and effectiveness. In addition, process observation based on event data is a promising approach for (a) identifying the current (real) process workflow, (b) measurement data, and (c) is the foundation for process analysis and improvement. In this paper, we have explained the usage of a process model discovery approach to derive the model immediately from the actual engineering process data and identified improvement options for increasing process quality. We applied a signal change management process to illustrate (a) the basic concepts, (b) semantic integration approaches, and (c) process improvement based on collected and analyzed event data. Figure 5. Signal Change Management by using ASB & EKB. We found that this approach is easier to be adapted in alreadyrunning systems which consist of different tools and data models for each engineering area. This approach can also be adapted and generalized to other model-driven interoperability systems. 6. DISCUSSION In this section, we discuss the benefits and limitations of the model discovery approach compare to the manual approach. Future works will include the application of the model discovery approach to other problem domains and exploring the idea how to detect defects in signal change management and how to make decision on signal change management based on prior experience. We will develop a framework to prepare process model discovery for signal change management in different engineering fields, such that the process model discovery and other process analysis approach can be implemented more effective and more efficient. The benefits of the model discovery approach are as follows. (1) The model, which is obtained from the model discovery approach, is more precise and accurate because it is generated from actual event data from different engineering processes. (2) The model is easier to maintain and change. If some modifications of the system happen, we can collect the new event log data and run the process mining tool to get the latest model. (3) This model can be used to learn and understand the whole signal change management processes in the system. It also support model-driven interoperability for other purposes, e.g., decision making and signal defect detection. 8. ACKNOWLEDGMENTS This work has been supported by the Christian Doppler Forschungsgesellschaft and the BMWFJ, Austria. This work has been partially funded by the Vienna University of Technology, in the Complex System Design and Engineering Lab. The limitations of this approach are as follows. (1) We should provide complete event log data from each engineering processes for model discovery. (2) The ProM tool has limitations in managing the inputs format, so we should transform the processes event log data to ProM format (Mining XML). 9. REFERENCES [1] Akerblom, R. A management system for quality development. Requirements, methods and traps. In Proceedings of the Telecommunications Energy Conference, 1997. INTELEC 97., 19th International (1923 Oct 1997, 1997). [2] Biffl, S. and Schatten, A. A Platform for Service-Oriented Integration of Software Engineering Environments. In Proceedings of the Eight Conference on New Trends in Software Methodologies, Tools and Techniques (SoMeT 09) (2009). IOS Press. [3] Biffl, S., Schatten, A. and Zoitl, A. Integration of heterogeneous engineering environments for the automation systems lifecycle. In Proceedings of the 7th From this discussion, it is possible for other model driven interoperability systems to adapt the model discovery approach to get their process model immediately, rather than building from the scratch and improve later via several iterations. The alternative of the process model discovery approach is making interview sessions for each engineer from different engineering fields to acquire the requirements to make a model. This model should be consulted between engineers to obtain an integrated view on the model from different engineering perspectives that support interoperability between different engineering fields. 20 IEEE International Conference on Industrial Informatics (INDIN 2009) (23-26 June 2009, 2009). [4] Biffl, S., Sunindyo, W. D. and Moser, T. Semantic Integration of Heterogeneous Data Sources for Monitoring Frequent-Release Software Projects. In Proceedings of the 4th International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2010) (2010). IEEE Computer Society. [5] de Medeiros, A. K. A., van Dongen, B. F., van der Aalst, W. M. P. and Weijters, A. J. M. M. Process Mining: Extending the alpha-algorithm to Mine Short Loops. Eindhoven University of Technology, Eindhoven, 2004. [6] Doan, A., Noy, N. F. and Halevy, A. Y. Introduction to the special issue on semantic integration. SIGMOD Rec., 33, 4, 2004, 11-13. [7] Elvesæter, B., Hahn, A., Berre, A.-J. and Neple, T. Towards an Interoperability Framework for Model-Driven Development of Software Systems. 2006. [8] Ferreira, D. M. R. and Ferreira, J. J. P. Developing a reusable workflow engine. J. Syst. Archit., 50, 6, 2004, 309-324. [9] Halevy, A. Why Your Data Won't Mix. Queue, 3, 8, 2005, 50-58. [10] Hohpe, G. and Woolf, B. Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions. Addison-Wesley Professional, 2003. [11] Horvath, L. and Rudas, I. J. Information Content Orientated Product Model Assisted Change Management. In Proceedings of the 5th International Symposium on [12] [13] [14] [15] [16] 21 Intelligent Systems and Informatics (SISY 2007) (Subotica, 24-25 Aug. 2007, 2007). Moser, T., Biffl, S., Sunindyo, W. D. and Winkler, D. Integrating Production Automation Expert Knowledge Across Engineering Stakeholder Domains. In Proceedings of the 4th International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2010) (Krakow, Poland, 2010). Andrzej Frycz Modrzewski Cracow College. Noy, N. F., Doan, A. H. and Halevy, A. Y. Semantic Integration. AI Magazine, 26, 1, 2005, 7-10. Noy, N. F. and McGuinness, D. Ontology Development 101: A Guide to Creating Your First Ontology. Stanford Knowledge Systems Laboratory, 2001. Sunindyo, W. D., Moser, T., Winkler, D. and Biffl, S. Foundations for Event-Based Process Analysis in Heterogeneous Software Engineering Environments. In Proceedings of the 36th Euromicro Conference on Software Engineering Advanced Applications (SEAA 2010) (Lille, France, 1-3 September 2010, 2010). IEEE Computer Society. van der Aalst, W. M. P., Weijters, A. J. M. M. and Maruster., L. Workflow Mining: Discovering Process Models from Event Logs. IEEE Transactions on Knowledge and Data Engineering, 16, 9, 2004, 11281142. Efficient Analysis and Execution of Correct and Complete Model Transformations Based on Triple Graph Grammars Frank Hermann Hartmut Ehrig Ulrike Golas Department of Theoretical Computer Science and Software Technology Technische Universität Berlin Berlin, Germany Department of Theoretical Computer Science and Software Technology Technische Universität Berlin Berlin, Germany Department of Theoretical Computer Science and Software Technology Technische Universität Berlin Berlin, Germany frank(at)cs.tu-berlin.de ehrig(at)cs.tu-berlin.de Fernando Orejas ugolas(at)cs.tu-berlin.de Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Barcelona, Spain orejas(at)lsi.upc.edu ABSTRACT Keywords Triple Graph Grammars are a well-established, formal and intuitive concept for the specification and analysis of bidirectional model transformations. In previous work we have formalized and analyzed already termination, correctness, completeness, local confluence and functional behaviour. In this paper, we show how to improve the efficiency of the execution and analysis of model transformations in practical applications by using triple rules with negative application conditions (NACs). In addition to specification NACs, which improve the specification of model transformations, the generation of filter NACs improves the efficiency of the execution and the analysis of functional behaviour supported by critical pair analysis of the tool AGG. We illustrate the results for the well-known model transformation from class diagrams to relational database models. Model Transformation, Triple Graph Grammars, Functional Behaviour 1. INTRODUCTION Categories and Subject Descriptors D.2.1 [Software Engineering]: Requirements/Specifications; D.2.12 [Software Engineering]: Interoperability; I.6.5 [Simulation and Modeling]: Model Development Modeling methodologies General Terms Theory, Design, Verification Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI 2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10 ...$10.00. 22 Model transformations based on triple graph grammars (TGGs) have been introduced by Schürr in [19]. Operational rules are automatically derived from the triple rules and used to define various bidirectional model transformation and integration tasks that are mainly focused on model-to-model transformations Since 1994, several extensions of the original TGG definitions have been published [20, 17, 10], and various kinds of applications have been presented [22, 11, 16]. Besides model transformation TGGs are also applied for model integration [1] and model synchronization [8] in order to support model driven interoperability. For source-to-target model transformations, so-called forward transformations, forward rules are derived which take the source graph as input and produce a corresponding target graph. Similarly, backward rules are used for target-tosource transformations making the transformation approach bidirectional. Major properties expected to be fulfilled for model transformations are termination, correctness, completeness, efficient execution and — for several applications — functional behaviour. Termination, completeness and correctness of model transformations have been studied already in [6, 3, 7, 4]. Functional behaviour of model transformations based on triple graph grammars has been analyzed for triple rules without application conditions in [15] using forward translation rules that use additional translation attributes for keeping track of the elements that have been translated so far. The main aim of this paper is to extend the analysis techniques for functional behaviour in [15] to the case of triple rules with negative application conditions (NACs) and to improve the efficiency of analysis and execution of TGS model transformations studied in [3, 4, 7, 15]. For this purpose, we distinguish between specification NACs and filter NACs. Specification NACs have been introduced already in [7, 4], where triple rules and corresponding derived source and forward rules have been extended by NACs in order to improve the modeling power. Exemplarily, we show that NACs improve the specification of the model transformation CD2RDBM from class diagrams to relational data base models presented in [6, 3]. Therefore, we extend the forward translation rules introduced in [15] by corresponding NACs and show that model transformations based on forward translation rules with NACs are equivalent to model transformations studied in [7, 4], such that main results concerning termination, correctness and completeness can be transferred to our new framework (see Thm. 1). In order to analyze functional behaviour we can use general results for local confluence of transformation systems with NACs in [18]. But in order to improve efficiency in the context of model transformations we introduce so-called filter NACs. They filter out several misleading branches considered in the standard analysis of local confluence using critical pairs. In our second main result (see Thm. 2) we show how to analyze functional behaviour of model transformations based on forward translation rules by analyzing critical pairs for forward translation rules with filter NACs. Moreover, we introduce a strong version of functional behaviour, including model transformation sequences. In our third main result (see Thm. 3) we characterize strong functional behaviour by the absence of “significant” critical pairs for the corresponding set of forward translation rules with filter NACs. In Sec. 2 we introduce model transformations based on TGGs with specification NACs and show the first main result on termination, correctness, and completeness. In Sec. 3 we introduce forward translation rules with filter NACs and present our main results on functional and strong functional behaviour. Based on these main results we discuss in Sec. 4 efficiency aspects of analysis and execution. Related work and a conclusion are presented in Sections 5 and 6. The full proofs of the main results are given in [14]. 2. TGC 0..1 0..* parent TGT Class Table CT name: String 1 attrs 1 0..* src 1 0..* name: String 1 dest 0..* 0..* Association Attribute cols 0..* Column type: String name: String 1 AC name: String is_primary: boolean 1 0..1 0..1 fcols 0..* fkeys 0..1 pkey FKey AFK name: String 0..* 1 references type 1 PrimitiveDataType name: String Figure 1: Triple type graph for CD2RDBM ple model transformation from class diagrams to database models. The source component TG S defines the structure of class diagrams while in the target component the structure of relational database models is specified. Classes correspond to tables, attributes to columns, and associations to foreign keys. Throughout the example, originating from [6], elements are arranged left, center, and right according to the component types source, correspondence and target. Morphisms starting at a correspondence part are specified by dashed arrows. The denoted multiplicity constraints are ensured by the triple rules in Figs. 3 and 5. Note that the case study uses attributed triple graphs based on E-graphs as presented in [6] in the framework of weak adhesive HLR categories. We refer to [2] for more details on attributed graphs. L tr R MODEL TRANSFORMATIONS BASED ON TRIPLE GRAPH GRAMMARS WITH NACS (LS o trS sL trC (RS o LC sR tL trT RC / LT ) tR / RT ) L tr /R (P O) m G n t /H Figure 2: Triple rule and triple transformation step Triple rules synchronously build up their source, target and correspondence graphs, i.e. they are non-deleting. A triple rule tr (left of Fig. 2) is an injective triple graph morphism tr = (trS , trC , trT ) : L → R and w.l.o.g. we assume tr to be an inclusion. Given a triple graph morphism m : L → G, a triple graph transformation (TGT) step tr,m G ===⇒ H (right of Fig. 2) from G to a triple graph H is given by a pushout of triple graphs with comatch n : R → H and transformation inclusion t : G ֒→ H. A grammar TGG = (TG, S, TR) consists of a triple type graph TG, a triple start graph S = ∅ and a set TR of triple rules. Triple graph grammars [19] are a well-known approach for bidirectional model transformations. Models are defined as pairs of source and target graphs, which are connected via a correspondence graph together with its embeddings into these graphs. In this section, we review main constructions and results of model transformations based on [20, 4, 15] and extend them to the case with NACs. sG tG A triple graph G =(GS ← −− GC − − → GT ) consists of three graphs GS , GC , and GT , called source, correspondence, and target graphs, together with two graph morphisms sG : GC → GS and tG : GC → GT . A triple graph morphism m = (mS , mC , mT ) : G → H between triple graphs G and H consists of three graph morphisms mS : GS → HS , mC : GC → HC and mT : GT → HT such that mS ◦ sG = sH ◦ mC and mT ◦ tG = tH ◦ mC . A typed triple graph G is typed over a triple graph TG by a triple graph morphism typeG : G → TG. Example 2. Triple Rules: The triple rules in Fig. 3 are part of the rules of the grammar TGG for the model transformation CD2RDBM . They are presented in short notation, i.e. left and right hand side of a rule are depicted in one triple graph. Elements which are created by the rule are labeled with green ”++” and marked by green line colouring. The rule “Class2Table” synchronously creates a class with name “n” together with the corresponding table in the Example 1. Triple Type Graph: Fig. 1 shows the type graph TG of the triple graph grammar TGG for our exam- 23 Class2Table(n:String) ++ :Class :CT name=n Definition 1. Triple Rules with Negative Application Conditions: Given a triple rule tr = (L → R), a negative application condition (NAC) (n : L → N ) consists of a triple graph N and a triple graph morphism n. A NAC with n = (nS , idLC , idLT ) is called source NAC and a NAC with n = (idLS , idLC , nT ) is called target NAC. A match m : L → G is NAC consistent if there is no injective q : N → G such that q ◦ n = m for each NAC ∗ n L− N . A triple transformation G ⇒ H is NAC consistent → if all matches are NAC consistent. ++ :Table ++ name=n Subclass2Table(n:String) S1:Class ++ :parent :Class ++ :Table :CT ++ :CT name=n Attr2Column(n:String, t:String) S1:Class ++ :PrimitiveDataType :attrs ++ name=t ++ ++ :Attribute :type name=n C1: CT Association2ForeignKey(an:String, cn:String) T1:Table :cols ++ :Class :src ++ :Association ++ ++ :Column name=n type=t ++ :AC is_primary=false :CT ++ :AFK name = an :dest ++ :Class Figure 3: Rules for the model transformation CD2RDBM , Part 1 :CT PrimaryAttr2Column(n:String, t:String) S1:Class (LS o ∅ / ∅) (RS o ∅ / ∅) source rule trS (∅ o ∅ / LT ) trT (∅ o ∅ / RT ) target rule trT (RS o LC id sR trC / LT ) tR / RT ) :Attribute trT :cols ++ ++ ++ is_primary=true (RS o RC forward rule trF NAC2 :pKey :attrs ++ :Attribute name=n Figure 4: Derived operational rules of a TGG :pKey :Column :Column ++ name=n type=t ++ :AC is_primary=true Triple Rule trS :attrs tL T1:Table C1: CT NAC1 trS ◦sL ++ ++ :Column :Table :fkeys ++ :cols type = t ++ ++ name = an+“_“+cn :FKey :fcols :Column :references ++ :pkey type = t :Table name = cn :type ++ ++ :PrimitiveDataType relational database. Accordingly, subclasses are connected to the tables of its super classes by rule “Subclass2Table”. Attributes with type “t” are created together with their corresponding columns in the database component via the rule “Attr2Column”. name=t PrimaryAttr2ColumnFT(n:String, t:String) S1:Class tr=T NAC1 :attrs tr=T NAC2 :pKey :Column :cols :Attribute tr=T is_primary=true tr_is_primary=T :attrs tr=[F)T] :pKey ++ :Attribute tr=[F)T] name=n tr_name=[F)T] is_primary=true tr_is_primary=[F)T] :type ++ :AC ++ :Column ++ name=n type=t Forward Translation Rule From each triple rule tr we derive a source rule tr S for the construction resp. parsing of a model of the source language and a forward rule trF for forward transformation sequences (see Fig. 4). By TR S and TR F we denote the sets of all source and forward rules derived from the set of triple rules TR. Analogously, we derive a target rule trT and a backward rule tr B for the construction and transformation of a model of the target language leading to the sets TR T and TR B . A set of triple rules TR and the start graph ∅ generate a visual language VL of integrated models, i.e. models with elements in the source, target and correspondence component. The source language V LS and target language VLT are derived by projection to the triple components, i.e. V LS = projS (V L) and V LT = projT (V L). The set V LS0 of models that can be generated resp. parsed by the set of all source rules TR S is possibly larger than ∗ VLS and we have VLS ⊆ VLS0 = {GS | ∅ ⇒ = (GS ← ∅ → ∅) via TR S }. Analogously, we have V LT ⊆ V LT 0 = ∗ {GT | ∅ ⇒ = (∅ ← ∅ → GT ) via TR T }. According to [7, 4] we present negative application conditions for triple rules. In most case studies of model transformations source-target NACs, i.e. either source or target NACs, are sufficient and we regard them as the standard case. They prohibit the existence of certain structures either in the source or in the target part only, while general NACs may prohibit both at once. T1:Table C1: CT tr=[F)T] :PrimitiveDataType tr=[F)T] name=t tr_name=[F)T] Figure 5: Rules for the model transformation CD2RDBM , Part 2 Example 3. Triple Rules with NACs: Figure 5 shows the remaining two triple rules for the model transformation “CD2RDBM ” and additionally a derived forward translation rule as explained in Ex. 4. NACs are specified in short notation using the label “NAC” with a frame and red line colour 24 within the frame. A complete NAC is obtained by composing the left hand side of a rule with the red marked elements within the NAC-frame. The rule “Association2ForeignKey” creates an association between two classes and the corresponding foreign key and the NAC ensures that there is only one primary key at the destination table. The parameters “an” and “cn” are used to set the names of the association and column nodes. The rule “PrimaryAttr2Column” extends “Attr2Column” by creating additionally a link of type “pkey” for the column and by setting “is primary=true”. Furthermore, there is a source and a target NAC, which ensure that there is no primary attribute nor column currently present. The extension of forward rules to forward translation rules is based on additional attributes, called translation attributes, that control the translation process by keeping track of the elements which have been translated so far. While in this paper the translation attributes are inserted in the source models they can be kept separate as an external pointer structure in order to keep the source model unchanged as shown in Sec. 5 of [13]. Moreover, for each NAC n : L → N of tr we define a forward translation NAC nF T : LF T → NF T of trF T as inclusion with NF T = (LF T +L N ) ⊕ AttT NS \LS . Remark 1. Note that (LFT +L N ) is the union of LFT and N with shared L and for a target NAC n the forward translation NAC nF T does not contain any translation attributes because NS = LS . Example 4. Forward Translation Rule with NACs: Fig 5 shows in its lower part the forward translation rule with NACs “PrimaryAttr2ColumnFT ”. According to Def. 3 the source elements of the triple rule “PrimaryAttr2Column” are extended by translation attributes and changed by the rule from “F” to “T”, if the owning elements are created by the triple rule. Furthermore, the additional elements in the NAC are extended by translation attributes set to “T”. Thus, the source NACs concern only elements that have been translated so far. From the application point of view model transformation rules should be applied along matches that are injective on the structural part. But it would be too restrictive to require injectivity of the matches also on the data and variable nodes, because we must allow that two different variables are mapped to the same data value. For this reason we use the notion of “almost injective matches” [15], which requires that matches are injective except for the data value nodes. This way, attribute values can still be specified as terms within a rule and matched non-injectively to the same value. Next, we define model transformations based on forward translation rules based on complete forward translation sequences. Definition 2. Graph with Translation Attributes: Given an attributed graph AG = (G, D) and a subgraph G0 ⊆ G we call AG ′ a graph with translation attributes over AG if it extends AG with one boolean-valued attribute tr x for each element x (node or edge) in G0 and one booleanvalued attribute tr x a for each attribute associated to such an element x in G0 . This means that we have a partition of the items (nodes, edges, or attributes) of G0 into I1 and F F T I2 s.t. AG ′ = AG ⊕ Att T I1 ⊕ Att I2 , where Att I1 and Att I2 denotes the translation attributes with value T for I1 and value F for I2 . Moreover, we define Attv (AG) := AG⊕ AttvG for v ∈ {T, F}. In any case we require that there is at most one translation attribute tr x or tr x a for each item. Definition 4. Completely Translated Graphs and Complete Sequences: A forward translation sequence tr ∗ G0 ==FT =⇒ Gn with almost injective matches is called complete if Gn is completely translated, i.e. all translation attributes of Gn are set to true (“ T”). The new concept of forward translation rules as introduced in [15] extends the construction of forward rules by additional translation attributes in the source component. The translation attributes keep track of the elements that have been translated so far, which ensures that each element in the source graph is not translated twice. The rules are deleting on the translation attributes and thus, the triple transformations are extended from a single (total) pushout to the classical double pushout (DPO) approach [2]. We call these rules forward translation rules, because pure forward rules need to be controlled by additional control conditions, such as the source consistency condition in [6, 4]. Definition 5. Model Transformation Based on Forward Translation Rules: A model transformation setr ∗ quence (GS , G0 ==FT =⇒ Gn , GT ) based on forward translation rules with NACs consists of a source graph GS , a tartr ∗ Definition 3. Forward Translation Rules with NACs: Given a triple rule tr = (L → R), the forward translation l r rule of tr is given by tr F T = (LFT ← −FT − − KFT − −FT − → RFT ) tr F defined as follows using the forward rule (LF − −− → RF ) and tr S the source rule (LS − −− → RS ) of tr , where we assume w.l.o.g. that tr is an inclusion: • LFT = F LF ⊕ Att T LS ⊕ Att RS \LS • KFT = LF ⊕ Att T LS • RFT = = T RF ⊕ Att T LS ⊕ Att RS \LS T RF ⊕ Att RS , get graph GT , and a complete TGT-sequence G0 ==FT =⇒ Gn with almost injective matches, G0 = (Att F (GS ) ← ∅ → ∅) and Gn = (Att T (GS ) ← GC → GT ). A model transformation MT : VLS0 ⇛ VLT 0 based on forward translation rules with NACs is defined by all model transformation sequences as above with GS ∈ VLS0 and GT ∈ VLT 0 . All these pairs (GS , GT ) define the model transformation relation MTR ⊆ VLS0 × VLT 0 . The model transformation is terminating if there are no infinite TGTsequences via forward translation rules and almost injective matches starting with G0 = (Att F (GS ) ← ∅ → ∅) for some source graph GS . Now, we are able to state our first main result concerning termination, correctness and completeness of model transformations. Theorem 1. Termination, Correctness and Completeness: Each model transformation MT : VLS0 ⇛ VLT 0 based on forward translation rules is • lFT and rFT are the induced inclusions. 25 • terminating, if each forward translation rule changes at least one translation attribute from “F” to “T”, G0 S1:Class • correct, i.e. for each model transformation sequence tr ∗ (GS , G0 ==FT =⇒ Gn , GT ) there is G ∈ VL with G = (GS ← GC → GT ), and it is :Table S3:Class tr=F name=n tr_name=F ) • complete, i.e. for each GS ∈ V LS there is G = (GS ← GC → GT ) ∈ VL with a model transformation setr ∗ quence (GS , G0 ==FT =⇒ Gn , GT ). ! Proof Idea. The proof (see [14]) is based on a corresponding result in [15] for the case without NACs and a Fact showing the equivalence of (1) source and NAC-consistent TGT-sequences based on forward rules and (2) complete NAC-consistent TGT-sequences based on forward translation rules. G S1:Class :CT :Table tr=T S2:parent tr=F S3:Class tr=T name=n tr_name=T Applying a rule according to the DPO approach involves the check of the gluing condition in general. However, in the case of forward translation rules and almost injective matches we have that the gluing condition is always satisfied. This means that the condition does not have to be checked, which simplifies the analysis of functional behaviour in Sec. 3. :CT :Table name=n Class2Table Figure 6: Step G0 ========FT =⇒ G with misleading graph G w.r.t. the model transformation relation is preserved. Filter NACs are based on the following notion of misleading graphs, which can be seen as model fragments that are responsible for the backtracking of a model transformation. Fact 1. Gluing Condition for Forward Translation Rules: Let tr FT be a forward translation rule and mFT : LFT → G be an almost injective match, then the gluing condition is satisfied, i.e. there is the transformation step tr FT ,mFT G === ====⇒ H. Definition 7. Translatable and Misleading Graphs: A triple graph with translation attributes G is translatable ∗ if there is a transformation G ⇒ H such that H is completely translated. A triple graph with translation attributes G is misleading, if every triple graph G′ with translation attributes and G′ ⊇ G is not translatable. Proof Idea. Since only attribution edges are deleted there are no dangling points and almost injective matching ensures that there are no identification points (see [14] for full proof). 3. :CT tr=T S2:parent tr=F ANALYSIS OF FUNCTIONAL BEHAVIOUR Example 5. Misleading Graph: Consider the transformation step shown in Fig. 6. The resulting graph G is misleading according to Def. 7, because the edge S2 is labeled with a translation attribute set to “F”, but there is no rule which may change this attribute in any larger context at any later stage of the transformation. The only rule which changes the translation attribute of a “parent”-edge is “Subclass2TableFT ”, but it requires that the source node “S3” is labeled with a translation attribute set to “F”. However, forward translation rules do not modify translation attributes if they are set to “T” already and additionally do not change the structure of the source component. Functional behaviour of a model transformation means that each model of the source language LS ⊆ VLS is transformed into a unique model of the target language. This section presents new techniques especially developed to show functional behaviour of correct and complete model transformations based on TGGs. Definition 6. Functional Behaviour of Model Transformations: A model transformation MT based on forward translation rules has functional behaviour if each execution of MT starting at a source model GS of the source language LS ⊆ VLS leads to a unique target model GT ∈ VLT . The execution of MT requires backtracking, if there are ter- Definition 8. Filter NAC: A filter NAC n for a forward translation rule tr FT : LFT → RFT is given by a morphism tr FT ,n n : LFT → N , such that there is a TGT step N === ==⇒ M with M being misleading. The extension of tr FT by some set of filter NACs is called forward translation rule tr FN with filter NACs. tr ∗ ′ minating TGT-sequences (Att F (GS ) ← ∅ → ∅) ==FT =⇒ Gn ′ S T with Gn 6= Att (GS ). The standard way to analyze functional behaviour is to check whether the underlying transformation system is confluent, i.e. all diverging derivation paths starting at the same model finally meet again. In the context of model transformations, confluence only needs to be ensured for transformation paths which lead to completely translated models. For this reason, we introduce so-called filter NACs that extend the model transformation rules in order to avoid misleading paths that cause backtracking. The overall behaviour Example 6. Forward Translation Rule with Filter NACs: The rule in Fig. 7 extends the rule Class2Table FT by a filter NAC obtained from graph G0 of the transformaClass2Table tion step G0 ========FT =⇒ G in Fig. 6, where G is misleading according to Ex. 5. In Ex. 7 we extend the rule by a further similar filter NAC with “tr = T” for node “S2”. 26 transformation step for any given source model GS . The full proof is given in [14]. NAC :parent S2:Class tr=F S1:Class tr=F name=n tr_name=F A direct construction of filter NACs according to Def. 8 would be inefficient, because the size of the considered graphs to be checked is unbounded. For this reason we now present efficient techniques which support the generation of filter NACs and we can bound the size without losing generality. At first we present a static technique for a subset of filter NACs and thereafter, a dynamic generation technique leading to a much larger set of filter NACs. The first procedure in Fact 2 below is based on a sufficient criteria for checking the misleading property. Concerning our example this static generation leads to the filter NAC shown in Fig. 7 for the rule Class2TableFT for an incoming edge of type “parent”. The following dynamic technique for deriving relevant filter NACs is based on the generation of critical pairs, which define conflicts of rule applications in a minimal context. By the completeness of critical pairs (Lemma 6.22 in [2]) we know that for each pair of two parallel dependent transformation steps there is a critical pair which can be embedded. For this reason, the generation of critical pairs can be used to derive filter NACs. A critical pair either directly specifies a filter NAC or a conflict that may lead to non-functional behaviour of the model transformation. For the dynamic generation of filter NACs we use the tool AGG [23] for the generation of critical pairs for a plain graph transformation system. For this purpose, we first perform the flattening construction for triple graph grammars presented in [3, 15] extended to NACs using the flattening construction for morphisms. A critical pair tr 2,FT tr P1 ⇐=1,FT === K ====⇒ P2 consists of a pair of parallel dependent transformation steps. If a critical pair contains a misleading graph P1 we can use the overlapping graph K as a filter NAC of the rule tr 1,FT . However, checking the misleading property needs human assistance, such that the generated critical pairs can be seen as filter NAC candidates. But we are currently working on a technique that uses a sufficient criteria to check the misleading property automatically and we are confident that this approach will provide a powerful generation technique. Fact 2. Static Generation of Filter NACs: Given a triple graph grammar, then the following procedure applied to each triple rule tr ∈ TR generates filter NACs for the derived forward translation rules TR FT leading to forward translation rules TR FN with filter NACs: Fact 3. Dynamic Generation of Filter NACs: Given a set of forward translation rules, then generate the tr 1,FT ,m1 tr 2,FT ,m2 set of critical pairs P1 ⇐== ===== K =======⇒ P2 . If P1 (or similarly P2 ) is misleading, we generate a new filter NAC m1 : L1,FT → K for tr 1,FT leading to tr 1,FN , such that tr=T RHS LHS S1:Class S1:Class tr=F name=n tr_name=F ) tr=T name=n tr_name=T :CT :Table Figure 7: A forward translation rule with filter NAC: Class2TableFN tr 1,FN K = ==== ⇒ P1 violates the filter NAC. Hence, the critical pair for tr 1,FT and tr 2,FT is no longer a critical pair for for tr 1,FN and tr 2,FT . But this construction may lead to new critical pairs for the forward translation rules with filter NACs. The procedure is repeated until no further filter NAC can be found or validated. This construction starting with TR FT always terminates, if the structural part of each graph of a rule is finite. • Outgoing Edges: Check the following conditions – tr creates a node (x : Tx ) in the source component and the type graph allows outgoing edges of type “Te ” for nodes of type “Tx ”, but tr does not create an edge (e : Te ) with source node x. – Each rule in TR which creates an edge (e : Te ) also creates its source node. – Extend LFT to N by adding an outgoing edge (e : Te ) at x together with a target node. Add a translation attribute for e with value F. The inclusion n : LFT → N is a NAC-consistent match for tr . For each node x of tr fulfilling the above conditions, the filter NAC (n : LFT → N ) is generated for tr FT leading to tr FN . Proof. The constructed NACs are filter NACs, because tr 1,FT ,m1 the transformation step K === ====⇒ P1 contains the misleading graph P1 . The procedure terminates, because the critical pairs are bounded by the amount of possible pairwise overlappings of the left hand sides of the rules. The amount of overlappings can be bounded by considering only constants and variables as possible attribute values. For our case study the dynamic generation terminates already after the second round, which is typical for practical applications, because the amount of already translated elements in the new critical pairs usually decreases. Furthermore, the amount of NACs can be reduced by combining similar NACs differing only on some translation attributes. The remaining critical pairs that do not specify filter NACs show effective conflicts between transformation rules and they can be provided to the developer of the model transformation to support the design phase. The filter NACs introduced in this paper on the one hand support the analysis of functional behaviour and on the • Incoming Edges: Dual case, this time for an incoming edge (e : Te ). • TR FN is the extension of TR FT by all filter NACs constructed above. Proof Idea. Each generated NAC (n : LFT → N ) for a node x in tr with an outgoing (incoming) for an edge tr FT ,n e in N \ L defines a transformation step N === ==⇒ M , where edge e is still labeled with “F”, but x is labeled with “T”.By the structure of forward translation rules it follows that edge e cannot be labeled with “T” at any later model 27 other hand, they also improve the efficiency of the execution. By definition, the occurrence of a filter NAC at an intermediate model means that the application of the owning rule would lead to a model that cannot be translated completely, i.e. the execution of the model transformation would perform backtracking at a later step. This way, a filter NAC cuts off possible backtracking paths of the model transformation. As presented in Fact 2 some filter NACs can be generated automatically and using Fact 3 a larger set of them can be obtained based on the generation of critical pairs. Finally, by Thms. 2 and 3 we can completely avoid backtracking if TR FN has no significant critical pair or, alternatively, if all critical pairs are strictly confluent. As shown by Fact 4 below, filter NACs do not change the behaviour of model transformations. The only effect is that they filter out derivation paths, which would lead to misleading graphs, i.e. to backtracking for the computation of the model transformation sequence. This means that the filter NACs filter out backtracking paths. This equivalence is used on the one hand for the analysis of functional behaviour in Thms. 2 and 3 and furthermore, for improving the efficiency of the execution of model transformations as explained in Sec. 4. If the set of generated critical pairs of a system of forward translation rules with filter NACs TR FN is empty, we can directly conclude from Thm. 2 that the corresponding system with forward translation rules TR FT has functional behaviour. From an efficiency point of view, model transformations should be based on a compact set of rules, because large rule sets usually involve more attempts of matching until finding a valid match. In the optimal case, the rule set ensures that each transformation sequence of the model transformation is itself unique up to switch equivalence. For this reason, we introduce the notion of strong functional behaviour. Definition 9. Strong Functional Behaviour of Model Transformations: A model transformation based on forward translation rules TR FN with filter NACs has strong functional behaviour if for each GS ∈ LS ⊆ VLS there is a GT ∈ VLT and a model transformation sequence tr ∗ (GS , G0 ==FN =⇒ Gn , GT ) and each two terminating TGTtr ∗ tr ∗ ′ ′ ′ sequences G′0 ==FN =⇒ Gm are switch=⇒ Gn and G0 ==FN equivalent up to isomorphism. Remark 3. 1. The sequences are terminating means that no rule in TR FN is applicable any more, but it is not required that the sequences are complete, i.e. ′ that G′n and Gm are completely translated. Fact 4. Equivalence of Transformations with Filter NACs: Given a triple graph grammar TGG = (TG, ∅, TR) and a triple graph G0 = (GS ← ∅ → ∅) typed over TG. Let G′0 = (Att F (GS ) ← ∅ → ∅). Then, the following are equivalent for almost injective matches: 2. Strong functional behaviour implies functional be′ haviour, because G′n and Gm completely translated tr ∗ tr ∗ ′ tr ∗ ,m∗ ′ ′ implies that G′0 ==FN =⇒ Gm are ter=⇒ Gn and G0 ==FN minating TGT-sequences. tr ∗ ,m∗ 3. Two sequences t1 : G0 ⇒∗ G1 and t2 : G0 ⇒∗ G2 are called switch-equivalent, written t1 ≈ t2, if G1 = G2 and t2 can be obtained from t1 by switching sequential independent steps according to the Local Church Rosser Theorem with NACs [18]. The sequences t1 and t2 are called switch-equivalent up to isomorphism if t1 : G0 ⇒∗ G1 has an isomorphic sequence t1′ : G0 → G2 ∼ (using the same sequence of rules) with i : G1 − − → G2 , written t1′ = i ◦ t1, such that t1′ ≈ t2. This means especially that the rule sequence in t2 is a permutation of that in t1. ′ FT 1. ∃ a complete TGT-sequence G′0 === ===FT =⇒ G via forward translation rules. ′ FN 2. ∃ a complete TGT-sequence G′0 === ===FT =⇒ G via forward translation rules with filter NACs. Proof Idea. Sequence 1 consists of the same derivation diagrams as Sequence 2. The additional filter NACs in sequence 2 prevent a transformation rule to create a misleading graph. Both sequences lead to completely translated models, such that we know that the matches in sequence 1 also fulfill the filter NACs of the rules in sequence 2. The full proof is given in [14]. Theorem 2. Functional Behaviour: Let MT be a model transformation based on forward translation rules TR FT and let TR FN extend TR FT with filter NACs such that TR FN is terminating and all critical pairs are strictly confluent. Then, MT has functional behaviour. Moreover, the model transformation MT ′ based on TR FN does not require backtracking and defines the same model transformation relation, i.e. MTR ′ = MTR. The third main result of this paper shows that strong functional behaviour of model transformations based on forward translation rules with filter NACs can be completely characterized by the absence of “significant” critical pairs. Definition 10. Significant Critical Pair: A critical tr 1,FN tr 2,FN pair P1 ⇐ ==== = K = ==== ⇒ P2 for TR FN is called significant, if it can be embedded into a parallel dependent pair tr 1,FN ′ ′ tr 2,FN G′1 ⇐ ==== ⇒ G2 such that there is GS ∈ VLS ==== = G = Remark 2. TR FN is terminating, if TR FT is terminating and a sufficient condition is given in Thm. 1. Termination of TR FN with strict confluence of critical pairs implies unique normal forms by the Local Confluence Theorem in [18]. tr ∗ ′ ′ F and G′0 ==FN =⇒ G with G0 = (Att (GS ) ← ∅ → ∅). Proof Idea. The proof (see [14]) is based on a decomposition theorem of triple rule sequences into match-consistent TGT-sequences based on source and forward rules with NACs in [7]. The latter are equivalent to complete TGTsequences based on forward translation rules without NACs in [15] and with NACs in Fact 1 in [14]. Finally, by Fact 4 complete TGT-sequences via forward translation rules with and without filter NACs are equivalent. G′0 ∗ tr 1,FN c -5 ′ c +3 G′ c[c[c[c[c[c[c [[ G1 [ )1 ′ G tr 2,FN 2 Theorem 3. Strong Functional Behaviour: A model transformation based on terminating forward translation rules TR FN with filter NACs has strong functional behaviour and does not require backtracking iff TR FN has no significant critical pair. 28 and efficiently by checking (RS \LS ) 6= ∅. In Thm. 1 we have given an explicit condition for the forward translation rules to be terminating. Functional Behaviour: The new concept of filter NACs introduced in this paper provides a powerful basis for reducing the analysis efforts w.r.t. functional behaviour. Once termination is shown as explained above, functional behaviour of model transformations based on forward translation rules TR FT can be checked by generating the critical pairs of the transformation system with AGG [23] and showing strict confluence. The static and dynamic generation of filter NACs (Facts 2 and 3) allows to eliminate critical pairs. In the best case, all critical pairs disappear showing the functional behaviour of the model transformation immediately. The new notion of strong functional behaviour of a system based on transformation rules TR FN with filter NACs is completely characterized by the absence of “significant” critical pairs, such that we can ensure for each source model that the transformation sequence is unique up to switch equivalence. Furthermore, the critical pairs generated by AGG can be used to find the conflicts between the rules which may cause non-functional behaviour of the model transformation. The modeler can decide whether to change the rules or to keep the non-functional behaviour. Proof Idea. The proof (see [14]) is based on that of Thm. 2 and the fact that in the absence of critical pairs two terminating sequences with the same source can be shown to be switch-equivalent up to isomorphism using the Local Church-Rosser and Parallelism Thm. with NACs in [18]. GS 1:Class name=“Company“ 6:src :CT 7:fkeys 8:Association name = “employee“ 11:dest 14:Class name=“Person“ 16:parent 22:attrs 18:Class name=“Customer“ 3:Table name=“Company“ :AFK :CT :CT 10:FKey 4:cols GT 5:Column type = “int“ name = “employee_cust_id“ 12:fcols 13:references 17:Table name=“Person“ 20:cols 21:pkey 23:Attribute is_primary = true name=“cust_id“ 23:type AC 25:Column type = “int“ name = “cust_id“ 27:PrimitiveDataType name = “int“ Figure 8: Triple graph instance Example 7. Functional Behaviour: We analyze functional behaviour of the model transformation CD2RDBM with triple rulesTR given in Figs. 3 and 5. First of all, CD2RDBM is terminating according to Thm. 1. For analyzing the local confluence we can use the tool AGG [23] for the generation of critical pairs. We use the extended rule Class2TableFN as shown in Fig. 7 and extend it by a further filter NAC obtained by the static generation acc. to Fact 2. AGG detects two critical pairs showing a conflict of the rule “PrimaryAttr2Column” with itself for an overlapping graph with two primary attributes. Both critical pairs lead to additional filter NACs by the dynamic generation of filter NACs in Fact 3 leading to a system of forward translation rules with filter NACs without any critical pair. Thus, we can apply Thm. 3 and show that the model transformation based on the forward translation rules with filter NACs TR FN has strong functional behaviour and does not require backtracking. Furthermore, by Thm. 2 we can conclude that the model transformation based on the forward translation rules TR FT without filter NACs has functional behaviour and does not require backtracking. As an example, Fig. 8 shows the resulting triple graph (translation attributes are omitted) of a model transformation starting with the class diagram GS . Model Size Model Transformation Sequences of CD2RDBM without Filter NACs with Filter NACs Time 1) [Elements2)] 11 25 53 109 [ms] 143.75 302.75 672.68 1,481.43 Success Rate Time 1) Overhead Success Rate [%] 42.86 16.84 3.94 0.17 [ms] 158.33 335.45 742.62 1,584.86 [%] 10.14 10.80 10.40 6.98 [%] 100.00 100.00 100.00 100.00 1) Average time of 100 successful model transformation sequences 2) Nodes and Edges Table 1: Benchmark, Tool: AGG [23] Efficient Execution: Filter NACs do not only improve the analysis of functional behaviour of a TGG, but also the execution of the model transformation process by forbidding the application of misleading transformation steps that would lead to a dead-end eliminating the need of backtracking for these cases. Table 1 shows execution times using the transformation engine AGG [23]. The additional overhead caused by filter NACs is fairly small and lies in the area of 10% for the examples in the benchmark, which is based on the average execution times for 100 executions concerning models with 11, 25, 53 and 109 elements (nodes and edges), respectively. The first model with 11 elements is the presented class diagram in the source component of Fig. 8. We explicitly do not compare the execution times of the system with filter NACs with one particular system with backtracking, because these times can vary heavily depending on the used techniques for partial order reduction and the chosen examples. Instead we present the computed success rates for the system without NACs which show that backtracking will cause a substantial overhead in any case. Thus, the listed times concern successful execution paths only, i.e. those executions that lead to a completely translated model. The success rate for transformations without filter NACs decreases fast when considering larger models. Times for the 4. EFFICIENT ANALYSIS AND EXECUTION Our approach to model transformations based on triple graph grammars (TGGs) with NACs will be discussed now with respect to the efficiency for both, analysis of properties and execution. Correctness and Completeness: As shown by Thm. 1 based on [7, 4] model transformations based on TGGs with NACs are correct and complete with respect to the language of integrated models VL generated by the triple rules. Thus, correctness and completeness are ensured by construction. Termination: As presented in [4] termination is essentially ensured, if all triple rules are creating on the source component. This property can be checked statically, automatically 29 unsuccessful executions, which appear in the system without filter NACs, are not considered. However, in order to ensure completeness there is the need for backtracking for the system without filter NACs. This backtracking overhead is in general exponential and in our case study misleading graphs appear already at the beginning of many transformation sequences implying that backtracking is costly. Backtracking is reduced by filter NACs and avoided completely in the case that no “significant critical pair” remains present (see Thm. 3), which we have shown to be fulfilled for our example. The additional overhead of about 10% for filter NACs is in most cases much smaller than the efforts for backtracking. Moreover, in order to perform model transformations using highly optimized transformation machines for plain graph transformation, such as Fujaba and GrGen.Net [21], we have presented how the transformation rules and models can be equivalently represented by plain graphs and rules. First of all, triple graphs and morphisms are flattened according to the construction presented in [3, 15], which can be extended to NACs using the flattening of morphisms. Furthermore, we presented in this paper how forward rules with NACs are extended to forward translation rules with NACs, such that the control condition “source consistency” [6] and also the gluing condition (Fact 1) are ensured automatically for complete sequences, i.e. they do not need to be checked during the transformation. Summing up, the presented results allow us to combine the easy, intuitive and formally well founded specification of model transformations based on triple graph grammars with NACs with the best available tools for executing graph transformations while still ensuring correctness and completeness. In the following we discuss how the presented results can be used to meet the “Grand Research Challenge of the TGG Community” formulated by Schürr et.al. in [20]. The main aims are “Consistency”, “Completeness”, “Expressiveness” and “Efficiency” of model transformations. The first two effectively require correctness, completeness w.r.t. the triple language VL and additionally termination and functional behaviour. They are ensured as shown in Sec. 3. While we considered functional behaviour w.r.t. unique target models, the more general notion in [20] regarding some semantical equivalence of target models will be part of further extensions of our techniques. “Expressiveness” requires suitable control mechanisms like NACs, which are used extensively in this paper and we further extend the technique by additional control mechanisms. In [9] more general application conditions [12] are considered, but functional behaviour is not yet analyzed. In general, the overall usage of complex control structures should be kept low, because they may cause complex computations. Finally, we discussed in Sec. 4 that our approach can be executed efficiently based on efficient graph transformation engines. Especially model transformations fulfilling the conditions in Thm. 3 do not need to backtrack, which bounds the number of transformation steps to the elements in the source model as required in [20]. 6. CONCLUSION In this paper we have studied model transformations based on triple graph grammars (TGGs) with negative application conditions (NACs) in order to improve efficiency of analysis and execution compared with previous approaches in the literature. The first key idea is that model transformations can be constructed by applying forward translation rules with NACs, which can be derived automatically from the given TGG-rules with NACs. The first main result shows termination under weak assumptions, correctness and completeness of model transformations in this framework, which is equivalent to the approach in [7]. The second key idea is to introduce filter NACs in addition to the NACs in the given TGG-rules, which in contrast are called specification NACs in this paper. Filter NACs are useful to improve the analysis of functional behaviour for model transformations based on critical pair analysis (using the tool AGG [23]) by filtering out backtracking paths and this way, some critical pairs. The second main result provides a sufficient condition for functional behaviour based on the analysis of critical pairs for forward translation rules with filter NACs. If we are able to construct filter NACs such that the corresponding rules have no more “significant” critical pairs, then the third main result shows that we have strong functional behaviour, i.e. not only the results are unique up to isomorphism but also the corresponding model transformation sequences are switch-equivalent up to isomorphism. Surprisingly, we can show that the condition “no significant critical pairs” is not only sufficient, but also necessary for strong functional behaviour. Finally, we discuss efficiency aspects of analysis and execution of model transformations and show that our sample model transformation CD2RDBM based on TGG-rules with NACs has strong functional behaviour. The main challenge in applying our main results on functional and strong functional behaviour is to find suitable filter NACs, such that we have a minimal number of critical pairs for the forward translation rules with filter NACs. For this purpose, we provide static and dynamic techniques 5. RELATED WORK Since 1994, several extensions of the original TGG definitions [19] have been published [20, 17, 10] and various kinds of applications have been presented [22, 11, 16]. The formal construction and analysis of model transformations based on TGGs has been started in [6] by analyzing information preservation of bidirectional model transformations and continued in [3, 5, 4, 7, 15], where model transformations based on TGGs are compared with those on plain graph grammars in [3], TGGs with specification NACs are analyzed in [7] and an efficient on-the-fly construction is introduced in [4]. A first approach analyzing functional behaviour was presented for restricted TGGs with distinguished kernels in [5] and a more general approach, however without NACs, based on forward translation rules in [15]. The results in this paper for model transformations based on forward translation rules with specification and filter NACs are based on the results of all these papers except of [5]. In [6] a similar case study based on forward rules is presented, but without using NACs. This causes that more TGT-sequences are possible, in particular, an association can be transformed into a foreign key with one primary key, even if there is a second primary attribute that will be transformed into a second primary key at a later stage. This behaviour is not desired from the application point of view. Thus, the grammar with NACs in this paper handles primary keys and foreign keys in a more appropriate way. Furthermore, the system has strong functional behaviour as shown in Sec. 3. 30 for the generation of filter NACs (see Facts 2 and 3). The dynamic technique includes a check that certain models are misleading. In any case, the designer of the model transformation can specify some filter NACs directly by himself, if he can ensure the filter NAC property. Furthermore, we can avoid backtracking completely by Thms. 2 and 3 if TR FN has no significant critical pair or, alternatively, if all critical pairs are strictly confluent. In future work, we will study further static conditions to check whether a model is “misleading”, because this allows to filter out misleading execution paths. In addition to that, we currently develop extensions to layered model transformations and amalgamated rules, which allow to further reduce backtracking in general cases and to simplify the underlying rule sets. Moreover, we study applications to model transformations that partially relate two DSLs, were some node types are irrelevant for the model transformation. [11] [12] [13] [14] 7. REFERENCES [1] Ehrig, H., Ehrig, K., Hermann, F.: From Model Transformation to Model Integration based on the Algebraic Approach to Triple Graph Grammars. In: Ermel, C., de Lara, J., Heckel, R. (eds.) Proc. GT-VMT’08. EC-EASST, vol. 10. EASST (2008) [2] Ehrig, H., Ehrig, K., Prange, U., Taentzer, G.: Fundamentals of Algebraic Graph Transformation. EATCS Monographs, Springer (2006) [3] Ehrig, H., Ermel, C., Hermann, F.: On the Relationship of Model Transformations Based on Triple and Plain Graph Grammars. In: Karsai, G., Taentzer, G. (eds.) Proc. GraMoT’08. ACM (2008) [4] Ehrig, H., Ermel, C., Hermann, F., Prange, U.: On-the-Fly Construction, Correctness and Completeness of Model Transformations based on Triple Graph Grammars. In: Schürr, A., Selic, B. (eds.) Proc. ACM/IEEE MODELS’09. LNCS, vol. 5795, pp. 241–255. Springer (2009) [5] Ehrig, H., Prange, U.: Formal Analysis of Model Transformations Based on Triple Graph Rules with Kernels. In: Ehrig, H., Heckel, R., Rozenberg, G., Taentzer, G. (eds.) Proc. ICGT’08. LNCS, vol. 5214, pp. 178–193. Springer (2008) [6] Ehrig, H., Ehrig, K., Ermel, C., Hermann, F., Taentzer, G.: Information preserving bidirectional model transformations. In: Dwyer, M.B., Lopes, A. (eds.) Proc. FASE’07. LNCS, vol. 4422, pp. 72–86. Springer (2007) [7] Ehrig, H., Hermann, F., Sartorius, C.: Completeness and Correctness of Model Transformations based on Triple Graph Grammars with Negative Application Conditions. In: Heckel, R., Boronat, A. (eds.) Proc. GT-VMT’09. EC-EASST, vol. 18. EASST (2009) [8] Giese, H., Wagner, R.: From model transformation to incremental bidirectional model synchronization. Software and Systems Modeling 8(1), 21–43 (2009) [9] Golas, U., Ehrig, H., Hermann, F.: Enhancing the Expressiveness of Formal Specifications for Model Transformations by Triple Graph Grammars with Application Conditions. In: Proc. Int. Workshop on Graph Computation Models (GCM’10) (2010) [10] Guerra, E., de Lara, J.: Attributed typed triple graph transformation with inheritance in the double pushout [15] [16] [17] [18] [19] [20] [21] [22] [23] 31 approach. Tech. Rep. UC3M-TR-CS-2006-00, Universidad Carlos III, Madrid, Spain (2006) Guerra, E., de Lara, J.: Model view management with triple graph grammars. In: Corradini, A., Ehrig, H., Montanari, U., Ribeiro, L., Rozenberg, G. (eds.) Proc. ICGT’06. LNCS, vol. 4178, pp. 351–366. Springer (2006) Habel, A., Pennemann, K.H.: Correctness of high-level transformation systems relative to nested conditions. Mathematical Structures in Computer Science 19, 1–52 (2009) Hermann, F., Ehrig, H., Golas, U., Orejas, F.: Formal Analysis of Functional Behaviour for Model Transformations Based on Triple Graph Grammars Extended Version. Tech. Rep. 2010-8, TU Berlin, Fak. IV (2010) Hermann, F., Ehrig, H., Golas, U., Orejas, F.: Efficient Analysis and Execution of Correct and Complete Model Transformations Based on Triple Graph Grammars - Extended Version. Tech. Rep. 2010-13, TU Berlin, Fak. IV (2010) Hermann, F., Ehrig, H., Orejas, F., Golas, U.: Formal Analysis of Functional Behaviour of Model Transformations Based on Triple Graph Grammars. In: Proc. Int. Conf. on Graph Transformation (ICGT’10). LNCS, vol. 6372, pp. 155–170. Springer (2010) Kindler, E., Wagner, R.: Triple graph grammars: Concepts, extensions, implementations, and application scenarios. Tech. Rep. TR-ri-07-284, Department of Computer Science, University of Paderborn, Germany (2007) Königs, A., Schürr, A.: Tool Integration with Triple Graph Grammars - A Survey. In: Proc. SegraVis School on Foundations of Visual Modelling Techniques. ENTCS, vol. 148, pp. 113–150. Elsevier Science (2006) Lambers, L.: Certifying Rule-Based Models using Graph Transformation. Ph.D. thesis, Technische Universität Berlin (November 2009) Schürr, A.: Specification of Graph Translators with Triple Graph Grammars. In: Tinhofer, G. (ed.) Proc. WG’94. LNCS, vol. 903, pp. 151–163. Springer (1994) Schürr, A., Klar, F.: 15 years of triple graph grammars. In: Ehrig, H., Heckel, R., Rozenberg, G., Taentzer, G. (eds.) Proc. ICGT’08. pp. 411–425. LNCS, Springer (2008) Taentzer, G., Biermann, E., Bisztray, D., Bohnet, B., Boneva, I., Boronat, A., Geiger, L., Geis̈, R., Horvath, A., Kniemeyer, O., Mens, T., Ness, B., Plump, D., Vajk, T.: Generation of Sierpinski Triangles: A Case Study for Graph Transformation Tools. In: Schürr, A., Nagl, M., Zündorf, A. (eds.) Proc. AGTIVE’07. LNCS, vol. 5088, pp. 514 – 539. Springer (2008) Taentzer, G., Ehrig, K., Guerra, E., de Lara, J., Lengyel, L., Levendovsky, T., Prange, U., Varro, D., Varro-Gyapay, S.: Model Transformation by Graph Transformation: A Comparative Study. In: Proc. MoDELS 2005 Workshop MTiP’05 (2005) TFS-Group, TU Berlin: AGG (2009), http://tfs.cs.tu-berlin.de/agg Towards an Expressivity Benchmark for Mappings based on a Systematic Classification of Heterogeneities∗ M. Wimmer G. Kappel TU Vienna TU Vienna wimmer@big.tuwien.ac.at W. Retschitzegger JKU Linz werner@bioinf.jku.at A. Kusel gerti@big.tuwien.ac.at J. Schoenboeck TU Vienna schoenboeck@bioinf.jku.at ABSTRACT JKU Linz kusel@bioinf.jku.at W. Schwinger JKU Linz wieland@jku.at modeling tools is available supporting different tasks, such as model creation, model simulation, model checking, model transformation, and code generation. Seamless exchange of models among different modeling tools increasingly becomes a crucial prerequisite for effective MDE. Due to the lack of interoperability, however, it is often difficult to use tools in combination, thus the potential of MDE cannot be fully exploited. For achieving interoperability in terms of transparent model exchange, current best practices comprise creating model transformations between different tool metamodels (MMs) with the main drawback of having to deal with all the intricacies of a certain transformation language. In contrast to that, first mapping tools [6, 18] have been proposed, allowing to specify a transformation on a more abstract level by means of reusable components. Out of the resulting mapping definitions corresponding executable transformation code can be generated. In the definition of a mapping between MMs the resolution of heterogeneities represents the key challenge. Thereby heterogeneities result from the fact that semantically similar metamodeling concepts (M2) can be defined with different meta-metamodeling concepts (M3) leading to differently structured metamodels. As a simple example Fig. 1 shows two metamodels of fictitious1 domain-specific tools administrating publications. Whereas the MM of Tool1 models the type of a publication by the attribute Publication.kind (e.g., conference, workshop or journal), the MM of Tool2 represents the same semantic using the class Publication which refers to a class Kind to determine the kind of the publication. A crucial prerequisite for the success of Model Driven Engineering (MDE) is the seamless exchange of models between different modeling tools demanding for mappings between tool-specific metamodels. Thereby the resolution of heterogeneities between these tool-specific metamodels is a ubiquitous problem representing the key challenge. Nevertheless, there is no comprehensive classification of potential heterogeneities available in the domain of MDE. This hinders the specification of a comprehensive benchmark explicating requirements wrt. expressivity of mapping tools, which provide reusable components for resolving these heterogeneities. Therefore, we propose a feature-based classification of heterogeneities, which accordingly adapts and extends existing classifications. This feature-based classification builds the basis for a mapping benchmark, thereby providing a comprehensive set of requirements concerning expressivity of dedicated mapping tools. In this paper a first set of benchmark examples is presented by means of metamodels and conforming models acting as an evaluation suite for mapping tools. Categories and Subject Descriptors D.2.12 [Software Engineering]: Interoperability General Terms Measurement Keywords Classification of Heterogeneities, Mapping Benchmark Publication 1. INTRODUCTION name:String kind:Integer With the rise of MDE models become the main artifacts of the software development process [3]. Hence, a multitude of MM of Tool1 ∗This work has been funded by the Austrian Science Fund (FWF) under grant P21374-N13. Publication name:String kind 1..1 Kind name:String MM of Tool2 Figure 1: Two Heterogeneous Tool Metamodels In order to resolve such heterogeneities mapping tools provide certain reusable components. Nevertheless, it is still unclear, which kinds of reusable components are required to provide the necessary expressivity. Therefore this paper provides a systematic classification of heterogeneities occurring in the domain of MDE between object-oriented MMs, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10 ...$10.00. 1 Due to reasons of comprehensibility examples comprising ontological concepts have been preferred over examples comprising linguistic concepts. 32 thereby adapting and extending existing classifications [2, 4, 10, 11, 12, 13, 15, 17]. Moreover, this classification is used to derive an evaluation suite building an expressivity benchmark for mapping tools. Thereby a first set of examples is presented in this paper. Additional heterogeneity examples can be downloaded from our homepage2 complementing the expressivity benchmark. The remainder of this paper is structured as follows. In Section 2 we present the design rationale behind our classification as well as the feature-based classification itself. In the Sections 3-5 we exemplary discuss heterogeneities, thereby presenting six examples of our expressivity benchmark. Related work is discussed in Section 6 and finally, Section 7 concludes the paper together with an outlook on future work. In this respect, Fig. 3 depicts the relevant extract of the Ecore meta-metamodel for mappings. When comparing two Ecore-based metamodels, different cases can be distinguished, namely (i) that in the left-hand side (LHS) MM and in the right-hand side (RHS) MM the same Ecore concept is used. Thereby differences wrt. the owned attribute settings can arise, e.g., if two EClasses are used, one can be set abstract whereas the other is not – leading to a concreteness difference. Moreover, (ii) in the LHS MM and in the RHS MM different Ecore concepts may be used, e.g., an EAttribute in the LHS MM and an EReference, an EClass and an EAttribute in the RHS MM (cf. example in Fig. 2). Finally, (iii) both cases mentioned get more complex, if the number of Ecore concepts for modeling a certain MM concept differs. A simple example in this respect is that in one MM two EAttributes firstName and lastName are used whereas in the other MM this information is contained in just one EAttribute name. 2. TOWARDS A SYSTEMATIC CLASSIFICATION OF HETEROGENEITIES This section presents the design rationale behind the proposed classification of heterogeneities as well as the classification itself. Since the classification targets at the domain of MDE, it bases on object-oriented MMs in contrast to existing classifications from the domain of data engineering basing either on the relational or the XML data model. To clearly make explicit the interconnections between heterogeneities we build our classification on a feature model [5]. ENamedElement Naming Difference name : String ETypedElement ordered : boolean lowerBound : int upperBound : int Order Difference Multiplicity Difference Concreteness Difference Inheritance Type Difference EClassifier … Breadth Difference Depth Difference eSu uperTypes 0..** EClass abstract : boolean 2.1 Deriving Heterogeneities from Ecore Context Difference Con ncrette Syyntaxx name:String kind:Integer name:String g name = ‘name‘ lowerBound = 1 upperBound = 1 eAttributeType name = ‘Publication‘ abstract = false name = ‘kin kind‘ d ordered = fa alse lowerBound d=1 upperBound d=1 containment = false eStructuralFeatures EAttribute eStructuralFeatures name = ‘kind‘ ‘ki d‘ lowerBound = 1 upperBound B d=1 eAttributeType eAttributeType EString EReference e name = ‘Publication‘ Publication abstract = false eAttributeType eReferenceType EInteger EAttribute EClass name = ‘Kind‘ Kind abstract = false eStructuralFeatures s Containment C i Difference EAtt ib t EAttribute Figure 3: Variation Points in Ecore-based MMs Kind EClass Datatype Difference … name:String g name = ‘nam me‘ lowerBound d=1 upperBound d=1 EReference containment: boolean Constraint Difference EAttribute EString 0..* … eStructuralFeatures EClass Instaance of Eccore ((Absttract Synttax) kind 1..1 Publication EAttribute eStructuralFeatures eStructuralFeatures EStructuralFeature MM of Tool2 2 Publication 1..1 eAttributeType yp 1..1 eReferenceType R f T Direection Difference Heterogeneities result from the fact that semantically similar concepts can be defined with different metamodeling concepts (e.g., Ecore3 ) leading to differently structured tool metamodels. To exemplify this, Fig. 2 depicts the MMs of Fig. 1 as Ecore instances. Thereby, several heterogeneities arise, e.g., the MM of Tool1 represents the publication kind by an EAttribute whereas the MM of Tool2 utilizes an EReference, an EClass and an EAttribute to represent the semantically equivalent information. MM of Tool1 EDataType … name = ‘name‘ lowerBound = 1 upperBound = 1 Figure 2: Tool Metamodels as Instances of Ecore To gain a systematic classification of different kinds of syntactic heterogeneities, we investigated potential variation points between two Ecore-based metamodels (cf. Fig. 3). Ecore has been used since it is the prevalent meta-metamodel in MDE and since it comprises the core concepts of semantic data models [9], being classes, attributes, references and inheritance. Therefore, the proposed classification can also be applied to other data models comprising these common core concepts, e.g., OWL4 . 2 www.modeltransformation.net http://www.eclipse.org/modeling/emf/ 4 http://www.w3.org/TR/owl-features/ 3 33 Besides syntactic heterogeneities, comprising all heterogeneities that can be derived from the syntactic definition in Ecore, also semantic heterogeneities may arise [15]. They occur when the valid instance set differs – either (i) in the number of valid instances or (ii) in the interpretation of the instance values. An example for the first case is that one MM comprises an EClass Publication whereas the other MM comprises an EClass JournalPublication, allowing only for journal instances - thus being a subset of the valid instances of the EClass Publication. An example for the second case is that one MM comprises an EAttribute amount encoding pricing information in Dollar, whereas the other MM also exhibits an EAttribute amount but encoding the pricing information in Euro. Thus, semantic heterogeneities can not be derived from the syntax (since in both cases the MMs can be represented syntactically equal) but only by incorporating interpretation, i.e., an assignment of a meaning to each piece of data [8]. 2.2 Classification of Heterogeneities Based on this design rationale, we introduce a classification of heterogeneities. It is expressed using the feature model formalism [5], which allows to clearly point out the interconnections between the different kinds of heterogeneities (e.g., xor features modeling mutual exclusive features versus or features allowing to pick several features at once). Thereby heterogeneities are divided into the two main classes of (i) semantic heterogeneities, i.e., heterogeneities wrt. what Heterogeneity g y Required Feature XOR Features Optional Feature OR Features Legend Semantic Heterogeneity Number of Instances Difference Syntactic Heterogeneity Interpretation of Instance Values Difference Naming Difference Structural Difference I h it Inheritance Diff Difference C Core C Concept Diff Difference Intersection Subset Superset Disjoint Source-Target-Concept Cardinality 1:1 Context Difference n:1 1:n m:n Same Metamodeling Concept C2C A2A R2R Different Metamodeling Concept C2A A2C R2A 2R Reference Source Context Difference O de Order Difference R2C Same Metamodeling Concept I2I Re eference Target I2C I2R C2I A2I R2I Breadth Difference Multiplicty Difference Datatype Difference iff I2A Concreteness Difference O d Order Difference Multiplicity Difference Different Metamodeling Concept C A R C A Depth Difference Diff R Inheritance Type Difference Direction Difference iff Containment Difference Constraint Difference Constraint nce Differen Figure 4: Heterogeneity Feature Model is represented by a MM and (ii) syntactic heterogeneities, i.e., heterogeneities wrt. how it is represented (cf. Fig. 4) whereby these two classes might occur jointly as modeled by the or relationship in between. Semantic Heterogeneities. Concerning semantic heterogeneities – as mentioned above – two main cases can be distinguished namely (i) differences in the number of valid instances and (ii) differences in the interpretation of the instance values. With respect to the first case all the settheoretic relationships might occur as modeled by the corresponding sub-features. Regarding the second case diverse modifications of the values might be necessary to translate the values of one MM to correct values of the other MM such that it conforms to the interpretation of the other MM. Syntactic Heterogeneities. With respect to syntactic heterogeneities we distinguish between simple naming differences (i.e., a difference in the value of the name attribute of ENamedElement – cf. Fig. 3) and more challenging structural differences. Although names play an important role when deriving the semantics of a certain concept, names do not allow to automatically conclude on the semantics. Thereby, the two cases (i) same semantic and different naming, i.e., synonyms and (ii) different semantic and same naming, i.e., homonyms can be distinguished. With respect to structural differences again two main cases can be distinguished – namely core concept differences and inheritance differences. Thereby, core concept differences are differences that occur due to the different usage of classes, attributes and references between two MMs. In addition, these two main categories can be further distinguished into same metamodeling concept heterogeneities and different metamodeling concept heterogeneities, differentiating whether the same Ecore concepts have been used in the LHS MM and in the RHS MM or not. In the context of core concept differences additionally a different number of concepts may have been used in the two MMs leading to different sourcetarget-concept cardinalities. In the following sections a first set of benchmark examples is given divided into three main 34 packages, comprising (i) core concept heterogeneities with same metamodeling concept heterogeneities, (ii) core concept heterogeneities with different metamodeling concept heterogeneities and (iii) inheritance heterogeneities. Due to space limitations only a subset of all potential heterogeneities is explained in detail by means of concrete metamodels and according model instances but nevertheless examples from each main category are given. In this respect, the benchmark examples are described uniformly comprising (i) a short description, (ii) the main challenges, (iii) the example description, and (iv) a discussion of resolution strategies. Complementary benchmark examples are presented on our collaborative homepage which invites the community to participate in adding and discussing benchmark examples. 3. CORE CONCEPT HETEROGENEITIES – SAME CONCEPTS Same metamodeling concept heterogeneities are heterogeneities, that occur although the same modeling concept has been used in the LHS MM as well as in the RHS MM as mentioned above. In this respect, two main differences might emerge – either the concepts exhibit different attribute settings (cf. Fig 3) or a different number of concepts has been used in the MMs to express the same semantic concept (cf. Source-Target-Concept Cardinality in Fig. 4). In the following two examples of this category are given. 3.1 Benchmark Example 1 This first example (cf. Fig. 5) only exhibits differences wrt. different attribute settings (cf. optional features of A(ttribute)2A(ttribute) and R(eference)2R(eference) in Figure 4) as well as semantic heterogeneities. The main challenges in this example can be summarized as follows: 1. EAttribute Professor.dateOfBirth – EAttribute Prof.bornIn: A2A, Multiplicity Difference, Datatype Difference Con ncrette Syyntaxx Professor Prof name:String dateOfBirth:Date [0..1] salary:Integer Currency = D ll Dollar publications 0..* EClass Publication eStructuralFeatures Absttract Synttax EAttribute name = ‘name‘ lowerBound = 1 upperBound = 1 Challenge 1 EString eStructuralFeatures St t lF t EDataType C Challenge 2 name = ‘Date‘ ‘D t ‘ eStructuralFeatures Challenge 3 EReference f eStructuralFeatures name = ‘bornIn‘ lowerBound = 1 upperBound = 1 eAttributeType name = ‘salary‘ ‘ l ‘ lowerBound = 1 upperBound B d=1 eAttributeType EI t EInteger EReference f 1:1, R2R, Naming Differen nce, Multiplicity Difference p name = ‘publications‘ ordered = false lowerBound = 0 upperBound = -1 1 containment = false eStructuralFeatures Ch ll Challenge 4 eReferenceType EClass EString EAttribute 1:1, A2A, Semanttic Heterogeneity eAttributeType EI t EInteger eAttributeType EAttribute tt bute 1:1 A2A 1:1, A2A, Naming Difference, Difference Multiplicity Difference Difference, Datatype Difference EAttribute eStructuralFeatures eStructuralFeatures eAttributeType name = ‘dateOfBirth‘ eAttributeType lowerBound = 0 upperBound = 1 name = ‘salary‘ ‘ l ‘ lowerBound = 1 upperBound B d=1 name = ‘Prof‘ abstract = false 1:1, A2A (no heterogeneity) EAttribute tt bute Journal name:String EClass 1:1 C2C 1:1, C2C, Naming Difference name = ‘Professor‘ eStructuralFeatures EAttribute abstract b t t = false f l name = ‘name‘ lowerBound = 1 upperBound = 1 journals 1..* name:String bornIn:Integer [1 [1..1] 1] salary:Integer Currency = Euro name:String type:String name = ‘journals‘ j ordered = false lowerBound = 1 upperBound = -1 containment = false eReferenceType EClass 1:1, C2C, Naming Difference, Semantic Heterogeneity name = ‘Publication‘ abstract = false name = ‘Journal‘ abstract = false eStructuralFeatures EAtt ib t EAttribute eStructuralFeatures name = ‘name‘ lowerBound = 1 upperBound = 1 EAtt ib t EAttribute 1:1, A2A (no heeterogeneitiy) name = ‘name‘ lowerBound = 1 upperBound = 1 eAttributeType eAttributeType eAttributeType EAttribute Exxamp ple In nstan nces eStructuralFeatures 1:0, Information Loss name = ‘type‘ lowerBound = 1 upperBound = 1 P1:Professor P1:Prof P2:Prof name = ‘Prof1‘ dateOfBirth = 12.04.1956 12 04 1956 salary = 5000 name = ‘Prof1‘ bornIn = 1956 salary = 3970 name = ‘Prof2‘ b I 2000 bornIn= salary = 2382 publications P10:Publication name = ‘Paper1‘ Paper1 type = ‘Conference‘ publications P11:Publication P2:Professor name = ‘Prof2‘ ‘P f2‘ dateOfBirth = ‘‘ salary l = 3000 journals name = ‘Paper2‘ Paper2 type = ‘Journal‘ journals P11:Journal P0:Journal name = ‘Paper2‘ name = ‘TODO‘ Autogenerated or userinteraction necessary Autogenerated or userinteraction necessary Figure 5: Benchmark Example 1 – Same Metamodeling Concept Heterogeneities 2. EAttribute Professor.salary – EAttribute Prof.salary: Semantic Heterogeneity (Interpretation of Instance Values Difference), A2A 3. EReference Professor.publications – EReference Prof.journals: R2R, Multiplicity Difference 4. EClass Publication – EClass Journal: Semantic Heterogeneity (Number of Instances Difference), C2C Example Description. This first benchmark example (cf. Fig. 5) exhibits four main challenges. With respect to the first challenge, a multiplicity difference as well as a datatype difference between the EAttributes Professor.dateOfBirth and Prof.bornIn arise. Concerning the second challenge a semantic heterogeneity between the EAttributes Professor.salary and Prof.salary emerges since Professor.salary is encoded in Dollars whereas Prof.salary is encoded in Euros, i.e., difference in the interpretation of the values. Regarding the third challenge a multiplicity difference between the EReferences Professor.publications and Prof.journals exists. Finally, the fourth challenge incorporates again a semantic heterogeneity – but this time a difference in the number of valid instances. For resolving the differences of the first three challenges corresponding functions are required which either are able to generate values or to transform values. In contrast to that, for resolving 35 the heterogeneity of the fourth challenge a corresponding condition is needed, that filters those instances, that are still valid in the context of the RHS EClass. Discussion of Resolution Strategies. When taking a look at the example instances, one can see that a resolution strategy has been chosen to minimize information loss and to achieve valid instances only. This is since instance P2 has been kept in the RHS although it does not reference any journal publication in the LHS model. Another potential resolution strategy would be to keep only those Professor instances that actually exhibit a journal publication. If this is the case, also a semantic heterogeneity between the EClasses Professor and Prof would exist, since the valid instance set would be potentially different. Another interesting point in this example is that the RHS MM is more restrictive than the LHS MM since the EAttribute Prof.bornIn always requires a value and since each instance of Prof requires at least one link to a journal publication. Since these restrictions do not exist in the LHS MM, instances of the LHS MM may not fulfill them. Therefore some resolution strategy is needed – either by auto-generating values or by incorporating user-interaction in order to produce valid instances of the RHS MM. 3.2 Benchmark Example 2 In contrast to the first example which restricts itself to source-target-concept cardinalities of 1:1, this example (cf. Concrete Syyntaxx kind 1..1 Publication title:String subtitle:String eStructuralFeatures Publication Ki d Kind name:String ki d St i kind:String name:String g EClass EAttribute name = ‘Publication‘ abstract = false name = ‘title‘ lowerBound = 1 upperBound = 1 Challenge 1 eAttributeType EAttribute n:1, A2A, Naming eStructuralFeatures name = ‘name‘ Difference lowerBound = 1 eStructuralFeatures eAttributeType pp =1 upperBound EString EAttribute Ab bstracct Syntax name = ‘subtitle‘ lowerBound = 1 upperBound = 1 eAttributeType yp Challenge 2 EClass EReference eStructuralFeatures EString name = ‘Publication‘ abstract = false n:1, C2 2C name = ‘kind‘ ordered = false lowerBound = 1 upperBound = 1 containment = false eReferenceType y EClass name = ‘Kind‘ abstract b t t = false f l Chaallenge 3 eStructuralFeatures Exxamp ple Instances name = ‘name‘ lowerBound = 1 upperBound B d=1 eStructuralFeatures EAttribute EAttribute 1:1, A2A, Naming Difference, D Context Difference name = ‘kind‘ lowerBound = 1 pp =1 upperBound eAttributeType P1: Publication P2: Publication P3: Publication PK1:Publication title = ‘P1‘ subtitle = ‘S1‘ title = ‘P2‘ subtitle = ‘S2‘ title = ‘P3‘ subtitle = ‘S3‘ name = ‘P1 – S1‘ kind = ‘Journal‘ kind kind kind PK2:Publication K1: Kind K2: Kind name = ‘Journal‘ ‘J l‘ name = ‘Conference‘ ‘C f ‘ eAttributeType Att ib t T PK3:Publication name = ‘P3 – S3‘ kind = ‘Conference‘ name = ‘P2 – S2‘ kind = ‘Journal‘ Figure 6: Benchmark Example 2 – Same Metamodeling Concept Heterogeneities strategies could be followed, whereby in this case a simple concatenation has been chosen. Other strategies comprise another concatenation order. In case of other datatypes (e.g., numbers) arbitrary calculations could be incorporated. Fig. 6) additionally contains differences wrt. the number of concepts (cf. Source-Target-Concept Cardinality in Fig. 4). The main challenges in this example can be summarized as follows: 1. EAttribute Publication.title, EAttribute Publication.subtitle – EAttribute Publication.name: Source-Target-Concept Cardinality: n:1, A2A 4. CORE CONCEPT HETEROGENEITIES – DIFFERENT CONCEPTS Different metamodeling concept heterogeneities result from expressing the same semantic concept with different modeling concepts in the LHS MM and in the RHS MM. In our classification, potential heterogeneities were derived by systematically combining the identified core concepts of semantic data models. To exemplify these heterogeneities two benchmark examples are discussed in the following. 2. EClass Publication, EClass Kind – EClass Publication: Source-Target-Concept Cardinality: n:1, C2C 3. EAttribute Kind.name – EAttribute Publication.kind: A2A, Context Difference 4.1 Example Description. This benchmark example (cf. Fig. 6) possesses three challenges. Concerning the first challenge, there is a n:1 source-target-concept cardinality between the EAttributes title, subtitle and name. In order to resolve this heterogeneity, merging functionality is needed, which is basically a concatenation function in this case. Concerning the second challenge, again a n:1 sourcetarget-concept cardinality exists, but this time between the EClasses Publication, Kind and Publication. Therefore, again merging functionality is needed, allowing to merge objects under a certain condition. Finally, the third challenge consists in a context difference between the EAttributes Kind.name and Publication.kind. For its resolution the assignment of values across object boundaries is needed. Discussion of Resolution Strategies. When taking a look at the example instances in Fig. 6, one can see, that for each combination of a Publication object and the referenced Kind object a Publication object should be generated. Concerning the merge of the attributes different 36 Benchmark Example 3 The third example (cf. Fig. 7) deals with the fact that a concept is modeled in the LHS MM by means of an EAttribute whereas the RHS MM models this concept explicitly by means of an EClass. Thus, the main challenges in this example can be summarized as follows: 1. EAttribute Publication.kind – EClass Kind: A2C 2. EClass Publication, EAttribute Publication.kind – EReference Publication.kind: CA2R Example Description. The first challenge is that the kind of the publication is represented by means of the EAttribute Publication.kind in the LHS MM whereas the RHS MM makes the type explicit by means of the EClass Kind, which is therefore classified as A(ttribute)2C(lass) in Fig. 7. In order to link publications with the publication kind, the RHS MM provides the EReference Publication.kind for which there is no according counterpart in Con ncrete Syyntax ‐ Publication eStructuralFeatures kind 1..1 Publication name:String kind:String title:String Kind unique name:String eStructuralFeatures EAttribute EClass name = ‘name‘ lowerBound = 1 upperBound = 1 name = ‘Publication‘ abstract = false EAttribute name = ‘title‘ lowerBound = 1 pp =1 upperBound 1:1, A2A,, Naming Difference eAttributeType A ract SSyntaax Abst eAttributeType eAttributeType EString Challenge 2 EString EReference name = ‘kind‘ ordered = false lowerBound = 1 upperBound = 1 containment = false 1:1,, CA2R,, Namingg Difference EClass name = ‘Publication‘ abstract = false eStructuralFeatures eAttributeType eReferenceType EClass C name = ‘Kind‘ abstract = false eStructuralFeatures EAttribute Exam mple Instaancees eStructuralFeatures EAttribute name = ‘kind‘ lowerBound = 1 upperBound = 1 Challenge 1 P1 P bli ti P1:Publication name = ‘P1‘ kind = ‘Journal‘ P3:Publication P2:Publication name = ‘P3‘ P3 kind = ‘Conference‘ name = ‘name‘ o e ou d = 1 lowerBound upperBound = 1 1:1, A2A, Naming Difference, D Context Difference P1: Publication P2: Publication P3: Publication title = ‘P1‘ P1 title = ‘P2‘ P2 title = ‘P3‘ P3 kind name = ‘P2‘ P2 kind = ‘Journal‘ kind kind K1: Kind K2: Kind name = ‘Journal‘ name = ‘Conference‘ Figure 7: Benchmark Example 3 – Different Metamodeling Concept Heterogeneities (A2C, CA2R) 4. EReference Professor.publications, EClass Publication – EReference DBLPEntry.publication: RC2R the LHS MM, i.e., the RHS links have to be generated, representing the second challenge in the example. In order to establish such additional links in the RHS, the information is needed in which relation the to be linked concepts have been in the LHS MM. With respect to this example, the source of the EReference Publication.kind is represented in the LHS MM by means of the EClass Publication and the target of the EReference by means of the EAttribute Publication.kind. Therefore, this heterogeneity is classified as C(lass)A(ttribute)2R(eference), whereby the first letter depicts the used LHS concept for the source of the to be generated reference and the second letter the used LHS concept for the target of the to be generated reference. Discussion of Resolution Strategies. When taking a look at the example instances, one can see that the desired intention of an A2C heterogeneity is that only for distinct Publication.kind attribute values an according Kind object should be generated. Therefore, the RHS model exhibits only a single object named Journal (cf. K1 in Fig. 7), which is referenced by the Publication objects P1 and P2. 4.2 Example Description. Whereas the class Professor in the LHS MM in Fig. 8 has a direct EReference Professor.publications, the LHS MM offers this information only indirectly by means of the EClass DBLPEntry and its EReference DBLPEntry.publication, representing the first challenge in this example (cf.R(eference)2C(lass) feature value in Fig. 4). Concerning the second challenge, values for the DBLP.id EAttribute have to be generated. Since the containing RHS EClass is generated on basis of the LHS EReference Professor.publications the according EAttribute has also to be generated on basis of this EReference (cf. R(eference)2A(ttribute) feature value in Fig. 4). With respect to the third and fourth challenge, the according links have to be established. For this again the information is needed in which relation the to be linked concepts have been in the LHS MM, as described above. Concerning the Professor.entries EReference, the source of the EReference (Professor) is generated on basis of the LHS EClass Professor and the target of the EReference (DBLPEntry) on basis of the EReference Professor.publications – thus this heterogeneity is classified as C(lass)R(eference)2R(eference). A similar situation occurs for the RHS EReference DBLPEntry.publication but in this case the source of the EReference bases on an EReference and the target bases on an EClass – a heterogeneity classified as R(eference)C(lass)2R(eference). Discussion of Resolution Strategies. The challenge in this benchmark example is to obtain objects conforming to the RHS EClass DLBPEntry (cf. example instances in Fig. 8). These RHS objects have to created on basis of the LHS links since these links encode the information which publications belong to which professor which is also the task of DBLPEntry objects. Therefore, Fig. 8 depicts four DLBPEntry objects which originate from the four LHS Professor.publications links. To set the DBLPEntry.id Benchmark Example 4 Whereas the previous example exhibited the heterogeneity that a LHS concept is modeled by means of an EAttribute and the RHS concept by means of an EClass, the following example (cf. Fig. 8) exhibits the heterogeneity that a LHS concept is modeled by means of an EReference whereas the equivalent RHS concept is again represented by an EClass. The main challenges in this example are: 1. EReference Professor.publications – EClass DBLPEntry: R2C 2. EReference Professor.publications – EAttribute DBLPEntry.id: R2A 3. EClass Professor, EReference Professor.publications – EReference Professor.entries: CR2R 37 Concrette Syyntaxx P f Professor publications 0..* name:String g Publication Professor name:String name:String eStructuralFeatures EClass name = ‘Professor‘ abstract = false name = ‘name‘ lowerBound = 1 upperBound = 1 eAttributeType eAttributeType yp publication 1..1 Publication name:String EAttribute name = ‘name‘ lowerBound = 1 upperBound = 1 1:1,, A2A (no ( heterogeneity) g y) EAttribute DBLPEntry id:Integer eStructuralFeatures EClass 1:1, C2C ((no heterogeneity) g y) name = ‘Professor‘ abstract = false entries 0.* eAttributeType EString g eAttributeType Att ib t T EString EReference 1:1 CR2R 1:1, CR2R, Naming Difference eStructuralFeatures Ch llenge 3 Challe name = ‘entries‘ ordered = false l lowerBound B d=0 upperBound = -1 containment = false Ab bstract Syyntaxx eReferenceType Challenge 1 EReference eStructuralFeatures St t lF t name = ‘publications‘ ‘ bli ti ‘ ordered = false lowerBound = 0 upperBound = -1 containment = false EClass eStructuralFeatures name = ‘DBLPEntry‘ ‘DBLPE t ‘ abstract = false 1:1, R2C, Naming Difference eAttributeType Att ib t T EAttribute EInteger name = ‘id‘ id lowerBound = 1 upperBound = 1 1:1, R2A, Nam ming Difference ff Challenge 2 EReference name = ‘publication‘ p ordered = false lowerBound = 1 upperBound = 1 containment = false eStructuralFeatures eReferenceType Challenge e4 1:1, RC2R, Naming Difference eReferenceType EClass EClass name = ‘Publication‘ abstract b t t = false f l name = ‘Publication‘ abstract b t t = false f l 1:1 C2C (no heterogeneity) 1:1, eStructuralFeatures EAttribute EAttribute 1:1, A2A (no heterogeneity) name = ‘name‘ lowerBound = 1 upperBound = 1 name = ‘name‘ lowerBound = 1 upperBound = 1 eStructuralFeatures E mple Instaancess Exam P1:Professor entries P1:Professor publications name = ‘Prof1‘ P2:Professor entries name = ‘Prof1‘ entries name = ‘Prof2‘ entries P2:Professor publications publications name = ‘Prof2‘ publications bli ti P10:Publication P11:Publication P12:Publication name = ‘P1‘ name = ‘P2‘ name = ‘P3‘ D1:DBLPEntry D2:DBLPEntry D3:DBLPEntry D4:DBLPEntry id = 1 id = 2 id = 3 id = 4 publication P10:Publication name = ‘P1‘ publication publication P11:Publication name = ‘P2‘ publication P12:Publication name = ‘P3‘ Figure 8: Benchmark Example 4 – Different Metamodeling Concept Heterogeneities (R2C, R2A) concreteness differences. The main challenges in this example can be summarized as follows: value a function is needed which generates an according id whereby again for every LHS link an according RHS value should be created. 1. EClass FullProf, EClass AssistantProf – EClass FullProf: I2I, Breadth Difference 5. INHERITANCE HETEROGENEITIES 2. EClass Assistant – EClass Assistant: I2I, Concreteness Difference, Depth Difference In the previous sections we discussed potential heterogeneities when considering the metamodeling concepts of classes, attributes and references. Finally, heterogeneities might be caused by the concept of inheritance. In this respect we again divide into heterogeneities that might occur although both MMs use inheritance (cf. same metamodeling concept inheritance differences in Fig. 4) and heterogeneities that occur if only one MM makes use of inheritance (cf. different metamodeling concept inheritance differences in Fig. 4). Similar to the afore mentioned same metamodeling concept differences (cf. Section 3), same metametamodeling concept inheritance differences occur due to different attribute values or links in the Ecore MMs (cf. Fig. 3) whereas the latter heterogeneities occur if an inheritance hierarchy in one MM is expressed by other concepts (i.e., classes, attributes, and references) in the other MM. In the following one example per category is given. 5.1 3. EClass PrePhd, EClass PostPhd – No corresponding EClass: I2I, Breadth Difference Benchmark Example 5 This example (cf. Fig. 9) belongs to the same metametamodeling concept category and therefore both MMs make use of inheritance. Nevertheless certain heterogeneities occur, comprising breadth differences, depth differences and 38 Example Description. Concerning the first challenge, a breadth difference between the LHS EClasses FullProf, AssistantProf and the RHS EClass FullProf exists. This is since the number of sibling classes in the context of a certain parent class differs. For resolving breadth differences, the strategy can be applied to map instances of some class only existing in the LHS MM to a concrete parent class in the RHS MM. Nevertheless, since the parent classes of the EClass AssistantProf are abstract, instances of AssistantProf get lost. With respect to the second challenge, a concreteness difference as well as a depth difference occurs between the two EClasses Assistant. This is since the EClass Assistant in the LHS MM is set abstract whereas the corresponding EClass Assistant in the RHS MM is concrete. Additionally, a depth difference exists, since the longest path of subclasses in the context of the EClass Assistant in the LHS MM is 1 whereas it is 0 in the context of the corresponding class in the RHS MM. For resolving the ResearchStaff C crete Synttax Conc ResearchStaff name:String name:String Professor f FullProf Assistant AssistantProf PrePhd FullProf PostPhd 1:1, C2C (no heterogeneity) EClass Assistant Professor EClass name = ‘ResearchStaff‘ ‘R hSt ff‘ abstract = true name = ‘ResearchStaff‘ ‘R hS ff‘ abstract = true eSuperTypes eSuperTypes EAttribute eStructuralFeatures eSuperTypes p yp name = ‘name‘ l lowerBound B d=1 upperBound = 1 EAttribute 1:1, A2A (no heteerogeneity) eAttributeType eAttributeType EString eSuperTypes EClass EClass 1:1, C2C (no heterogeneity) name = ‘Professor‘ Professor abstract = true name = ‘Professor‘ Professor abstract = true eSuperTypes S T eSuperTypes S T Absttract Synttax eSuperTypes S T name = ‘name‘ l lowerBound B d=1 upperBound = 1 eStructuralFeatures EString 1:1, C2C C, breadth difference EClass name = ‘FullProf‘ abstract = false EClass EClass name = ‘FullProf‘ abstract = false Challenge 1 1:0, breadth difference name = ‘AssistantProf‘ AssistantProf abstract = false Ch llenge 2 Chall EClass EClass 1:1, C2C, concreteness difference, depth d difference name = ‘Assistant‘ abstract = true eSuperTypes name = ‘Assistant‘ abstract = false eSuperTypes EClass 1:0, breadth difference name = ‘PrePhd‘ PrePhd abstract = false EClass Challenge 3 1:0, breadth difference Exxamp ple Insstancces name = ‘PostPhd‘ abstract = false F1:FullProf name = ‘Prof1‘ Prof1 P1:PrePhd name = ‘PrePhd1‘ PrePhd1 A1:AssistantProf name = ‘AssProf1‘ P2:PostPhd name = ‘PostPhd1‘ F1:FullProf name = ‘Prof1‘ ‘P f1‘ P1:Assistant name = ‘PrePhd1‘ ‘P Phd1‘ P2:Assistant name = ‘PostPhd1‘ Figure 9: Benchmark Example 5 – Same Metamodeling Concept Heterogeneities 5.2 concreteness difference no strategy is needed in this example, since the LHS class is abstract and therefore no instances can exist. The situation would be different, if it would be inverse. Then instances might be lost, if no concrete class in the RHS MM for including those instances might be found. For resolving the depth difference, the strategy can be pursued to map instances of the classes only existing in the LHS MM to some concrete parent class in the RHS MM. Therefore, in this case the instances of the EClasses PrePhd and PostPhd result in instances of the parent EClass Assistant in the RHS MM. Finally, regarding the third challenge, a breadth difference between the EClasses PrePhd and PostPhd and the non-existing RHS classes exists. Since in this case the breadth difference overlaps with the depth difference of challenge 2 (being the case since the EClass Assistant in the RHS MM exhibits no subclasses at all), no additional resolution strategy is needed here. Discussion of Resolution Strategies. When taking a look at the chosen resolution strategies, one can see that a strategy has been chosen that tries to minimize instance loss and thus information loss. Therefore instances of a class that only exist in the LHS MM should be kept by mapping them to some concrete parent class due to the is-a relationship between the classes. Nevertheless, the explicit type information and additional features only owned by the subclass are lost. Therefore sometimes also a strategy that omits these instances might be useful. Benchmark Example 6 This example (cf. Fig. 10) belongs to the different metamodeling concept category and therefore only one MM makes use of inheritance. The main challenge in this example can be summarized as follows: 1. EAttribute ResearchStaff.kind – EClasses ResearchStaff, Professor, Assistant and FullProf in inheritance hierarchy: A2I Example Description. With respect to the main challenge in this example, an A(ttribute)2I(nheritance) heterogeneity between the EAttribute ResearchStaff.kind and the EClasses ResearchStaff, Professor, Assistant and FullProf occurs. For resolving this kind of heterogeneity a condition is needed to divide the instances of the EClass ResearchStaff according to the values of the EAttribute kind in order to instantiate instances of the corresponding RHS classes. Thereby the problem may arise, that the EAttribute of the LHS MM comprises values that do not correspond to any (concrete) EClass in the RHS MM. This is the case in the example with the instance R1, since the corresponding EClass Professor in the RHS MM is abstract and can thus not be instantiated causing information loss. Discussion of Resolution Strategies. Concerning the resolution strategy chosen in this example again information loss should be prevented whenever possible. Nevertheless, as already discussed above, this may not always be possible. 39 ‐ Co oncreete Syyntaxx ResearchStaff name:String R ResearchStaff hSt ff Assistant Professor name:String g kind:String FullProf EClass EClass ECl name = ‘ResearchStaff‘ ResearchStaff abstract = false name = ‘ResearchStaff‘ abstract = true eSuperTypes 1:1 A2A (no heterogeneity) 1:1, h EAttribute Abstracct Syn ntax eStructuralFeatures name = ‘name‘ l lowerBound B d=1 upperBound = 1 eAttributeType EAttribute eStructuralFeatures EString name = ‘name‘ l lowerBound B d=1 upperBound = 1 eAttributeType EString eAttrib teT pe eAttributeType EAttribute eStructuralFeatures EClass name = ‘kind‘ lowerBound = 1 upperBound = 1 name = ‘Professor‘ abstract = true S T eSuperTypes A2I EClass eSuperTypes name = ‘FullProf‘ abstract = false Challenge h ll 1 EClass Exxample Insstances name = ‘Assistant‘ abstract = false R1:ResearchStaff R2:ResearchStaff R3:ResearchStaff R2:FullProf R3:Assistant name = ‘staff1‘ ki d = ‘Professor‘ kind ‘P f ‘ name = ‘staff2‘ ki d = ‘FullProf‘ kind ‘F llP f‘ name = ‘staff3‘ ki d = ‘Assistant‘ kind ‘A i t t‘ name = ‘staff2‘ name = ‘staff3‘ Figure 10: Benchmark Example 6 – Different Metamodeling Concept Heterogeneities (A2I) 6. RELATED WORK ments are presented, but on a rather coarse-grained level, e.g., conditional patterns dealing with attribute differences and transformation patterns, vaguely dealing with different metamodeling concept heterogeneities. With respect to existing classifications, Visser et al. [17] and Klein [12] provide a comprehensive list of semantic heterogeneities. Nevertheless, they have a strong focus on semantic heterogeneities, neglecting syntactic heterogeneities. Summarizing, although there are several classifications available, none explicitly focuses on the domain of MDE. Therefore we systematically analyzed variation points in the Ecore meta-metamodel in order to extend and adapt existing classifications. In this respect, we aligned on the one hand terms of existing classifications, e.g., most classifications introduced terms for the heterogeneities summarized in our classification by same metamodeling concept heterogeneities. On the other hand, we introduced new heterogeneities stemming from the explicit concepts of references and inheritance in object-oriented metamodels in contrast to existing classifications basing either on the relational or the XML data model. Finally, current classifications miss to explicate how different types of heterogeneities relate to each other, which we formalized by means of a feature model. Existing Benchmarks. Model Engineering. To the best of our knowledge no benchmark for mapping systems in the area of MDE exists. Nevertheless, a benchmark for evaluating the performance of graph transformations [16] has been proposed. Data Engineering. In the area of data engineering Alexe et. al. propose in [1] a first benchmark for mapping systems, thereby presenting a basic suite of mapping scenarios which should be readily supported by any mapping system focussing on information integration. In this respect, ten examples are discussed for which the actual transformation functions are given in terms of XQuery5 expressions. Addi- In the following, two threads of related work are considered. First, our feature-based classification is compared to existing classifications. Second, mapping benchmark is related to existing mapping benchmarks. In this respect, at first the most closely related area of model engineering is examined. Moreover, the more widely related areas of data engineering and ontology engineering are investigated. Existing Classifications. Model Engineering. Although model transformations and thus the resolution of heterogeneities between MMs play a vital role in MDE, to the best of our knowledge no dedicated survey examining potential heterogeneities exists. Data Engineering. In contrast to that, in the area of data engineering a plethora of literature exists for decades highlighting different aspects of heterogeneities in the context of database schemata. A first classification of semantic and structural heterogeneities when integrating two different schemas was presented by Batini et al. in [2]. A systematic classification of possible variations in a SQL statement was presented by Kim et al. in [11], detailing Table-Table and Attribute-Attribute heterogeneities, e.g., wrt. cardinalities. The classification of Kashyap et al. presented in [10] provides a broad overview on possible heterogeneities in a data integration scenario comprising semantic heterogeneities and conflicts occurring between same modeling concepts. The work of Blaha et al. presented in [4] describes patterns resolving syntactic heterogeneities, comprising same metamodeling concept heterogeneities as well as different metametamodeling concept heterogeneities. Finally, the classification of Legler [13] presents a systematic approach for attribute mappings by combining possible attribute correspondences with cardinalities. Ontology Engineering. Concerning the domain of ontology engineering pattern collections as well as classifications exist. A pattern collection has been presented by Scharffe et al. in [14]. Thereby correspondence patterns for ontology align- 5 40 http://www.w3.org/TR/xquery/ tional examples are presented on their homepage6 . Although the benchmark provides a first set of mapping scenarios it remains unclear how the scenarios have been obtained and if they provide full coverage in terms of expressivity. Although XQuery expressions are given to define the semantics, some of the XQuery functions assume the availability of custom functions which are not provided. Since there are also no RHS models given it is hard to get the actual outcome of the transformation. Finally, some scenarios are not clearly specified with the given query (cf. scenario 2 and 17 on their homepage). A further benchmark called THALIA is presented by Hammer et. al in [7]. It provides researchers with a collection of twelve benchmark queries given in XQuery, focusing on the resolution of syntactic and semantic heterogeneities in a data integration scenario. For every query a socalled reference schema (i.e., global schema) and a challenge schema is provided (i.e., the schema to be integrated) together with instances. Although the paper claims a systematic classification of semantic and syntactic heterogeneities leading to the presented queries, it is merely an enumeration of heterogeneities where the rationale behind is left unclear. Ontology Engineering. With respect to the area of ontology engineering, no dedicated mapping benchmark exists. Nevertheless, efforts concerning the evaluation of matching tools, i.e., tools for automatically discovering alignments between ontologies have been spent, resulting in an ontology matching benchmark 7 whereby these examples could be of interest for a dedicated mapping benchmark as well. Summarizing, although both benchmarks from the area of data engineering provide useful scenarios in the context of XML they do not provide a systematic classification resulting in a systematic set of benchmark examples to evaluate the expressivity of a certain mapping system. [3] J. Bézivin. On the Unification Power of Models. Journal on SoSyM, 4(2):31, 2005. [4] M. Blaha and W. Premerlani. A catalog of object model transformations. In Proc. of the 3rd Working Conference on Reverse Engineering, WCRE’96, pages 87–96, 1996. [5] K. Czarnecki, S. Helsen, and U. Eisenecker. Staged Configuration Using Feature Models. In Proc. of Third Software Product Line Conf., pages 266–283, 2004. [6] M. Del Fabro and P. Valduriez. Towards the efficient development of model transformations using model weaving and matching transformations. Journal on SoSyM, 8(3):305–324, July 2009. [7] J. Hammer, M. Stonebraker, and O. Topsakal. THALIA: Test harness for the assessment of legacy information integration approaches. In Proc. of the Int. Conf. on Data Engineering, ICDE, pages 485–486, 2005. [8] D. Harel and B. Rumpe. Meaningful modeling: What’s the semantics of ”semantics”? Computer, 37:64–72, 2004. [9] R. Hull and R. King. Semantic Database Modeling: Survey, Applications, and Research Issues. ACM Comput. Surv., 19(3):201–260, 1987. [10] V. Kashyap and A. Sheth. Semantic and schematic similarities between database objects: A context-based approach. The VLDB Journal, 5(4):276–304, 1996. [11] W. Kim and J. Seo. Classifying Schematic and Data Heterogeneity in Multidatabase Systems. Computer, 24(12):12–18, 1991. [12] M. Klein. Combining and relating ontologies: an analysis of problems and solutions. In Proc. of Workshop on Ontologies and Information Sharing, IJCAIŠ01, 2001. [13] F. Legler and F. Naumann. A Classification of Schema Mappings and Analysis of Mapping Tools. In Proc. of the GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW’07), 2007. [14] F. Scharffe and D. Fensel. Correspondence Patterns for Ontology Alignment. In Proc. of the 16th Int. Conf. on Knowledge Engineering, EKAW ’08, pages 83–92, 2008. [15] A. P. Sheth and J. A. Larson. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Comput. Surv., 22(3):183–236, 1990. [16] G. Varro, A. Schürr, and D. Varro. Benchmarking for graph transformation. In Proc. of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing, VLHCC ’05, pages 79–88, 2005. [17] P. R. S. Visser, D. M. Jones, T. J. M. Bench-Capon, and M. J. R. Shave. An analysis of ontological mismatches: Heterogeneity versus interoperability. In Proc. of AAAI 1997 Spring Symposium on Ontological Engineering, 1997. [18] M. Wimmer, G. Kappel, A. Kusel, W. Retschitzegger, J. Schönböck, and W. Schwinger. Surviving the Heterogeneity Jungle with Composite Mapping Operators. In Proc. of the 3rd Int. Conf. on Model Transformation, ICMT 2010, pages 260–275, 2010. 7. CONCLUSION AND FUTURE WORK In this paper we presented a systematic classification of heterogeneities occurring between Ecore-based MMs. Nevertheless, this classification of heterogeneities can also be applied to other semantic data models, comprising the common core concepts this classification bases on. Moreover, a first set of benchmark examples has been proposed stating the requirements a mapping tool should fulfill. Additionally, these benchmark examples can be used to compare solutions realized with ordinary transformation languages. Further work comprise the completion of the benchmark examples to fully cover the classification. However, the success of a benchmark heavily depends on the agreement of the community – thus our collaborative homepage invites for discussions. Finally, a tool evaluation on basis of this benchmark is envisioned comparing and evaluating mapping tools from diverse engineering domains wrt. their expressivity. 8. REFERENCES [1] B. Alexe, W.-C. Tan, and Y. Velegrakis. STBenchmark: Towards a Benchmark for Mapping Systems. VLDB Endow., 1(1):230–244, 2008. [2] C. Batini, M. Lenzerini, and S. B. Navathe. A Comparative Analysis of Methodologies for Database Schema Integration. ACM Comp. Surv., 18(4):323–364, 1986. 6 7 http://www.stbenchmark.org/ http://oaei.ontologymatching.org/2010/ 41 Specifying Overlaps of Heterogeneous Models for Global Consistency Checking Zinovy Diskin, Yingfei Xiong, and Krzysztof Czarnecki Univesity of Waterloo Waterloo, ON, Canada {zdiskin, yingfei, kczarnec}@gsdlab.uwaterloo.ca ABSTRACT [25] or a pair of model [9]. However, individual consistency or pairwise consistency do not guarantee global consistency. For example, Fig. 1 shows three UML class diagrams D1,2,3 , where the classes connected by a dashed line are considered to be the same class (though named differently). Each of the three diagrams is consistent, and each pair of them is consistent, but taken together the three diagrams are inconsistent: there is a cycle in the inheritance chain. The example shows two issues in checking global consistency. First, we need to specify the models’ overlap. For models like code and UML class diagrams extracted from code, we may know their overlap by matching the elements by name. But for models in the conceptual stage, we cannot deduce their overlap automatically. For example, an entity “Person” created by a business analyst and a table “Employee” existing in a legacy database may refer to the same concept even though they have different names. Second, when we have an overlap specification, we need an approach to check global consistency. Sabezadeh et al.[22] proposed to check global consistency of homogeneous models by their merging. First, the models’ overlap is specified by a correspondence diagram: a set of auxiliary models and mappings “in-between” the local model, which declare some elements in different local models as being actually the same. Then all local models are merged into one model modulo the correspondence, i.e., elements of local models declared the same in the correspondence diagram become one element. Finally, consistency of the merged model is checked. Thus, verifying global consistency amounts to checking consistency of a single model. However, the approach was developed for the case of homogeneous models only. The goal of the paper is to adopt the consistency-checkingby-merging (CCM) idea for the heterogeneous situation. A straightforward solution is to first merge all involved metamodels so that all local models become instances of the same global metamodel; then we can merge them and check the result wrt. the constraints in the global metamodel. Though theoretically possible, in practice this approach leads to dealing with huge models and metamodels resulting from the merge, which is cumbersome and not effective. We present another approach in which merging metamodels is significantly reduced to an unavoidable minimum, and merging models is reduced to only merging their relevant parts. Briefly, we find common views between metamodels, project related models to spaces of instances (overlaps) determined by those views, and then apply the CCM approach to the homogeneous set of projections. Software development often involves a set of models defined in different metamodels, each model capturing a specific view of the system. We call this set a mutlimodel, and its elements partial or local models. Since partial models overlap, they may be consistent or inconsistent wrt. a set of global constraints. We present a framework for specifying overlaps between partial models and defining their global consistency. An advantage of the framework is that heterogeneous consistency checking is reduced to the homogeneous case yet merging partial metamodels into one global metamodel is not needed. We illustrate the framework with examples and sketch a formal semantics for it based on category theory. Categories and Subject Descriptors D.2.12 [Software Engineering]: Interoperability General Terms Design, Languages, Theory, Verification. 1. INTRODUCTION Software development often D1 D2 involves a set of heterogeneous Class A Class C models, such as use cases, process models, UML design Class B Class D models, and code. These models are defined by differClass E ent metamodels, and are often built by different teams, but collectively represent a sinClass H Class G D3 gle system. Due to possible overlaps between models, individually consistent models Figure 1: Three may be globally inconsistent if globally inconsistent taken together. Many existing models approaches focus on checking consistency of a single model Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI’2010, October 5, 2010, Oslo, Norway Copyright 2010 ACM 978-1-4503-0292-0/10/10 ...$10.00. 42 A Order OnlineOrder 10:Class 11:Name “Order”:String :attr :type :end :Generalization :start 20:Class :attr :type 21:Name “OnlineOrder”:String GA is held, and &&N be e itself. For example, &‘Order’=11 and &&’Order’=10. In its turn, graph GM is typed over the metametamodel graph GM M . Any UML class diagram can be represented by a typed graph as above but not the converse. To ensure that a typed graph is a correct diagram, constraints must be declared and added to the metamodel. For example, (C1) a class has only one name, or (C2) a class has only one parent class (we assume that multiple inheritance is prohibited), or (C3) classes with stereotype ’singleton’ cannot be instantiated with more than one object. Note that constraints can either be imposed by a particular metamodeling technique, e.g., constraints (C1) and (C2), or can be user-defined, e.g., (C3), in a suitable language like OCL. In this paper we do not distinguish these two types and consider them abstractly as constraints over graphs. tA Generalization:Node start:arrow end:arrow Node arrow GMM tM GM type:arrow Class:Node attr:arrow Name:Node String:Node 2.2 Figure 2: Graph Representation We formulate the framework in a general way based on category theory. This makes it applicable to a wide class of models and metamodels, whose carrier structures are graphs, attributed graphs, or general graph-like structures. By the latter we mean systems of sets (nodes, arrows, arrows between arrows...) interrelated by (source and target) functions. Realization of the approach requires several challenging issues to be solved: type-safe model matching, specification of indirect overlap between metamodels, and inter-metamodel constraints. We will discuss these issues in more detail in Section 3 after we briefly outline the basics of CCMapproach in Section 2. The rest of the paper is structured as follows. Section 4 describes our main techniques with simple examples. Section 5 presents general definitions and constructions in a semi-formal way. Relation to other works is discussed in Section 6. Section 7 concludes. 2. BACKGROUND: HOMOGENEOUS OVERLAP AND CONSISTENCY We briefly review the basics of the CCM-approach, and also show how to manage conflicts between values. 2.1 Matching models via spans Suppose two business analysts independently build two UML diagrams, A1 and A2 in Figure 3. To check their global consistency, we first need to specify overlap between the diagrams. Suppose we know that class ’OnlineOrder’ in diagram A1 and class ’Order’ in A2 refer to the same class, and their ’price’ attributes refer to the same attribute. We could write the following two informal equations OnlineOrder@A1 = Order@A2 price@A1 = price@A2 . Note that these equations conform to the type system of class diagrams: we match a class to a class and an attribute to an attribute. Hence, we can represent the set of equations by a class diagram A0 shown in the middle of Fig. 3. The question mark indicates that the name of the class is unknown and the corresponding slot is empty. That is, the slot node ( :Name) in the graph representing model A0 does not have any arrow ( :type) adjoint to it (see the auxiliary top-rightmost box in the figure). Nevertheless, it is convenient to denote the slot and its owner by &’ ?’ and &&’ ?’ like if ’ ?’ were a name. Since elements of model A0 represent pairs of elements (e1 , e2 ) with ei ∈ Ai , i = 1, 2, we have two inter-model mappings fi : A0 → Ai . Formally, these mappings are functions between the corresponding graphs, e.g., f1 acts on GA0 ’s nodes as follows: f1 (&&’?’) = &&’OnlineOrder’, f1 (&&’price’) = &&’price’, f1 (&’?’) = &’OnlineOrder’, f1 (&’price’) = &’price’, f1 (’price’) = ’price’. Software models are typed graphs We consider metamodels as pairs M = (GM , CM ) with GM a graph and CM a set of constraints. A model (M ’s instance) is a graph typed over M , i.e., a pair A = (GA , tA ) with GA a graph (typically much bigger than GM ) and tA : GA → GM a graph mapping (which preserves the incidence relationship between arrows and nodes) such that all constraints in set CM are satisfied. For example, Fig. 2 shows how to represent a UML class diagram A as a typed graph. GM is the graph representing the metamodel of UML class diagrams; GA is the graph representing the diagram A; and tA is the type mapping. UML classes, attributes, primitive values and generalization relations are represented as nodes; their relationships are captured by arrows. The value of mapping tA at an element e is given after colon, e.g., expression “10:Class” means tA (10)=Class for node 10. Identifiers of some elements are omitted, e.g., for all arrows. To refer to the elements, we will use the following notation: if N is the name of an element e, let &N be the slot (owned by e) where the name Its action on arrows is evident. Mapping f2 is defined similarly. Importantly, both mappings preserve the types of elements, i.e., commute with the typing mappings of the corresponding graphs. In Fig. 3 we specify mappings in a shortened way, but precise formal specifications like above will be needed when we consider merging. We call a pair of mappings with a common source a (binary) span. The source (model A0 ) is called the head of the span, mappings fi are legs and their targets (models Ai ) are feet. Thus, an overlap of two homogeneous models is specified by a correspondence span over the same metamodel. An overlap of n models is described by an n-ary span with n legs and feet. 2.3 Merging and conflicts After specifying the overlap by a correspondence span, we merge two models into one and check whether it satisfies all constraints defined in the metamodel. 43 A1 f1 Order OfflineOrder OnlineOrder price: int Mapping f1 OnlineOrder <-- ? price <--price int <-- int A2 denotes Order OfflineOrder Game Order price:int :Class :Name :attr g2 colimit A∑ ? OnlineService Mapping f2 ? ---> Order price-->price int --> int ? price:int g1 Mapping g1 Order --> Order OnlineOrder-->{OnlineOrder, Order} price ---> price int ---> int OfflineOrder ---> OfflineOrder f2 A0 OnlineService OnlineOrder, Order Game price :int Mapping g2 OnlineService <-- OnlineService {OnlineOrder, Order} <-- Order price <-- price int <-- int Game <-Game Figure 3: Homogeneous Model Matching The merge procedure consists of two parts. We first disjointly merge the graphs underlying the models, and then glue together elements declared to be the same by the span. The result is shown as diagram AΣ in Figure 3, in which the merged graph has five rather than six class nodes because of gluing. Class &&{OnlineOrder,Order} has one name slot because the two local name slots were also glued, but this slot holds two names since they are not (and cannot be) equated in the head. (A precise formal specification of the mechanism can be found in [6]). Besides graph AΣ , merging also produces two graph mappings gi : Ai → AΣ that show how the local models are embedded into the merge. The merge procedure is fully automatic and can be precisely formalized in terms of the colimit operation developed in category theory. A detailed explanation and examples of how colimit works can be found in [21] or [6]. It follows from general properties of colimit that the merged graph GAΣ is correctly typed over graph GM (with M denoting the metamodel of class diagrams). After we have built the merged graph, we can check whether it satisfies all constraints defined in the metamodel (say, with a checking tool). In our example, we find two violations: class {OnlineOrder, Order} has (i) two names and (ii) two parent classes. 3. FROM HOMO- TO HETEROGENEOUS MULTIMODELING: THE PROBLEMS Existing CCM-approaches [22] handle the homogeneous case well, but in practice software models are often heterogeneous. Business analysts, database experts, and objectoriented software designers all work with different models in different languages, say, BPMN, ER, UML. For instance, Fig. 4 presents three different UML models of a system developed independently by three different teams: a class diagram cd, a statechart sc, and a sequence diagram sd, whose simplified metamodels are shown in the right half of the figure. Since the models are developed independently, synonymy and homonymy of names, and other similarities and discrepancies between models are quite possible. For example, classes Order in the class and the sequence diagram may refer to the same or different classes of the system. If they refer to the same class, we need to check whether message settled@sd refers to operation setSettled@cd. If it is the case, we have a naming conflict (synonymy) between the 44 cd mmCD Order Class -items:List<item> +addItem(Item i) +setPaid(int p) +setSettled(Date d) -updTotal() sc * Operation settled() addItem() paid() Settled Created mmSC protocol StateMachine states * State cancel() * Parameter Class Order Paid * Property Event transitions * 2 * Transition * trigger Region sd :Order :OrderManager <<new>> 1 paid() settled(date) mmSD Object * addItem() obj lifeline Lifeline 2 * messages Message * 1 Class type 1 MsgType Figure 4: Motivating Example models; in addition, parameters of the message and the operation it refers to are named differently (homonymy): ’d’ in cd and ’date’ in sd. Such conflicts are fixable by renaming, but we also need to take into account the statechart. There may be more serious discrepancies between the models. Suppose, for example, that the sequence diagram states that parameter ’date’ is of type String while class diagram declares a different type for the same parameter. This discrepancy violates the condition that an operation parameter has a single type. This condition is stated in both metamodels (of class and sequence diagrams), but message settled does not belong to a class diagram and operation setSettled is not in a sequence diagram. There are also semantically motivated constraints that directly regulate interaction between models defined in different metamodels. For example, we may require that the interaction described by the sequence diagram is to be allowed by the statechart’s state machine. Thus, specifying overlap and checking global consistency of heterogeneous models gives rise to several specific problems caused by heterogeneity. A) Type-safety is important for overlap specification. In the homogeneous situation, we allow only elements of the same type to be matched to ensure type safety. However, in heterogeneous cases different models are declared in different metamodels, and hence their elements have disjoint types. We need a new method to ensure type-safety in overlap specifications. B) Indirect overlap often occurs in heterogeneous multimodeling. For example, in class diagrams operations are linked to their owning classes. Such linking also exists but is implicit in sequence diagrams (through consecutive linking Classes, Objects, Lifelines, Messages, and MsgTypes). Hence, we cannot use direct matching to describe overlap between sets of Class-Operation links in class diagrams and Class-MsgType links in sequence diagrams. C) Inter-metamodel constraints (like conformance of traces to statecharts) are important for heterogeneous multimodeling. These constraints regulate interaction of partial models, and hence are not captured by metamodels of any of them. Such constraints are inherently global and should be explicitly specified. D) Metamodel inter-relations become crucial as soon as we consider type-safety as a fundamental requirement. The latter implies that model interaction should be coherent with metamodel interaction, and hence “the metamodel” of a heterogeneous multimodel is a system of metamodels together with their relationships rather than a discrete set of isolated metamodels. To address this new dimension of multimodeling, we need a language for specifying systems of interacting metamodels. 4. HETEROGENEOUS OVERLAP AND CONSISTENCY BY EXAMPLES In this section we incrementally introduce our approach. We will consecutively consider very simple examples addressing the principle points: (i) building overlap metamodels to ensure type-safe matching, (ii) the necessity of derived elements, (iii) inter-model constraints, and (iv) N-ary multimodeling with a non-trivial correspondence diagram. 4.1 From heterogeneous to homogeneous overlaps and type-safety Consider the overlap between class diagram cd and sequence diagram sd in Fig. 4. Suppose we know that class Order together with methods addItem, setSettled in cd refer to the same elements in the system as class Order together with message types addItem, settled in sd. However, if we take the type discipline strictly, direct linking of these elements is prohibited because their types reside in different metamodels. Hence, before matching models we need to match their metamodels, mmCD and mmSD, as shown in Fig. 5. Namely, we state that metaclasses Class@mmCD and Class@mmSD refer to the same concept, and metaclasses Operation@mmCD and MsgType@mmSD are also synonyms. These declarations can be presented by a span in the middle of Fig. 5. The head of this span is a new overlap metamodel mmCA, and two legs m1,2 map it to the two metamodels we are matching. Note that the overlap metamodel can be considered as a common view between mmCD and mmSD, and mappings m1 ,m2 as the corresponding view definitions. The view definition m1 : mmCA → mmCD can be executed for any instance of mmCD (i.e., for any class diagram) by extracting its mmCA-portion and respectively changing its type mapping. For example, class diagram cd shown in left upper corner of Fig. 6 (we have slightly simplified the class di- 45 agram from Fig. 4 to save space) will be translated into diagram cd2CA typed over metamodel mmCA. We write cd2CA = getm1 (cd) with getm1 denoting the operation of view execution (getView ) determined by view definition m1 (in figures we omit the superscript). We will also say that model cd is projected into the overlap space mmCA, and call model cd2CA the mmCA-projection of cd. Since the ownership between classes and actions is not specified in the overlap, the cd2CA-view of cd will be just a discrete set of named elements. Note also that the view is computed along with traceability mappings m1 : cd2CA → cd Similarly, sequence diagram sd in the top right corner of Fig. 6 is translated into a discrete set sd2CA = getm2 (sd) of named elements also typed over mmCA, along with its traceability mapping m2 . Since both views are instances of the same metamodel, we can type-safely match them and build a span (ca1 , f1 , f2 ). This span and the corresponding merge (colimit) are shown in the middle part of Fig. 6. They reveal a conflict between the models: actions setPaid@cd2CA and paid@sd2CA are linked but their names are different (in the merge model cd+sd, the action with two names is shown by ?). 4.2 Indirect overlap A closer inspection of the original models cd and sd shows that the conflict above is mistaken because message ’paid’ is actually an operation of class OrderManager rather than Order. The error occurred because our overlap model does not capture the relationship between classes and actions (operations). To build a better overlap, we need to match the ownership edge Class-Operation@mmCD and similar edge Class-MsgType@mmSD. However, the latter is not directly included into the metamodel mmSD. Nevertheless, the concepts of MsgType and Class are related indirectly via a sequence of intermediate edges: a message ends at the lifeline, which belongs to an object, which belongs to a class. We can compose these three edges into a new — derived — edge Class-MsgType shown in the metamodel mmSD+ (Fig. 7) with a dashed line. In addition, we use UML stereotypes and prefix the names of derived elements by a slash. In more detail, we augment metamodel mmSD with a new element mtp (read “messageType”) coupled with its definition, i.e., specification of some operation computing the instances of the derived element. In our case, the operation is sequential composition of four association links leading, consecutively, from instances of Class to instances of MsgType. It can be written in OCL as follows: context Class inv: self.mtp=self.objects.lifeline.messages.type Now we declare the sameness of associations oper@mmCD and mtp@mmSD+ by placing association act into the head of the span as shown in Fig. 7, and defining m1 (act) = oper, m2 (act) = /mtp. Since mappings m1 , m2 in Fig. 7 define richer views than earlier defined mappings m1 ,m2 in Fig. 5, projections cd2CA and sd2CA in Fig. 7 are also richer than in Fig. 5 and include links between classes and operations. We at once see that matching setPaid@cd2CA and paid@sd2CA is illegal, and the corresponding “equation” must be removed from the span. The result of merging models cd2CA and sd2CA modulo the new span ca1 is shown in the middle bottom of Fig. 7. It is a correct mmCA model satisfying the constraints of mmCA: an element may have only one name, and different actions owned by a class are named differently. Class Mapping m1 Class<--Class Operation<--Action Property * Mapping m2 Class-->Class Action-->MsgType Class 1 * obj lifeline Object 1 Class Lifeline * oper * Operation Parameter 2 * messages m2 Action m1 Message * mmCD * type 1 MsgType mmSD mmCA Figure 5: Example of metamodel overlap Mapping f1 Order<- Order setPaid<- ? cd ca1 f1 Mapping f2 Order->Order ?->paid f2 Order:Class ?:Action addItem:Action sd :OrderManager Order cd2CA get +addItem(Item i) +setPaid(int p) -updTotal() m1 OrderManager:Class setPaid:Action addItem:Action updTotal:Action paid:Action colimit g1 Mapping m1 Order<-Order addItem<-addItem setPaid<-setPaid updTotal<-updTotal Order:Class sd2CA Order:Class addItem:Action Order:Class updTotal:Action paid() m2 g2 cd+sd Mapping g1 Order->Order addItem->addItem setPaid->? updTotal→updTotal addItem() get -items:List<item> <<new>> :Order Mapping g2 OrderManager<-OrderManager ? <-- paid Order<- Order addItem <- addItem OrderManager:Class addItem:Action ?:Action Mapping m2 OrderManager->OrderManager paid->paid Order->Order addItem->addItem Figure 6: Example of model overlap over the respective metamodel overlap (see Fig. 5 for view definitions) Class * oper Operation * Mapping m1 Class <-- Class oper<--act Operation<--Action Property Object * Mapping m2 Class --> Class act --> /mtp Action-->MsgType Class Lifeline * act Action * Parameter m2 m1 mmCD Class obj lifeline 2 * messages type Message * <<derived>> /mtp * MsgType mmSD+ mmCA Figure 7: Matching basic and derived meta-elements Mapping f1 Order<-Order addItem<-addItem f1 ca1 f2 Order:Class addItem:Action Mapping f2 Order->Order addItem->addItem cd sd cd2CA m1 Mapping g1 Order->Order addItem->addItem setPaid->setPaid updTotal->updTotal addItem:Action setPaid:Action OrderManager:Class updTotal:Action paid:Action colimit cd+sd Order:Class addItem:Action setPaid:Action :OrderManager Order:Class addItem:Action g1 Mapping m1 Order<-Order addItem<-addItem setPaid<-setPaid updTota<-updTotal sd2CA get +addItem(Item i) +setPaid(int p) -updTotal() get Order -items:List<item> Order:Class :Order <<new>> addItem() paid() m2 g2 OrderManager:Class paid:Action updTotal:Action Mapping g2 OrderManager<-OrderManager paid<-paid Order<-Order addItem<-addItem Mapping m2 OrderManager->OrderManager paid->paid Order->Order addItem->addItem Figure 8: Matching basic and derived elements (see Fig. 7 for view definitions) 46 The next section will show more interesting cases of using derived elements in overlap specification. 4.3 mmSD m4 m2 Inter-metamodel constraints So far we only checked the constraints declared in the head of the correspondence span (mmCA in our examples). These constraints are common for both feet metamodels (mmCD and mmSD). However, as discussed in Section 3, there may be important constraints which reside in neither of the feet metamodels. For example, traces of actions exhibited by a sequence diagram must conform to the state machine specified by the corresponding statechart. We will denote this sm meaning “Traces are to conform to the constraint by t ♯sm sm requires StateMachine”. Declaration of the constraint t ♯sm elements from both metamodels, mmSD and mmSC, and cannot be done in either of them. Hence, a new metamodel sm could be specified has to be built. In this in which t♯sm section we first show how to build such a metamodel, and then show how to project partial models sd and sc to the space of this metamodel instances, in which projections can sm be matched, merged and checked against t ♯sm sm. sm To declare t ♯sm sm, we need a metamodel encompassing metaclasses for Classes, Traces (sequences of actions), StateMachines, and related notions: States, Transitions, Events as specified by metamodel mmCTrSM in the middle of Fig. 9. The upper half of this metamodel is “taken” from the sequence diagram metamodel mmSD as specified by mapping m1 in Fig. 9. Note that m1 maps class Trace@mmCTrSM to derived class /Trace@mmSD, whose instances are sequences of actions described by the sequence diagram and hence can be computed by a suitable query. The lower half of mmCTrSM is taken from the statechart metamodel mmSC as specified by mapping m2 in Fig. 9 (and we again use derived elements). Having built metamodel mmCTrSM, we sm with its intended semandeclare in it the constraint t ♯sm tics. We call the configuration (m1 , mmCTrSM, m2 ) a partial span because mappings m1 and m2 are partially defined (on the upper and lower halves of mmCTrSM resp.). In Fig. 9 and other figures below, a semi-arrow head indicates partiality of the mapping. The next step is to project models sd and sc to the metamodel mmCTrSM. We cannot directly execute view definitions mj (j = 1, 2) because they are partial, but we can execute them in three steps. Step 0. We explicitly specify the domains mmCTr and mmSM of mappings mj (j = 1, 2; see Fig. 10) on which they become totally defined mappings m!j ; inclusion mappings ij embed the domains into the head of the span. Step 1. Total view definitions m!j (j = 1, 2) are executed for models sd and sc and produce views sd2CTr and sc2CSM over metamodels mmCTr and mmCSM resp. Step 2. Because the two latter metamodels are included into mmCTrSM, we may consider their instances as “partial” instances of mmCTrSM. Formally, we compose typing mappings of models sd2CTr, sc2CSM with inclusion mappings ij , j = 1, 2 and get typing mappings into mmCTrSM. In Fig. 10, these new typing mappings are marked by ∗. The three steps are performed automatically and may be hidden from the user, who observes the projection mappings getm1 and getm2 as if mappings mj were ordinary total view definitions. Now we have two models sd2CTr and sc2CSM over the same metamodel mmCTrSM. To finish consistency checking, 47 mmCD m1 mmCA m3 mmSC = m6 mmCTrSM = m5 Figure 11: Metamodel schema of the example in Fig. 4 the user must match the models and build a correspondence span, say, (f1 , ca2, f2 ). The head of the span is denoted by ca2 because it is, in fact, an instance of metamodel mmCA built in Section 4.2 (it can be formally proved). After that, the system merges models modulo the span and checks the result against the constraints in mmCTrSM, including the sm inter-metamodel constraint t ♯sm sm. The entire procedure is well seen in the right half of Fig. 10: data provided by the user are shown with bullet nodes and solid arrows (and are black), data automatically computed are shown with blank nodes and dashed arrows (and are blue). 4.4 N-ary multimodeling and metamodel schemas In this subsection we consider our full example involving all three models, cd, sd and sc. First we build a ternary span (mmCA, m1 , m2 , m3 ) specifying correspondences between operations, messages and transitions in cd, sd, sc resp. as shown in Fig. 11; a dashed frame indicates that the metamodel is augmented with derived elements defined by queries. Ternary span mmCA is a straightforward extension of binary span mmCA built in Section 4.2 with a new leg towards sc. Projecting the three models to the head, matching them with a ternary correspondence span, say, ca3 (see Fig. 12), merging projections modulo ca3, and finally checking the constraints against the merge can be done in exactly the same way as in Section 4.2. A minor distinction is that the leg ca3→getm2 (sd) is partial because there are binary (rather than ternary) correspondences like (setPaid@cd, paid@sc) that do not involve sd’s elements; colimit operation consumes such correspondences as well. The second point of consistency checking is at the span sm is to be checked (mmCTrSM, m4 , m5 ) where constraint t ♯sm as explained in Section 4.3. However, when we consider all three models, the correspondence span ca2 between projections getm4 (sd) and getm5 (sc) can be derived from the span ca3 rather than specified independently. Indeed, we have mapping m6 that sends nodes Class and Action and edge act between them to the corresponding elements in mmCTrSM. By applying the retyping procedure explained in Section 4.3, we project the span ca3 into mmCTrSM and get a span ca2 as shown in Fig. 12 (where the block arrow rtpm6 denotes the retyping operation). After the span ca2 is computed, we proceed exactly as described in Section 4.3 and sm check the constraint t ♯sm sm. An important property of the metamodel schema in Fig. 11 is commutativity of the two triangle diagrams (note two =- mmSD Object * Class <<derived>> /tr <<derived>> * lifeline Lifeline 2 * messages Message * <<derived>> /mtp * type /Trace <<derived>> /msg MsgType * mmCTrSM m1 Mapping m1 Class <--Class MsgType<--Action /mtp <-- act /tr <-- tr /Trace <-- Trace /msg <-- msg * tr msg * Class m2 Trace * Action act trig protocol t#sm StateMachine * states 2 * State * trans * Transition mmSC Mapping m2 Class→Class protocol→protocol act → /events Action→Event StateMachine → StateMachine states→ states trans --> trans State→State ... Class protocol StateMachine states * State <<derived>> /events * trigger trans * 2 * Transition Event * Region Figure 9: Specifying inter-metamodel constraints mmCA Class * actions Action * Trace mmCTr Inter-metamodel constr. C mmSD m!1 mmCSM mmCTr m2 mmCTrSM * * Action * actions Action * ge * actions Metamodels mmSC i2 i1 m1 Class m!2 t ge t mmCSM Class * f1 trigger * states 2 * sc2CSM m!2 j1 sd * transitions * Transition Models f2 m!1 sd2CTr protocol StateMachine State ca2 lim co sd +ca2 sc j2 sc satisfies C? Figure 10: Verifying inter-metamodel constraints mmCD mmSC mmSD cd sc mmCA The simple example above shows how local model interaction is governed by the multimodel schema specifying metamodels’ inter-relationships. The example also demonstrates that N-ary multimodeling may exhibit sufficiently complex metamodels schemas bearing their own constraints like commutativity. mmCTrSM User Specified sd getm4 getm5 Automatically Generated getm1 getm3 getm2 5. MAKING MULTIMODELING PRECISE: A GENERAL FRAMEWORK ca2 The three basic ingredients of our approach are (i) metamodels and their mappings, (ii) models and their mappings, and (iii) a mechanism of model translation from one metamodel to another. We build a (minimal in a sense) mathematical framework allowing to define these concepts and their inter-relations in Section 5.1. In Section 5.2 we show that global consistency checking can be indeed realized in this framework. In Section 5.3 we show how the abstract framework of Section 5.1 can be implemented with constructs close to modeling practice: typed structures, query and constraint languages. Due to space limitations, the presentation is very brief and semi-formal: we show how the concepts could be formally defined rather than present real formal definitions. We use simple category theory concepts without explanation, and refer to basic concepts of the institution theory [14] — an abstract framework for logic and model theory. rtpm6 ca3 User Specified Figure 12: Global consistency checking of the example in Fig. 4 labels): (=)m m 6 ; m 4 = m 2 and m 6 ; m 5 = m 3 . Because view execution and retyping preserve metamodel mapping composition (we will formalize these properties in Section 5), we have commutativity for view execution mappings as well: (=)get getm4 ; getm6 = getm2 and getm5 ; getm6 = getm3 . 5.1 Abstract multimodeling framework An abstract multimodeling framework F abstr is a tuple of constructs defined below. 1) A category MMod whose objects are called metamodels Hence, we have only one projection of sequence diagram sd to the instance space of mmCA, and only one projection of sc to the same space. 48 with M ∈ MMod a metamodel and A a diagram in [[ M ]]; the latter can be thought of as a family of models together with a system of correspondence spans. A multimodel is condef sistent if colimit AΣ = ΣA (which always exists in [[ M ]]? ) satisfies M ’s constraints, i.e., AΣ ∈ [[ M ]]. A heterogeneous multimodel is a tuple and arrows are metamodel mappings. 2) Each metamodel M is assigned with two categories, one being a subcategory of the other, [[ M ]] ⊂ [[ M ]]? . Intuitively, objects of [[ M ]]? are structures properly typed over M but perhaps violating M ’s constraints (hence the question mark); we will call them structural instances. Objects of [[ M ]] are (legal) models: structural instances of M satisfying, in addition, all constraints in M . We require all categories [[ M ]]? to be closed under colimits (merging). This is the case for many classes of structures carrying metamodels and models like graphs or attributed graphs. But we do not require this property on [[ M ]]. Our examples above show that in practically interesting situations [[ M ]] is not closed under colimits. 3) Any metamodel mapping m : M → N ::MMod is assigned with a getView functor getm : [[ N ]] → [[ M ]] that maps in the opposite direction (think of m as a view definition and getm as a view execution). Moreover, if m = 1M is the identity mapping of metamodel M , then getm is the identity functor on [[ M ]], and for m1 m2 two consecutive mappings M ✲ N ✲ O, AA = (A1 :M1 . . . An :Mn ) getm1 ;m2 = getm2 ; getm1 : [[ O ]] → [[ M ]] (a sequentially composed view definition is executed consecutively). 4) A subcategory MModinc ⊂ MMod of inclusion mappings is fixed: it has the same objects but fewer mappings than MMod. A formal inclusion mapping i : M → N ::MModinc is to be thought of as inclusion of metamodel M into a bigger metamodel N . Any inclusion i : M → N is assigned with a retyping functor rtpi : [[ M ]]? → [[ N ]]? (think of retyping described in Sections 4.3-4). Note that in contrast to operation get, rtp maps structural instances (particularly, models) to structural instances (not necessarily models): if even an instance A is an M -model, we cannot guarantee that rtpi (A) would satisfy all constraints in N . Similarly to get, we require rtp1 M to be the identity functor on [[ M ]]? , and for two consecutive mappings m1 , m2 as above, rtpm1 ;m2 = rtpm1 ; rtpm2 : [[ M ]]? → [[ O ]]? . We will write an abstract multimodeling framework in a short form as a triple F abstr = (MMod, get, rtp) assuming that the [[ ]]-part of the construction is “included” into get, and the [[ ]]? and MModinc parts are “included” into rtp. Operations get and rtp together provide model translation over partial mappings. A partial mapping m : M ⇀ N between metamodels (note the semi-arrow head) is, formally, im fm ✲ N with Dm ⊂ M a metaDm a diagram M ✛ model called the domain of m (while M is the source of m), im is the corresponding inclusion, and fm is an ordinary (total) metamodel mapping (the function of m). Evidently, sequential composition getfm ;rtpim provides a functor [[ M ]]? ← [[ N ]]? translating N ’s structural instances and their mappings into M ’s ones. We will denote this composition by getm (so that the actual meaning of getm depends on whether m is a total or a partial mapping). 5.2 Multimodels and their consistency Let F abstr = (MMod, get, rtp) be an abstract multimodeling framework. A homogeneous multimodel over F abstr is a pair (M, A) 49 with Mi ∈ MMod and Ai a homogeneous multimodel over Mi , i = 1..n. Consistency of a heterogeneous multimodel is much more involved than in the homogeneous situation, and we will begin with a simpler case of discrete multimodels, for which each diagram Ai is actually a set of models without mappings between them. The algorithm for checking global consistency of a discrete heterogeneous multimodel AA is as follows. We begin with specifying a system of common views (overlaps) between metamodels Mi . For simplicity, we assume that such a system amounts to a set M of total and partial spans like that one shown in Fig. 11 if we remove mapping m6 between spans themselves. Global consistency of AA is checked at the heads of these spans. That is, for each span S in M we perform the following procedure. Let H be S’s head. First, we project to the space [[ H ]]? of structural H-instances all models Ai , whose metamodels Mi are reachable from H by the legs of the span. If the span is total, projecting is provided by the view mechanism. If the span is partial, projecting needs both view execution and model retyping as explained above. In this way we obtain a set of instances AH ⊂ [[ H ]]? . Second, instances in AH are matched by a correspondence diagram E H (for example, think of spans ca2 or ca3 in our examples). Note that E H -data are provided by the user and are, in fact, part of the multimodel’s state. Third, all instances in AH are merged modulo the correspondence diagram E H into a structural instance def (AΣ )H = (ΣAH /E H ) ∈ [[ H ]]? . Finally, we check whether (AΣ )H ∈ [[ H ]], i.e., whether it satisfies all constraints declared in H. A general multimodeling case with Ai being diagrams rather than sets can be treated similarly. The key is that translation operations get and rtp are functors, that is, they translate not only instances but also instance mappings, and hence correspondence diagrams as well. Then the projection AH ⊂ [[ H ]]? will be a diagram rather than a set of instances, and diagram E H will provide a second level correspondence structure. As colimit operation consumes any sort of input diagrams, the algorithm works well for the general case too. Another generalization of the algorithm, for which the metamodel schema is more complicated than a set of spans, is harder and is a work in progress. 5.3 Concrete multimodeling framework In a nutshell, a concrete multimodeling framework consists of three components: (i) a base category G of graph-like structures to be thought of as the carriers of metamodels and models, (ii) a constraint language C together with binary relations |= of satisfying a constraint by a model, and (iii) a query language Q together with operations of query execution over a model. In more detail (but still very briefly with many important conditions skipped), a concrete framework is given by the following constructs 1) G-objects are to be thought of as graphs, or manysorted (colored) graphs, or attributed graphs [11]. The key point is that they are definable by a metametamodel itself being a graph with, perhaps, a set of equational constraints. In precise categorical terms, we require G to be a presheaf topos [3], and hence possessing limits, colimits, and other important properties. We will call G-objects “graphs”. For a “graph” G thought of as a metamodel, an instance of G is a pair A = (DA , tA ) with DA another “graph” and tA : DA → G a mapping (arrow in G) to be thought of as typing. An instance mapping f : A → B is a “graph” mapping f : DA → DB commuting with typing: f ; tB = tA . This defines a category [[ G ]] of G-instances. Any mapping m : G′ → G determines a functor pbm : [[ G ]] → [[ G ′ ]] built with pullback operation in the standard way (see e.g.[15, p.48]). 2) Constraints are defined exactly like in the institution theory. We postulate a functor C : G → Sets and a binary relation |=G ⊂ [[ G ]]×C(G) for every “graph” G. For an instance A ∈ [[ G ]] and a constraint c ∈ C(G), we write A |=G c for (A, c) ∈|=G . 3) Queries are an original part of the definition. We begin with a functor Q : G → G of query specifications. For a “graph” G ∈ G, the “graph” Q(G) ⊃ G is to be thought of as “graph” G augmented with definitions of derived elements. (Actually we require Q to be a monad [3]). Functor Q also acts on constraints: for a “graph” G and a set of constraints C ⊂ C(G) over G, there is a set Q(C) ⊂ C(Q(G)) of constraints derived from C. Semantics of query specifications DA ⊂ ✲ D[[ Q ]](A) is given by an operation [[ Q ]] that maps G-instances to Q(G)-instances tA t[[ Q ]](A) as specified by the inset diagram on ❄ ❄ the right (two derived arrows are G ⊂ ✲ Q(G) dashed and the derived node is underlined). We require this diagram to be a pullback square, which means that “graph” DA is the inverse image of “graph” D[[ Q ]](A) , that is, the original data are not changed by the query execution. To ensure that derived instances satisfy derived constraints, we require the following to hold for any instance A: (QC) A |=G C implies [[ Q ]](A) |=Q(G) Q(C). Finally, we requite operation [[ Q ]] to act also on instance mappings: for any injective arrow f : A → B in [[ G ]], there is defined an injective arrow [[ Q ]]f : [[ Q ]](A) → [[ Q ]](B) in [[ Q(G) ]]. In the database literature, this property of a query language is called monotonicity, and it is known that queries without negation are monotonic [18]. From these data we can derive an abstract framework F abstr along the following lines. We first fix a subcategory G◦ ⊂G of finite “graphs” to be the carriers of metamodels. A metamodel is a pair M = (GM , CM ) with GM ∈ G◦ a carrier graph and CM ⊂ C(GM ) a set of constraints. Structural by composition (like in the example of Section 4.3) and also satisfy necessary conditions. With accurate formal definitions, it can be proved that every concrete multimodeling framework gives rise to an abstract multimodeling framework. Hence, the algorithm of global consistency checking can be used with a concrete framework as well. 6. RELATED WORK Specifying overlaps of homogeneous models by correspondence spans is known for a long time [13, 5, 4, 17]. Close relations between consistency checking and model merging were noticed in [7] for behavioral, and in [22] for structural models. A large body of work in this direction was done in databases in the context of view integration, where they worked mainly with ER-diagrams [23]; a generalization for a much more expressive graph-based language was developed in [5]. A serious limitation of these approaches was that they work for the homogeneous case only because it was unclear how to merge heterogeneous models. Consistency of heterogeneous models is a central issue of the living with inconsistency frameworks [20, 24, 19, 10]. Their basic idea is to translate all models and constraints into a common logical formalism, and check if there are conflicts between logical formulas. Although these approaches handle many cases in heterogeneous multimodeling, the configuration of model overlap (which may be very intricate as our examples show) is flattened and hidden in arrays of formulas. As a result, none of the approaches fully covers heterogeneous multimodeling: they mainly handle well-defined cases where elements are matched by names, or only pairwise cases. In contrast, the structure of inter-model relationships is made visible and essentially used in our approach. Several approaches also transform models to aid model merging and consistency management. Egyed [8] proposes a flexible framework based on model transformation and mapping; however, it is the user’s responsibility to use them correctly. Ehrig et al. [12] use graph transformation to derive views from a reference model, and integrate modified views using colimit. Compared to our work, their work requires users to define the transformation manually. Jurack and Taentzer [16] consider multimodeling (they say composite models) in a distributed environment. Their setting is mainly operational and is based on graph transformations. None of the approaches handles inconsistent views. Many researchers focus on discovering traceability links between heterogeneous models [2] and discovering differences between homogeneous models [1]. Their results can be integrated into our approach as a means for an automated construction of correspondence spans. 7. CONCLUSION The paper describes a general approach to global consistency checking of heterogeneous multimodels. The approach is based on finding common views between metamodels of the models involved, projecting all models to these views, merging projections and checking the result against the constraints specified in the view. We have shown that type-safe matching, indirect model overlap, and inter-metamodel constraints can be uniformly managed along the lines described. The approach gives rise to a novel framework for heterogeneous multimodeling, in which a network of interrelated metamodels — the metamodel schema — plays the central def instances of M are instances of GM , i.e., [[ M ]]? = [[ GM ]], and models of M are GM ’s instances satisfying CM . Metamodel mappings are G-arrows of the form m: GM → Q(GN ) (Kleisli arrows of monad Q), which are compatible with constraints: C(m)(CM ) ⊂ Q(CN ). Any such mapdef ping determines a functor getm = [[ Q ]]; pbm : [[ N ]] → [[ M ]], which satisfies conditions postulated in the definition of the abstract framework. The retyping functors rtp are defined 50 [8] A. Egyed. Heterogeneous view integration and its automation. PhD thesis, University of Southern California, 2000. [9] A. Egyed. Instant consistency checking for the UML. In ICSE, pages 381–390, 2006. [10] A. Egyed. Fixing inconsistencies in UML design models. In ICSE, pages 292–301, 2007. [11] H. Ehrig, K. Ehrig, U. Prange, and G. Taenzer. Fundamentals of Algebraic Graph Transformation. 2006. [12] H. Ehrig, R. Heckel, G. Taentzer, and G. Engels. A combined reference model- and view-based approach to system specification. Int. Journal of Software and Knowledge Engeneering, 7:457–477, 1997. [13] J. L. Fiadeiro and T. S. E. Maibaum. Interconnecting formalisms: Supporting modularity, reuse and incrementality. In SIGSOFT FSE, pages 72–80, 1995. [14] J. Goguen and R. Burstall. Institutions: Abstract model theory for specification and programming. Journal of ACM, 39(1):95–146, 1992. [15] B. Jacobs. Categorical logic and type theory. Elsevier Science Publishers, 1999. [16] S. Jurack and G. Taentzer. Towards composite model transformations using distributed graph transformation concepts. In MoDELS, pages 226–240, 2009. [17] H. Liang, Z. Diskin, J. Dingel, and E. Posse. A general approach for scenario integration. In MoDELS, pages 204–218, 2008. [18] H. Liefke and S. Davidson. View maintenance for hierarchical semistructured data. In DaWaK, pages 114–125, 2000. [19] C. Nentwich, W. Emmerich, and A. Finkelstein. Consistency management with repair actions. In ICSE, pages 455–464, 2003. [20] B. Nuseibeh, J. Kramer, and A. Finkelstein. Viewpoints: meaningful relationships are difficult! In ICSE, pages 676–683, 2003. [21] M. Sabetzadeh and S. M. Easterbrook. View merging in the presence of incompleteness and inconsistency. Requir. Eng., 11(3):174–193, 2006. [22] M. Sabetzadeh, S. Nejati, S. Liaskos, S. M. Easterbrook, and M. Chechik. Consistency checking of conceptual models via model merging. In RE, pages 221–230. IEEE, 2007. [23] S. Spaccapietra and C. Parent. View integration: A step forward in solving structural conflicts. IEEE Trans. Knowl. Data Eng., 6(2):258–274, 1994. [24] R. V. D. Straeten, T. Mens, J. Simmonds, and V. Jonckers. Using description logic to maintain consistency between UML Models. In UML, pages 326–340, 2003. [25] J. Warmer and A. Kleppe. The Object Constraint Language. Precise modeling with UML. Addison-Wesley, 2000. role. The framework has a number of advantages. First, heterogeneous consistency checking is reduced to homogeneous with a minimal amount of metamodel merging; the latter is unavoidable if we want to treat inter-metamodel constraints yet we work as locally as possible. Second, the framework is applicable to a wide class of models and metamodels satisfying not too restrictive conditions formulated in Section 5. Third is the adaptability of the framework to the living with inconsistencies paradigm: conflicts between models can be recorded in the heads of the correspondence spans and resolved later. Forth, heterogeneous multimodeling becomes directly related to the institution theory and hence to a source of important (and hard to prove) mathematical results about interrelation of logical theories and their models. However, the approach still needs practical, and in part also theoretical, validation. On the practical side, the main question is how effectively a multimodeling tool based on the framework could be implemented. On the theoretical side, the cornerstone of the approach is a default assumption that our “as local as possible” consistency checking is equivalent to direct global consistency checking. By the latter we mean merging all metamodels into one global metamodel MM , then all partial models becomes partial instances of MM , whose joint consistency can be checked by a homogeneous CCM-algorithm. There are strong formal arguments justifying this assumption but an accurate formal proof is still to be complete. An important theoretical line of future work is to develop a useful classification of heterogeneous multimodels. We may classify multimodels by the type of their metamodel schema: whether it is a plain collection of spans, or there are spans over spans over spans..., or perhaps even more complex configurations. Types of mappings in the metamodel schema are also essential: whether they are plain projections or complex views involving non-trivial queries. Complexity of queries involved in the metamodel schema of a multimodel is its important property, and many useful results can be found in the database literature. Defining multimodeling in abstract mathematical terms along the lines described in the paper would allow useful interaction of the two fields. 8. REFERENCES [1] M. Alanen and I. Porres. Difference and union of models. In UML, pages 2–17, 2003. [2] G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo. Recovering traceability links between code and documentation. IEEE Transactions on Software Engineering, 28(10):970–983, 2002. [3] M. Barr and C. Wells. Category theory for computing science. Prentice Hall, 1995. [4] P. Bernstein and R.Pottinger. Merging models based on given correspondences. In VLDB, 2003. [5] B. Cadish and Z. Diskin. Heterogenious view integration via sketches and equations. In ISMIS, pages 603–612, 1996. [6] Z. Diskin. Model synchronization, mappings, tile algebra, and categories. In GTTSE’09. Springer. To appear. [7] S. M. Easterbrook and M. Chechik. A framework for multi-valued reasoning over inconsistent viewpoints. In ICSE, pages 411–420, 2001. 51 Anticipating Unanticipated Tool Interoperability using Role Models Mirko Seifert Christian Wende Uwe Aßmann Technische Universität Dresden Software Technology Group Dresden, Germany Technische Universität Dresden Software Technology Group Dresden, Germany Technische Universität Dresden Software Technology Group Dresden, Germany mirko.seifert@tudresden.de c.wende@tu-dresden.de uwe.assmann@tudresden.de ABSTRACT 1. INTRODUCTION The interoperability of tools heavily relies on their ability to exchange shared data. While the definition of standardised metamodelling languages such as the Essential Meta Object Facility (EMOF) [23] has substantially simplified the task of reading and persisting arbitrary domain data, there are still open issues concerning the integration of domain abstractions (metamodels) used by different tools. For example, accessing common data by shared metamodels is limited, because of the lack of first-class support for metamodel composition. Data that is processed using multiple tools must be either stored in a common abstraction—which introduces a strong coupling of the involved tools—or is replicated (e.g., represented in different tool formats)—which introduces the need for tedious synchronisation. In this paper we present how role-based metamodelling can overcome these limitations and provide a formalism to enable tool interoperability by role composition. Based on a running example, the implications of the current problems of tool integration are shown and their resolution based on role modelling is discussed. The interoperability of heterogeneous tools has been studied for a long time as it is vital to increase the productivity gained by software systems. Interoperability can be divided into two aspects. First, to enable interoperability, software must be able to access shared information. Second, the behaviour of different systems needs to be integrated. To address the former issue—sharing data across tools— both syntax and semantics of the data that tools operate on must be known and formally defined. In this context, metamodelling languages (e.g., EMOF) have significantly improved the specification of data structures. By using a standardised formalism to describe and exchange the abstract syntax of domain data, a big obstacle for tool interoperability has vanished. The explicit representation of domain concepts in metamodels allows tool integrators to understand the data they are dealing with. While this may appear to be a simple requirement, it is not fulfilled by tools that use internal or implicit metamodels. Besides the explicit representation of concepts using metamodels, standardised serialisation allows to retrieve and persist data uniformly. The mapping from the abstract to the concrete syntax (i.e., the actual representation of data in files) is defined by the metamodelling facilities rather than individual tools. The tedious task of reading custom file formats to obtain the contained data becomes superfluous. Despite all gained benefits, metamodelling does not solve all problems related to tool interoperability. This is particularly caused by the need for explicit anticipation of future interoperability between tools. If this concern is neglected during the creation of metamodels, later integrations are hard to accomplish. Object-oriented metamodelling languages like EMOF do allow to proactively handle tool interoperation by designing metamodels in such a way that future extension and interoperation is anticipated. Distinct patterns do exist, which allow the extension of metamodels [7]. However, the existence of these pattern does not enforce tool developers to use them. As a result, the decision whether to ease future interactions between tools or not is completely left to the tool designer. Usually this yields limited anticipation for extension. In an ideal world all future tool integrations would be known beforehand, which would substantially ease the task of anticipating every interaction between tools. However, Categories and Subject Descriptors D.2.12 [Software Engineering]: Interoperability—Data mapping; I.6.5 [Simulation and Modeling]: Model Development—Modeling methodologies General Terms Design Keywords role modelling, tool integration Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010 October 5, 2010, Oslo, Norway Copyright 2010 ACM 978-1-4503-0292-0/10/10 ...$10.00. 52 new tools are created and existing ones evolve dynamically, which renders perfect anticipation impossible. One can also say that tools must anticipate the unanticipated. An approach to handle unanticipated tool interoperation retroactively is to apply transformations. Transformations convert data created by one tool to a different representation that is understood by another tool. However, in this case each tool keeps its own local version of the data, which yields to data replication. This can in turn lead to inconsistencies and increased memory consumption. Instead of transformations, which copy data, one can also use views, that convert data as needed without keeping replicated data. Unfortunately, such interoperability is often achieved in an ad-hoc manner. Adaptors are implemented to present existing data in a specific view, which can be used by some tool. Thus, the structure of this view (i.e., the interface required by a tool to access data) is not separated from the binding of this interface to some data source. This lack of separation of concerns renders tool integration fragile. If one is not aware of the presence of an access interface, because it is not explicit, changes are applied lightheaded and all existing adaptations need to be adjusted. To overcome the limitations outlined above and to simplify tool interoperability, we propose to use role models as an extension of object-oriented metamodelling. The concept of roles has been originally presented in [18] and applied to the design of frameworks in [19]. In [18] a role model is defined as a unit to isolate an area of concern. To establish interoperability, the concern that each tool deals with is captured by a role model. The contributions of this paper are the following. First, a set of requirements for flexible and durable tool integration are formulated. Second, object-oriented metamodel integration and model transformation—two common techniques for proactive and retroactive tool integration—are analysed w.r.t. these requirements. In particular, the problems induced by requirements that are not met by the two approaches are depicted. Third, a role-based approach for the specification and integration of metamodels is presented, which overcomes the discovered limitations. This paper is organised as follows. After presenting a running example in Sect. 2, we will analyse the problems of existing proactive and retroactive tool integration techniques in Sect. 3. The usage of role models to leverage tool integration will be discussed in Sect. 4. We will compare our approach with related work in Sect. 5 and conclude with Sect. 6. Textual State Machine Editor State name : String from to type Transition condition: String StateType PLAIN INITIAL FINAL 2D Shape Renderer Shape x, y, size: Integer label: String Graph Analysis Tool Kind CIRCLE RECTANGLE LINE Node source Edge invalid:Bool Colour target WHITE BLACK RED Figure 1: Example scenario Third, we want to check our state machines statically against certain well-formedness criteria. For this purpose, a generic graph analysis tool shall be used. This tool is also not aware of concepts specific to the state machine domain, but based on nodes and edges. Since the goal of the analysis tool is to find invalid elements, nodes do carry a boolean attribute invalid. Using the graph analysis tool we would like to check that initial states do not have incoming transitions and that final states do not have outgoing ones. We might also restrict the total number of transitions for one state to 10, because higher numbers indicate a bad design of our state machine model (see [9] for more details about graph constraints and [27] for a similar example). One can see that each of the tools owns a specific metamodel, which captures the domain abstraction that is appropriate for the task of the tool. The depicted metamodels are independent of each other. This gives tool developers the flexibility to choose their domain abstractions freely. Also, changes made to one metamodel do not have implications to other metamodels. From the tool developers perspective, these are nice properties of the design depicted in Fig. 1. However, our goal is to integrate the three tools. We do want to render state machines, which is why states and transitions need to be shapes as well. We would also like to apply the graph analysis tool to state machines. Therefore, states need to represent nodes and transitions need to be handled like edges. In other words, our tools need to share common data. Besides data that is shared across all tools (i.e., states and transitions), there is also data that is required by a subset of the tools only. For example, we might want to show errors stemming from the analysis of a state machine in red colour when rendering state machines. Thus, the tool integration must allow the analysis tool to change the colour of problematic elements. The question raised by this example, is how the three tools should be built in order to allow their interoperability while preserving a high degree of independence. Interoperability includes the interactions mentioned above as well as other interactions, which were not anticipated at development time. The goal of this work is to illustrate how one can incorporate the tool interoperability concern at tool development time and thereby ease unforeseen integrations in the future. 2. RUNNING EXAMPLE We illustrate our approach by a simplified, exemplary scenario of tools we would like to interact. Suppose, we like to create, visualise and validate state machines. To achieve this, we want to integrate three tools that use tool-specific data abstractions. This scenario is depicted in Fig. 1. First, we use a textual editor for state machines. This editor is aware of the domain concepts of state machines (i.e., states and transitions). We will restrict ourselves here to a simplified representation of the state machine domain. Second, we like to visualise our textual state machines graphically, which is why a tool that can layout and render two-dimensional shapes is employed. This tool was not specifically designed to render state machines—it rather uses the concepts of shapes, coordinates and colours. 3. PROBLEM ANALYSIS From the previously discussed example we derive the following requirements for tool implementation and integration: R1 Appropriate Abstraction Efficient tool implementation requires an appropriate data abstraction. Therefore, each tool should operate on a tool-specific data abstraction 53 the integrated metamodel in the integrating one. This is a more invasive influence on the structure of the integrating tool metamodel. For example, to integrate the shape metamodel from Sect. 2 with the state machine metamodel one could define shapes as subtypes of states and transitions. However, this enriches shapes with source and target references, which is not part of the abstraction used by the shape rendering tool. R2 Tool Independence Both delegation and inheritance introduce a strong coupling of the integrating metamodel and the integrated one. Consequently, the integrating tool can hardly be reused in different contexts and strongly depends on the integrated tool. In the basic scenario where a domain feature is renamed (e.g., changing condition to guardExpression), all the integrating metamodels and tools must be changed. R3 Shared Data Both techniques provide means to access and manipulate data among tools. Delegation can be used to access and adapt data from the integrated metamodel. Integration using inheritance is beneficial to adapt a shared abstraction from the outside, but only at predefined places. Thus, it only allows for anticipated extension. For example, if a tool requires floating point coordinates for shapes, changing the type of the attributes x and y of class Shape is not possible. R4 Tool Interaction Both techniques only allow for tool integration that is implemented or anticipated at tool development time. Delegation can be used to navigate from an integrating metamodel to the integrated one. Navigation in the opposite direction is not possible as the integrated metamodel can not be extended from the outside. While metamodel integration by inheritance can be used for anticipated extension of the integrated metamodel, it is not applicable for sharing data between several tools as subclasses are hard to propagate from one tool metamodel to another. For example, if two subclasses—BorderedShape and FilledShape—are introduced by two different tools, they can not process each others shapes since the objects can not be casted to the opposite class. In addition, these integration mechanisms mix two facets of tool integration. The tool metamodels are implemented to present shared data in a tool-specific view. Thus, they intermingle the structure of this view (i.e., the interface required by a tools to access data) with the binding of this interface to the shared data structure. This lack of separation of concerns renders tool metamodels integrated with delegation and inheritance very fragile w.r.t. changes in the shared abstraction. (tool metamodel). R2 Tool Independence Tools should be unaware of each other and reusable in different constellations. Thus, individual tool metamodels should be loosely coupled. R3 Shared Data Tools should be able to access and manipulate shared data. This requires means for integrating and adapting different data abstractions. R4 Tool Interaction Tools should be able to interact if needed. Interaction is required when functionality of one tool relies on data provided by another tool. Again loose coupling is preferred. These requirements seem to contradict: Tool implementation should be independent (R2) and rely on suitable abstractions (R1), but on the other hand their interoperation requires sharing of common data (R3) and tool interaction (R4). As discussed for the example in the previous section, metamodelling enables the implementation of adequate and tool-specific metamodels. However, regarding tool integration current metamodelling approaches have several drawbacks. In the following we will analyse and discuss two forms of tool integration: Proactive Tool Integration, where the metamodels of the tools to integrate are directly connected at tool development time and Retroactive Tool Integration, where the metamodels are untouched and external transformation are implemented during tool integration to synchronise data between tools. 3.1 Proactive Tool Integration A common approach to access shared data is to directly integrate the metamodel of one tool in the metamodel of another. Such approaches have also been studied before by other communities to integrate XML schemas [13]. We call these approaches proactive tool integration as they require the implementation or at least anticipation of integration during the development of the integrated tools. Object-oriented metamodelling supports two ways for such invasive integration of metamodels—delegation and inheritance. With delegation one can import and reference existing metaclasses and thereby decorate existing elements with additional data. By using inheritance one can add new subclasses to imported ones and thereby access, refine and reuse the data of the original classes. Delegation and inheritance is also used in various patterns that prepare anticipated metamodel extension. The pattern presented in [7] works with association classes to link metamodels that shall be integrated. These are inherited from abstract association classes to provide a generic protocol for navigating associations which enables flexible addition of new links and, thus, flexibility for later metamodel extension. In [11] Emerson et. al differentiate further patterns like metamodel interfacing or class refinement for proactive metamodel integration. As depicted in Fig. 2, both delegation and inheritance tightly couple the integrating metamodel to the integrated one. In the following we discuss the details of this integration approach regarding the introduced requirements: R1 Appropriate Abstraction The two techniques for proactive tool integration require an adaptation of the integrating tool metamodel. Integration by delegation imposes no structural restrictions on the integrating metamodel and, thus, does not impair the abstraction used in the tool metamodel. Integration by inheritance integrates the abstraction of 3.2 Retroactive Tool Integration A second approach for tool integration is to implement metamodels independently and to apply transformations to synchronise data from one representation to another. This technique is non-invasive w.r.t. the tool’s metamodels and, thus, typically used to integrate existing tools retroactively. As depicted in Fig. 2, retroactive tool integration does not introduce a dependency between the integrated metamodels. Regarding our requirements for tool integration it has the following properties: R1 Appropriate Abstraction As no tool metamodel is affected by retroactive tool integration this requirement is satisfied. R2 Tool Independence Model transformations provide 54 Proactive Integration Technique Inheritance, Delegation Tool1 MM 2D Shape Renderer Retroactive Integration Shape x, y, size: Integer label: String Transformation Tool1 MM Tool2 MM Tool3 MM Appropriate - tool metamodel needs to be Abstraction adapted Tool - strong coupling of tools Independence Shared Data + shared data among all integrated tools Tool +/- only anticipated interaction Interaction supported Tool2 MM Graph Analysis Tool Kind CIRCLE kind RECTANGLE LINE Node invalid:Bool colour Colour Tool3 MM Edge target WHITE BLACK RED + tool metamodels unaffected source + tools are not coupled Types Notation - data replication - synchronisation neccessary - transformations hinder interaction among several tools Features Notation name name: type Role Type Figure 2: Analysis of existing Tool Integration Techniques Enum Attribute Reference Figure 3: Example of Role-based Tool Metamodels role-model flexible means to connect arbitrary tool metamodels, while the involved tools stay completely independent. R3 Shared Data Model transformations copy and replicate data from one metamodel to another. If they are implemented in a bi-directional fashion, data sharing can be emulated. This is a very flexible coupling mechanism. Tools can easily be integrated by providing new transformations. However, if several tools are involved, all transformations need to be synchronised. Furthermore, transformations are typically performed on whole models and not on individual model elements, which discourages this integration mechanism for scenarios where tools should concurrently and interactively operate on shared data. For example, if transformations are employed to realise the scenario from Sect. 2 one has to wait for the graph analysis to finish, before the rendering tool can draw state machines including the red colouring for elements that are erroneous. If both tools would share data physically, the analysis could run in parallel triggering a repaint for the incorrect elements once the analysis has finished. R4 Tool Interaction When transformations are used for tool integration the interrelation of shared data and tool specific data is implicitly defined in the transformation. Due to this non-invasive data integration it is hard to track relationships of shared and the according tool-specific data which impedes multilateral tool interaction. To comprehend this drawback, consider a tri-lateral interaction of state machines, graphs and 2D rendering. First a state machine is transformed to a graph representation. Once the resulting graph is analysed, the problematic node can be tagged invalid. But, to render invalid states in a red colour, the rendering tool needs to know which state was transformed to a node that was tagged invalid. Tracking and using the relation between states and nodes in the rendering tool unnecessarily complicates the realisation of such tool interaction. 3.3 Literal literals * Type Enum type enums* Role Model roles * Role PrimitiveType RoleFeature * roleFeature Attribute Reference attributeType Figure 4: Language for Role-based Metamodelling specific abstractions. In the following we will introduce an approach for role-based tool integration which combines the benefits of proactive and retroactive integration by providing both means for tool independence and tool interaction. 4. ROLE-BASED TOOL INTEGRATION To overcome the limitations outlined above and to increase flexibility of tool interoperability, we propose to use role models as an extension of current object-oriented metamodelling. The concept of roles has been originally presented in [18] and applied to the design of frameworks in [19]. Conceptually, a role model captures an area of concern [18]. This motivates the application to achieve tool independence. On the other hand, role modelling introduces the technique of role composition [2] to integrate several role models and object-oriented system specifications to an interacting system implementation. In the following we will introduce role-based metamodelling and elaborate how role modelling and role composition helps to achieve tool independence and interaction, respectively. Conclusion of Problem Analysis 4.1 In Fig. 2 we concluded the most important characteristics of both integration approaches w.r.t. the requirements defined above. Given this characterisation one can conclude that no approach satisfies all requirements. While proactive tool integration is more appropriate for shared data and tool interaction, retroactive integration better addresses the need for tool independence and does not interfere with tool- Role Models for Tool Independence In [18] a role model is defined to capture an “archetypical pattern of objects” that enables the specification of a specific concern of system behaviour. Thus, role models can be used to define a specific abstraction to implement a particular tool. Each individual role defines a type of object required to achieve the tools functionality. 55 2D Shape Renderer Shape x, y, size: Integer label: String models defined in our example. For the 2D Shape Renderer both Transition and State play the role Shape. From this example we see that roles can be bound by several role players. Likewise a role player can be bound to several roles. To integrate the state machine metamodel with the metamodel of the Graph Analysis Tool, Transitions are bound to the Edge role and States are bound to the Node role. These bindings are completed by the role composition specification depicted in Listing 1. The language used to specify these bindings can be either declarative or imperative (e.g., using Java). Depending on the implementation of the role bindings (see Sect. 4.3), the bindings need to be translated to the target platform of the integration or interpreted at runtime. Up to now, the physical representation of shared data has not been specified. To do so, we introduced the concept of RoleGrounding (cf. Fig. 6) to role composition. Grounding is used to identify roles and role features that physically materialise data. In contrast to proactive integration where newly developed tools need to adapt to an existing and fixed data materialisation or retroactive integration where data is replicated and materialised in various representations, this allows to postpone the decision on data materialisation until tool integration time. Graph Analysis Tool Kind CIRCLE kind RECTANGLE LINE Node invalid:Bool colour Colour source Edge target WHITE BLACK RED Textual State Machine Editor State name: String Transition from condition: String to StateType type PLAIN INITIAL FINAL Grounding Notation Name name: Type Grounded Role Grounded Attribute Binding Notation name Grounded Reference Role Binding Figure 5: Example of Role Composition for Tool Interaction integrate statemachine , 2 dShapes , graph { State plays Shape { label : name kind : if ( player . type == PLAIN ) return RECTANGLE else return CIRCLE colour : if ( player . type == INITIAL ) return WHITE else return BLACK } Fig. 3 depicts a role-based version of the tool metamodels introduced in Sect. 2. Instead of concrete class-based data structures these metamodels use role types for all concepts of a tool metamodel. To model data required to implement tool functionality, role types can define role attributes that provide primitive values or references that connect several role types. Role models can, thus, be considered as a mechanism for defining the required interface of a specific tool. Compared to current metamodelling approaches role models do not bind this interface to a concrete data source or representation. Thus, in contrast to using interfaces and classes, roles do not distinguish between data that is physically represented and data that is obtained from another object. This distinction is made by the grounding specification, which will be discussed soon. As depicted in Fig. 3 for the 2D Shape Renderer, we use role features to specify that every player of Shape role needs to provide a label and the kind of shape it is represented by. In the metamodel of the Graph Analysis Tool role references are used to provide the source and target Node of an Edge. The role-model metamodelling language used to define the metamodels is depicted in Fig. 4. It provides means to define RoleModels consisting of several Roles. Each role contributes a set of RoleFeatures that are either Attributes with a PrimitiveType or references with a complex Type. In addition to Roles, Enums are allowed as complex Types. 4.2 Transition plays Shape { label : condition kind : return LINE colour : return BLACK } State plays Node {} Transition plays Edge { source : from target : to } ground State { name , type } ground Transition { condition , from , to } } Listing 1: Role Composition Specification Fig. 5 and Listing 1 depict the RoleGroundings and RoleFeatureGroudings used to specify data materialisation for our example. The example shows that grounding complements the flexibility of role binding as it allows for an integration-specific adjustment of data materialisation. Data that shall be used by both the Graph Analysis Tool and the Renderer is materialised and shared using the roles in the state machine role model. Data which is only required within a specific tool (e.g., the x-, y, and size-coordinates of a Shape) is materialised in the respective role model. Such flexible means to specify data sharing and material- Role Composition for Tool Interaction Tool interaction can now be achieved by specifying a RoleBinding relationship between Roles of several tool RoleModels (cf. Fig. 5 and 6). Each role binding connects a role of one role model to the role it plays in another role model. For each role binding RoleFeatureBindings can be used by the role player to bind attributes and references of the role. This makes role composition suitable for the specification of data sharing. Fig. 5 depicts the bindings used to integrate the tool meta- 56 role-composition Role Binding Implementation RolePlayer plays Role { roleFeature: playerFeature } groundings * RoleGrounding RoleFeatureGrounding * featureGroundings GenericRoleTypeInterface hasRoleType() getRoleByType() * bindings Composition RoleBinding RoleFeatureBinding Role roleFeature: Type * featureBindings role-model models * role player Role Model Role roles * role RoleTypeInterface getRoleFeature() setRoleFeature() binds RoleFeature * roleFeature RolePlayer playerFeature: Type grounds role getRoleFeature() setRoleFeature() Figure 6: Composition Language for Role-based Metamodelling getPlayerFeature() player setPlayerFeature() getRoleFeature() { return player.getPlayerFeature(); } setRoleFeature(value) { player.setPlayerFeature(value); } isation redeems tool developers from the challenge of anticipating extensibility required for future tool integration and avoids problems of data replication and synchronisation. 4.3 RolePlayer RoleTypeImpl Grounding Implementation ground Role { roleFeature } Implementing Role Composition Role roleFeature: Type Up to now, the languages to specify both the data abstractions used by tools (i.e., the role models) and the possible interactions across them (i.e., the role bindings) have been presented. The former specification did also include the decision about which data needs to be represented physically (i.e., the grounding). To actually use role-based metamodelling for tool interoperability, the question how to implement both role bindings and groundings needs to be answered. As there is no single answer to this, because roles can be implemented in various ways (see [20] for an overview), we will present one feasible solution here. The aim of this is to show that the presented approach can easily be implemented based on classic object-oriented technology. It does not raise the claim to be a universal solution. Our mapping of roles, role bindings and groundings is depicted in Fig. 7. The most simple mapping is the one for grounded roles and grounded features (i.e., attributes and references). These are basically mapped to plain classes and features respectively. As grounded roles and features are selected to be the part of the role model that is physically represented, this mapping is straight forward. To explain the mapping of role types, one must keep in mind, that role types can only exist in collaboration with one or more role players, where each player is connected to the role with a role binding. We map this relation to objectoriented models as shown in Fig. 7. For each role type a dedicated interface (RoleTypeInterface) is introduced, which defines the features of the role. This interface is a supertype of both the implementation of the role type (RoleTypeImpl) and the player type. The latter inheritance relation reflects the fact that players need to provide all features that are expected by clients of the role. This pattern is quite similar to the Role Object Pattern [8]. We just added an additional interface GenericRoleTypeInterface to provide reflection facilities which can be used to check whether objects play a certain role (hasRoleType()) or obtain roles played by an object by role type (getRoleByType()). The pattern can be applied on a pair of types in both RoleTypeImpl roleFeature: Type getRoleFeature() setRoleFeature() Figure 7: Patterns for Object-oriented Implementation of Role Binding and Grounding directions. Consider for instance a pair of types A and B, then it is possible to have one role bindings where A is the player and B is the role and a second binding where B is the player and A is the role. Only the bound role features need to be distinct and free of cycles. Consequently, our appoach does not distinguish between an integrated and an integrating metamodel, but supports different directions for individual bindings and, thus, flexible data propagation in both directions. The role binding specification presented before (cf. Listing 1), is used to fill the implementation of the RoleTypeImpl class. If the role binding is a declarative one (e.g., label: name), appropriate code needs to be generated to delegate calls to the methods getLabel and setLabel to getName and setName respectively. For bindings that use the imperative style (e.g., the binding for the colour feature) the code needs to be translated to the target implementation language. Depending on the style of the binding (i.e., either declarative or imperative), role bindings can be information preserving or not. Whether this is a requirement depends on the concrete domains that are bound to each other. Additionally, if the imperative style is used, one should make sure that get and set operations are inverse to each other. A complete mapping of all role bindings, feature bindings and groundings results in a plain object-oriented model. Such a model implements the integration defined by the role model and the role binding specification. Moreover, this object-oriented model can be derived fully automatically. If a different tool integration is needed, the role binding and the grounding can be changed, leaving the involved tools 57 untouched. To obtain a sound integration of all tools, all role types and features must be either bound or grounded. This can easily be explained by the fact that the data required by tools must be either derived—by evaluating role bindings— or available in materialised form—in accordance to groundings. It is important to realise that even though the implementation sketched above is based on inheritance, it is different from plain inheritance-based approaches to interoperability. The patterns that employ inheritance to achieve integration are automatically derived in our approach, whereas using inheritance “as is” does not enforce tool developers to apply the patterns correctly. integrate statemachine , 2 dShapes , graph { State plays Shape { colour : if ( player . hasRoleType ( Node ) && player . getRoleByType ( Node ). isInvalid ()) return RED else if ( player . type == INITIAL ) return WHITE else return BLACK } } Listing 2: Reflection on Role Bindings 5. RELATED WORK 4.4 Tool integration and interoperability is a broad field and received a lot of attention in the literature (see [26] for an exhaustive survey). With the increasing popularity of model-based technology, the problem has been tackled from a different angle. Armed with tools such as standardised metamodelling languages (e.g., EMOF), unified data formats (e.g., XML Metadata Interchange (XMI) [24]) and model transformation languages (e.g., Query/View/Transformation (QVT) [25]), substantial improvements have been made. Transformation-based approaches have been employed to synchronise various kinds of data [10, 22]. In principle any model transformation language can be employed for this task. However, some transformation languages are more suitable than others. For example, a very prominent approach in the field (i.e., Triple Graph Grammars [17]) was implemented in the MOFLON tool suite [1] and used to integrate heterogeneous tools. Furthermore, high-level specification languages were proposed to ease the specification of relations between domain models [12]. All these transformational approaches do however still suffer from the drawbacks mentioned before. Data is replicated and tools cannot operate concurrently on the same data. Both drawbacks do not apply to role-model based tool integration, which is an advantage of our approach. Nonetheless, transformations are a good choice in the presence of existing metamodels. If the domain abstraction used by a tool has not been captured in a role model and can not be changed anymore, our approach is not applicable. Metamodel integration approaches do import and reuse metamodels to establish interoperability rather than transforming data required by heterogeneous tools. For example, in [3] a common metamodel is proposed to establish tool interoperability. While this is a feasible approach if one is focused on a specific, closed domain, it strongly couples the involved tools the common metamodel. The classical mechanisms for reuse (i.e., delegation and inheritance) have been subject to an analysis in this paper. Based on these two mechanisms, a pattern that allows to create metamodels that are extensible was presented in [7]. This approach focuses on the level of metamodels (M2). It is similar to our approach, but we do use roles as a first-class concepts during modelling, instead of emulating it using a modelling pattern. Using a dedicated concept for integration makes interfaces explicit and avoids errors compared to applying a pattern. Besides the classical mechanisms for extension and reuse, several others were proposed in the past [11]. Not all these Contributions of Role-based Metamodelling In the following we will conclude the contributions of rolebased metamodelling w.r.t. the requirements for tool implementation and integration defined in Sect. 3. R1 Appropriate Abstraction Role models are meant to decompose systems into units of concerns. Thus, our rolebased metamodelling approach satisfies the requirement to provide data abstractions customised for tool specific needs. As role composition contributes advanced means to integrate different role models the need for tool interaction does not interfere with the design of tool specific abstractions. R2 Tool Independence Each role model can be specified independent of other role models. Consequently, tools can be implemented independently. Role composition is performed in a separate step and provides a technique to loosely couple tools. This resolves the tool interdependence issues experienced with proactive integration approaches. Thus, changes (renaming, extension, refactorings) that result from the evolution of any involved tool can be supported by adapting the composition specification and do not interfere with the metamodels used in other tools. R3 Shared Data Role composition takes several role models and provides means to compose them to an integrated system specification. The composition process integrates role players and the roles they play in accordance to the role feature bindings. These bindings implement the desired data sharing between the role models of several tools. Role grounding provides means to precisely and flexibly specify physical data representation at integration time. Data synchronisation is supported in any direction and duplication as found in retroactive integration approaches is prohibited. R4 Tool Interaction Role composition can be used to share data and, thus, enable interaction of the corresponding tools. Furthermore, our role composition approach allows to reflectively access the role bindings of a role player which allows for more advanced forms of tool interaction. Consider for instance the interaction of the 2D Shape Renderer and the Graph Analysis Tool introduced previously: Every time the Node a State plays is marked invalid by the graph analysis tool the corresponding Shape should be drawn in a RED colour. This interaction can be achieved by the role composition defined in Listing 2 that is based on the reflection facilities introduced by the pattern in Fig. 7. This leverages multilateral tool integration while still preserving the independence of each individual tool. 58 view on data required by tools. These views are bound to other views or physical data representations at tool integration time. Thus, decisions about interoperability, which were previously made by tool developers are left to the tool integrators. This increases the degree of freedom at integration time and therefore eases establishing tool interoperability. The use of role models to specify tool interfaces decouples the specification of the data required by a tool, from the binding of this interface to a concrete data source or representation. The latter is established by role bindings, which are created by the tool integrator, as she is the one, who knows which data needs to be shared by which tools. Still, designing metamodels using roles is similar to using classic object-oriented metamodelling facilities. In the future, further investigations are needed to prove the feasibility of the approach in industrial scenarios. An implementation based on the Eclipse Modeling Framework (EMF) [21] is planned. This will allow to use the existing EMF infrastructure and to test integration of various tools that are based on EMF. Based on such an implementation, the validation of the role compositions to leverage the reliability of composition specifications could be performed. Also, the full potential of model-based specifications of tool integrations can be exploited. While we do not foresee conceptual problems in this regards, the implementation and validation on a representative case study need to be performed. In this paper we sketched the idea and illustrated its benefits on a small toy example only. Another important issue that needs to be addressed is the migration of existing data. Once the landscape of integrated tools changes (e.g., new tools are included), the grounding of data may change and existing data needs to be transferred to the new physical representation. To achieve such a migration an analysis of the changes made to the role bindings and grounding is needed. The main limitation of the role-based metamodelling approach is that it needs to be applied during tool design time to prepare tools for interoperability. It can not be applied as it is to integrate existing tools. Nonetheless, we strongly believe that the benefits gained at tool integration time outweigh enforcing tool developers to design their tools to allow for interaction. The fact that the proposed modelling approach anticipates all potential future integration scenarios by construction frees tool developers from the burden of anticipating the unanticipated. mechanisms have been investigated with regard to tool interoperability. To the best of our knowledge, this paper is the first to investigate how to use roles as metamodel integration concept. Evaluating other mechanisms is subject to future work. A general problem related to the integration of metamodels is their semantics or their relations in general. To find correct mappings between metamodels, the use of ontologies has been proposed [16]. Also, to capture relations across metamodels, mega models can be used [5]. While ontologies may certainly help in identifying common or related concepts, we left this task to the tool integrator. This derivation of metamodel mappings based on ontologies, is independent of its realisation using a particular integration technique. Depending on the chosen technique (e.g., transformations or metamodel integration) the resulting integrations do still suffer from the drawbacks identified in Sect. 3. If tool integration spans multiple technological spaces, which requires to align the concepts of the spaces first (M3), before aligning the concepts of tools that reside in one particular space. An example for such an integration can be found in [6]. Here, tools participating in an interoperability scenario, need to provide explicit metamodels, which are then aligned to each other. In contrast, we assume tools to reside in the same technological space, but focus on flexible adaptation of tools with different domain abstractions. The integration of heterogeneous tools and the adaptation to different repositories has also been tackled by the Modelbus project [14]. However, the project aimed at providing infrastructure to integrate tools based on existing technology (e.g., model transformation languages). This can ease the task of tool integration to some degree, but the problem of integrating metamodels and creating transformation specifications is not resolved by this infrastructure. After all, the separation of tools (i.e., their functionality) and the data they process (i.e., their repository) is a crucial point within the topic of tool integration. This has been coined in [4] as the service vision. The work presented in this paper is along this line and may therefore be conceived as one approach to model tools as services. 6. CONCLUSION This paper was motivated by the need to integrate tools in order to reuse their functionality. To allow such reuse, tools must exchange data that is relevant to perform certain tasks. While standardised metamodelling languages have made this substantially easier, there are still plenty of open problems. To identify some of them, we have carefully analysed existing techniques for sharing and synchronising data based on models (i.e., object-oriented metamodel integration and model transformation). We have identified drawbacks for both approaches. While proactive tool integration does not allow tools to be independent of each other and interferes with tool-specific abstractions, retroactive tool integration using transformations does not support data sharing and impedes multilateral tool interaction. In the latter case, interoperability was not anticipated at all, while in the former it is anticipated only to a certain degree. This limited anticipation of tool interoperability at tool design time is what makes the lives of tool integrators so hard. To explicitly anticipate arbitrary future interactions between tools, we propose to use role models to specify the 7. REFERENCES [1] C. Amelunxen, F. Klar, A. Königs, T. Rötschke, and A. Schürr. Metamodel-based Tool Integration with MOFLON. In W. Schäfer, M. B. Dwyer, and V. Gruhn, editors, ICSE, pages 807–810. ACM, 2008. [2] E. P. Andersen. Conceptual Modeling of Objects: A Role Modeling Approach. Ph.D. Thesis. Oslo, Norway, University of Oslo, 1997. [3] A. Baumgart. A common meta-model for the interoperation of tools with heterogeneous data models. In Hein and Wagner [15]. [4] J. Bézivin, H. Brunelière, J. Cabot, G. Doux, F. Jouault, and J.-S. Sottet. Model Driven Tool Interoperability in Practice. In Hein and Wagner [15]. [5] J. Bézivin, F. Jouault, and P. Valduriez. On the Need for Megamodels. In Proceedings of the 59 [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] OOPSLA/GPCE: Best Practices for Model-Driven Software Development workshop, 19th Annual ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, 2004. H. Bruneliere, J. Cabot, C. Clasen, F. Jouault, and J. Bézivin. Towards Model Driven Tool Interoperability: Bridging Eclipse and Microsoft Modeling Tools. In T. Kühne, B. Selic, M.-P. Gervais, and F. Terrier, editors, ECMFA, volume 6138 of Lecture Notes in Computer Science, pages 32–47. Springer, 2010. S. Burmester, H. Giese, J. Niere, M. Tichy, J. P. Wadsack, R. Wagner, L. Wendehals, and A. Zündorf. Tool Integration at the Meta-Model Level within the FUJABA Tool Suite. International Journal on Software Tools for Technology Transfer (STTT), 6(3):203–218, August 2004. D. Bäumer, D. Riehle, W. Siberski, and M. Wulf. The Role Object Pattern. In Proceedings of the 4th Pattern Languages of Programming Conference (PLoP’97), Washington University Dept. of Computer Science, Tech. Report (wucs-97-34), 1997. H. Ehrig, K. Ehrig, A. Habel, and K.-H. Pennemann. Constraints and Application Conditions: From Graphs to High-Level Structures. In H. Ehrig, G. Engels, F. Parisi-Presicce, and G. Rozenberg, editors, ICGT, volume 3256 of Lecture Notes in Computer Science, pages 287–303. Springer, 2004. K. Ehrig, G. Taentzer, and D. Varró. Tool Integration by Model Transformations based on the Eclipse Modeling Framework. EASST Newsletter, June 2006. M. Emerson and J. Sztipanovits. Techniques for Metamodel Composition. In OOPSLA – 6th Workshop on Domain Specific Modeling, pages 123–139, October 2006. M. D. D. Fabro, J. Bézivin, and P. Valduriez. Model-Driven Tool Interoperability: An Application in Bug Tracking. In R. Meersman and Z. Tari, editors, OTM Conferences (1), volume 4275 of Lecture Notes in Computer Science, pages 863–881. Springer, 2006. W. Fan and P. Bohannon. Information Preserving XML Schema Embedding. ACM Transactions on Database Systems, 33(1), 2008. C. Hein, T. Ritter, and M. Wagner. Model-Driven Tool Integration with ModelBus. In Workshop Future Trends of Model-Driven Development, 2009. C. Hein and M. Wagner, editors. 3rd Workshop on Model-Driven Tool and Process Integration, Co-located with ECMFA 2010, 16th June 2010, Paris, France, 2010. E. Kapsammer, H. Kargl, G. Kramler, T. Reiter, W. Retschitzegger, and M. Wimmer. On Models and Ontologies - A Layered Approach for Model-based Tool Integration. In Proceedings of the Modellierung 2006 (MOD2006), pages 11–27, 2006. A. Königs and A. Schürr. Tool Integration with Triple Graph Grammars - A Survey. Electronic Notes in Theoretical Computer Science, 148(1):113–150, 2006. T. Reenskaug, P. Wold, and O. Lehne. Working with Objects: The OOram Software Engineering Method. Manning Publications, Greenwich, CT, 1996. D. Riehle and T. R. Gross. Role Model Based [20] [21] [22] [23] [24] [25] [26] [27] 60 Framework Design and Integration. In Proceedings of the 13th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA ’98), pages 117–133, 1998. F. Steimann. On the Representation of Roles in Object-oriented and Conceptual Modelling. Data Knowledge Engineering, 35(1):83–106, 2000. D. Steinberg, F. Budinsky, M. Paternostro, and E. Merks. Eclipse Modeling Framework (2nd Edition). Pearson Education, 2009. Y. Sun, Z. Demirezen, F. Jouault, R. Tairas, and J. Gray. A Model Engineering Approach to Tool Interoperability. In D. Gasevic, R. Lämmel, and E. V. Wyk, editors, Software Language Engineering, First International Conference, SLE 2008, Toulouse, France, September 29-30, 2008. Revised Selected Papers, volume 5452 of Lecture Notes in Computer Science, pages 178–187. Springer, 2008. The Object Management Group. Meta Object Facility (MOF) Core Specification, version 2.0. Technical report, January 2006. The Object Management Group. Meta Object Facility(MOF) 2.0 XMI Mapping Specification, v2.1.1. Technical report, December 2007. The Object Management Group. Meta Object Facility (MOF) 2.0 Query/View/Transformation Specification. Technical report, April 2008. M. N. Wicks and R. G. Dewar. Controversy Corner: A new research agenda for tool integration. Journal of Systems and Software, 80(9):1569–1585, 2007. J. Winkelmann, G. Taentzer, K. Ehrig, and J. M. Küster. Translation of Restricted OCL Constraints into Graph Constraints for Generating Meta Model Instances by Graph Grammars. Electronic Notes in Theoretical Computer Science, 211:159–170, April 2008. Aligning Business and IT Models in Service-Oriented Architectures using BPMN and SoaML Brian Elvesæter Dima Panfilenko Sven Jacobi & Christian Hahn SINTEF ICT P. O. Box 124 Blindern N-0314 Oslo, Norway +47 22 06 76 74 DFKI IWi Stuhlsatzenhausweg 3, Campus D3.2 D-66123 Saarbruecken, Germany +49 681 85775 7777 Saarstahl Bismarckstraße 57-59 D-66333 Voelklingen. Germany +49 6898 10 3476 brian.elvesater@sintef.no dima.panfilenko@dfki.de {sven.jacobi | christian.hahn} @saarstahl.com Although SOA concepts, business models and service technologies has been a hot topic the last few years, the alignment of business and IT models still remain a challenge. Furthermore, although modelling is now an integrated part of software engineering approaches, standardised modelling languages to support SOA has been lacking. SHAPE (Semantically-enabled Heterogeneous Service Architecture and Platforms Engineering) (ICT-2007-216408) (http://www.shape-project.eu/) is a European Research Project under the 7th Framework Programme that has developed an infrastructure for model-driven engineering (MDE) for SOA with support for various technology platforms [1]. The SHAPE technologies resolve around the new Service oriented architecture Modeling Language (SoaML) specification [2] from the Object Management Group (OMG). SoaML aims at providing a common modelling language to business and system architects. In the SHAPE project we have defined an MDE approach to SOA that incorporates the use of business modelling formalisms such as BPMN and provide mappings to SoaML to help the business and system stakeholders to align their business requirements and IT system implementations. ABSTRACT In this paper, we introduce the new Service oriented architecture Modeling Language (SoaML) and describe how the language can be used to align business models and IT models. In particular we provide a mapping specification from BPMN models to SoaML models. Categories and Subject Descriptors D.2.11 [Software Architectures]: Service-oriented architectures. D.2.12 [Software Engineering]: Interoperability. General Terms Design, Standardization, Languages, Theory. Keywords Business modelling, service modelling, business and IT alignment, BPMN, SoaML. 1. INTRODUCTION This paper is structured as follows: In Section 2 we give an overview of the SoaML language. Section 3 describes our requirements, mapping rules and tool support for the business and IT alignment between BPMN and SoaML. In Section 4 we present an illustrative example taken from one of the industrial use cases in the SHAPE project. Section 5 discusses our results and findings. Finally, Section 6 concludes this paper. There is an industrial interest in ensuring a good connection and mapping between business models as expressed in enterprise architectures and IT models as expressed in technical system architectures, which are commonly realised as service oriented architectures (SOAs). The increasing popularity of the SOA paradigm relies on its closeness to business models, in particular business processes. The concepts of SOA apply both to business architectures as well as system architectures. From a business perspective the SOA describes the business-critical processes, contracts, information and capabilities of the enterprise. From an IT perspective the SOA describes the software components, their service interfaces and how these components can be coupled to form a technical system architecture that supports the business requirements of the enterprise. 2. SoaML The Service oriented architecture Modeling Language (SoaML) specification [3] defines a UML profile and a metamodel for the design of services within a service-oriented architecture. The goals of SoaML are to support the activities of service modelling and design and to fit into an overall model-driven development approach. The SoaML profile defines extensions to UML to support the range of modelling requirements for service-oriented architectures, including the specification of systems of services, the specification of individual service interfaces, and the specification of service implementations. This is done in such a way as to support the automatic generation of derived artefacts following an MDA based approach. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10...$10.00. According to the specification, SoaML has been designed to support both an IT and business perspective on SOA. Our 61 experiences with the SoaML language, in the context of tool and method implementation in the industrial use case, have suggested that a clearer separation of the business-level and IT-level concepts are needed. In the context of SHAPE we have made these levels more explicit. Figure 1 illustrates the separation. Business Perspective on SOA Business Processes and Participants Business Goals Services Architecture Capabilities Service Contracts Business and IT alignment IT Perspective on SOA Service Interfaces Interfaces and Messages Service Choreographies Components and Ports Figure 2. UML extensions for business concepts Figure 1. Business and IT concepts of SoaML The language constructs from SoaML that are most suitable at the IT level are service interface and its behaviour (i.e. service choreography), interface, message type, components (i.e. participants) and service and request ports (see Figure 3). In the business perspective on SOA we suggest to integrate the use of the SoaML language with the Business Motivation Model (BMM) language [4] to define business motivation models and the Business Process Model and Notation (BPMN) language [5] to define business processes. Motivation models and business processes are important aspects to be included when modelling the business perspective on SOA. The SoaML specification defines relationships to BMM and the BMM specification defines relationships to BPMN which allows for this integration of languages. In this paper we focus on the relation between BPMN and SoaML in order to align process models from BPMN with service models for SOA. The language constructs from SoaML that are most suitable at the business level are participant, services architecture, service contracts and capability (see Figure 2). Service interfaces are used to describe the operations provided and required to complete the functionality of a service. A service interface can be used as the protocol for a service port or a request port. Service data are used to describe service messages and message attachments. The message type is used to specify the information exchanged between service consumers and providers. An attachment is a part of a message that is attached to rather than contained in the message. Services architectures are used to define how a set of participants works together for some purpose by providing and using services. A services architecture describes how participants work together by providing and using services expressed as service contracts. It should be noted that some of the language constructs defined in SoaML fit on both the business and IT level. In particular this applies to participants that are used to define the service providers and consumers in a system. At the business level the participants typically represent business organization units or roles, whereas on the IT level the participants typically represent IT systems or software components. When a participant acts as a provider it contains service ports, and when a participant acts as a consumer it contains request ports. Service contracts are used to describe interaction patterns between service entities. A service contract is used to model an agreement between two or more parties. Each service role in a service contract has an interface that usually represents a provider or a consumer. SoaML is agnostic to the choice of modelling formalisms to define behaviour. The specification states than any UML behavioural constructs can be used to describe behaviour such as service choreographies, but also other formalisms such as BPMN can be used. Participants are used to define the service providers and consumers in a system. A participant may play the role of service provider, consumer or both. Capabilities represent an abstraction of the ability to affect change. Capabilities identify or specify a cohesive set of functions or resources that a service provided by one or more participants might offer. Capabilities can be used by themselves or in conjunction with participants to represent general functionality or abilities that a participant must have. 62 UML, or BPEL, are used instead of EPCs. The reference models provided by SAP are also defined using EPC methodology. EPCs offer a variety of ways to analyse processes and identify both quantitative and qualitative improvement options. The Business Process Management Initiative (BPMI) (http://www.bpmi.org/) developed an initial standard called Business Process Modelling Notation (BPMN) that was adopted by the OMG and renamed to Business Process Model and Notation (BPMN) [5]. The primary goal of BPMN is to provide a notation that is readily understandable by all business users, from the business analysts that create the initial drafts of the processes, to the technical developers responsible for implementing the technology that will perform those processes, and finally, to the business people who will manage and monitor those processes. Thus, BPMN creates a standardised bridge for the gap between the business process design and process implementation. Another goal, but no less important, is to ensure that XML languages designed for the execution of business processes, such as BPEL4WS (Business Process Execution Language for Web Services), can be visualised with a business-oriented notation. Furthermore you have the possibility to create organisational units. With pools and lanes you can manage your organisation view of the process. Another aspect is that you are able to communicate between pools and lanes. In general the BPMN and SoaML models can be seen as different architectural viewpoints on the enterprise model, and coupled to the enterprise and information, and computational viewpoints respectively from the Reference Model for Open Distributed Processing (RM-ODP) [7-10]. Indeed, BPMN is focused rather on the enterprise processes and information, whereas SoaML primarily describes the structure of the service architecture. The models we create with the BPMN and SoaML standards could be seen as architectural viewpoints according to IEEE 1471 [11], which suggests a viewpoint-based modelling approach for supporting different stakeholders in the system development process. Figure 3: UML extensions for IT concepts 3. ALIGNING THE BUSINESS AND IT PERSPECTIVES ON SOA 3.1 Requirements For the support of the different roles in a collaborative modelling project one can think of appropriate modelling formalisms. In the aligning of the business and IT perspectives there are obviously at least two roles that can be considered – business architect and system architect. They both are experts in their area but are not necessarily using the same notations for representing the same concepts, For that reason the two formalisms are described in the following that can be used by these respective users for modelling. 3.2 BPMN to SoaML Mapping Rules In this section the mapping rules for the model transformation between BPMN and SoaML are presented. The challenge here is in transforming BPMN models to SoaML in order to generate the appropriate system relevant constructs for SoaML according to the generic business context on the computation independent model (CIM) level. The tool support for that is implemented within CIMFlexMT (see Section 3.3), which supports in its initial version the model-to-model transformation by making use of the Atlas Transformation Language (ATL) [12]. First the simple oneto-one rules are presented and then patterns for recognizing the SoaML service contracts are introduced. Business users may use a business process modelling formalism such as Event-driven Process Chain (EPC) [6] to represent their workflows. Process chains describe the sequencing and interaction between data, process steps, IT systems, organisational structure and products. An EPC always starts and ends with events, which define the state or condition under which a process starts and the state under which it ends. An event may initiate multiple functions at the same time; similarly, a function may result in multiple events. To represent these branches and processing loops in an EPC, a connector (or rule) is used. However, instead of acting simply as graphical connections, the connectors also define the logical links between objects, such as “and” or “either/or.” EPCs are typically used at the higher levels of the process hierarchy. If more technical details of business processes need to be described, other methods, such as BPMN, Mapping Rule 0: Process to Services Architecture A services architecture has components at two levels of granularity: The community services architecture is a ”top level” view of how independent participants work together for some purpose. The services architecture of a community does not assume or require any one controlling entity or process. A participant may also have a participant services architecture, which specifies how parts of that participant (e.g., departments within an organization) work together to provide the services of 63 the owning participant. Participants that realize this specification must adhere to the architecture it specifies. Mapping Rule 3: Pool to Participant (Community-level) A pool in BPMN stands for a business entity or a participant of a process, on the one hand. It also can be structured with respect to further participants of the process, thus creating a participants’ hierarchy. These two points together to map the pool onto a role in a community-level services architecture that has a participant type matching the pool. Table 3 illustrates the mapping of the notation. The services architecture is aligned with the business process, and the participants and service contracts can be derived from the pools or lanes and activities in the business processes respectively following these guidelines: • • Identify public and collaborative business processes that involve interactions and potential usage of software services between different business organizations. These processes are candidates for public community-level services architectures in SoaML that describe the service contracts between the business organizations. Table 3. Pool to Participant (Community-level) BPMN Construct Identify private business processes for the business entities under your ownership control that are involved in the services architecture under consideration. These processes are candidates for private participant-level services architectures in SoaML that describe the service contracts between the internal organizational roles or units within the business organization. < < Part icipa nt > > Po ol < <Serv icesArc hit ec t ur e> > XY Notation Ro le: Poo l Mapping Rule 1: Task to UML Action Mapping Rule 4: Lane to Participant (Participant-level) A task describes an activity that is possibly providing a useful output that could be consumed by the participants of the process. It can be then mostly closely assigned to an action construct in UML as it gives the abstract interface for the job done and at the same time does not give further specification of the workflow implementing this task. In the CIM manufacturing example it means all three Tasks “Prepare Order”, “Purchase” and “Receive Order” are mapped to actions. Table 1 illustrates the mapping of the notation. A lane represents a participant or a department in BPMN and is situated in a pool, thus showing the two-tier hierarchy. In order to show the possibility for further subdivision (which is also ongoing in the current BPMN2 proposals), the lane is mapped to a role in a participant-level services architecture that has a participant type matching the lane. The participant-level services architecture must adhere to the community-level services architecture for which the corresponding pool participants (see rule 2) belongs. Table 4 illustrates the mapping of the notation. Table 1. Task to UML Action Construct Pool SoaML Participant Role in a Community-level Services Architecture Table 4. Lane to Participant BPMN SoaML Task Action Construct Notation BPMN SoaML Lane Participant < < Part icipa nt > > < < Part icipa nt > > Lane 1 Lane 2 < <Serv icesArc hit ec t ur e> > Poo l Notation Mapping Rule 2: Sub-Process to Services Architecture Role1: Lane1 A sub-process represents a more complex process than a simple task, but still can be seen as a whole. It can be assigned to a lower-level, e.g. participant-level services architecture that details the roles and tasks of the sub-process. It should be mentioned, though, that this services architecture is not necessarily the bottom level and can be subdivided further (through roles). Table 2 illustrates the mapping of the notation. Mapping Rule 5: Message “Begin” to Service The beginning point of each and every message in BPMN has the following semantics – it should be the starting end of the data channel between two participants or pools. This exact meaning also has the service port in SoaML, which finds its accordance in this mapping point. The participants in SoaML are using this construct in order to provide services for other participants in the modelled architecture. Table 5 illustrates the mapping of the notation. Table 2. Sub-Process to Services Architecture Construct BPMN SoaML Sub-process Services Architecture Role2: Lane2 < <Serv icesArc hit ec t ur e> > XY Notation 64 Table 7. Process fragment (pattern) to Service Contract Table 5. Message “Begin” to Service Construct BPMN SoaML Message “Begin” Service Construct BPMN Lane1 Lane2 SoaML Service Contract < <int er face> > Lane1_T ask1_I nt er fac e Notation < <int er face> > Lane2_T ask2_I nt er fac e Notation < <Serv iceCont ract > > Lane1_Lane2 Lane1_Role: Lane1_T ask1_Int erface Mapping Rule 6: Message “End” to Request Lane2_Role: Lane2_T ask2_Int erface The ending point of each and every message in BPMN has the semantics that looks very alike with the message beginning point, but is situated on the other end of the communication channel. The similar semantics of the request port in SoaML offers this construct to be mapped to the messaging end from the BPMN. The aim of this mapping is the reflexion of the data channel target in the service consumption of the modelled architecture. Table 6 illustrates the mapping of the notation. 3.3 Tool Support In last section we provided mapping rules of the high CIM-level service modelling with the aid of the BPMN. This notation is well-known and established since the beginning of the 21st century, moreover it has been standardized and there are more than 50 products, both commercial and open-source, providing the implementation of this standard [13]. The particular considerations with respect to modelling services by the business users are that there is a little awareness of the services by CIMlevel users, on the one hand, and even if there would be any knowledge about it, there are no direct constructs describing the services on the CIM-level in the BPMN notation anyway. Of course the upcoming BPMN 2.0 [14] standard includes the services modelling and the according constructs for it, but it only rules out the second, more technical problem, and not the first one – understanding. Table 6. Message “End” to Request Construct BPMN SoaML Message “End” Request Notation Mapping Rule 7: Process fragment (pattern) to Service Contract For the solution of this problem we propose a semiautomated approach in this section based on a model-to-model (M2M) transformation from CIM-level BPMN models to PIMlevel SoaML-based models. Those models on the higher abstraction level in BPMN would be analysed through a set of mapping rules and would result in a service model representing according constructs and architectures needed for the comprehensive PIM-level model as a basis for the further transformation to the PSM-level. The further section content comprises the manufacturing example and the mapping rules identified and needed for the services mapping from CIM- to PIM-level models. In addition there are technical details of the transformation presented for the BPMN to SoaML mapping set giving a short insight into the serialisation of the models during transformation. There is no single construct in BPMN that resembles a service contract. You need to analyze the BPMN processes and identify process fragments that can be mapped to service contracts. A service contract defines a service specification that defines the roles each participant plays in the service, and the interfaces they implement to play that role in the service. We can however, define a pattern of BPMN constructs that can be mapped to a service contract. As an example of the technical solution we consider the mapping rule 7 for service contract (also see Figure 4). In the following we show how the pattern identified for the recognition of the service contract on the CIM-level is technically transformed into the corresponding PIM-level construct. We consider the specific function names in ATL transformation file out of scope and concentrate on the XML representation of the source and target models. Through the rule 7 eight objects of the BPMN model are being translated into six objects of SoaML model (see Table 8). The graphical representation of the SoaML input models is taken from the SoaML Editor developed in SHAPE project. Figure 4. Rule 7 Transformation Pattern The pattern (see Figure 4) describes a task sequence connected by a sequence flow, but the participants are represented through different lanes in the same pool. The two tasks that belong to a service contract also share a data object. Table 7 illustrates the mapping of the notation. 65 the domain user and especially the business analysts. From an architectural point of view the component has two interdependencies with other components for its output. The information, which is required for the creation of a CIM model, will be derived from the use cases by the domain users. The output of the CIM level editor can have two different forms depending on its purpose. On the one hand, a model on CIM level in BPMN notation can be used as the technical information description draft, giving a starting point for the transformation into BPEL for further execution of the resulting model or the enrichment with further technical information. On the other hand the output of the CIM level editor is the starting point for the CIM to PIM transformation. In this case the editor does not provide the models in BPMN notation, but transforms them into SoaML models. The conceptual and technical details of this transformation are described in the Section 3. The prototypes of this transformation are available on the SHAPE website (http://www.shape-project.eu/). Table 8. Transformation XML mapping Lane1 BPMN SoaML Property Lane 1, Dependency1 Lane2 Property Lane 2, Dependency2 Association1 - Association2 - SequenceFlow - Task1 - Task2 - DataObject <<ServiceContract>>, <<Collaboration>>, <<CollaborationUse>> For the technical realisation of the transformations following agreements are valid in the ATL transformation implementation • • • Figure 5 depicts the Saarstahl Manufacturing example modelled in the CIMFlex editor, which partly implements the BPMN notation. There is a pool named Manufacturing representing the cooperation between two counterparts of the process, namely Customer and Manufacturer represented by BPMN lanes. The starting event is followed by a BPMN task on the Customer side fulfilling the purpose of order preparation. As soon as the Order represented by a BPMN data object reaches the Manufacturer, it performs a purchase operation and leads the way to the receiving order by the Customer. The process ends with a BPMN end event. In the following we apply a set of mapping rules to illustrate the transformation from this BPMN model to SoaML. CollaborationUse is an element of a ServicesArchitecture. The Properties are elements of a ServicesArchitecture as well. Dependencies are assigned CollaborationUse as children. (One can see the hierarchy graphically in the SHAPE SoaML Editor). The directions in which Associations are showing are of no importance, they should only connect the two Tasks in the different Lanes with a DataObject. The objects possess hierarchy structure relations, in particular CollaborationUse containts a reference to the according ServiceContract, Properties a reference to the according Participant, Dependencies a reference to the according Properties. The transformed ServiceContract element according to the rule 7 can be seen in SHAPE SoaML Editor, which shows not only the structure of the transformed element and accompanying relations and properties but also the SoaML stereotypes applied during the transformation (see Figure 5). Figure 5. Hierarchy of objects in SHAPE SoaML editor 4. ILLUSTRATIVE EXAMPLE The CIMFlex editor is a tool developed in SHAPE project. The CIMFlex editor allows the user to create and refine a semi-formal model of a business process, an organisational structure, a data structure or business rules based on the input coming from the domain users. The editor is able to create, change and store these types of models in EPC or BPMN notation. As storage format XML files are generated. The target users of this component are Figure 6. Manufacturing process – input model After the transformation application the following model would emerge through the rules described before: 66 < < Part icipa nt > > Cu st om er service contracts. This is a business design choice which ultimately depends on the people involved and how they best understand the business operations. < < Part icipa nt > > Man ufa ctu re r < <Serv icesArc hit ec t ur e> > Manufac t ur ingArc hit ect ure The overall approach presented by SHAPE is how to model your processes starting on CIM, over PIM down to PSM yielding to some system which reflects the processes described on CIM level. For green field projects this ‘top-down approach’ might be a suitable approach. In the Saarstahl use case they benefited from improved practices for business and IT modelling to improve communication and synchronisation between business requirements and IT solutions. However, Saarstahl also noted that most companies have already an existing IT landscape and running systems modelling their processes. A reverse engineering or bottom-up approach should be investigated to cover this missing part. c ust omerPart : Cust omer c ust omerRole o rd erin g: Man u fa ct u rin g Co n t ra ct manufac t ur erRole ma n uf act u re rPar t : Ma n uf act ure r < <int er face> > Cust om er Int er face < <int er face> > Manufac t ur er Int er face 6. CONCLUSION AND FUTURE WORK In this paper we have presented an overview of the SoaML modelling language and its application for describing both a business and IT perspective on SOA. Furthermore, we have defined a set of model transformation rules that can be used to map BPMN models to SoaML models. The application of these mapping rules have been tested in industrial use cases in the SHAPE project with the objective of aligning business and IT models. The SHAPE technologies improved practices for business and IT modelling and improved communication between business requirements and IT solutions. < <Serv iceCont ract > > Manufac t ur ingCont r ac t c ust omerRole: Cust omerInt erface manufac t ur erRole: Manufac t ur erI nt erfac e As we can see, the lanes constructs from the BPMN notation example are translated into the participants constructs in SoaML (rule 4). At the same time a pattern identified by the rule 7 translates the interaction between Customer and Manufacturer into a service contract within the services architecture. One aspect of our guidelines that requires further work is to identify and describe additional patterns and guidelines for mapping to service contracts. In particular better support for multi-tier service contracts requires additional work. Furthermore, the mapping rules defined must also be updated and aligned with the ongoing BPMN 2.0 specification, which introduces some new process and service language constructs. 5. DISCUSSION 7. ACKNOWLEDGMENTS Figure 7. SoaML services architecture – output model There is an industrial interest in ensuring a good connection and mapping between business models as expressed in enterprise architectures and IT models as expressed in technical system architectures, which are commonly realised as service oriented architectures (SOAs). The gap between these models is not trivial to close and we believe this stems from the fact that this is not only a technical task, but also one that requires collaborations and decisions to be made by both business and system stakeholders. Obviously, modelling guidelines, mapping rules and software tools, as those developed in SHAPE, to model and execute semiautomated model transformations can be used in the alignment of business and IT models, in particular for simple one-to-one mappings. This research was co-funded by the European Union in the frame of the SHAPE FP7 project (ICT-2007-216408). The authors would like to express their acknowledgments to SHAPE colleagues. 8. REFERENCES [1] M. Stollberg (ed.), "SHAPE Project Whitepaper", SHAPE STREP, 9 June 2009. http://www.shape-project.eu/wpcontent/uploads/2008/01/shape_whitepaper.pdf [2] OMG, "Service oriented architecture Modeling Language (SoaML), FTF Beta 2", Object Management Group, OMG Document ptc/2009-12-09, December 2009. http://www.omg.org/spec/SoaML/1.0/Beta2/PDF/ However, for more complex mappings, as evident in the mapping to service contracts, it is more of a business and IT design choice. Although we have presented a pattern for identifying service contracts from analyzing BPMN processes, the choice of which tasks to include into a service contract is still not clear. This relates to the service choreography that defines the behaviour of the service contract. The issue is to include all tasks and all interactions that make up a suitable choreography. This choreography may include several interactions and passing of messages across two or more pools in the case of multi-tier [3] OMG, "Service oriented architecture Modeling Language (SoaML), FTF Beta 1", Object Management Group, OMG Document ptc/2009-04-01, April 2009. http://www.omg.org/spec/SoaML/1.0/Beta1/PDF/ [4] OMG, "Business Motivation Model, Version 1.0", Object Management Group (OMG), OMG Document formal/200808-02, August 2008. http://www.omg.org/spec/BMM/1.0/PDF/ 67 [10] ITU-TS, "Basic Reference Model of Open Distributed Processing - Part 4: Architectural Semantics", Rec.X904 (ISO/IEC 10746-4), 1995. [5] OMG, "Business Process Model and Notation (BPMN), Version 1.2", Object Management Group (OMG), OMG Document formal/2009-01-03, January 2009. http://www.omg.org/spec/BPMN/1.2/PDF [11] IEEE, "IEEE Std 1471-2000: IEEE Recommended Practice for Architectural Description of Software-Intensive Systems", IEEE, October 2000. [6] A.-W. Scheer and M. Nüttgens, "ARIS Architecture and Reference Models for Business Process Management", 2000. http://www.wiso.unihamburg.de/fileadmin/wiso_fs_wi/EPKCommunity/LNCS_Geschaeftsprozessarchitektur.pdf [12] INRIA & LINA, "ATLAS Transformation Language (ATL) Project Documentation". http://www.eclipse.org/gmt/am3/doc/ (last visited 2010). [7] ITU-TS, "Basic Reference Model of Open Distributed Processing - Part 1: Overview and guide to use the Reference Model", Rec.X901 (ISO/IEC 10746-1), 1995. [13] BPMI, "Current Implementations of BPMN", Business Process Management Inititative (BPMI). http://bpmn.org/BPMN_Supporters.htm (last visited 2010). [8] ITU-TS, "Basic Reference Model of Open Distributed Processing - Part 2: Descriptive model", Rec.X902 (ISO/IEC 10746-2), 1995. [14] OMG, "Business Process Model and Notation (BPMN), FTF Beta 2 for Version 2.0", Object Management Group (OMG), OMG Document dtc/2010-05-03, May 2010. http://www.omg.org/spec/BPMN/2.0/Beta2/PDF [9] ITU-TS, "Basic Reference Model of Open Distributed Processing - Part 3: Prescriptive model", Rec.X903 (ISO/IEC 10746-3), 1995. 68 Domain-specific Templates for Refinement Transformations Lucia Kapova⋆, Thomas Goldschmidt†, Jens Happe‡, Ralf H. Reussner⋆,† ⋆ Karlsruhe Institute of Technology (KIT), 76131 Karlsruhe, Germany Email: {kapova, reussner}@ipd.uka.de † FZI Research Center for Information Technology, 76131 Karlsruhe, Germany Email: goldschmidt@fzi.de ‡ SAP Research CEC, 76131 Karlsruhe, Germany Email: jens.happe@sap.com ABSTRACT Keywords Model transformations are a major instrument of modeldriven software development. Especially in declarative transformation approaches, the structuring of transformations depends to a large extent on the structure of the source models and the generated artefacts. In many cases, similar code is written for transformations that deal with the same source or target metamodel. Writing such transformations can be simplified significantly if re-occurring parts within the transformation rules can be specified in a reusable way. Current approaches to transformation development include means for transformation reuse as well as inheritance. However, modularisation along the boundaries of different parts of domain metamodels is still lacking. Furthermore, the possibilities to reuse transformation fragments that re-occur in multiple transformations is limited. In this paper, we introduce domain-specific templates for refinement transformations with well-defined variation points. Transformation templates are based on known design patterns and enable a modular specification of refinement transformations and thus yield a simpler definition of transformations that can be grasped more easily and developed more efficiently. In addition, we present a real-world case study of transformation templates in the context of component based software architectures. The case study gives insight into the application of the presented approach. Software Architecture, Refinement Transformations, Templates, Higher-Order Transformations 1. INTRODUCTION The OMG’s Model Driven Development (MDD) enables developers to design and implement software systems at a high level of abstraction. Routine work is delegated to tools as far as possible in order to increase efficiency in software development. Model transformations are the major instrument of model-driven software development for this purpose. They are heavily used in the development and refinement of models. The target model of a transformation is a refinement of the source model, if the transformation preserves large parts of the source model and adds additional information. Such transformations are called refinement transformations [1]. Transformations are mainly determined by the source- and target-domains on which they operate. Especially in declarative transformation approaches, the structuring of transformations depends to a large extent on the structure of the source models and the generated artefacts. In many cases, similar code is written for transformations that deal with the same source or target metamodel. The re-occurrence of domain-specific patterns for the creation of the refined model leads to large parts of duplicated transformation code. This holds especially for refinement transformations that mostly operate on a single metamodel. Refinement transformations often require an annotation phase [1, 2] in which software engineers attach information to individual elements of a model. The annotations specify which elements are to be refined by the subsequent transformation. Such annotations and the underlying model are then transformed into a refined model [1]. Writing such refinement transformations can be simplified significantly if re-occurring parts within the transformation rules can be specified in a reusable way. However, there is little experience available about how to design and implement refinement transformations using modern relational transformation languages. One reason for this is the fact that model transformations are written in model transformation languages of very recent date (e.g. QVT Version 1.0 was published in 2008) [3]. Therefore, a basis of formalised knowledge and experience with model transformation development is not yet available at a broad Categories and Subject Descriptors D.2.12 [Software Engineering]: Interoperability; I.6.5 [Simulation and Modeling]: Model Development Modeling methodologies General Terms Design, Performance Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10 ...$10.00. 69 basis. First initiatives for transformation design template specification focused on generic patterns [4] for model transformations. Although these patterns define a foundation, they do not exploit domain-specific knowledge of the transformation’s source and target models. For example, they do not make use of design patterns that are often part of software models. This work is a reflection of our experience with the implementation of model transformations used for customized software development. We build on previous work introduced in [5] where we presented an approach to develop architectural refinement transformations based on model annotations. These annotations express configurations, that depend mostly on architectural design decisions. We found out that model refinements resulting from these configurations follow certain patterns. Similarly, as each software or its model is addressing a particular domain or combination of them, these refinements address the domain defined by a known metamodel. However, this refinements could come from different domains and be expressed conform to different metamodels. To support interoperability between different domains and requirements on refinements originating from different domains we have to deal with identification of common refinement actions that could be expressed as templates and mapped to the domain of origin. This way we increase the level of abstraction in our automated generation process and allow to build refinement transformations over an number of different domains. Towards this goal, we have to build an library of templates and propose an method supporting reuse of such transformation parts. In this paper, we address this issue and present an approach taking advantage of this possibility to reuse transformation parts and automate the development of refinement transformation even further than proposed in [5]. We present parametrised templates for the definition of domain-specific transformations. The templates increase the reusability of already formalised domain knowledge. Reusability and modularization are the main advantages of transformation templates. Once defined, transformation templates can be reused to create new, configurable transformations for their respective source and/or target models. The contributions of this paper are (I) an extension to the transformation generation process (see [5] for details) to support parametrised transformation templates allowing easy variation of transformations, (II) higher-order transformations for template instantiations and (III) domain-specific templates for architectural refinements. The remainder of this paper is structured as follows. An example-driven overview on the process for defining and integrating reconfigurable transformation templates is presented in Section 2. Section 3 introduces the template metamodel as well as the transformation generation process. Related work is discussed in Section 4. Section 5 concludes the paper and presents future work. Figure 1: Refinement Transformation Development process is to automatically generate a refinement transformation based on certain design decisions, expressed as configurations. These configurations could express architectural design decisions, such as the usage of the Message Oriented Middleware (MoM) for communication, the usage of thread pools, etc. In the following, we first introduce the basic concepts needed. Figure 1 provides a high-level overview of our approach. The process yields a Refinement Transformation for a particular configuration. The resulting Refinement Transformation is a composition of a Frame and Custom Rules. We use so-called copy transformations introduced in [6] as Frame. Copy transformations have been motivated by the lack of support for a higher-order operator to specify the copying of whole sub-models in QVT-Relations [3]. In [6], we used Higher-Order Transformation (HOT) to synthesise default copy transformations in QVT-Relations for a given metamodel. Consequently, the Frame is a transformation that copies unchanged parts of a model. In this paper, we go beyond the mere creation of a copy transformation and realise a transformation variation based on a configuration. In our process, the selection of Custom Rules for a composition depends on a configuration that allows software engineers to customise a refinement transformation. We use feature models [7] to capture variabilities of refinement transformations and specify valid combinations of features. An instance of a feature model represents a specific (feature) configuration, i.e., the actual configuration of a specific transformation. It is defined by selecting or deselecting certain elements of the feature model. The feature configuration determines the actually needed Custom Rules. In our approach Custom Rules are implemented in QVT-Relations. The insertion of Custom Rules according to a given feature configuration is again achieved by HOTs [5], composing Frame and Custom Rules. HOTs are transformations that themselves operate on transformations. They are mainly used to generate (or transform) transformation specifications. Providing HOTs for QVTRelations can be done elegantly based on its abstract syntax model. QVT-Relations with its precondition and postcondition dependency network between mapping relations can be understood quite well [8] when it comes to transformations that create the abstract syntax model. In this case, relations are used to generate the model of other relations. This way, the complex refinement relations are generated. In the following, we first give a running example for feature models and feature configurations and then further elaborate the concepts of transformation templates and their instantiation as Custom Rules. 2. REFINEMENT TRANSFORMATION DEVELOPMENT PROCESS In this section, we present an overview of the development process of refinement transformation using domain-specific templates. In general, the presented process is similar to processes in product line engineering. Both share the common goals of reusability and customisability. Our process is focused on reuse of process artefacts. The goal of the 70 Messaging Model Refinement Messaging Message Channel Generated Transformation Receiver Sender Message Sender Adapter Message Receiver Adapter IMessageReceiverAdapter IConsumerPoolRequirer Point-to-Point Channel Publish-Subscribe Channel Competing Consumers Transactional Client Selective Consumer IMessageSender Guaranteed Delivery IPoolManager Message Oriented Middleware Legend Exclusive OR Mandatory Feature Optional Feature Durable Subscriber Pool Size Transaction Size (a) Annotated Model Element Consumer Pool Manager (b) Result of the transformation Figure 2: Running Example: Feature Model Figure 4: MOM: Transformation Illustration Running Example:. To illustrate the application of the presented approach, we use a real-world case study of message-based systems introduced in [9] as a running example. In Figure 2, a feature model describes the possible configurations of the Messageoriented Middleware (MoM). The MoM Feature Model captures possible configurations for a Messaging system. The configuration includes the type of Messaging Channel as well as characteristics of the Sender and Receiver. For example, a Messaging Channel can be configured as a Point-to-Point Channel if only a single Receiver is needed. The Transaction Size is a property of the Sender and expresses the amount of data (N × Message Size) transferred in a transaction. Furthermore, the number of Competing Consumers at the Receiver’s side can be specified. The choice of either of these features results in a change of the architectural model. The complexity of these changes varies from setting a parameter, through structural changes, to globally changing the deployment of a whole system. In our case study, we consider a feature configuration with the selected features: Competing Consumers, Pool Size of 4, Transactional Client, TransactionSize of 100 messages, and Message Size of 1 kilobyte. transformation generation process by steps to automate the transformation fragments development. The main contribution of this work is illustrated by the bold framing in Figure 1. We synthesise a template library of domain-specific refinement patterns. This is done on the basis of the supported metamodel, which defines types of possible elements refining the model. Additionally, based on the domain knowledge we can identify more complex refinement patterns. Taking advantage of QVTs graphical syntax [3] we can easily represent parametrised templates (with variation points) graphically. The instances of these templates specify concrete Custom Rules. A deeper view in the process of transformation generation (c.f. Figure 6) shows the dependencies and connections between the concepts introduced above. The process depends on the specification of several inputs for Higher-Order Transformations. The first input is a Feature Model with attached Transformation Fragments (Custom Rules). These fragments are used by a Higher-Order Transformation for the actual refinement transformation generation. The second input is the actual Feature Configuration, which defines which features are selected as well as the values of feature attributes. In contrast to an in-place transformation, a refinement transformation may also be specified to create a new model where the refinements are applied. In this case, the refinement transformation extends a copy transformation (Frame). The Higher-Order Transformation includes the Transformation Fragments into the generated copy transformation. The result of the Higher-Order Transformation is a Refinement Transformation that when applied to an Architectural Model generates the corresponding Refined Architectural Model. The line Meta-Level Boundary separates the generation of the transformation and its application. Custom Rules:. In our approach, we define “transformation fragments” as additional information attached to feature models. These fragments are concrete implementations of Custom Rules and are composed to a transformation depending on the selection of features by a HOT. The transformation fragments are implemented in QVT Relational [3]. Basically, these Custom Rules are fragments of model-to-model transformations that are attached to the individual features of the feature model. A fragment of a transformation is activated if its feature is selected in the configuration. Based on the selected combination of features, a refinement transformation is generated. The explicit binding of fragments to features reduces the complexity of transformations and, thus, alleviates their development and evolution. Our previous work explains this concept in detail [5]. Running Example:. Figure 4 illustrates the structural changes of a model resulting from a specific feature model configuration. The goal of our approach is to generate a transformation that yields this model automatically. Therefore, each feature is annotated with transformation fragments, as illustrated by Figure 3. So far, we have introduced the general concept that can be used to generate refinement transformations for architectural models with respect to specific feature configurations. The goal of this work is to ease the transformation fragments (Custom Rules) development. This is achieved through instantiation of Transformation Templates from Template Library. In the following section, we extend the concept by configurable transformation templates that further facilitate the specification of transformation fragments. This will fill in the unexplained pieces of Figure 6. Running Example:. In our running example, the transformation fragments sketched in Figure 3 express the effects of a feature selection on an architectural model to which the refinements are to be applied. In this example simplified fragments include in the model middleware subsystem and transaction handling for transactions of the size 100. However, we identified that many of these transformation fragments follow certain patterns. We extend the refinement 71 Messaging top relation T_Client { varSize : Integer; checkonly domain in p : Component {}; Message domain out s:TP { Channel enforce size = varSize; }; when { TC(p,s); } where { Point-to-Point Publish-Subscribe varSize = 100;--default Channel } Channel } top relation Messaging { checkonly domain in p : Component {}; enforce domain out s:MoM { }; Receiver } Sender Competing Consumers Selective Consumer Guaranteed Delivery Transactional Client Legend Exclusive OR Mandatory Feature Optional Feature Durable Subscriber Pool Size TC_size.varSize = size; Transaction Size Figure 3: Running Example: Transformation Fragments 3. CONFIGURATION-AWARE TRANSFORMATION TEMPLATES Coupled Adaptor Lock Requirer Adaptor Message Sender Adapter The automated generation of refinement transformations presented in the previous section significantly reduces the effort needed to specify such transformations. However, the Custom Rules still tend to contain a large set of similar elements, especially for architectural models. Therefore, we propose transformation templates as an additional means to ease the specification of refinement transformations. Transformation templates are stored in a Template Library (cf. Figure 6). New Custom Rules can be specified instantiating and composing the existing Templates. Furthermore, templates are configurable by a set of parameter values. Based on the template and its configuration, the Template Instantiation Transformation (HOT) creates Template Instances and adds the necessary rules to the refinement transformation. Each template represents a configurable specification of a domain-specific pattern re-occurring in a transformation. Figure 5 illustrates the set of patterns we have identified so far for the running example. A Coupled Adaptor allows sender and receiver to use the same message-oriented middleware. This pattern could be used in the case of refinement by coupled actions, such as encryption and decryption, or composition and decomposition. The Lock Requirer is used when a component has to acquire a lock before accessing a certain service and release a lock when finished. Same refinement pattern could be observed in the case of dependent actions. In the example, this pattern is used for the Message Receiver Adaptor component to acquire locks through the IConsumerPoolRequirer interface. An Active Component pattern is used to model a component with a complex internal behavior. This pattern refines the model with an element introducing independent behavior branch. An additional wrapper is provided for the functionality defined as an internal action of the component behavior. To provide a queue for competing consumers the Lock Manager pattern is used in the ConsumerPoolManager component. This pattern could be used when introducing a state holding element to the model. The Controller pattern is applied to the Clock component to provide a wrapper for simple monitor functionality. The last pattern introduces a new functionality Message Receiver Adapter IMessageReceiverAdapter IConsumerPoolRequirer Active Component Lock Manager IMessageSender IPoolManager Message Oriented Middleware Consumer Pool Manager (a) Pattern Illustration based on CaseStudy Controller Delegator Clock Delegator (b) Additional Patterns Figure 5: Templates introduction into the refined model and could be independently required by already existing model elements. In the following section, we describe the adaptor pattern, as a representative, in more detail. For the description of transformation patterns, we use a standard description schema for patterns defined in [10] and [4]. This includes the following information: the name of the pattern, the goal of the change, the motivation for the pattern, the specification of the template using the QVT-Relations language, and an example for the pattern. 3.1 The Adaptor template In this section, we illustrate the concepts introduced above with the example of the adaptor pattern [10]. For the application within a refinement transformation further details concerning the specific metamodel are necessary. 3.1.1 Goal: Change the provided or required service interface. 3.1.2 Motivation: When new functionality is needed in an architecture (for 72 Higher-Order Transformation Chain Configuration instance of Feature Configuration Feature Model Template Library Copy Higher-Order Transformation input Transformation Fragments instance of T1 Frame extends Custom Rules TF1 T2 T1-Config1 TN TFN Ti-Configi Ti Template copy Instantiation Transformation conf Copy Transformation input input Higher-Order Transformation Refinements Integration Higher-Order Transformation output TF1 input TF2 Template Instances Ti-Instancei TFN output Meta-Level Boundary Legend model(of transform.) r/w access active transform. dependency Architectural Model Refinement Transformation Refined Architectural Model generated Figure 6: Transformation Process targetInterface in the target domain . example filtering) its configuration could result in a change of a service’s signature (input or return parameters). The change of the adapted interface is considered as a configurable change and allows developers to define changed attributes without the need to reimplement the whole transformation for the integration. 3.1.4 Example: An example of an Adaptor is shown in Figure 5. This Adaptor provides an interface to the message receiver and adapts its required interface to communicate with used messaging middleware. 3.1.3 Specification: The adaptor pattern is specified by a template that creates an Adaptor component which requires the interface provided by the adapted component and provides the interface required by the calling component. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 3.1.5 Applicability: The applicability of a pattern defines constraints for the usage of a template. For the Adaptor template such a constraint is defined by the requirement that a variation point should be of type interface. Additional examples illustrating the variation point approach for model transformation templates are given in Table 1. The variation point types map known element types for specification of component-based architectures (e.g. components, interfaces, signatures, resources, etc.). transformation CBSE Adaptor (source: CBSE, target : CBSE) { top relation Adaptor template CreateAdaptor { checkonly domain source sourceInterface:{ −−adapted interface <fromInterface:TemplateVariationPoint> }; checkonly domain source targetInterface:{ <toInterface :TemplateVariationPoint> }; enforce domain target adaptorComponent:{ name =<adaptorName:LiteralExpVariationPoint> −−name requiredRoles = reqRole:RequiredRole{ requiredInterface = sourceInterface } providedRoles = provRole:ProvidedRole{ −−modified interface providedInterface = targetInterface } serviceEffectSpecifications = −−behavior specification seff : ServiceEffectSpecification{ . . . } } }; } 3.2 Template Definition Metamodel To define a framework supporting the definition and configuration of transformation templates we need to describe them and their variations in a general way. This description is provided by means of a metamodel introduced in this section and illustrated by Figure 7. As a main element of the transformation templates metamodel we introduce the Template element. This element represents the concept of a transformation template in our terminology and defines a reconfigurable and reusable transformation fragment for the model transformation generation. The Description of a template contains a definition of the Goal of the template as well as a textual Motivation for the Template definition. Each Template defines the applicability, or usage scenarios, by specifying an OCL Constraint. To be able to apply a template in a certain context this constraint needs to evaluate to true. The Template element refers to a set of Relations from the QVT Relational metamodel. These relations form the basis of the template as they will be parametrised by Listing 1: Template Specification of the Adaptor pattern. Additionally, based on a designer defined method mapping, it requires or provides a modified interface to another component in the system. As illustrated in Listing 1, an adaptorComponent is created with the modified interface 73 Template Goal Variation Point Type Delegator Provides a wrapper for a required or provided interface and delegates additional information without adjusting the signature. Interface Coupled Adaptor/Delegator Adapts two interfaces allowing their communication. Or in a case of delegation to allow them to use communication connection together without changing their signatures. Interface Lock Requirer Provides an interface requiring a software resource (thread pool, queue or semaphore). Interface Lock Manager Models a component providing a passive software resource (thread pool, queue, semaphore). Passive Resource Active Component Provides a component with its own, independent control flow thread. Component Controller Adds a mutex to all method calls allowing only a single thread to access the component at one time. Internal action Table 1: Transformation Templates VariationPoints as defined below. Furthermore, the Template definition contains a set of VariationPoints. These variation points define possibilities for variations within the basic relations. A VariationPoint is defined by a reference to either a template expression (TemplateExp) or relation domain (RelationDomain). These points are defined by subclasses of VariationPoint named TemplateVariationPoint, DomainVariationPoint, and LiteralVariationPoint (for the specification of variable literals within a template). The association dependencies of the Template class expresses dependencies between transformation templates. Defined transformation templates depend on each other and therefore these constructs need access to results of required transformation templates. The binding of a template to an actual transformation fragment is done as soon as the template is referenced within an actual transformation fragment that is defined for a concrete feature model. The actual application of the transformation template is defined by the TemplateConfig. For each defined VariationPoint the template configuration includes VariationPointInstances which bind the VariationPoint to actual templates or relation domains specifications. VariationPointInstances can be assigned to multiple VariationPoints stemming from different transformation templates. This yields the possibility to combine transformation templates to build more complex model transformations. 3.3 Figure 7: The Templates Metamodel tion. Repository2Transformation creates a new transformation that will then contain the configured templates. Furthermore, AddTypedModels adds the model parameter of the transformation to the transformation as they were specified in the template repository. Each used and configured template is then added to the newly generated transformation by the IntegrateRelations relation. All other template relations that were copied from the template repository by the copy transformation will be ignored. Further parts of the HOT are responsible for binding the variation points of the templates to the elements from the actual template configuration. Listing 3 shows the necessary relations for binding a TemplateVariationPoint. An extension to the generated QVT-R copy transformation is made by overriding the generated copy relations for those elements that may be variation points in the templates. In the example above this would be all copy relations that inherit from TemplateExp. Listing 4 shows how this is done for the ObjectTemplateExp. This extension will cause the copy transformation to omit all TemplateExp that are variation points during the copy process. For each binding that is configured in the template configuration Template Instantiation The instantiation process presented in Listing 2 is realized using a HOT. It merges the transformation using the templates and creates a transformation based on the actual configuration given by the template configuration model. The first step of the TemplateInstantiation is the creation of a copy of the relations that were specified within the template. Therefore we use a generated copy transformation for the QVT-Relations metamodel1 . The TemplateInstantiation transformation extends this copy transforma1 The Mark_QVTRelation_Relation relation that is used here is a part of this generated transformation. Using this, it is possible to retrieve the copied instance of a given original relation. For each class in the corresponding metamodel such a relation exists. 74 the BindTemplateVariationPoint relation in Listing 3 will call the Mark_QVTTemplate_TemplateExp relation. 1 2 3 13 15 16 17 18 top relation Library2Transformation{ n:String; 6 19 20 7 9 10 22 23 25 enforce domain target t : QVTRelation: : RelationalTransformation { name = n + ’ templateInstantiation ’ }; 12 13 14 15 26 27 28 30 where { MarkTargetTransformation(t);} 17 31 } 18 32 19 21 22 23 Due to the functionality of the copy transformation this will cause the copy relations to treat the substituted template as the copy of the original and will assign it to all points in the template’s copy where the original template was used. See [6] for a detailed description of the copy transformation and how it works. top relation AddTypedModels { checkonly domain source templateRep: templateRepository { modelParameter =mm: QVTBase: :TypedModel {} }; 26 27 28 29 30 enforce domain target t : QVTRelation: : RelationalTransformation { modelParameter = mmCopy: QVTBase: :TypedModel {} }; 31 32 33 34 1 2 35 3 when { Repository2Transformation(templateRep, t ); Mark QVTBase TypedModel(mm, mmCopy);} 36 37 6 7 top relation IntegrateRelations { n:EString; 40 41 8 9 42 10 checkonly domain source templateConfig: templateDefinition : :templateConfig{ instanceOf = template:templateDefinition : : template { name = n, templateRelations = templateRel : QVTRelation: :Relation {}} }; 43 44 45 46 47 48 49 50 12 13 15 16 when { not(sourceObjectTemplateExp=variationTemplate); } 17 18 19 53 54 55 56 20 where { Mark QVTTemplate ObjectTemplateExp( sourceObjectTemplateExp, targetObjectTemplateExp);} 21 22 23 24 57 } 25 when { MarkTargetTransformation(t ); Mark QVTRelation Relation(templateRel,targetRelation); } 58 59 60 61 63 [...] Listing 4: Overriding Copy Rules. } [...] 62 3.4 } Examples of Transformation Templates 3.4.1 The Delegator Template Listing 2: Higher-order transformation for instantiating templates. Goal. Provide a wrapper for a required or provided interface and delegate its functionality based on the unchanged signature. top relation BindTemplateVariationPoint { n:EString; variationPointBindings:OrderedSet(VariationPoint); 4 9 enforce domain target targetObjectTemplateExp: QVTTemplate: :ObjectTemplateExp{}; 14 enforce domain target targetRelation: QVTRelation: :Relation { name = n + ’ template ’ + templateRel.name , transformation = t : QVTBase: :Transformation{} }; 52 10 checkonly domain source sourceObjectTemplateExp: QVTTemplate: :ObjectTemplateExp{}; 11 51 8 checkonly domain source variationPoint: templateDefinition : :TemplateVariationPoint{ template = variationTemplate : QVTTemplate: :TemplateExp {} }; 5 39 7 −−Override the Generated Copy Rule: top relation Copy QVTTemplate ObjectTemplateExp overrides Copy QVTTemplate ObjectTemplateExp{ 4 } 38 [...] Listing 3: Binding of template variation points. 24 25 } 33 relation MarkTargetTransformation { checkonly domain target t :QVTRelation: : RelationalTransformation{}; } 20 6 where { Mark QVTTemplate TemplateExp(variationTemplate, targetTemplate);} 29 16 5 when { Mark QVTTemplate TemplateExp(instanceTemplate, targetTemplate); variationPointBindings−>includes(variationPoint);} 24 11 3 enforce domain target targetTemplate: QVTTemplate: :TemplateExp {}; 21 checkonly domain source templateLib:templateLibrary{ domain = n }; 8 2 checkonly domain config variationPointInstance : templateDefinition : :TemplateVariationPointInstance{ bindsTo = variationPointBindings, template = instanceTemplate : QVTTemplate: :TemplateExp {} }; 14 transformation templateInstantiation(source:templateDefinition, config :templateDefinition, target :QVTRelation) extends CopyQVTRelation{ 5 1 }; 12 4 64 QVTTemplate: :TemplateExp {} 11 Motivation. A delegator can be used for example, when for each request a semaphore lock should be asked to allow access the semaphore provider service before allowing the request to reach the interface. checkonly domain source variationPoint : templateDefinition : :TemplateVariationPoint{ name = n, relationTemplate = relationTemplate : QVTRelation: :Relation {}, template = variationTemplate : 75 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 3.4.3 The Lock Requirer template transformation CBSE Delegator (source: CBSE, target : CBSE) { top relation Delegator template CreateDelegator { checkonly domain source delegatedInterface:{ }; enforce domain target delegatorComponent:{ name =<delegatorName:LiteralExpVariationPoint> requiredRoles = reqRole:RequiredRole{ requiredInterface = delegatedInterface } providedRoles = provRole:ProvidedRole{ providedInterface = delegatedInterface } serviceEffectSpecifications = seff : ServiceEffectSpecification{...} } }; } Goal. To provide an interface requiring a software resource (thread pool, queue or semaphore). Motivation. When component has to acquire a lock before accessing a certain service and release a lock when finished. Specification. This template is specified by a relation that extends in a model already existing component with an interface requiring an external service providing acquire() and release() on a lock resource held be the called component. This specification implies an existence of an Lock Manager in a system. Listing 5: Template Specification of the Delegator template. Specification. This template is specified by a relation that creates a Delegator component that requires or provides a delegated interface to other components in the system. Additionally, a Delegator could request services from other components. This template could be used to generate the initial structures for this. 1 2 3 4 5 6 7 8 9 10 Example. The example of a Delegator is shown in figure 5 as an additional template. This Delegator provides interfaces to the message receiver with the same interface. 11 12 13 14 15 16 17 Applicability. For the Delegator template it is required that a variation point is not of type interface. transformation CBSE LockRequirer (source: CBSE, target : CBSE){ top relation LockRequirer template CreateLockRequirer { checkonly domain source synchronizedInterface:{ }; enforce domain target lockRequirerComponent:{ name =<lockRequirerName:LiteralExpVariationPoint> requiredRoles = reqRole:RequiredRole{ requiredInterface = synchronizedInterface, requiredInterface = <lockName:TemplateVariationPoint> } providedRoles = provRole:ProvidedRole{ providedInterface = synchronizedInterface } serviceEffectSpecifications = seff : ServiceEffectSpecification{ . . . } } }; } Listing 6: Template Specification of Lock Requirer template. 3.4.2 The Coupled Adaptor/Delegator template Example. The example of a Lock Requirer is shown in figure 5 and illustrated by extention to Message Receiver Adaptor Component with an IConsumerPoolRequirer interface. Goal. To adapt two interfaces and to allow their communication. Or, in a case of delegation, to allow them to use communication connection together without changing their provided functionality. Applicability. For the Lock Requirer template is required that variation point should be of type LockManagerReference. Motivation. When it is needed to build a connector between two communicating components or to build a chain of delegators to access certain external functionality in a certain state of message delivery. 3.4.4 The Active Component template Goal. To provide a wrapper for a functionality defined as internal action of a component behavior. Specification. This template is specified by a relation that creates two Delegator or Adaptor components that mirror their adapted or delegated interface. Motivation. When it is needed to model a component with a complex internal behavior. Example. The example of a Coupled Adaptor is shown in figure 5. This construct allows sender and receiver to use the same message-oriented middleware. Specification. This template is specified by a relation that creates an Active component that requires or provides a delegated interface to the another components, depending on a developer specification. In case of this template is the template only a frame for implementation, it is the most complex template with no restrictions on VariatonPoints. Applicability. For the Adaptor /Delegator template is required that variation point should/shouldn’t be of type interface. 76 Specification. This template is specified by a relation that creates a Controller component that requires or provides a delegated interface to the another component. This component has only a simple internal action defined and is creating processing delay through computation. Example. The example of a Active component is shown in figure 5 and illustrated by Message-oriented Midleware. Applicability. There are no restrictions for this template. Consequently this template requires higher user interaction to implement. Example. The example of a Controller is shown in figure 5 as an additional template. This Controller provides a clock for a connector. 3.4.5 The Lock Manager template Goal. To model a component providing a passive software resource (thread pool, queue, semaphore). Applicability. For the Controller template is required that variation point should be of type internal action. Motivation. When a synchronization mechanism based on a lock strategy is used in a system. 4. RELATED WORK In the domain of model transformation languages, transformation composition, transformation generation, and template definition for model transformations are relatively new. Specification. This template is specified by a relation that creates LockManager component that provides an interface with two signatures acquire() and release() on its internal passive resource. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Transformation Composition and Configuration:. Dealing with composition of transformations we are heading towards complex problems, that are in the focus of many currently running research initiatives. One of them is [11] which proposes a superimposition composition technique for ATL and QVT Relations. Other works [12] and [13] investigate possibilities of composing complex transformations from atomic transformation definitions. Our approach is different to these composition methods because it is based on a predefined structure (i.e. the feature model) that guides the transformation developer. Furthermore, our focus is on metamodel-specific transformation generation and not generic composition techniques. Therefore, many problems that arise when trying to compose arbitrary atomic transformation parts are avoided. transformation CBSE LockManager (source: CBSE, target : CBSE) { top relation LockManager template CreateLockManager { checkonly domain source appRepository:{ }; enforce domain target lockManagerComponent:{ name =<lockManagerName:LiteralExpVariationPoint> requiredRoles = reqRole:RequiredRole{} providedRoles = provRole:ProvidedRole{ providedInterface = lockInterface } serviceEffectSpecifications = acquireLock seff : ServiceEffectSpecification{ . . . } serviceEffectSpecifications = releaseLock seff : ServiceEffectSpecification{ . . . } passiveResource = lock{ <lock :TemplateVariationPoint> } }; } Transformation Generation Approaches:. The automated framework DUALLY [14] aims to answer the issues concerning tools and languages interoperability. This approach introduces the concept of transformation generation with the purpose of translating model specifications from one language to another. The transformation generation is based on a mapping of elements between these languages. In its current state it is not able to generate refinement transformations. Listing 7: Template Specification of Lock Manager template. Example. The example of a LockManager is shown in figure 5. This ConsumerPoolManager provides a queue for competing consumers. Applicability. For the LockManager template is required that the variation point is of type passiveResource. Transformation templates or patterns:. Iacob et al. [4] introduced an initial set of design patterns for transformation specification. We extend this set with transformation refinement templates that are specific to the handling of design decisions based on specific metamodels. Agrawal et al. [15] introduce a graph transformation language named GREAT and a set of templates for graph transformations. Additionally, work on templates defining transformation problems was published by [16]. 3.4.6 The Controller template Goal. To provide a wrapper for simple monitor functionality. Motivation. When it is needed to model a component that only gains and stores data, or provides some timing control. For example a clock component required by a connector or accessing middleware, providing a control interface externally to set a clock and providing an interface internally for other components in assembly to ask a clock. 5. CONCLUSION In this paper, we described how writing of refinement transformations can be simplified based on transformation reconfiguration and generation. The configuration process is based on explicitly defined feature models that also bear 77 the necessary transformation fragments to give their contribution to the refined architecture model. These transformation fragments are instances of pre-defined templates. Additionally, we have presented an approach to specify reusable transformation templates that occur in transformation development for specific metamodels (domain). Based on these templates refinement transformations can then be generated using HOTs. [9] [10] Limitations:. [11] Despite the advantages in simplifying the configuration of transformations with our feature model based approach, there are also some limitations that need to be discussed. One particular drawback of our approach is the debuggability of the transformation. The debugger of the transformation engine will execute and observe only the generated and woven transformation. Hence, a transformation developer will need to understand the generated transformation in order to be able to debug it. A specialised debugger would be needed if debugging should be possible on the configuration level. [12] [13] [14] Future work:. The presented work is a part of continuous research on the automatic transformation composition and generation. Further templates will be identified and a library of known transformation templates will be extended in the future. Although the presented case study demonstrates the benefits of the approach, we plan a quantitative evaluation and efficiency study (e.g. as a controlled experiment). Future work will deal with the automatic derivation of templates from example models as it was proposed in [17]. This would greatly ease the development of templates as the manual extraction from an instance model to the transformation can be shortened significantly. Additionally, further development of the retainment policies introduced in [18] to support migration of manual changes on the models is needed. [15] [16] [17] [18] 6. REFERENCES [1] M. Girschick, T. Kühne, and F. Klar, “Generating systems from multiple levels of abstraction,” in Conference on Trends in Enterprise Application Architecture, 2006. [2] M. Moriconi, X. Qian, and R. A. Riemenschneider, “Correct architecture refinement,” IEEE Trans. Softw. Eng., vol. 21, no. 4, 1995. [3] MOF 2.0 Query/View/Transformation, version 1.0, 2008. [4] M.-E. Iacob, M. Steen, and L. Heerink, “Reusable model transformation patterns,” Enschede: Freeband, 2008. [5] L. Kapova and T. Goldschmidt, “Automated feature model-based generation of refinement transformations,” in EUROMICRO (SEAA). IEEE, 2009. [6] T. Goldschmidt and G. Wachsmuth, “Refinement transformation support for QVT Relational transformations,” in MDSE, 2008. [7] K. Czarnecki and U. W. Eisenecker, Generative Programming - Methods, Tools and Applications. Addison-Wesley, 2000. [8] L. Kapova, T. Goldschmidt, S. Becker, and J. Henss, 78 “Evaluating Maintainability with Code Metrics for Model-to-Model Transformations,” in QoSA. J. Happe, H. Friedrich, S. Becker, and R. H. Reussner, “A Pattern-Based Performance Completion for Message-Oriented Middleware,” in WOSP. ACM, 2008. E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software, 1995. D. Wagelaar, “Composition techniques for rule-based model transformation languages,” in Conference on Model Transformation - Theory and Practice of Model Transformations, 2008. J. Oldevik, “Transformation composition modelling framework,” in Distributed Applications and Interoperable Systems, 2005. R. Marvie, “A transformation composition framework for model driven engineering,” LIFL Ű IRCICA University of Lille, Tech. Rep., 2004. I. Malavolta, H. Muccini, and P. Pelliccione, “Dually: A framework for architectural languages and tools interoperability,” Conference on Automated Software Engineering (ASE), 2008. A. Agrawal, A. Vizhanyo, Z. Kalmar, F. Shi, A. Narayanan, and G. Karsai, “G: Reusable idioms and patterns in graph transformation languages,” in GraBaTs. Elsevier, 2004. E. D. Willink and P. J. Harris, “The side transformation pattern: Making transforms modular and re-usable,” ENTCS, 2005. D. Varró, “Model transformation by example,” in 9th International Conference on Model Driven Engineering Languages and Systems (MODELS), 2006. T. Goldschmidt and A. Uhl, “Retainment Rules for Model Transformations,” in Workshop on Model Co-Evolution and Consistency Management, 2008. Advanced Modelling Made Simple with the Gmodel Metalanguage Jorn Bettin Tony Clark Sofismo Lenzburg, Switzerland School of Engineering and Information Sciences University of Middlesex, London, UK http://www.eis.mdx.ac.uk/staffpages/tonyclark/ t.n.clark@mdx.ac.uk http://www.sofismo.ch/ jbe@sofismo.ch ABSTRACT The development of Gmodel relates to the second objective of the KISS initiative [3], and builds on the KISS results that have been achieved in 2009 [4]. In particular, Gmodel represents an attempt to provide explicit tool support for the full set of KISS principles: Gmodel is a metalanguage that has been designed from the ground up to enable specification and instantiation of modelling languages. Although a number of metalanguages can be used for this purpose, most provide no or only limited support for modular specifications of sets of complementary modelling languages. Gmodel addresses modularity and extensibility as primary concerns, and is based on a small number of language elements that have their origin in model theory and denotational semantics. This article illustrates Gmodel’s capabilities in the area of model-driven integration by showing that the Eclipse Modeling Framework Ecore language can easily be emulated. Gmodel offers support for unlimited multi-level instantiation in the simplest possible way, and any metalanguage emulated in Gmodel can optionally be equipped with this functionality. 1. The DSL must be meaningful to users 2. The DSL should be cognitively efficient 3. The DSL should have multiple notations where necessary 4. DSLs should offer mechanisms for modularising and integrating models 5. The DSL should be supported by appropriate tooling Categories and Subject Descriptors D.2.2 [Software Engineering]: Design Tools and Techniques; D.2.12 [Software Engineering]: Interoperability 6. There must be an economic imperative for the development of a DSL General Terms 7. The DSL must not be polluted with implementation features Binding times, Denotational semantics, Domain analysis, Graphs, Instantiation semantics, Metamodels, Model-driven integration, Model theory, Modularity, Multi-level modelling, Scope management, Value chain modelling 8. Model processing must always be based on a formal DSL definition 9. DSLs should be kept small through modularisation and integration 1. INTRODUCTION In order to increase awareness about the role that domainspecific modelling languages can play in capturing, preserving, and exploiting knowledge in virtually all industries, it is necessary to: Since the design of Gmodel rests on mathematical concepts from model theory and from the theory of denotational semantics, Gmodel can tap into established mathematical terminology, and the target audience for Gmodel includes modellers in all disciplines. Consistent with denotational semantics and with the third KISS principle, Gmodel completely separates the concern of representation from the concern of naming. This means that in contrast to most programming language specifications, the specification of Gmodel does not include a text-based concrete syntax. 1. Reach a consensus on fundamental values and principles for designing and using domain-specific languages 2. Progress towards interoperability between tools – KISS Initiative, 2009 [4] The authors of Gmodel believe that modelling has the greatest value when performed by domain experts, and if modelling language design takes into account established domain notations. The challenge consists in providing a metalanguage that enables the most experienced domain experts to define the notation for modelling in their field, whilst at the same time providing tool support for enforcing (and ideally Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10...$10.00. 79 guaranteeing) the adherence to KISS modelling language design principles. Beyond these two definitions, model theory defines the following concepts that every modelling language designer should be familiar with: substructure, term, formula, variable, language, cardinality, sentence, theory, model [7]. This paper starts with a brief introduction of relevant terminology, and then presents the Gmodel metalanguage in the context of model-driven interoperability based on practical examples: This terminology and the associated mathematical theory have heavily influenced the design of Gmodel. 1. Introduction of Gmodel kernel concepts 2.2 2. Outline of Gmodel’s contribution to model-driven interoperability The second source of influence on Gmodel is denotational semantics, in particular the concepts of semantic domain and semantic identity [11]. 3. Representation of the Eclipse Modeling Framework Ecore language in Gmodel, and outline of the design of a bidirectional bridge between the two technologies 4. Description of advanced modelling techniques for scope management, modularisation, and interoperability 5. Comparison of Gmodel with other technologies 6. Conclusions Denotational Semantics One advantage of using established mathematical terminology to describe Gmodel is a low risk of terminological confusion with concepts from the Meta Object Facility (MOF), a popular metalanguage that is steeped in object orientation, and with concepts from related implementations such as the Eclipse Modeling Framework Ecore language. This benefit immediately becomes apparent when discussing the representation of Ecore in Gmodel. A second advantage of using the above terminology is the ability to reason about Gmodel in mathematical terms, without the need for any linguistic gymnastics. 2. TERMINOLOGY Modellers are not in the business of inventing new terminology, they are in the business of identifying concepts and links between concepts that are useful for a particular community of people – usually scientists or professionals in a particular field. This approach to modelling is consistent with the Oxford American dictionary definition of modelling: 2.3 In addition to mathematics, Gmodel terminology draws on concepts that have shaped the development of natural language, and the way in which humans perform work and exchange artefacts – including abstract ideas. In relation to the latter, and in accordance with the second KISS principle, the design of Gmodel takes into account human cognitive abilities and limitations [10]. to model devise a representation, especially a mathematical one of (a phenomenon or system) 2.1 Natural Language and Exchange of Artefacts Model Theory language artefact A container of information that: The mathematical definitions from model theory make use of the concepts of sets and graphs, and provide a mathematical basis for reasoning about models: 1. is created by a specific actor (human or a system) 2. is consumed by at least one actor (human or system) structure A structure A is a set that contains the following four sets 3. represents a natural unit of work (for the creating and consuming actors) 1. A set called the domain of A, written as dom(A) 2. A set of elements of A called constant elements, each of which is named by one or more constants 4. may contain links to other language artefacts 5. has a state and a life-cycle 3. For each positive integer n, a set of n-ary relations on dom(A), each of which is named by one or more n-ary relation symbols model artefact A language artefact that meets the following criteria: 4. For each positive integer n, a set of n-ary operations on dom(A), each of which is named by one or more n-ary function symbols 1. It is created with the help of a software program that enforces specific instantiation semantics (quality related constraints) signature The signature of a structure A is specified by giving the set of constants of A, and for each separate n > 0, the set of n-ary relation symbols and the set of n-ary function symbols – The symbol L is used to represent signatures and languages; if A has a signature L, A is also called an L-structure. 2. The information contained in a model artefact can be easily processed by software programs (in particular transformation languages) 3. Referential integrity between model artefacts is preserved at all times with the help of a software pro- 80 gram (otherwise the necessary level of completeness and consistency is neither adequate for automated processing nor for domain experts making business decisions based on artefact content) edge traces Directed links from one edge to another edge The Gmodel vertex and all four types of links are encoded as sub sets of graph. In order to serve as a metalanguage, edge ends are decorated with variables for minimum cardinality and maximum cardinality, as well as variables that represent the direction of navigability of edges and a notion of containment relating to the connected set. 4. No circular links between model artefacts are allowed at any time (a prerequisite for true modularity and maintainability of artefacts) 5. The life-cycle of a model artefact is described in a state machine (allowing artefact completeness and quality assurance steps to be incorporated into the artefact definition) 3.1 instance A set that seems to contain one and only one element at any given point in time from the view point of a specific actor instantiation function A function that returns an instance – sometimes instantiation functions are also called concretisation functions visibility Visibilities are links between model artefacts that set the architectural context for artefact producers by declaring the model artefacts that can be used as inputs for the creation of specific kinds of model artefacts producer An actor that creates language artefacts consumer An actor that consumes language artefacts Instantiation A modeller may use the instantiation function of Gmodel kernel to create representations of vertices and links. Since vertices are encoded as a sub set of graph – and hence enable the representation of nested abstractions, vertices are well positioned to serve as the unit of modularity in Gmodel. Using the terminology introduced above, vertices play the role of model artefacts, and in the context of Gmodel (modelling), are simply referred to as artefacts. Links between artefacts are also encoded as a sub set of graph, and therefore are also capable of representing nested abstractions by containing sets of vertices and sets of links. Links between two artefacts are always contained in the artefact that contains the first of the two artefacts connected by the link, which is one of the constraints that allows Gmodel to fulfil the fourth criteria of the definition of model artefact – effectively enforcing much stronger rules regarding modularity than the minimum expectations set by KISS principles (4) and (9). The most powerful feature of Gmodel instantiation is the ability to decorate any Gmodel artefact with instantiation semantics (or concretisation semantics) relating to representations of less abstract (or more concrete) sets, such that the artefact becomes instantiable. The instantiation semantics available in the Gmodel kernel are specified via the variables for cardinalities, navigability, and containment that are part of all edge ends of Gmodel edges. Thus, on the one hand, by excluding any circular links between artefacts, Gmodel imposes heavy constrains on the models that can be created, but on the other hand, Gmodel allows an unlimited degree of freedom with respect to the number of instantiation levels. value chain A value chain consists of actors (systems and humans) that consume artefacts as inputs and produce derived artefacts 3. THE GMODEL KERNEL The Gmodel kernel [figure 1] is a semantic domain consisting of a set of semantic identities that reify the concepts of ordered pair, ordered set, and graph – the latter consisting of a set of vertices and a set of edges. To facilitate extensibility and multi-level instantiation, the encoding of the Gmodel kernel is entirely expressed in Gmodel semantic identities, and each semantic identity in the kernel is defined as an instance of itself, and as a sub set of the next simpler semantic identity in the kernel. Gmodel does not mandate a layered metamodel architecture. Our modelling experience in software intensive industries has taught us that the model pattern known as the power type pattern in object orientation occurs pervasively in highly configurable systems. The power type pattern is a technical kludge that forces the fragmentation of semantic identities, and it clearly demonstrates the limits of the object oriented paradigm – which is currently still treated as dogma by many software engineers. By allowing multi-level instantiation, the need for the power type pattern is eliminated, and the fragmentation of semantic identities can be avoided. The generic term to refer to any semantic identity that is expressed in Gmodel is the set. The simplest semantic identity is the ordered pair. Ordered pairs are used to define ordered sets and graphs. Much of the power and simplicity of Gmodel has its origin in the specific encoding chosen for graphs. Instead of consisting of a set of vertices and a set of edges, a Gmodel graph is encoded as a set of vertices and several complementary ordered sets of links: edges Links between two sets with a dedicated edge end for each connected set 3.2 Surface notation The name Gmodel is motivated by the graph concept, and all notations for visualising graphs are good candidates for a concrete syntax for Gmodel artefacts. In contrast, purely text-based representations are only practical for representing Gmodel artefacts with certain characteristics, such as super set references Directed links from a sub set to a super set visibilities Directed links from one sub graph to another sub graph 81 Figure 1: The Gmodel kernel within a typical usage context artefacts with a low ratio of edges to vertices. Additionally, given that the main target audience for Gmodel consists of modellers in general – as opposed to software engineers with a strong preference for working with formal text-based languages, there is no urgent need for developing a human readable purely text-based syntax. The Gmodel open source project has no intention of reinventing XML or burdening the world with yet another XMLbased but not-quite-human-readable syntax. The Gmodel API can easily be used to build graphical editors for Gmodel artefacts that are complemented with appropriate form-based representations of variables and their values. At this point in time Gmodel provides two complementary graphical surface notations for visualising model artefacts. 1. Modularity – The implementation of the artefact concept prevents users from constructing circular dependencies between modules. In contrast to other technologies, Gmodel allows the modelling of links between primary model artefacts and derived artefacts [figure 6], which amounts to an in-built infrastructure for orchestrating model transformations. Model artefact storage Gmodel internally uses a serialisation format that is not intended for human consumption, and it provides a binding of this serialisation format to relational database technologies. In particular Gmodel fulfils criteria (3) of the definition of model artefacts, and provides explicit support for the semantics of unknown and the semantics of not applicable. As needed, the serialisation format can be bound to alternative persistence mechanisms such as file systems, object databases or cloud database technologies. 2. Simplicity – Since the Gmodel kernel treats links between concepts as first-class constructs, all kinds of graph structures (including undirected graphs) can be represented without compromise, and in the preferred terminology of the user. As a result, representations of modelling languages within Gmodel tend to be highly compact [figure 6], and the complexity of any required model transformations is reduced accordingly. 3. Multi-level-modelling – Gmodel is not limited to the four layered metamodel architecture. This opens up new approaches with respect to interoperability [5] [2], since types – and therefore interoperability patterns, can be encoded in Gmodel to any level of complexity. As illustrated in this article, multi-level instantiation is a prerequisite for emulating “foreign” modelling technologies. We are not aware of any other multi-level modelling technology that is ready for industrial use. In order to work with model artefacts, Gmodel includes a repository API that currently offers basic artefact search functionality, which will be significantly enhanced in future releases. 3.4 4. CONTRIBUTION TO MODEL-DRIVEN INTEROPERABILITY To a significant degree the development of Gmodel was motivated by a lack of adequate technologies for formal modelling beyond the realm of software engineering and programming languages, and by a lack of interoperability between existing tools for domain-specific modelling. Gmodel simplifies model-driven interoperability in the following areas: The first graphical surface notation for artefacts is based on boxes and lines, and uses Unified Modeling Language style syntax elements for edges (like UML associations), visibilities (like UML dependencies), as well as generalisation references (like UML generalisations). All symbols that constitute the decoration of an artefact with instantiation semantics are coloured in red [figures 4 and 5]. The second graphical surface notation for artefacts is based on nested boxes that precisely mirror the set containment structure of an artefact. A generic graphical editor that allows artefacts to be created and modified is under development. 3.3 ticle focuses on the level of profound semantic interoperability with other metalanguages that can be achieved by making use of multi-level instantiation to emulate “foreign” technologies. Gmodel also offers an alternative for partial and superficial interoperability via file based information exchange. Out of the box Gmodel includes integration with the Eclipse integrated development environment, and with the openArchitectureWare Xpand template/transformation engine, putting text or code generation at the user’s fingertips. Interoperability mechanisms There are two main ways of achieving interoperability between Gmodel and other modelling technologies. This ar- 82 4. Scope management – Gmodel has an explicit feature for scope management that is universally available within all modelling languages expressed in Gmodel [figure 9]. This gives designers of modelling languages and system architects an unprecedented amount of control over the artefacts that language users can instantiate. In the experience of the authors, such functionality is essential for managing the dependencies between languages and between components in large-scale software intensive systems. 5. Separation of the concern of modelling from the concern of naming – Right down to the core Gmodel functionality is expressed in semantic identities, and these identities can be referenced from as many representations (models) as needed. In practical terms this allows Gmodel to incorporate custom terminology and jargons at all meta levels. 5.1.2 Representing the representation of Ecore To prepare for the representation of Ecore in itself (the metametamodel level in the classical four layered metamodel architecture) in Gmodel, we instantiate a model artefact (with meta element vertex ) based on the semantic identity Ecore that has been defined as part of the EcoreDomain in the previous step. Loosely speaking we now have an empty model artefact called Ecore. We can then proceed to add contained artefacts to the Ecore artefact that correspond to the Ecore generalisation/ specialisation hierarchy that starts with EObject. Once this is done we can represent the entire Ecore generalisation/ specialisation hierarchy in the Ecore artefact using super set references as shown in figure 3, and we can represent all instances of EReferences in the Ecore artefact within Gmodel as illustrated in figure 4. 6. Portability – In contrast to many other modelling technologies Gmodel makes no assumption about the implementation and legacy technologies that modellers are going to drive from their model artefacts. The Gmodel kernel is highly portable. It is articulated using the concepts presented in this paper, and makes use of the Java programming language to bootstrap the nine kernel concepts of ordered pair, ordered set, graph, vertex, edge, edge end, super set reference, visibility, and edge trace – but without exposing the Java type system in the core API, whilst restricting internal use of Java types to a handful: boolean, int, List, Iterator, UUID, and String. Lastly we add all relevant variables to the elements of the Ecore artefact, making use of appropriate semantic identities from the EcoreDomain. The whole process of representing Ecore in Gmodel is straightforward modelling in Gmodel, and requires no coding in a programming language. 5.2 Representing Ecore models The representation of Ecore models (the metamodel level in the classical four layered metamodel architecture) in Gmodel follows the same pattern as the representation of Ecore in itself in Gmodel. First, appropriate semantic identities must be defined, and then the Ecore model artefact can be instantiated to obtain an empty model artefact. Note that above we instantiated a vertex to obtain a model artefact with the Ecore semantic identity, and now we are instantiating this model artefact. 5. EMULATING ECORE IN GMODEL Gmodel clearly distinguishes between semantic domains and models. The former simply contain sets of semantic identities, whereas the latter contain representations of semantic identities from the view point of a particular actor. 5.1 in the encoding of Ecore in itself corresponds to a semantic identity. Representing the Ecore metamodel In Gmodel no model can be constructed without referencing elements in the relevant underlying semantic domains. Just as above, the next step consists of adding contained artefacts to the model artefact, this time however the meta elements of the contained artefacts correspond to Ecore concepts. Up to this point there is nothing special about using Gmodel. We could turn the table and proceed with very similar steps in Ecore to obtain a reasonable representation of Gmodel – “reasonable”, because Ecore actually lacks one instantiation level to provide a precise representation of Gmodel edges. But instead of delving into the encoding details of Gmodel edges, the following step in encoding Ecore models is straightforward to follow, and clearly illustrates where multi-level instantiation plays a critical role. 5.1.1 Defining the Ecore semantic domain In Ecore the most generalised element is the EObject, and all other elements are part of a generalisation/specialisation hierarchy that starts with EObject. To represent Ecore in Gmodel, the first step consists of instantiating the semantic domain EcoreDomain, which contains all the semantic identities that appear in Ecore [figure 2]. This step will be perceived as somewhat unusual by all those who are only familiar with the definition of text-based languages using EBNF-style grammars; as the concern of representation and the concern of naming are one and the same in such specifications. The number of semantic identities required to represent Ecore is significantly larger than the number of elements that appear in the Ecore generalisation/specialisation hierarchy. Every instance of an EDataType, every instance of an EReference, every instance of an EAttribute, etc. that occurs in the encoding of Ecore in itself requires a corresponding semantic identity. Loosely speaking, everything that has a name 83 In Gmodel we can proceed to represent all instances of EReferences as demanded by the Ecore model we are emulating, and we can use the edges that Gmodel uses to represent EReference instances to record the cardinalities pertaining to the instantiability of the model artefact. In a metalanguage without multi-level instantiation we would already have hit rock-bottom at this point. We would have been able to express links between elements (which, depending on the metalanguage, may be called “references”,“association”, “relationships”, “connections”, “edges” or similar – the name Figure 2: The Ecore semantic domain Figure 3: Encoding of Ecore super types Figure 4: Encoding of Ecore references Figure 5: Encoding of an entity relationship modelling language in the Ecore emulation 84 Figure 6: Native encoding of an entity relationship modelling language in Gmodel is immaterial), but we would not have been able to decorate these links with cardinalities etc., which constitute essential instantiation semantics for the next level of instantiation or concretisation. by an event-based mechanism to create dynamic interoperability between the two technologies, opening up interesting avenues for model-driven systems that exploit the strengths of both technologies. 5.3 6. ADVANCED MODELLING TECHNIQUES Representing instances of Ecore models Modularity and scope management go hand in hand. One without the other is of very little value. Given the explanations above, it is obvious how to proceed to instantiate Ecore models (the model level in the classical four layered metamodel architecture) in Gmodel such as the example shown in figure 5. The comparison with figure 6 illustrates the complexity introduced one the one hand by the emulation, and on the other hand by the Ecore encoding of links between concepts in the form of EReferences. 5.4 6.1 Representing instances of instances of Ecore models In Gmodel there is no reason to stop modelling at the “model” level. If the modeller has invested in decorating a model artefact with instantiation semantics, Gmodel is capable of applying these semantics – regardless of the level of instantiation or concretisation [figure 7]. Scope management via visibilities Gmodel requires users to be explicit about scope. A model artefact may not reference any element in other model artefacts unless these artefacts have been declared to be visible from the first artefact [figure 9]. In contrast to most programming languages, declarations of visibility are not part of an artefact, but they are part of the parent artefact in the so called artefact containment tree. In practical terms multi-level instantiation allows the modeller to instantiate operational data right down to the concrete level (the instance level in the classical four layered metamodel architecture) – where Joe Bloggs owns life insurance policy number 123456 [figure 8]. The parent artefact has the responsibility of providing the architectural context for all the artefacts that it contains. The authors of Gmodel consider it to be good modelling practice to associate every artefact with a producer, and to identify and name the binding time that is associated with the instantiation of an artefact. Experience from many large-scale software system development initiatives has consistently confirmed the usefulness of this approach to system analysis and modularisation. Given that industrial-strength relational database technology is the default storage format used by Gmodel, navigating and maintaining large databases or data warehouses is simply a matter using the Gmodel repository for navigation, and of using Gmodel’s instantiation function. The encoding of Ecore in Gmodel required the declaration of a small number of visibilities, but there are much better practical examples that can be used to demonstrate the value of scope management via visibilities – a topic that goes beyond the scope of this paper. 5.5 Interoperability between Ecore and Gmodel With the native encoding of Ecore emulated by Gmodel artefacts, building a bi-directional bridge between the two technologies has become a trivial task. The Ecore API can be used to systematically read EMF models (at the metamodel level and the model level in the classical four layered metamodel architecture), and the retrieved in-memory representations can be mechanically mapped to corresponding in-memory representations in the Ecore emulation within Gmodel. Visibilities offer significant value to intensive users of EMF, as Ecore lacks a corresponding facility. By switching from the native implementation of Ecore to the Gmodel Ecore emulation, EMF users gain access to the use of visibilities, and hence obtain a powerful tool for actively managing/restricting the dependencies in large-scale Java component architectures. 6.2 Applications of multi-level instantiation 6.2.1 The bottomless pit of abstractions Gmodel is a technology that allows the construction of modeldriven systems on a new scale, whereas EMF Ecore is a technology with an established user base and a vast array of useful transformation and generator components that facilitate the binding to popular Java implementation technologies. A bridge between Ecore and Gmodel can be driven 85 Gmodel incorporates the insight from experienced modellers that there is no absolute rock-bottom concrete level of models. Life insurance policy number 123456 only looks like an instance from the view point of the average policy holder. From the view point of the insurer a specific version of the policy that is active for a certain interval is a more appropriate perception of instance. If, in 2020, Joe Bloggs decides Figure 7: Snippet from an entity relationship model of a CRM application Figure 8: Joe Bloggs’ life insurance policy number 123456 Figure 9: Example of visibility declarations 86 tends to have a preferred terminology or jargon for their specific interactions, and such jargon is often a valuable tool for disambiguation. to shift his entire life into n virtual worlds (given the track record of software technology, who would want to put all eggs in one basket), his view point will shift. Life insurance policy number 123456 in Second Life may be considered to be one instance, and the corresponding policy representation in Third Life may be considered to be a different instance – perhaps the currency in which premiums are being paid is different in each of the virtual worlds. Without the systematic use of semantic identities, establishing interoperability across an entire value chain is significantly complicated. Names end up being used in the definition of protocols and artefacts, and the reliability of links between the participants in the value chain and communication across the links suffers accordingly. 6.2.2 Value chain modelling and mass customisation If the above sounds far fetched, analysing the typical evolution of technology products over a period of several years provides further motivation for multi-level instantiation. Since the 1970s software has been used as a tool to not only automate industrial production, but also to extend the degree to which technology products can be configured and customised without having to resort to manual manufacturing techniques. Mass customisation has become commonplace in many industries. The evolution of a product over longer stretches of time can be modelled as a series of instantiation levels. Adding a new set of configuration options equates to adding additional variables to an artefact that used to be perceived as an instance. What used to be called a product morphs into a product line, and the new products are the instances of the product line, where each of the variables take on concrete values. The view point of the customer usually remains unaffected, she still buys instances of a product. 7. OTHER TECHNOLOGIES The level of interoperability between current domain-specific modelling tools is comparable to the level of interoperability between CASE tools in the 90s. To increase the popularity of model based approaches, this needs to change. The assumption that all parties in a global software supply chain will use identical tooling is simply not realistic. 7.1 Research prototypes We are aware of at least three research prototypes with some form of multi-level instantiation capability [1], [8], [9], [6]. It would be extremely interesting to compare the design of Gmodel with the design of these technologies. Within a non-trivial value chain, the variables associated with a product line tend to be replaced by concrete values in a series of stages, so called binding times. Each binding time is associated with a specific actor that is responsible for making decisions regarding the values relating to a specific set of variables. In our experience multi-level instantiation is by far the simplest modelling technique for representing non-trivial value chains. 7.2 Eclipse Modeling Framework Ecore In this paper we have illustrated how Gmodel can be used to emulate the Ecore technology, and conversely we have highlighted some of the limits of Ecore, in particular the lack of support for multi-level instantiation. 7.3 MetaEdit+ MetaEdit+ is mature metamodelling and modelling tool that compares favourably with the Eclipse Modeling Framework. In particular the metametamodel of MetaEdit+ is simpler than the metametamodel used by Ecore, without any sacrifice in expressive power. But just as Ecore, MetaEdit+ follows the four layered metamodel architecture dogma and does not offer multi-level instantiation. As a result, MetaEdit+ runs into the same limitation that Ecore runs into when attempting to emulate “foreign” modelling technologies. The alternative of using a purely object oriented design, in combination with the classical power type pattern, leads to system designs that are much more complex and much less maintainable than they could be. In particular the traditional distinction between design-time and run-time is a dangerous over-simplification that distracts from the need of proper value chain analysis (also known as domain analysis in the discipline of software product line engineering). 6.3 It is worthwhile to note that semantic identities are not only applicable at the atomic level to define identities such as TRUE and FALSE, but are just as applicable to statements such as minimum cardinality = 1 or to aggregates such as the entire Ecore model artefact. Applications of denotational semantics Similar to Gmodel, MetaEdit+ relies on database technology rather than a file system for the storage of model artefacts, enabling modellers to build large-scale model-driven systems, but without explicit scope management facilities. Since all model artefacts in Gmodel are constructed from semantic identities, and since semantic identities are the only Gmodel elements that have names, semantic identities offer a one-stop-shop for dealing with all aspects of naming. This greatly facilitates any required translation between different terminologies, and it even enables users to replace the names of the semantic identities in the Gmodel kernel. If a user prefers to call a vertex a node, or if she prefers to rename TRUE to FALSE and FALSE to TRUE, so be it. The role of modelling is representation and not naming. 7.4 Unified Modelling Language tools The main target audience of UML consists of software professionals who have an interest in visualising code, especially object oriented code. Most UML tools only offer very limited – if any – functionality for instantiating models that users have created. Since UML is based on the Meta Object Facility (and on Ecore or similar implementations), UML tools are affected by the kinds of limitations discussed in this paper in relation to Ecore. Separating the concern of modelling from the concern of naming adds value precisely because good terminology is so important. Each pair of collaborating actors in a value chain 87 7.5 Programming languages We hope that Gmodel offers the missing stepping stone that allows a much larger group of organisations to reap the benefits of formal modelling, by significantly reducing the number of concepts and technologies that a designer of modelling languages needs to be familiar with, and by offering features – such as multi-level instantiation – that lead to simpler and clearer designs. Interoperability with EMF Ecore as outlined in this article is currently being refined, and a bi-directional bridge between Gmodel and Ecore will be a feature in an upcoming release of Gmodel. There are several programming languages that offer multilevel instantiation, and there are also a number of programming languages that are based on denotational semantics, such as LISP or REBOL. Whilst these language have expressive power that is comparable to Gmodel, they don’t offer the limitations and constraints that have consciously been built into Gmodel. Programming language designers approach language design from a view point that differs significantly from the view point of a modelling language designer. No modelling tool can ever replace the need for domain analysis, but Gmodel is ideally positioned to record the results of domain analysis. On the one hand Gmodel provides domain-specific modelling support for all participants in a value chain, and on the other hand it serves as a front end for model transformation and code generation technologies that allow models to be glued to existing technologies and legacy systems. 1. A programming language is designed to be executable on a specific platform. The platform represents the solution space, and the implementations of programming languages are optimised with respect to using the resources offered by the platform. 2. Since most programming languages are general purpose languages, they have to offer features that cover the needs of a big range of different users. As a result programming languages offer many features that are not strictly needed by the majority of users. These features lead to additional degrees of freedom in solution designs, and consequently lead to variations in implementation that are induced by personal design preferences of individual software engineers. In the small this may not matter, but in the large these variations are known as spurious complexity. 9. REFERENCES [1] Colin Atkinson, Matthias Gutheil, and Bastian Kennel. A flexible infrastructure for multilevel language engineering. IEEE Trans. Softw. Eng., 35(6):742–755, 2009. [2] Jorn Bettin and Tony Clark. Gmodel, a language for modular meta modelling. In Australian Software Engineering Conference, KISS Workshop, 2009. [3] Jorn Bettin and Tony Clark. The knowledge industry survival strategy initiative (kiss), 2009. [4] Jorn Bettin, William Cook, Tony Clark, and Steven Kelly. Knowledge industry survival strategy (kiss): fundamental principles and interoperability requirements for domain specific modeling languages. In OOPSLA ’09: Proceeding of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications, pages 709–710, New York, NY, USA, 2009. ACM. [5] Tony Clark, Paul Sammut, and James Willans. Applied metamodelling: A foundation for language driven development, 2008. [6] Tony Clark, Paul Sammut, and James Willans. Superlanguages: developing languages and applications with xmf., 2008. [7] Wilfrid Hodges. A shorter model theory. Cambridge University Press, New York, NY, USA, 1997. [8] A. Laarman. An ontology-based metalanguage with explicit instantiation, March 2009. [9] Alfons Laarman and Ivan Kurtev. Ontological metamodeling with explicit instantiation. In M. van den Brand, D. Gaševi?, and J. Gray, editors, Software Language Engineering, volume 5969 of Lecture Notes in Computer Science, pages 174–183, Heidelberg, January 2010. Springer Verlag. [10] Tomasello M, Carpenter M, Call J, Behne T, and Moll H. Understanding and sharing intentions: The origins of cultural cognition. Behavioral and Brain Sciences, 28, 675 - 691, 2005. [11] David A. Schmidt. Denotational semantics: a methodology for language development. William C. Brown Publishers, Dubuque, IA, USA, 1986. 3. A modelling language is designed for the representation of specific kinds of problems. As outlined in this article, problem spaces are best modularised along the lines of the actors that participate in a value chain, and each actor must be equipped with modelling languages that have a clear focus on the specific context and view point – all other details must be abstracted away. The result is a design force that pulls in the opposite direction of the design force that drives the development of most programming languages. The most valuable modelling languages are not only domain-specific, they are company specific. Gmodel is a metalanguage that strives to provide expressive power in those areas that matter to modellers, and at the same time it strives to restrict those expressive powers that may lead to non-maintainable artefacts. 8. CONCLUSIONS Although Gmodel is a brand new metalanguage, it embodies the collective lessons from many experienced modellers. The specific constraints that have been built into Gmodel have a track record of many years in industrial practice. Up to now best practices for scope management and modularity had to be applied manually, in the form of conventions. This worked up to a point, but it posed limits to the scalability of modelling technology in large environments. Without appropriate tool support, designing and maintaining advanced model-driven systems requires a large number of highly skilled modellers and system architects, and often the required level of expertise is simply not available. 88 Model-driven Rule-based Mediation in XML Data Exchange Yongxin Liao1, Dumitru Roman2, and Arne J. Berre2 SINTEF ICT Forskningsveien 1, Oslo, Norway 1 yongxinliao@gmail.com, 2{firstname.lastname}@sintef.no ABSTRACT 1. INTRODUCTION XML data exchange has become ubiquitous in Business to Business (B2B) collaborations. Automating as much as possible the exchange of XML data between enterprise systems is a key requirement for ensuring agile interoperability and scalability in B2B collaborations. The lack of standardized XML canonical models or schemas in B2B data exchange, as well as semantic differences and inconsistencies between conceptual models of those that want to exchange XML data implies that XML data cannot be directly and fully automatically exchanged between B2B systems. We are left with the option of providing techniques and tools to support humans in reconciling the differences and inconsistencies between the data models of the parties involved in a data exchange. In this paper we introduce such a technique and tool for XML data exchange. Our approach is based on a lifting mechanism of XML schemas and instances to an object-oriented model, and the design and execution of data mediation at the object-oriented level. We use F-logic – an object oriented rule language – together with its Flora2 engine as the underlying mechanism for providing an abstract, object-oriented model of XML schemas and instances, as well as for specification and execution of the mappings at the model level. This provides us with a fully-fledged tool for design- and run-time data mediation, by focusing at the actual semantic models behind the XML schemas, rather than having to deal with the technicalities of XML in the data mediation process. Finally, we present the architecture of the current data exchange system and report on preliminary evaluation of our system. Providing techniques and tools to improve the level of automation of XML data exchange in B2B collaborations is widely regarded as a key enabler for agile interoperability and scalability in B2B collaborations [1]. In this paper we introduce a technique and tool for design- and run-time support of XML data exchange. Before we give a brief overview of the approach, let us define in more details the problem of XML data exchange in the context of B2B collaborations. Since we assume the data sent and received by parties in a B2B collaboration to be in XML, we face the problem of XML data transformation. Figure 1 provides an overview of the elements involved in XML data transformation and the process by which an XML document is transformed into another document. Company X (depicted on the left side of the picture) wants to send the Source XML document (e.g. an invoice) to Company Y. The Source XML document is compliant with an XSD schema (Source XSD) made available by Company X such that the receivers of its XML documents can understand the structure and meaning of such documents. Company Y (on the right side of the figure) processes XML documents (in our case Target XML) according to its own schema Target XSD. If Target XSD differs from Source XSD, then company X is faced with the problem of having to process the Source XML document which it does not understand. Company X Transformation Layer Company Y Categories and Subject Descriptors D.2.12 [Software Engineering]: Interoperability, D.2.2 [Design Tools and Techniques], H.2.5 [Heterogeneous Database] Source XSD Schema Transformation Target XSD General Terms Algorithms, Design, Experimentation, Languages Design-Time Run-Time Keywords XML Data Exchange, Data mediation, Semantic mapping Source XML Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10…$10.00. Instances Transformation Target XML Figure 1. Generic design-time and run-time XML data transformation. 89 target at the object-oriented level. It is easier to focus on the semantics of data if it is represented in an object-oriented form rather than a tree-like structure as in XSD. With our solution, mapping rules, schemas and instances will all be in the objectoriented form. Therefore, the core challenge is to generate the Target XML document from the Source XML document, given the Source XSD schema and the Target XSD schema. A Transformation Layer is usually designed to address this challenge by providing means to map the Source XSD to the Target XSD at design time, and by providing an engine that implements the schema mappings at run time when the Target XML needs to be generated from Source XML. In this paper, F-logic, together with its Flora2 implementation, is used as the object-oriented language for formalizing schemas and instances, as well as the mappings. Flora2 is a sophisticated object-oriented knowledge base language and application development environment platform [3]. Flora2 is implemented as a set of run-time libraries and a complier that translate a unified language of F-logic, HiLog and Transaction Logic into tabled Prolog code. Figure 2 presents an example of Flora2 schemas and objects description, rules and queries, as well as loading files into modules. For example, in the specification of schemas ‘=>’ is used to specify the types of the attributes of a class, ‘*’ is used for inheritable attributes, in the specification of objects ‘->’ is used to specify the values of the object’s attributes, ’>>Mod’ means load a program into a module Mod (‘@Mod’ means query the value in model Mod). The reader is referred to [3] for further details of the syntax and semantics of F-logic/Flora2. Since the transformation cannot be fully automated, the core question is how to design the transformation layer in such a way that the human intervention in the specification and execution of mappings is kept at a minimum. XSD is well known to be a complex language and designing mappings between XSD schemas is nothing but a challenge. It is our belief that the mapping designer should focus on the mappings at the semantic level between the conceptual models behind the XSD schemas that need to be mapped, rather than having to deal with technicalities of XSD. Therefore, in the paper we rely on the lifting of XSD schemas to more abstract, objectoriented models, and the specification of the mapping at this more abstract layer. This will not only ease the specification of the mappings by the mappings creator, but would also enable other kind of schemas, not only XSDs, to be mapped to or from XSD schemas. In this paper we chose F-logic – a rule-based objectoriented logical language – as the language to represent the semantic models behind the XSD schemas. We use F-logic not only for specifying the semantic models, but also for specifying the mappings between them. Furthermore, the use of Flora2 engine 1 – a reasoning engine for F-logic [3] – allows us to perform run-time mediation. In this way, we use F-logic/Flora2 as a platform independent model according to the OMG MDA architecture. We argue for two benefits of our approach to XML data exchange: 1. It allows the mappings creator to focus on the semantic, object-oriented model behind the XSD schemas and specify the mappings at a more abstract, semantic level, rather than having to deal with technicalities of XSD schemas. 2. It allows both specification and execution of data mappings (i.e. design- and run-time mapping) in a single, unifying framework. Class Type Schema description person[name*=>string, children*=>person]. Object Instance Relation Object description: John:person[name -> ‘John Doe’, children -> {Bob, Mary}] Mary:person[name -> ’Mary Doe’, children -> {Alice}] Rules: ?X:human :- ?X:person. Queries: Whose child is Bob in module Mod: ?X : person@Mod, ?X[name ->?Y, children->Bob]@Mod. Output Result: ?X=’John’,?Y=’John Doe’ Loading programs in modules: ?- [‘path/filename.flr’>>Mod] #include “path/filename.flr” …. Figure 2. Flora2 examples: objects, rules, queries. The remaining of this paper is organized as follows. Section 2 provides a brief introduction to F-logic/Flora2. Section 3 presents our mapping approach for lifting of XSD schemas to objectoriented modes, mapping specification and run-time execution. Section 4 provides an overview of the architecture of our data exchange system together with some preliminary performance results. Section 5 gives concludes this paper, together with some relevant related work and potential extensions. The core motivation for choosing Flora2 is that it is a rule based object-oriented logical language which provides support for flexible specification of schemas, instances, mapping rules, and at the same time it can be used to execute mapping rules on instance data. Flora2 comes with an XML package which supports loading and parsing XSD/XML documents, converting them to sets of Flora2 objects stored in user-specified Flora2 modules. It also provides equivalent entities for XSD and XML, features that used in our framework for data mediation. 2. BRIEF OVERVIEW OF FLORA2 In order to realize data mediation at a more abstract, semantic level, we need a higher level of abstraction for the representation of XML schemas and instances. Our approach is based on using object-oriented representations to abstract XML schemas and instances and then to perform mapping between a source and a 1 Attribute 3. MAPPING APPROACH Our proposed solution called FloraMap which is based on logical rules for specifying mappings at the schema level and executing those mappings at the instance level. The choice for logical rules is motivated by their declarative and procedural semantics, making them a powerful tool for declaratively specifying and at http://flora.sourceforge.net/ 90 Design-time: the same time executing mappings. Logical rules cannot work directly with XSDs, and therefore proper abstraction mechanisms need to be developed for abstracting XSD schemas, on top of which mappings can be designed and executed. Our choice for such abstractions is the use of object-oriented techniques for representing XSD and XML, on top of which mapping rules can be more easily specified. Flora2 Schema Design-time Run-time Target XSD Transform Engine 2. Logical rules are used to specify the mappings between the source Flora2 schemas and target Flora2 schemas. 3. The Source XML is represented as Flora2 objects of the source Flora2 schema 4. Logical rules from step 2 are executed for the source Flora2 objects and target Flora2 objects are generated 5. The target Flora2 objects are serialized in target XML instances The rest of this section will give an overview of how abstraction is achieved (mapping XML schemas and instance to Flora2 representations), how mappings are specified and executed (i.e. mapping Flora2 source objects to Flora2 target objects), and how the resulting Flora2 objects are serialized in XML (i.e. mapping Flora2 objects to XML instances). Flora2 Schema Semantic Mapping (Specification and Execution) Flora2 Objects The Source XSD and Target XSD are represented as source and target Flora2 object-oriented schemas. Run-time: Figure 3 below gives an overview of the mapping approach. We can separate the mapping in two parts: Design-time and Run-time. Source XSD 1. To exemplify these steps we will use the exchange of an XML invoice between a company X (source) and a company Y (target). The schemas of the invoices of companies X and Y are presented in Figure 4, together with the following mappings: Flora2 Objects 1. Source XML Target XML Bizszam in source is the same as InvoiceNumber in target 2. Bizkelt in source is the same as InvoiceDate in target 3. City in source is the same as DeliveryAddress.city in target 4. Zip in source is the same as DeliveryAddress.zip in target 5. Street in source is the same as DeliveryAddress.street in target 6. AccDate in target is a concatenation of Ev in the source, a delimiter, Kanyvho in the source, a delimiter, and the string ’01’, i.e. AccDate = (Ev+‘_’+Kanyvho+‘_’+’01’) Figure 3. Mapping Approach – Overview. Target XSD: Company Y <xs:element name="InvoiceCompanyY"> <xs:complexType> <xs:sequence> <xs:element name="InvoiceNumber" type="xs:string"/> <xs:element name="AccDate" type="xs:string"/> <xs:element name="InvoiceDate" type="xs:string"/> <xs:element name="DeliveryAddress" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="city" type="xs:string" minOccurs="0"/> <xs:element name="zip" type="xs:string" minOccurs="0"/> <xs:element name="DoorNo" type="xs:string" minOccurs="0"/> <xs:element name="street" type="xs:string" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> (b) Source XSD: Company X <xs:element name="InvoiceCompanyX"> <xs:complexType> <xs:sequence> <xs:element name="Bizszam" type="xs:string“/> <xs:element name="Ev" type="xs:string“/> <xs:element name="Kanyvho" type="xs:string“/> <xs:element name="Bizkelt" type="xs:string“/> <xs:element name="city" type="xs:string" minOccurs="0"/> <xs:element name="zip" type="xs:int" minOccurs="0"/> <xs:element name="street" type="xs:string" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> 1 6 2 3 4 5 (a) Figure 4. XML Schemas and mappings example. 91 XSD import and include have natural equivalents to Flora2 modules. For example, “filename.xsd” is included in XSD file which is presented as <include schemaLocation="filename.xsd"/>. It can be transformed as #include “filename_Abstract.flr” in Flora2 Abstract file and #include “filename_Special.flr” in Flora2 Special file. For XSD import, the following steps can be used for the mapping: 3.1 XSD2OO The technique we designed for abstracting XML schemas to object-oriented models will generate two Flora2 models for each XSD: one Flora2 model (Abstract) contains the “clean” conceptual model of the schema (without any technicalities of XSD, but focusing on the semantics of the elements), and the other one (Special) contains XSD specific information (sequence, choice, etc.) which will be used for generating the structure of target XML instances. 1. In most cases, XSD elements can find a natural representation in Flora2. For example, if a job element in XSD is specified as <element name=”job” type=”string” minOccurs=”0”, maxOccurs=”5”>, it can be transform as in Flora2 as [job {0:5}*=>string]. The {0:5} cardinality is equivalent to minOccurs=”0” and maxOccurs=”5” in XSD. [‘filename_Abstract.flr’>>namespace] in Flora2 abstract file 2. [‘filename_Special.flr’>>namespace] in Flora2 special file 3. Keep the element name and replace the “:” with “_” in the type Table 2 below exemplifies the way XSD import and include are handled in Flora2 schemas. Table 2. XSD contains import and include to Flora2 mapping. Due to length restrictions, we do not provide the reader with a complete mapping of XSD to Flora2 schemas. Nevertheless, Table 1 provides three examples of how top-level elements in XSD are mapped to Flora2 representations. Situation 1 Table 1. Example of XSD elements to Flora2 schema mapping Situation 1 Top-level Element with BaseType XSD Abstract <element name=”name” type=”string” maxOccurs=”2”/> name[name {1:2} *=>string]. Special none Situation 2 Top-level Element with ComplexType XSD Abstract Special Situation 3 XSD Abstract Special XSD <element name=”name”> <complexType> <sequence> <element name=”firstname” type=”string”/> <element name=”lastname” type=”string”/> </sequence> </complexType> </element> name[firstname {1:1} *=>’string’]. name[lastname {1:1} *=> ‘string’]. Elements[name->firstname]. Elements[name->lastname]. Sequences[name->[firstname,lastname]]. Top-level Element with SimpleType Abstract Special <element name=”age”> <simpleType> <restriction base="int"> <maxInclusive value="200"/> </restriction> </simpleType> </element > age[base *=>’int’]. age[maxInclusive-> 200]. none Situation 2 XSD Import <schema xmlns:ccts=“abcd"> <import namespace=“abcd" schemaLocation="../Information.xsd"/> <element name=”person”> <complexType> <sequence> <element name=”name” type=”ccts:Type”/> <element ref=”ccts:age”/> <element name=“work”> <complexType> <simpleContent> <extension base=“ccts:workType“/> </simpleContent> </complexType> </element> </sequence> </complexType> </element> </schema> ?- [‘path/Information_ Abstract.flr’>>ccts] person[name {1:1} *=> ccts_nameType]. person[‘ccts:age’ {1:1} *=> ccts_age]. person[work {1:1} *=> personwork]. personwork[‘ccts:workType’ {1:1} ccts_workType]. ?- [‘path/Information_Special.flr’>>ccts] Elements[person -> name]. Elements[person -> ‘ccts:age’]. Elements[person -> work]. XSD XSD Include <include schemaLocation="person.xsd"/> Abstract #include “path/person_ Abstract.flr” Special #include “path/person_Special.flr” *=> The result of applying the XSD to Flora2 transformation to the XSD schema of Company X (Figure 4.a) is depicted in Figure 5, and the result of applying the transformation to the XSD schema of Company Y (Figure 4.b) is depicted in Figure 6. Attribute and Element are different things in XSD, but we abstract them as the same in Abstract and identify the difference in Special. 92 For example, if an instance of a job element is represented in XML as <job>Programmer</job>, then it can be transformed to obj_1:person [job->’programmer’] in Flora2. Obj_1 is a unique object name and obj_1:person means obj_1 is one of the instances of person. To transform the XML instance to Flora2 objects, the following high-lever steps are devised: Flora2 Abstract (Company X) Namespace[value->'xs:']. InvoiceCompanyX [Bizszam{1:1}*=>'xs:string']. InvoiceCompanyX [Ev{1:1}*=>'xs:string']. InvoiceCompanyX [Kanyvho{1:1}*=>'xs:string']. InvoiceCompanyX [Bizkelt{1:1}*=>'xs:string']. InvoiceCompanyX [city{0:*}*=>'xs:string']. InvoiceCompanyX [zip{0:*}*=>'xs:int']. InvoiceCompanyX [street{0:*}*=>'xs:string']. 1. Flora2 Special (Company X) Sequences[InvoiceCompanyX ->['Bizszam','Ev', 'Kanyvho',’Bizkelt','city','zip','street']] . Elements[InvoiceCompanyX ->Bizszam]. Elements[InvoiceCompanyX ->Ev]. Elements[InvoiceCompanyX ->Kanyvho]. Elements[InvoiceCompanyX ->Bizkelt]. Elements[InvoiceCompanyX ->city]. Elements[InvoiceCompanyX ->zip]. Parsing XML instance files in Flora2, resulting in a Flora2 tree. 2. Load Flora2 Abstract source files in Flora2. 3. Generate the Flora2 object structure according to the Flora2 abstract and query the value from Flora2 tree; Object names are constructed by concatenating “obj_” + a unique number (e.g. 1_1_2) generated from the unique location in the tree. Step 1 is performed by Flora2 engine itself which is not part of our implementation (Flora2 XML package provides XML parsing support). It stores XML instances in Flora2 tree automatically when XML files are parsed. FloraMap uses this package to load XML file and uses Flora2 tree to query the value. Step 2 and 3 are performed by FloraMap. FloraMap generates the Flora2 objects structure according to the Flora2 Abstract and queries the value from Flora2. Figure 5. Flora2 schema representation of Company X XSD schema (Figure 4.a) Figure 7 shows the generation of a Flora2 object from an XML instance example of Company X. On the upper part are X’s XML instance and Flora2 Abstract. The output is the Flora2 object obj. Flora2 Abstract (Company Y) Namespace[value->'xs:']. InvoiceCompanyY[InvoiceNumber{1:1}*=>'xs:string']. InvoiceCompanyY[AccDate{1:1}*=>'xs:string']. InvoiceCompanyY[InvoiceDate{1:1}*=>'xs:string']. InvoiceCompanyY[DeliveryAddress{0:*}*=> CompanyYDeliveryAddress]. CompanyYDeliveryAddress[city{0:*}*=>'xs:string']. CompanyYDeliveryAddress[zip{0:*}*=>'xs:string']. CompanyYDeliveryAddress[DoorNo{0:*}*=>'xs:string']. CompanyYDeliveryAddress[street{0:*}*=>'xs:string']. Source XML (Company X) <InvoiceCompanyX> <Bizszam>I_001</Bizszam> <Ev>2010</Ev> <Kanyvho>05</Kanyvho> <Bizkelt>2010-05-18</Bizkelt> <city>Oslo</city> <zip>1234</zip> <street>First Street</street> </InvoiceCompanyX> Flora2 Special (Company Y) Sequences[InvoiceCompanyY ->['InvoiceNumber','AccDate', 'InvoiceDate‘ ,'DeliveryAddress',TheOrderEnd]]. Elements[InvoiceCompanyY ->InvoiceNumber]. Elements[InvoiceCompanyY ->AccDate]. Elements[InvoiceCompanyY ->InvoiceDate]. Elements[InvoiceCompanyY ->DeliveryAddress]. Sequences[CompanyYDeliveryAddress->['city','zip', 'DoorNo','street']]. Elements[CompanyYDeliveryAddress ->city]. Elements[CompanyYDeliveryAddress ->zip]. Elements[CompanyYDeliveryAddress ->DoorNo]. Elements[CompanyYDeliveryAddress ->street]. Flora2 Abstract (Company X) Flora2 object (Company X) obj: InvoiceCompanyX ['Bizszam'->'I_001']. obj: InvoiceCompanyX ['Ev'->'2010']. obj: InvoiceCompanyX ['Kanyvho'->'05']. obj: InvoiceCompanyX ['Bizkelt'->'2010-0518']. obj: InvoiceCompanyX ['city'->'Oslo']. obj: InvoiceCompanyX ['zip'->'1234']. bj I i C X [' ' 'Fi S '] Figure 6. Flora2 schema representation of Company Y XSD schema (Figure 4.b) Figure 7. XML to Flora2: Company X 3.3 OO2OO These Flora2 Abstract and Special parts represent the source and target XSD and will be used as input in the design-time mapping and run-time target XML instance generation. The core part of data mediation is the specification and execution of the mappings in Flora2, process which takes as input the Flora2 Abstract schemas of the source and target and the mappings between them, the Flora2 source objects, and generates Flora2 target objects according to the specification of the mappings. This phase can be separated in three steps: 3.2 XML2OO The technique we designed for abstracting XML instance to object-oriented models will generate one Flora2 model. Flora2 provides natural equivalences between object entities and XML instances. 1. 93 Specification of the design-time mappings between the source and target Flora2 Abstract schemas. 2. Generation of the executable (run-time) mappings from the design-time specification of the mappings. 3. Execution of the mappings on source Flora2 object for generation of Flora2 target objects. Flora2 Abstract Company X For step 1 we provide a simple mechanism to capture the correspondences between the Flora2 Abstract source and target schemas. This is achieved by the following Flora2 predicates: Design-time Mappings: Company X to Y Flora2 Abstract Company Y ?- [‘InvoiceCompanyX.flr'>>SourceInstances]. ?-?h: CompanyX@SourceInstances,newoid{?t},newoid{?t_4}, insert{ ?t: InvoiceCompanyY[InvoiceNumber->?t_1], ?t: InvoiceCompanyY [AccDate->?t_2], ?t: InvoiceCompanyY [InvoiceDate->?t_3], ?t: InvoiceCompanyY [DeliveryAddress->?t_4], ?t_4: InvoiceCompanyYDeliveryAddress[city->?t_4_1], ?t_4: InvoiceCompanyYDeliveryAddress[zip->?t_4_2], ?t_4: InvoiceCompanyYDeliveryAddress[street->?t_4_4] | ?t_1=?h.Bizszam@SourceInstances, flora_concat_items([?h.Ev@SourceInstances,_, ?h.Kanyvho@SourceInstances,_01],?t_2)@_plg(flrporting), ?t_3=?h.Bizkelt@SourceInstances, ?t_4_1=?h.city@SourceInstances, ?t_4_2=?h.zip@SourceInstances, ?t_4_4=?h.street@SourceInstances}. OneToOne([source],[target]). OneToMany([source],[[target1],[target2],…],[n1,m1,n2,m2,..]). ManyToOne([[source1], [source2], [source3],…],[target]). OneToOne means that a class or attribute in the source schema corresponds to a class or attribute in the target schema. OneToMany means that a class or attribute in the source schema corresponds to more than one class or attributes in the target. ManyToOne means that more than one class or attribute in the source schema correspond to one class or attribute in the target. [source] is the path of the source class or attribute. [target] is the path of the target class or attribute. [n1,m1,n2,m2…] are values to identify substrings, first substring is from n1 to m1, second substring is from n2 to m2 and so on. Figure 9. Fora2 executable program (run-time mappings) Figure 8 shows the Flora2 specification of correspondences/mappings between the Flora2 Abstract source and target schemas from Figures 5 and 6, respectively. The mapping information is taken from our running example in Figure 4. In step 3, Flora2 system is used as the underlying reasoning engine to execute the Flora2 program on source instances. Figure 10 shows the result of applying the executable mapping program on an instance of Company X invoice (obj) and the resulting instance of the Company Y invoice (obj1). Flora2 source object (Company X) OneToOne([InvoiceCompanyX],[ InvoiceCompanyY]). OneToOne([InvoiceCompanyX,Bizszam], [ InvoiceCompanyY,InvoiceNumber ]). OneToOne([InvoiceCompanyX,Bizkelt], [InvoiceCompanyY,InvoiceDate ]). OneToOne([InvoiceCompanyX,City], [InvoiceCompanyY,DeliveryAddress, city]). OneToOne([InvoiceCompanyX,Zip], [InvoiceCompanyY,DeliveryAddress, zip]). OneToOne([InvoiceCompanyX,Street], [InvoiceCompanyY,DeliveryAddress, street]). ManyToOne([[InvoiceCompanyX,EV],‘_’, [InvoiceCompanyX,KANYVHO],‘_’,‘01’]], [InvoiceCompanyY, AccDate]). obj: InvoiceCompanyX ['Bizszam'->'I_001']. obj: InvoiceCompanyX ['Ev'->'2010']. obj: InvoiceCompanyX ['Kanyvho'->'05']. obj: InvoiceCompanyX ['Bizkelt'->'2010-05-18']. obj: InvoiceCompanyX ['city'->'Oslo']. obj: InvoiceCompanyX ['zip'->'1234']. obj: InvoiceCompanyX ['street'->'First Street']. Executable Mapping Program (Fig 9) Figure 8. Design-time correspondences between the Flora2 schemas of company X and Y For step 2 we have devised a mechanism that takes as input the Flora2 source and target schemas, the design-time correspondences between them, and generates a Flora2 program that represents the executable mappings. This can be achieved in Flora2 in a rather intuitive and straightforward way: for each object instances in source generate new objects (using the newoid primitive defined in Flora2), assign the values to the new objects according to the design-time correspondences rules, and store the new objects in a target knowledge base (using the transactional feature insert of Flora2). Figure 9 shows the generated executable mapping program for our running example. Flora2 target object (Company Y) obj1: InvoiceCompanyY[InvoiceNumber->’I_001’]. obj1: InvoiceCompanyY [AccDate->'2010_05_01']. obj1: InvoiceCompanyY[InvoiceDate->’ 2010-05-18']. obj1: InvoiceCompanyY[DeliveryAddress->{obj_4}]. obj_4:CompanyYDeliveryAddress[city->’Oslo']. obj_4:CompanyYDeliveryAddress[zip->‘1234']. obj_4:CompanyYDeliveryAddress[street->‘First Street']. Figure 10. Run-time mapping of Flora2 objects 94 At run-time FloraMap takes as input the XML source instances, the Flora2 source and target schemas, and the executable mapping rules produced at the design-time. Based on these inputs, FloraMap transforms XML source instances to Flora2 objects, executes the mappings on these source objects and generates target objects, and finally serializes the target objects into XML target instances. 3.4 OO2XML Flora2 to XML mapping is the last process in FloraMap execution and is concerned with serialization of generated Flora2 objects into XML instances. This process takes as input the target schema (both Flora2 Abstract and Special target schemas) and the Flora2 target objects and generates a target XML instances. In the XSD to Flora2 lifting process, FloraMap generated two Flora2 models: Flora2 Abstract (contains conceptual model of the schema) and Flora2 Special (contains XSD specific information). These two Flora2 files are used for generating the structure of target XML instances. Note that the Flora2 Special target schema plays a key role in the serialization of the objects, because it indicates the technical details of the XML instance that should be generated. In Flora2 to Flora2 mapping process, FloraMap generated Flora2 objects which are used to query the values of each class and attribute. Figure 11 depicts the Flora2 to XML process in our running example. Figure 12 presents a high-level overview of the FloraMap modules and the interactions between them. The followings are the core modules of FloraMap: • • • • Flora2 object (Company Y) obj1: InvoiceCompanyY[InvoiceNumber->’I_001’]. obj1: InvoiceCompanyY [AccDate->'2010_05_01']. obj1: InvoiceCompanyY[InvoiceDate->’ 2010-05-18']. obj1: InvoiceCompanyY[DeliveryAddress->{obj_4}]. obj_4: CompanyYDeliveryAddress[city->’Oslo']. obj_4: CompanyYDeliveryAddress [zip->‘1234']. obj_4: CompanyYDeliveryAddress [street->‘First Street']. XSD to Flora2: Transforms the input XSDs to Flora2 schema models XML to Flora2: Transforms the input XML instances to Flora2 objects Flora2 to Flora2: Specifies the mappings between the source and target Flora2 models (OO level) Flora2 to XML: serializes the Flora2 objects to XML instances Target XSD Source XSD XSD to Flora2 Source Target Flora2 Schema Flora2 Schema Target XML (Company Y) Flora2 Abstract (Company Y) Flora2 Special (Company Y) <?xml version="1.0"?> < InvoiceCompanyY > <InvoiceNumber>I_001</InvoiceNumber> <AccDate>2010_05_01</AccDate> <InvoiceDate>2010-05-18</InvoiceDate> <DeliveryAddress> <city>Oslo</city> <zip>1234</zip> <DoorNo> </DoorNo> <street>First Street </street> </DeliveryAddress> </ InvoiceCompanyY> Flora2 to Flora2 Source Target Flora2 Objects Flora2 Objects XML to Flora2 Flora2 to XML Source XML Figure 11. Serialization of Flora2 objects to XML instances 4. System Architecture, Implementation, and Experimental Results Target XML Figure 12. FloraMap: Core modules and interactions Several experiments have been performed on the current implementation to test the scalability of FloraMap. The experiments have been carried out on a commodity computer (Intel(R) Core(TM) 2 Duo CPU P8600 @ 2.4GHz, 4GB RAM, Windows Vista 32-bit OS). Two types of experiments have been performed: The techniques outlined in the previous section have been implemented in FloraMap - as a set of modules implemented in Flora2 which can be used to parse and transform XML schemas and instances into Flora2 schemas and objects, and execute the mediation rules specified at the Flora2 level. At design-time FloraMap takes as input the source and target XML schemas and generates the object-oriented models of the schemas. Then, the mappings creator specifies the correspondences/mappings between the schemas (similar to the example given in Figure 8), and generates the executable mapping program (similar to the example given in Figure 9) that will be used to execute mediation on source instances. 1. Transformation of XSDs of various sizes and complexities to Flora2 Schema. 2. End-to-end data exchange of increasing number of instances for the running example presented in above section. For the first type of experiments we have used XSDs of various sizes and complexities to test the scalability of generating Flora2 object-oriented models from XML schemas. The used XSDs ranged from simple schemas such as those presented in this paper 95 (in Figure 4) to very complex schemas such as the Northern European Subset of UBL (NES).2 The times needed to generate object-oriented models from XSDs are reported in Figure 13. 5. Related Work, Conclusions, and Outlook The problem of mapping between data structures has been extensively studied for decades, and schema mapping is well established as research field [6,2]. Nevertheless, the use of rulebased logical systems for data mapping/exchange hasn’t been yet widely investigated in the community. With this paper we provided a solution to the end-to-end data exchange problem based on the use of F-logic/Flora2 as a logical framework which we used for high-level, abstract specification of schemas and mappings between them, as well as for run-time execution of mappings. Our approach allows the mappings creator to focus on the semantic, object-oriented model behind the XSD schemas and specify the mappings at a more abstract, semantic level, rather than having to deal with technicalities of XSD schemas. The proposed approach allows both specification and execution of data mappings (i.e. design- and run-time mapping) in a single, unifying framework, providing an end-to-end solution to the problem of XML data exchange. Figure 13. Performance results: Generation of Flora2 models from XML schemas There are several works that can be related to our approach. For example [4] presents algorithms to represent XML and XSD in a mainstream object-oriented programming language. It develops two mappings: one uses a set of rules that map an XSD schema into its object-oriented schema, and the other one maps XML instances that conform to an XSD schema to their representation as objects. This is directly related to our generation of Flora2 object-oriented models from XML schemas and instances, however, the representation in [4] does not seem to be complete (e.g. it is unclear how XSD import/include statements are handled). Furthermore, our approach targets specification of mediation as well as run-time execution, whereas [4] focuses just on an object-oriented representation of XML schemas. Another relevant work is [5], which focuses on generation of XML from object oriented modes. This can be related to our serialization of Flora2 objects into XML, but as in the case of [5] the scope of our work is much broader. The results show that mapping large and complex schemas such as NES is a time consuming task (took about 7 minutes), however this is not an issue since this generation needs to be done at design time and only once. After producing the Flora2 representations of the XSDs, they can be loaded and processed rather fast by FloraMap, for run-time mediation. For the second type of experiments, where we tested the end-toend data exchange, we have used increased numbers of synthetically generated instances of the source schema presented in Figure 4, to generate instances of the target schema (also presented in Figure 4). This type of experiment included the complete mapping of source instances to target instances, through an intermediary schema (not presented here), meaning that we had three schemas and two set of mappings. The time needed to have a complete transformation of increased numbers (1 to 4000) of invoice instances of Company X XSD to instances of Company Y XSD is reported in Figure 14. In a wider context, the work presented in this paper is related to MDE model transformation techniques and languages [7,8] such as ATL Transformation Languages (ATL). 3 Whereas model transformation languages can be applied to the XML data exchange problem addressed in this paper, it is unclear how suitable and easy is to apply such general purpose languages for the specific case of XSD/XML. A thorough analysis of model transformation techniques developed in the MDE community is needed in order to judge their suitability for XML data exchange. Furthermore, a systematic comparison of mode transformation techniques and logical rule-based approaches for data exchange is needed in order to understand their similarities and differences, and have a clear understanding of their advantages and disadvantages for data exchange. Figure 14. Performance results: End-to-end data mediation These results show that the larger the number of instances the more time is needed for end-to-end processing, with the time being somewhere between linear and exponential. Whereas in some applications this can be acceptable (e.g. processing 4000 instances in about 15 minutes, as our results showed), in some other applications this might not be reasonable. The FloraMap mapping technique proposed in this paper is promising, and its implementation and experiments showed that run-time mediation is possible and feasible with a logic-based rule approach. However, there are still some directions can be considered to further enhance FloraMap: 2 3 http://www.nesubl.eu/ 96 http://www.eclipse.org/atl/ 1. 2. 3. 4. 5. 6. Extensions for handling end-to-end n-m mappings between, where multiple sources and multiple targets can exchange data. 6. REFERENCES [1] Christoph Bussler. B2B Integration. 2003, Springer, ISBN 3540434879. Inconsistent mappings may lead to errors during the run-time data exchange, therefore design and implementation of a consistency check technique at design time would significantly improve the mapping process. It is expected that the underlying reasoning mechanism provided by Flogic will significantly contribute to the automated detection of inconsistencies between mapping rules, and therefore making logical rule based approaches even more attractive for data exchange. [2] Ken Smith, Peter Mork, Len Seligman, et al. The Role of Schema Matching in Large Enterprises, CIDR Perspectives 2009. [3] Guizhen Yang, Michael Kifer. FLORA-2: User’s Manual 2008. [4] Suad Alagic, Philip A. Bernstein, Mapping XSD to OO Schemas, Microsoft Research, 2008. [5] R. Xiao, Tharam S. Dillon, E. Chang, Ling Feng. Modeling and Transformation of Object-Oriented Conceptual Models into XML Schema, Database and Expert Systems Applications, 795-804. Design and implementation of a graphical interface for design-time mapping. In its current implementation, FloraMap does not come with a graphical editor of Flora2 models and mappings. Reuse of open-source tools such as the emerging in the context of the OpenII project4 could be relevant in this context. [6] Bernstein, P. A. and Melnik, S. Model management 2.0: manipulating richer mappings. In Proceedings of the 2007 ACM SIGMOD international Conference on Management of Data (Beijing, China, June 11 - 14, 2007). FloraMap has been designed for XML data mapping, however since the approach works at an expressive model level, it should be fairly simple to extend it to handle other types of schemas such as relational schemas. This would enable exchange of data that conform to different schematic representation, e.g. relational, XML schemas, etc. [7] [8] Czarnecki, K, and Helsen, S. Classification of Model Transformation Approaches. In: Proceedings of the OOPSLA'03 Workshop on the Generative Techniques in the Context Of Model-Driven Architecture, Anaheim, California, USA. (Semi-)Automated generation of executable mapping rules. Approaches for automated generation of rules in the area of ontology and MDE model transformation techniques such as [9,10], as well ideas from semantic Web services matchmaking such as [11], can be employed here to provide sophisticated support for a (semi-) automated generation of mapping rules. [9] Stephan Roser, Bernhard Bauer. Automatic Generation and Evolution of Model Transformations Using Ontology Engineering Space. J. Data Semantics 11: 32-64 (2008). [10] Gerti Kappel, Elisabeth Kapsammer, Horst Kargl, Gerhard Kramler, Thomas Reiter, Werner Retschitzegger, Wieland Schwinger, Manuel Wimmer: Lifting Metamodels to Ontologies: A Step to the Semantic Integration of Modeling Languages. MoDELS 2006: 528-542. More comprehensive validation. Whereas we provided some initial experimental results for the scalability of FloraMap, other aspects of our approach need to be analyzed in a more systematic way. For example, analyzing the complexity of the specification of mapping rules, compared for example to the complexity of the specification of mapping rules using model transformation techniques would be another potential direction for future work. [11] Klusch, M. and Kaufer, F. WSMO-MX: A hybrid Semantic Web service matchmaker. Web Intelli. and Agent Sys. 7, 1 (Jan. 2009), 23-42. ACKNOWLEDGMENTS This work is partly funded by the EU projects “A Semantic Service-oriented Private Adaptation Layer Enabling the Next Generation, Interoperable and Easy-to-Integrate Software Products of European Software SMEs (EMPOWER)” 5 and “Environmental Services Infrastructure with Ontologies (ENVISION)” 6. 4 http://www.openintegration.org/ 5 http://empower-project.eu/ 6 http://www.envision-project.eu/ Mens, T, and Van Gorp, P. A Taxonomy of Model Transformation, Electronic Notes in Theoretical Computer Science, Volume 152, 27 March 2006, Pages 125-142. 97 Behavioural Interoperability to Support Model-Driven Systems Integration Alek Radjenovic Richard F Paige The University of York Department of Computer Science York YO10DD, United Kingdom +44 1904 567836 The University of York Department of Computer Science York YO10DD, United Kingdom +44 1904 343242 alek@cs.york.ac.uk paige@cs.york.ac.uk ABSTRACT 1. INTRODUCTION Software system integration is a process in which the target system is synthesised from discrete components (subsystems) whilst ensuring they function together as a system and are able to deliver required functionality. System integration is particularly important in projects in which new technologies must integrate with legacy systems. In such scenarios, this process can be broadly divided in two stages: interoperability checking and composition. Model-based approaches are promising since they allow us to carry out some of this process earlier (thus identifying problems earlier in the development lifecycle when they are easier to rectify). In this paper we describe a generic modelbased platform for system integration, applicable to different modelling languages, that supports both interoperability checking (at different levels of abstraction) and composition; our presentation focuses on the platform’s support for interoperability checking. The approach, which consists of a language and a simulation tool, is presented, and its use is illustrated in a simple example for interoperability checking involving architectural models enriched with behaviour. Software system integration is a process in which the target system is synthesised from discrete components (subsystems) whilst ensuring they function together as a system and are able to deliver their intended functionality. These software elements are typically developed separately. Indeed, many software-intensive and software-dependent projects, whilst taking advantages of the next generation technologies as well as ‘ready-made’ third party components, are required to reuse existing legacy software. In such scenarios, integration introduces risk because the interoperability between various parts cannot be ascertained before late stages in the development process (i.e. during the system integration phase). Modern software projects often use model-based development (employing various modelling platforms and notations) where models are created prior to the development of executable code. Even when models are not available (e.g., in legacy systems), system architects can use tools to generate component and architectural models automatically from source code. Increasingly, component models are described using heterogeneous modelling languages and tools. Thus, there is a substantive technical problem to be addressed in model integration. We argue that the identification of model integration mechanisms at the software architecture level is highly desirable. In particular, interoperability checking at the model level is key to identifying system integration problems early on. Categories and Subject Descriptors I.6.4 [Simulation and Modeling]: Model Validation and Analysis General Terms Design, Verification. Interoperability checking represents the necessary first step in model integration. The incompatibilities may arise in two different planes - structural (mainly observed at the syntax level) and behavioural (mainly observed at the semantics level). Our framework tries to address both of these issues. Many current approaches focus on one or the other, or are not sufficiently generic to support all modelling languages as they focus on specific standards, such as those of the OMG. Keywords Model analysis, model integration, model consistency, behaviour modelling, simulation. Our solution, which we call SMILE, is a framework within which we can (amongst other things) attach semantics, relevant to behaviour, to various structural model elements and perform execution of the specified behavioural model through simulation. Consequently, we are able to identify undesired behaviours of the combined models either through post-simulation analysis of the simulation trace or, actively, by formulating undesired conditions which cause the simulation to halt if they have been detected. SMILE is a platform capable of manipulating models specified in different modelling languages and checking different behavioural paradigms. We achieve this by means of transformation and 98 techniques, however, are defined at the metamodel level (various taxonomies can be found in [7]). simulation. First, the relevant behavioural information from the input models is extracted to create a SMILE behavioural model comprising behavioural types. And second, these types are instantiated as simulation objects used in the simulation. Thus, SMILE is essentially an interchange platform for exploring behaviours in combined models. Although SMILE is a generic platform, applicable to arbitrary modelling languages, in this paper we illustrate the principles behind it using UML and its State Machine diagrams and use this exemplar to show how it can be used in interoperability checking. In terms of breadth of usage, three of the more successful model transformation approaches are ATL, ETL and VIATRA. ATL (ATLAS Transformation Language) [13] is a model transformation language and toolkit which provides a means to produce a set of target models from a set of source models. Developed on top of the Eclipse platform, the ATL Integrated Environment (IDE) provides a number of standard development tools (syntax highlighting, debugger, etc.) that aims to ease development of ATL transformations. ATL also includes a library of ATL transformations and has been defined to perform general transformations within the MDA framework. There are currently over 100 defined transformations in the online library. The language itself appears to be somewhat cumbersome which is reflected in the supplied transformation examples. This may partly be due to its substantially declarative nature, because some transformations are not necessarily best expressed in this fashion (e.g., transformations that involve iterations over complex structures). However, ATL's tool support is some of the most robust in the MDE community. The rest of the text is organised as follows. Section 2 describes the related work. Section 3 provides an overview of our approach. Section 4 first introduces the case study and then presents the compatibility checking results. In closing, section 5 makes conclusions and suggests future directions. 2. RELATED WORK 2.1 Model Compatibility Various organizations and companies (OMG, IBM, Microsoft, etc.) have proposed environments to support Model Driven Engineering (MDE). Among these, the OMG MDA (Model Driven Architecture) [22] is most prominent, and it focuses on the identification of basic MDE principles, its practical characteristics (direct representation, automation, and open standards), original scenarios, and discussions of suitable tools and methods. System functionality is defined as a platformindependent model (PIM) through an appropriate domainspecific language (DSL). Given a platform definition model (PDM) corresponding to a particular software technology (such as, CORBA [25], or .NET [20]), the PIM is then translated to one or more platform-specific models (PSMs) for the actual implementation. One of the key obstacles to model-based interoperability and hence system integration is the incompatibility of models evident mainly at the syntactical level. In order to resolve this issue it was suggested that a unifying meta-model, to which all modelling languages concerned would conform, would be required. OMG's UML profile for Enterprise Application Integration (EAI) is defined as a complete MOFbased [27] metamodel that provides facilities for modelling the integration architecture, focusing on connectivity, composition and behaviour. The EAI UML profile also defines a MOF-based standardised data format intended for use by different systems to exchange data during integration. Data exchange is achieved by defining an EAI application metamodel that handles interfaces and metamodels for programming languages (such as C, C++, PL/I, and COBOL) to aid the automation of transformation. While standardising on MOF is a step in the right direction, in practice there are various problems, such as the lack of widespread support for MOF by various tools, and the differences between versions of XML Metadata Interchange (XMI) [26] support in tools [3]. MOF is currently the only standard that attempts to cut across the different modelling and implementation platforms. ETL (Epsilon Transformation Language) [14] provides model-to-model transformation capabilities to Epsilon and can be used to transform an arbitrary number of input models into an arbitrary number of output models specified in different modelling languages. ETL, like ATL [13] and QVT [24], has a mixture of both declarative and imperative language characteristics. Declarative transformation languages are generally limited to scenarios where the source and target metamodels are similar to each other in terms of structure and thus, the transformation is a matter of a simple mapping. Imperative languages, in addition, include operations but operate at a low abstraction level. Subsequently, users have to manually address issues such as tracing and resolving target elements from their source counterparts and orchestrating the transformation execution. Hybrid languages provide both a declarative rulebased execution scheme as well as imperative features for handling complex transformation scenarios. ETL is firmly in the hybrid camp, and thus targets both mapping transformations (where the source/target metamodels are similar) as well as more complex transformation scenarios. Like ATL and QVT, ETL reuses a portion of OCL for navigating model elements. Unlike ATL and QVT, ETL includes imperative constructs (such as loops, assignment statements, and sequencing of statements) that makes iterative transformations much easier to express. The VIATRA (VIsual Automated model TRAnsformations) [9] framework is the core of a transformation-based verification and validation environment for improving the quality of systems designed using UML by automatically checking consistency, completeness, and dependability requirements. Its main objective is to provide a general-purpose support for the entire life-cycle of engineering model transformations including the specification, design, execution, validation and maintenance of transformations within and between various modelling languages and domains. VIATRA intends to complement existing model transformation frameworks in providing: a model space for uniform representation of models and meta-models, a transformation 2.2 Model Transformation In an ideal situation, during model transformation the syntax is changed to the target modelling language whilst the semantics is preserved [12]. The overall majority of model transformation 99 Platform Independent Model (PIM) is merged with a Platform Definition Model (PDM) (which contributes platform-specific aspects) to form a Platform Specific Model (PSM). This has been particularly useful for, e.g., performance analysis where different system configurations (corresponding to platform-specific performance data) have been merged with system models, and the result has been used for simulation. When combined with other features of the Epsilon platform, this merging capability can be carried out iteratively, thus allowing batch-generation of arbitrarily large numbers of simulation models and simulation results. EML has also been used successfully for managing versions of models. language (with both declarative and imperative features; also, based on popular formal mathematical techniques of graph transformation (GT) and abstract state machines (ASM)), a high performance transformation engine (which supports incremental model transformations, but also trigger-driven live transformations where complex model changes may trigger execution of transformations), and with main target application domains in both model-based tool integration framework as well as model analysis transformations. More importantly, VIATRA considers scalability as an important factor and is claiming to be able to handle well over 100,000 model elements. 2.3 Model Composition openArchitectureWare (oAW) [1] is a modular MDA generator framework implemented in Java, based on the Eclipse platform. oAW can parse arbitrary models, and it has a family of languages to check and transform models as well as generate code from them. oAW has strong support for EMF (Eclipse Modelling Framework) based models but can work with other models, too (e.g. UML2, XML or simple JavaBeans). oAW is based around a workflow engine which allows the definition of generator mechanism. Various pre-built workflow components can be used to read and instantiate models, check for constraint violations, perform transformation into other models, or generate code. openArchitectureWare have also submitted to Eclipse a project proposal called Textual Modeling Framework (TMF). TMF focuses on textual DSLs and Eclipse IDE integration. One of two initial contributions will be Xtext - a framework and generator to provide a specialised Eclipse editor and an EMF meta-model from a simple EBNF-style grammar. Its focus will be on very short turnarounds and it is hoped to provide powerful abstractions for development of textual DSLs. The process of model composition consists of four distinct phases: comparison, conformance checking, merging and reconciliation (or restructuring) [5,28]. In the comparison phase, the correspondences between equivalent elements of the two models are identified making sure that the elements in question are duplicated in the merged model. In the conformance checking phase, matched elements from the previous phase are examined for conformance with each other in order to identify potential conflicts that would render merging impossible. The majority of proposed approaches (e.g., [18]) address conformance checking of models through their compliance with the same meta-model. In the merging phase, a number of approaches have been proposed, such as graph-based algorithms [19,28] or an interactive process for merging of UML 2.0 models [18]. The limitations of these approaches are related to the fact that they either only address merging models of the same (specific) metamodel, or use an inflexible merging algorithm with no means of extension or customisation. In the reconciliation and restructuring phase, the inconsistencies in the target model are fixed. 2.4 Multi-paradigm Modelling Next, some of the key approaches to model composition are described. Multi-paradigm modelling (MPM) combines three orthogonal research fields: multi-formalism modelling (using different languages while modelling a system), model abstraction (relationship between models at different levels of abstraction), and metamodelling (construction of the collection of concepts that highlight the properties of the modelling language) [29]. The advocates of MPM recognise that the design of systems increasingly requires representations in various languages (formalisms) and with different abstractions, where these representations must be “coupled, combined, integrated, and transformed” [33]. The AMW (ATLAS Model Weaver) [2] is a tool for establishing relationships between models. These links are stored in a weaving model which conforms to a weaving metamodel. Weaving models may be used in several application scenarios, such as meta-model comparison, traceability, model matching, model annotation, tool interoperability. AMW provides a base weaving meta-model (enabling creation of links between model elements and associations between those links) which may be extended to add further mapping semantics providing a mechanism for creating variable mapping languages dedicated to specific application requirements. In [11], the authors explore various multi-paradigm modelling techniques and evaluate them based on two criteria: 1) their level of support for an open set of modelling languages, and 2) their support for formal verification of properties. With respect to the first criterion, they make three key conclusions. Firstly, the platforms under observation (GME [8], Eclipse Modeling Project [10], and AToM3 [17]) allow the automatic generation of tool support for user-defined modelling languages but the limitation is their dependency on the underlying metamodels. Secondly, the composition of modelling languages is highly dependent on the syntax and semantics being expressed in a given format. And thirdly, the task of adding support for an additional modelling language can be very difficult (the order of magnitude corresponds to describing the semantics of a The EML (Epsilon Merging Language) [14] adds model merging capabilities to the Epsilon platform. More specifically, EML can be used to merge an arbitrary number of input models of potentially diverse metamodels and modelling technologies. The key motivation for EML was to have a mechanism that would enable automatic model merging on a set of established correspondences. This has a number of applications in MDE. For example, EML can be used to unify two complementary, but potentially overlapping, models that describe different views of the same system. As well, EML can be used to merge a core model with an aspect model (section 4.5) (potentially conforming to different meta-models). This is discussed in [21] where a core 100 or one or more of the OMG standards. Consequently, checking of model interoperability, particularly at the behavioural level, is often too dependent on, or skewed towards, Java, Ecore and/or MOF. An approach in which we could reason about model behaviours and model interoperability in a generic fashion, away from the underlying meta-models, is highly desirable. modelling language).With respect to the second criterion, the authors found that to reason about properties – at a global level and on a set of heterogeneous models – in a formal fashion represents quite a challenge. AToM3 [16] is a tool which received much attention from the research arena. It implements model transformation techniques based on graph rewriting. Here, input models are represented (internally) using graphs while the transformations are specified by graph grammars which spell out the rewriting rules. The authors claim that AToM3 can potentially support a wide range of modelling languages, provided that their abstract syntax is described by a metamodel and that a transformation can be written between the source and target metamodels. This may be particularly difficult for certain types of languages [11]. The key limitation is that the number of transformations that one needs to design increases exponentially with the number of participating languages. 3. THE SMILE-X PLATFORM Our platform for model integration is called SMILE (Simple Model Integration Language with Execution engine). It comprises the techniques, languages and tools. SMILE has effectively three main components two of which deal with different aspects of model compatibility – SMILE-S (for structural checking) and SMILE-X (for behavioural checking) – whilst the third, SMILE-I, deals with integration. We have initially explored the compatibility of models at the structural level. SMILE-S defines an interchange format for describing the structure of heterogeneous models in terms of trees [31], where the tree vertices (nodes) represent structural elements and the edges express the containment relationship between the elements. In addition, the nodes typically contain a collection of properties to further describe characteristics of the structural elements. The concrete syntax is effectively a DSL (domain specific language) that provides a way to represent heterogeneous input models (e.g. UML, Simulink, and AADL as shown in Figure 2) in a uniform fashion. The transformation of input models into SMILE trees is external to the core tool, i.e. the knowledge of the underlying meta-models and parsing is delegated to plug-in components. Other approaches either lack in mature tool support (e.g. Rosetta [15]) or the supported semantics to describe behaviours is of a limited range (e.g. BIP [4] uses labelled transition systems). 2.5 Model Interoperability The general notion of interoperability between systems is defined in [30] as the ability of one system to communicate with and access the functionality of the other system. The concept of interoperability can also be characterised as a certain degree of compatibility [6] where the levels of compatibility include coexistence, interconnection, interworking, interoperation and interchangeability, while the relevant system features that define the compatibility level comprise communication protocols and interfaces, data access and types, parameter semantics, application functionality and dynamic behaviour. SMILE-X is a natural extension of SMILE-S, and is the component described in this paper. DSL (structural models) Furthermore, two or more models are interoperable if they are related to one another in one of the following ways: • • • UML SM1 DSL (behavioural models) T1 e.g. state machines Integrated – diverse models are interpreted in the standard format which must be as rich as any of the constituent system models Unified –there is a common meta-level structure across constituent models which provides a way to establish semantic equivalence Simulink SM2 T2 AADL SM3 T3 SIMULATION (*) T1, T2, T3 ß transformations Federated – models have to be dynamically accommodated rather than have a predetermined metamodel (this assumes that concept mapping is done at the semantic level) templates applied to structural models Figure 1.SMILE: conceptual approach In SMILE-X, we ignore the structural incompatibilities of input models, such as the name and type mismatches (identification of which is dealt within SMILE-S as part of the structural compatibility checking) and focus solely on the behavioural properties. A particular behavioural model (such as the state machines) is provided as a template that enables model transformations (i.e. mappings from specific SMILE-S elements to a SMILE-X behavioural model – essentially a set of behaviour types). As shown in Figure 2, SMILE-X descriptions are yet another DSL that enable uniform representation of behavioural models for the SMILE-X simulation engine. In SMILE-X, we Such view clarifies the difference between full integration and interoperability: integrated systems are interoperable while the interoperable ones are not necessarily integrated. 2.6 Summary A large majority of the existing approaches to model-based system integration lack in one or more of the ‘ingredients’ we discussed in this section. Even those which support the more or less full set of model management techniques are typically tightly integrated with either the Eclipse Modeling Framework (EMF) 101 mechanisms. A fully bidirectional platform capable of transforming the analysed models back to the original format is being implemented in the SMILE-I component. neither attempt to resolve any inconsistencies or incompatibilities (i.e. we merely identify them and report back to the SMILE-X user) nor do we deal with behaviour preservation. These are dealt with inside the integration (SMILE-I) component of the platform. The SMILE-X tool depends on a specific SMILE-X language whose concrete syntax is in XML and conforms to a well defined XML Schema [32]. The tool is essentially an execution engine for the SMILE-X models and it also provides a capability to add transformation plug-ins for interchange with other modelling languages. In this paper, we look at the scenario of homogeneous, but vendor specific models. In particular, we use models specified in different versions of UML using different UML tools. This scenario is commonly present in projects of large organisations where various software components are typically a mixture of legacy code, new code, and third party (supply chain), off-theshelf, components. SMILE-X architecture is illustrated in Figure 3. Two or more input models are converted to SMILE trees (this functionality is part of the SMILE-S component). A behavioural template is then used to extract the relevant information from the trees and create a behavioural model (e.g. state machines) which essentially consists of a set of types that describe particular behaviours of model components. Next, a configuration is applied by means of a manual intervention and with the help of other artefacts (such as class and object diagrams) in order to instantiate the behavioural types into simulation objects and to create a simulation model. Finally, we define or select a specific schedule before we can perform simulation. Each simulation run provides a trace as an output. We can then analyse the trace in order to identify undesired behaviours in the system. Alternatively, by formulating undesired conditions through the definition of triggers we can cause the simulation to halt as soon as these conditions are detected. SMILE-X provides a framework where elements of input models can be mapped to (or matched against) the specified behavioural model. This behavioural model is provided in the form of a template (meta-model) which enables us to attach semantics to structural model elements and which describes a particular behavioural paradigm (or, a related family of behaviours) that we are interested in analysing. The chosen template transforms the input models into an integrated SMILEX model which describes system's behaviour in the form of a collection of elements and maps that convey information about interactions within the system. SMILE-X transparently glues elements together either fully automatically or with additional information entered interactively by the user. Thus, SMILE-X facilitates a mechanism through which we can integrate behaviours of input models based on the chosen perspective, and consequently perform simulations on the integrated system. BEHAVIOUR TEMPLATE MAP The template used in this paper is that of state machines which takes an approach that is based on a modified discrete event system specification (DEVS) [23]. The fact that there is a significant overlap between behavioural models (such as sequence, communication and state machine diagrams) on one hand, and structural models (e.g. class, object, or component diagrams) on the other, enables us to have an uncomplicated extension to the work on structural compatibility in models. The structural elements are further enhanced with concepts that add semantics, such as: event, time, action and state. Consequently, state machine models in SMILE-X are described in terms of a well-known state transition system with actions and guards. The behaviours specified must be regarded as specifications of the actual behaviours which can be both deterministic as well as non-deterministic. These behaviours are characterised by state variables whose evolution is specified by transitions. The transitions are triggered by events, guarded by conditions, and enriched by actions. However, we reiterate that SMILE-X is not restricted to state machine models; these are used here only as one example. CONFIGURATION BEHAVIOURAL MODEL INSTANTIATE SIMULATION MODEL SMILE TREES TRACE SIMULATE SCHEDULE There are two key parts to the SMILE-X language specification. The first part (the behavioural templates) enables generic descriptions of the input models' behaviours. The second helps in defining the rules (as triggers) which aid in detection of the behavioural incompatibilities. SMILE-X builds on SMILE-S by reusing all structural component declarations and adding semantics in terms of a 'behavioural layer' to the specification. The behavioural templates also enable the user to specify sequential execution behaviour (so, UML sequence, communication and state machine diagrams can all be input to the SMILE-X tool). TRIGGERS Figure 2. SMILE-X architecture Fundamentally, SMILE-X is designed to be a model interchange format. The interchange is one way - from the native (source) models (e.g. UML) to SMILE-X models – because we focus only on detecting incompatibilities and not the integration 102 SMILE-X not only allows for behaviour descriptions to be attached to instances of types (objects) but also to types, in which case all instances of any such types will behave the same and according to the provided specification. The users of SMILE-X can also specify whether a particular behaviour description specified on a type is also to be applied to any types that are descended from that type (the so called 'loose mode') or just that particular type ('strict mode'). In the case of loose mode, if a descendant type has its own specific behaviour description attached to it, then that description overrides the inherited behaviour from the super-type. • this case study, the infrared beam component is not modelled but it is assumed that the door will close after a fixed period of time after it has been opened. When a lift has no requests, it remains at its current floor with its door closed. 4.2 Uses cases There are two main use cases. ‘Calling a lift’ use case describes the following scenario: (1) Passenger presses floor button; (2) Lift system detects floor button pressed; (3) Lift moves to the floor; (4) Lift doors open. ‘Travelling in a lift’ use case consists of the following sequence of events: (1) Passenger gets in and presses a lift button; (2) Lift system detects lift button pressed; (3) Lift closes the doors if they are open; (4) Lift travels to the required floor; (5) Lift doors open; (6) Passenger gets out; (7) Lift doors close. 4.3 Class diagram The system class diagram is presented in Figure 5. The Controller class represents the lift system and there is a single instance of this class in the target software system. This class controls directly one or more Lift objects and two or more Floor objects. The Floor object has one or two FloorButton objects as explain in the introduction to the chapter. The floor also has one or more Doors (depending on the number of lifts in the building. Each Lift object has two or more LiftButton objects. The FloorButton and LiftButton classes share common features embodied in their Button superclass. Figure 3. The state machine behavioural template (UML class diagram) SMILE-X also supports concurrency - i.e. multiple threads of execution or multiple devices. The state machine template adopted is well known, describing a set of modelling artefacts sufficient to describe the behaviour of a model representing a reactive software system. It is depicted in the form a UML class diagram in Figure 4, showing key classes and the relationships between them. The Element class is a SMILE-S structural meta-model class with which the described behaviour is associated. 4.4 Sequence and state machine diagrams The system has two sequence diagrams which correspond to the two use cases explained above. Each class also has a separate state machine diagram (apart from FloorButton and LiftButton which inherit their behaviour from the Button class). Due to space limitations these diagrams are not presented here but some are illustrated in the section on Compatibility Checking (below). 4. EXAMPLE 4.1 Case Study This case study describes a real-time software application which is installed to control M lifts in a building with N floors. The problem concerns the logic required to move lifts between floors according to the following constraints: • • • Each lift has a set of N buttons, one for each floor. These illuminate when pressed and cause the lift to visit the corresponding floor. The illumination is cancelled when the lift visits the corresponding floor Each floor has one (top and bottom floors) or two (all other floors - to indicate the intended direction of travel: up or down) floor buttons to request the lift to come to the floor. The button illuminates when pressed. The illumination is cancelled when a lift visits the floor. Figure 4. Lift system (UML class diagram) 4.5 The development process Upon the arrival of the lift to any floor, the door opens and remains open for a fixed period of time after the infrared beam has last been cut by people or objects moving in and out of the lift. After the expiry of that fixed period of time, the door automatically closes. In It is assumed that the behaviour of each of the five main classes – Lift, Button, Floor, Door, and Controller – is specified by a different team, using different UML tools which use different UML versions. This is an attempt to replicate a real- 103 the expiry of a door timer instructing an open door to automatically close after a fixed period of time (for example, 5 seconds after the door infrared beam was last cut indicating that no more people or objects are going in our out of the lift). world software lifecycle where the development is distributed and the tools and platforms are potentially heterogeneous. 4.6 Compatibility checking As explained earlier, compatibility checking of behaviours in models in SMILE-X is performed through simulation, which has three key concerns. First, we have to ensure that we have selected relevant key characteristics and behaviours, and that the source information acquired is valid. The second concern is the use of simplifying approximations and assumptions typically used in the process of abstraction. Finally, we must have high level of confidence in the simulation outcomes in terms of their trustworthiness and validity. Door is an uncomplicated class (as shown in Figure 5). Its public property Closed indicates whether the door is open or closed. The Controller’s state machine coordinates the operation of the Lift objects and the Floor objects (which in turn control the opening and closing of the doors) by sending appropriate messages (such as MOVE, HALT to a Lift object, and LIFT_ARRIVED to a Floor object). If a MOVE message was sent to the Lift while the door was still open (i.e. before the AUTO_CLOSE event occurred), then this would represent a hazardous scenario. Our approach is that we deliberately assume that the first two conditions stated above are satisfied. By focusing solely on the third and by identifying incompatibilities, we are able to demonstrate that both the system characteristics used in behaviour descriptions, as well as the approximations used in modelling, are inaccurate and need readjustment. The criterion we used in detecting behavioural incompatibilities was incorrect and/or unpredictable behaviour. From a state machine perspective, this means that: • • • all states can be (have been) reached during the simulation run all state combinations are valid (i.e. invalid state combinations do not occur) • all events have been used ('fired') at least once • all actions are performed successfully • • • Figure 5. Explicit compound state The purpose of these definitions is to ensure that a particular set of components within the system are not concurrently in states which are mutually prohibiting. As an example extracted from the case study, we declare an explicit compound state (Figure 6) to ensure that the lift is not moving at the same time that one of its floor doors is open. all guard conditions are satisfied at least once there aren't subsystems that are disconnected from the rest of the system there aren't deadlocks (multiple objects waiting for a resource simultaneously and thus preventing a state change) relevant properties hold true In our tool, many operations are done automatically at first (e.g. model parsing), but some manual assistance is often needed (e.g. mapping of elements to state machines, or definition of element dependencies with respect to model behaviour) where an interactive process is employed. Next, we describe each type of behaviour incompatibility detected, providing an example related to the case study above. Figure 6. Door state machine The definition of the compound state DoorsLeftOpen (Figure 6) enables the detection of such scenarios. An addition of a simple guard condition in the Controller state machine would, for example, rectify this design fault. 4.7 Invalid state combinations We define the compound state as the union of the current state of one element and the current states of all its sub-elements within the structural component hierarchy. This is not to be confused with the UML definition of a composite state which is different and more complex. We also define the concept of an explicit compound state as a union of current states of an arbitrary (user-defined) set of elements. 4.8 Unused events Another type of analysis which reveals behavioural incompatibilities is the search for unused events. This is achieved by analysing the trace obtained from a simulation run, either by manually inspecting it or by applying a search filter. A simplified version of the Door state machine (without guard conditions or actions) is illustrated in Figure 7. The AUTO_CLOSE event is an internal timed event generated upon For example, two state machines are shown in Figure 8 and Figure 9 (Floor and Controller). When a passenger wishes to call 104 error in design. Whatever the case, the occurrence of such circumstances requires design modification. We have just demonstrated how an event (LIFT_REQUEST) may never be used. The Controller state machine in Figure 9 was purposely simplified. In particular, the Active state can be made more specific by introducing substates (e.g. NoRequests and RequestsPending, as in Figure 10). Assuming a successful power up, Controller is in Active state and NoRequests substate. In this scenario, a LIFT_REQUEST event is processed in the context of the current substate. Assuming that any guards are satisfied and any declared actions performed, a QUEUE event is generated (internally to the Controller subsystem) and the substate changes to RequestsPending. However, because a LIFT_REQUEST event never arrives, as explained in the previous section, the RequestPending substate can never be reached. After a simulation run under these circumstances, the SMILE-X tool can also report on all unreachable states. a lift to the floor they are on, they press the floor button. This is an external user event, which can be specified in SMILE-X language and subsequently included in the simulation. This event is received by the Button object and converted into a BUTTON_PRESS event sent to the Floor object (Figure 8). The event is processed if Floor is in the Vacant state. Ignoring guard conditions and actions (not shown in the diagram) as they are irrelevant to this scenario, the Floor object eventually moves to the Waiting state and sends the SERVICE_REQUEST to the Controller object. 4.10 Disconnected subsystems In the absence of sequence (or communication) diagrams, the state automatons are initially disconnected and there is no communication between various subsystems. SMILE-X provides a means of mapping events generated by an element to the intended target element. For example, the Floor objects generate DOOR_OPEN, DOOR_CLOSE and SERVICE_REQUEST events for its environment (Figure 8). The first two are intended for the Door objects, while the third is for the Controller object. Figure 11 is a screenshot from the SMILE-X tool showing how this mapping is done in a straightforward fashion. Figure 7. The Floor state machine Figure 8. The Controller state machine This request, however, never gets serviced. Quick scrutiny of the Controller state machine reveals that a LIFT_REQUEST event, rather than SERVICE_REQUEST, is expected from the Floor object. Hence, the LIFT_REQUEST event is never used and as a consequence the lift never arrives at the floor which issued the request. The ability to detect and report unused events is not a direct language capability of SMILE-X, but that of the SMILE-X simulation tool. Nevertheless, it is the SMILE-X language metamodel which indirectly enables the tool to expose unused events. Figure 9. Controller state machine with substates 4.9 Unreachable states Figure 10. Event mapping in SMILE-X The tool can also detect unreachable states which may sometimes prove to be simply superfluous or otherwise signify an 105 We have identified deficiencies and fragmentation in the current approaches and have provided an initial framework that checks for incompatibilities both at the structural as well as the behavioural level. Here, we have focused on the latter and described how we have identified some of the more important incompatibilities in behavioural descriptions of models. We directed our attention to a specific system type - the reactive system - which represents a large group of complex, large-scale software systems today. Behaviour can be described in various ways. UML defines several different kinds of behaviour diagrams, two of which are most commonly used: the sequence diagrams and the state machine diagrams. In this paper, we have focused on exactly these types of behavioural descriptions. 4.11 Out of sequence messages Sequence diagrams show how objects (or, processes) operate with one another and in what order. In SMILE-X, sequence diagrams are primarily used in order to extract the dependency information between various objects in the system in order to create a 'source object  event  target object' map that identifies which elements generate which events as well as which elements are recipients of these events. Once extracted from the input models, the sequence (order) of messages (events) exchanged between the elements is recorded internally in the SMILE-X notation. Manual adjustments of timing parameters (service times) of the elements' actions, as well as the priority of events, is part of the normal analysis and the refinement process that occurs between simulation runs. In particular scenarios (using a particular set of simulation parameters), the sequence of messages exchanged may become different to the intended one. SMILE-X can detect situations like this. Moreover, SMILE-X notation allows for the event sequences to be specified manually, by the tool user. It is not necessary that all events are described just the key ones. This enables the designers to ensure that particular events appear in order. For example, we may want to make certain that the DOOR_OPEN event always occurs after the LIFT_ARRIVED event. We have extended our existing SMILE platform to include SMILE-X - the behavioural component. SMILE-X builds on the previous structural component (SMILE-S) by providing a mechanism to add semantics to the existing structural elements. SMILE-X comprises a language and a tool. The language champions interchange between differing model descriptions of system behaviour using the standard XML format. SMILE-X models are not compiled but simulated. Our tool provides a way of loading the input models, refining and extending behavioural descriptions in SMILE-X format, specifying various simulation parameters, and it also includes an execution engine which enables user to run simulations. 4.12 Deadlocks Deadlock is a condition when two or more software objects (processes) are waiting for each other to release a resource, or are waiting for resources in a circular fashion. Typically, deadlocks are a widespread problem in multiprocessing where multiple processes share a specific type of mutually exclusive resource known as a lock or soft lock. Deadlocks can be identified in state machines by an object which cannot leave a particular state even though it is receiving events that should cause a transition. This typically happens if a guard condition does not hold true either every time that event arrives or for a long successive number of event arrivals. SMILE-X can detect and report on these situations. We have managed to identify seven different generic behavioural incompatibilities related to the state machine descriptions as well as sequential communication between components of the system. We have used a case study which models system components in different, vendor-specific, versions of UML. In this proof-of-concept study, we transformed the input models into the SMILE-X interchange format, as well manipulated, refined and glued these models together in order to perform meaningful simulations. There are a number of interesting directions in which to go next. We feel that one of them would be to - through the observation of particular behavioural attributes such as events or state - identify general purpose patterns to automatically detect some of the behavioural incompatibilities. Another opportunity is to see if causal paths can be uncovered. These would ideally (in the state machine scenario) include paths that lead to same states and events, as well as the ability to show alternative paths between states (if such paths exist). 4.13 Properties that do not hold Often, we would like to reason about invariants in the context of state transitions in the form of a guard condition which holds true throughout the entire transition. The evaluations of such guards are performed at the following points in time: (i) at the arrival of the event; (ii) after each transition action is executed; (iii) just before the change of state. 6. ACKNOWLEDGMENTS SMILE-X can monitor these situations, and detect and report if those kinds of guards fail to hold. This work was undertaken at SSEI (Software Systems Engineering Initiative), an MOD (Ministry of Defence) funded strategic initiative intended to enhance through life capability management for software intensive defence systems. 5. CONCLUSIONS AND FUTURE WORK We believe that model integration at the architectural level is of great importance. It provides a way of detecting and resolving issues that would not otherwise become apparent until very late in the project - during the system integration phase. This approach may not only reduce the typical risks associated with integration of systems whose components are developed in a distributed fashion, but can also substantially reduce the development costs. 7. REFERENCES 1. openArchitectureWare. 2008. http://www.openarchitectureware.org/. 2. ATLAS Group. Atlas Model Weaver (AMW). INRIA, Eclipse, 2008. http://www.eclipse.org/gmt/amw/. 106 3. Balasubramanian, K., Schmidt, D.C., Molnár, Z., and Lédeczi, Á. System Integration using Model-Driven Engineering. In P.F. Tiako, Designing Software-Intensive Systems: Methods and Principles. IGI Global, 2009, 474-504. 18. Letkeman, K. Comparing and merging UML models. IBM Rational Software Architect, IBM Developerworks, (2005). 19. Melnik, S., Rahm, E., and Bernstein, P. Rondo: A Programming Platform for Generic Model Management. SIGMOD, ACM, (2003). 4. Basu, A., Bozga, M., and Sifakis, J. Modeling Heterogeneous Real-time Components in BIP. Fourth IEEE International Conference on Software Engineering and Formal Methods (SEFM'06), (2006), 3-12. 20. Microsoft. .NET Framework 3.5. 2008. http://msdn.microsoft.com/en-us/library/w0x726c2.aspx. 21. Miller, J. and Mukerji, J. Model Driven Architecture (MDA). 2001. 5. Batini, C., Lenzerini, M., and Navathe, S. A Comparative Analysis of Methodologies for Database Schema Integration. ACM Computing Surveys 18, 4 (1986), 323-364. 22. OMG. Model Driven Architecture ( MDA ). Architecture, 2001, 1-31. 6. Chen, D. and Doumeingts, G. European initiatives to develop interoperability of enterprise applications—basic concepts, framework and roadmap. Annual Reviews in Control 27, 2 (2003), 153-162. 23. OMG. Interactive Objects and Project Technology, MOF Query/Views/Transformations. 2003. 24. OMG. MOF QVT Final Adopted Specification. 2005. 7. Czarnecki, K. and Helsen, S. Feature-based survey of model transformation approaches. IBM Systems Journal 45, 3 (2006), 621-645. 25. OMG. Common Object Request Broker Architecture. Management, 2006. 8. Davis, J. GME : The Generic Modeling Environment. OOPSLA '03: Companion of the 18th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, ACM (2003), 82-83. 26. OMG. MOF 2.0/XMI Mapping, Version 2.1.1. 2006. 27. OMG. Meta Object Facility (MOF) Core Specification (version 2.0). Management, 2006. 9. Eclipse. VIATRA. 2008. http://dev.eclipse.org/viewcvs/indextech.cgi/gmthome/subprojects/VIATRA2/index.html. 28. Pottinger, R. and Bernstein, P. Merging Models Based on Given Correspondences (Technical report). 2003. 29. Vangheluwe, H. and Lara, J.D. An Introduction to MultiParadigm Modelling and Simulation. AI, Simulation and Planning in High Autonomy Systems (AIS 2002), Society for Modeling and Simulation International (SCS) (2002), 163-169. 10. Eclipse. Eclipse Modeling Project. 2010. http://www.eclipse.org/modeling/. 11. Hardebolle, C. and Boulanger, F. Exploring Multi-Paradigm Modeling Techniques. Simulation 85, 11-12 (2009), 688-708. 30. Vernadat, F. Enterprise Modeling and Integration: Principles and Applications. Springer, 1996. 12. Harel, D. and Rumpe, B. Meaningful Modeling: What’s the Semantics of "Semantics"? Computer 37, 10 (2004), 64-72. 31. Wilson, R.J. Introduction to Graph Theory. Longman, 1985. 13. Jouault, F., Piers, W., and Wagelaar, D. ATLAS Transformation Language. Eclipse, 2008. http://www.eclipse.org/m2m/atl/. 32. World Wide Web Consortium (W3C). XML Schema 1.1. 2006. http://www.w3.org/XML/Schema. 14. Kolovos, D., Paige, R., Rose, L., and Polack, F. The Epsilon Book. Structure, 2010, 178. 33. de Lara, J., Levendovszky, T., and Mosterman, P.J. Guest Editorial: Special Issue on Multi-paradigm Modeling. Simulation 85, 11-12 (2009), 685-687. 15. Kong, C. and Alexander, P. The Rosetta meta-model framework. 10th IEEE International Conference and Workshop on the Engineering of Computer-Based Systems, 2003. Proceedings., (2003), 133-140. 16. Lara, J.D. and Vangheluwe, H. AToM 3 : A Tool for Multiformalism and Meta-modelling. Fundamental Approaches to Software Engineering, Springerlink (2002), 174-188. 17. Lara, J.D., Vangheluwe, H., Posse, E., Indrani, A.V., Provost, M., and WeiBin Liang. AToM3. 2010. http://atom3.cs.mcgill.ca/index_html. 107

RELATED PAPERS

RELATED TOPICS

Log In

Domain-specific templates for refinement transformations

Domain-specific templates for refinement transformations

Related Papers

RELATED PAPERS

RELATED TOPICS