In recent years, there have been a number of discussions around the subject of orchestration as a key enabler for different Cloud technologies.
The ETSI NFV Management and Network Orchestration (MANO) working group is defining the main interfaces for resource orchestration, a fundamental layer in management.
It is important to define standard interfaces, but equally important is to understand the main capabilities for an orchestration (or choreography) solution. We can gain some more insight by revisiting previous work, particularly in the domain of Grid computing.
Personally, I found the work done by Ian Foster and Steven Tuecke around IT as a Service (back in 2005, 9 years ago!), still extremely relevant. It is fascinating to see how applicable this work continues to be, apart perhaps from the replacement of general SOA services by REST services in particular. We should pay special attention to their definition of Grid Infrastructure: "enable the horizontal integration across diverse physical resources". I see their work applicable beyond the physical layer, to logical resources and their composition into services. Quoting the paper, the Grid Infrastructure's capabilities should be:
Two questions that come to mind: (1) how have requirements changed in these 9 years? (2) how (if at all) should we update these definitions to reflect the advances in infrastructure? It is clear that the new Cloud/NFV scenarios require increased scalability where targets are somewhat obscure with an increasing diversity of resources and services that have more complex relationships (virtualization, composition and interactions with legacy infrastructure). The new infrastructure needs to respond to the state changes of resources much faster, to fulfill more stringent SLA's in a more scalable and diverse environment, thus creating new challenges for the assurance applications (network, application, service assurance.)
In recent years the industry has focused extensively on the provisioning capability, pushed by the need for automation and thanks to technology advancements in Openstack, network controllers, and "DevOps" tools such as Chef, Puppet, Cloudify, etc. However, to address the challenges coming from the new use cases, a more balanced focus on all capabilities mentioned by Foster and Tuecke will be required. In most cases, the key enabler to deliver on all these areas is the use of advanced analytics to allow matching supply and demand for resources, similarly to a "just in time" production model that goes beyond resources to services and business processes.
What do you think? What has changed since Foster and Tuecke's publication?
Special thanks to Gary Berger, Frank Van Lingen and Marco Valente for their reviews to this text.