Internet-Draft | Middle Ware Facilities | July 2023 |
Yuan & Zhou | Expires 8 January 2024 | [Page] |
This draft proposes a method to perceive and process the running status of computing resources by introducing a logical Middle Ware facility, aiming to avoid directly reflecting continuous and dynamic computing resource status in the network domain, match service requirements and instance conditions, and ultimately achieve computing aware traffic engineering and be applicable to various possible scheduling strategies.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 8 January 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
With computing resources continuously migrating to edges, services residing distributedly turn to be delivered in a dynamic way. More fine-grained scheduling strategies awaring of service SLA requirements and current computing status are urgently required.¶
A framework to fulfill computing status aware traffic steering and services provisioning is illustrated in related works, [I-D.ldbc-cats-framework] for instance. Since a learning procedure to collect the information of network conditions and computing status is the premise to properly steer the traffic, a concise and effective learning and processing scheme is required.¶
Unlike the collection of network attributes, a learning procedure of computing status has its unique characteristics, features and objectives which proposes incremental requirements:¶
Currently, the perception and detection of computing resources can be commonly achieved by several schemes partly listed as follows:¶
Thus, this draft proposes a computing resources perception and processing method based on a logical Middle Ware facility to solve the mentioned problems and to satisfy the corresponding requirements.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
According to the requirements of computing status perception analyzed in the previous sections, a framework of metadata collection and processing based on Middle Ware Facilities is proposed.¶
+-------------+ +---------| Middle Ware |--------+ | +-------------+ | | | | | | | Network Attributes Network Attributes+Computing Status | | | | | | | | | +----------+ +-------+ | | Network | |Service| | |Controller| | Agent | | +----------+ +-------+ | | | | | ------- | | ( ) +-------+ | ( ) |Service| | +---( Instances ) | Agent | (---------------) | ( ) +-------+ ( )---+ ----------- | ( Network ) Cloud Site ------- ( )---+ ( ) (---------------) | ( ) +-------------------( Instances ) ( ) ----------- Cloud Site
A Middle Ware proposed here is a logical facility that has the knowledge of the computing status and network conditions, and thus the ability to process them. Considering the specific physical implementation, Middle Wares can be mapped to multiple physical entities or combinations of them. The involving entities may include a network controller, a superior orchestrator, a distributed database, distributed devices, an introduced application monitoring system, constructed service agents, etc. Logical modules of a Middle Ware are organized and defined as follows:¶
| | NorthBound | | | Interface | +-------------------------------------------------------------------------+ | | | | +--------------+ Middle Ware +--------------+ | | | Service | | Scheduling | | | | Registration |---------------------| Strategy | | | | & Management | | Configuration| | | +--------------+ +--------------+ | | | +---------------+ | | | +----------| ORChestration |----------+ | | +---------------+ | | | Other Modules: | | +-----------+ OAM,AI,... | | | Network & | | | +--------------------| Computing |--------------------+ | | | +---| Status |---+ | | | | | | DataBase | | | | | | | +-----------+ | | | | | | | | | | +---------------+ +-----------+ +-----------+ +---------------+ | | | Network | | Network | | Computing | | Computing | | | | Configuration | | Status | | Status | | Configuration | | | | & Control | | Collector | | Collector | | & Control | | | | | | | | | | | | | | +---------+ | |+---------+| |+---------+| | +---------+ | | | | | Protocol| | || Protocol|| || Protocol|| | | Protocol| | | | | | Service | | || Service || || Service || | | Service | | | | | +---------+ | |+---------+| |+---------+| | +---------+ | | | +---------------+ +-----------+ +-----------+ +---------------+ | | | | | | | +-------------------------------------------------------------------------+ | | SouthBound | | (--------------------) Interface (--------------------) ( ) ( ) ( ) ( ) ( Network )-------------( Service ) ( Domain )-------------( Domain ) ( ) ( ) ( ) ( ) (--------------------) (--------------------)
The logical modules and components are designed with the following respective functions and abilities:¶
With the functions defined, the workflow in the control plane to fulfill computing aware traffic engineering and service routing is described as follows:¶
Referring to [I-D.ldbc-cats-framework] and [I-D.yao-cats-ps-usecases], incremental requirements are proposed cats framework according to this draft:¶
NSC and NCC mentioned before are relatively similar or identical to the current subfunctions of a network controller, and thus will not be further discussed in this draft while the detailed design of the functions with SRM, SSC, NCSDB and ORC are illustrated as Part 1 to 3 in the following sections.¶
Service clients propose service requests and get responses including corresponding service identifications issued by the administration plane. For instance, a Service ID to represent a globally unique service semantic identification is defined in [I-D.ma-intarea-identification-header-of-san]. With the issued Service IDs, the information of constraints and sensitive attributes should be considered to generate corresponding modelling and evaluation methods for each service represented by a Service ID. The generation patterns of the modeling methods include but are not limited to:¶
The metadata of network and computing status can be concluded as following typical scheduling attributes:¶
According to the mentioned scheduling attributes, typical scheduling strategies performed can be concluded as:¶
Based on specified scheduling strategies, corresponding evaluation methods are determined. With the metadata calculated through specific functions, a most appropriate instance or all satisfied instances can be identified. Then, a preferred or balanced strategy can be performed which select a single entry or a set of entries to distribute.¶
+----------------+------------------+------------------+-----+ | | Service ID1 | Service ID2 | ... | +----------------+------------------+------------------+-----+ |End-to-end Delay| <50ms | <100ms | | +----------------+------------------+------------------+-----+ | Jitter | | <15ms | | +----------------+------------------+------------------+-----+ | Loss | <0.1% | | | +----------------+------------------+------------------+-----+ | ...... | | | | +----------------+------------------+------------------+-----+ | CPU Cores | | >6C | | +----------------+------------------+------------------+-----+ | Load | <80% | | | +----------------+------------------+------------------+-----+ | ...... | | | | +----------------+------------------+------------------+-----+ | | Resource first | Experience first | | | Metric= | | | | | Function() | Function1(Delay, | Function2(Delay, | | | | Loss,Load) | Jitter,CPU) | | +----------------+------------------+------------------+-----+
As shown above, a typical evaluation and modelling method is displayed and a function to calculate a metric value can be defined as follows. A to F are preliminary functions to process metadata while Function1() and Function2() are evaluation functions.¶
A(Delay) B(Loss) C(Load) ^ ^ ^ | | | MAX| +---- MAX| +---- MAX+ +---- | | | | | / | | | | | / MIN+-----+ MIN+-----+ MIN|----+ | | | +-------------> +-------------> +-------------> 50 Delay 0.1% Loss 40% 80% Load MAX,if max{A(Delay),B(Loss)}=MAX, Function1(Delay,Loss,Load)={ C(Load),others. D(Delay) E(Jitter) F(Cores) ^ ^ ^ | | | MAX| +---- MAX| +---- MAX+----+ | / | / | \ | / | / | \ MIN+----+ MIN+----+ MIN| +---- | | | +-------------> +-------------> +-------------> 20 100 Delay 5 15 Jitter 6 12 Cores MAX,if max{D(Delay),E(Jitter),F(Cores)}=MAX, Function2(Delay,Jitter,CPU)={ Average[D(Delay),E(Jitter),F(Cores)],others.
The design of functions also correlate with the semantics of the calculated metric value. As indicated above, if any requirement registered with the services is not satisfied, the end-to-end delay reaches 100ms in Function2() for instance, the overall function value reaches MAX which indicates that the corresponding entry fails to satisfy the service SLA represented by Service ID2. Also, a smaller metric value represents the better performance. Therefore, according to a simple metric, the performance of instances can be easily displayed.¶
Based on a set of overall subscribed services and the configured respective sensitive attributes of each service in the set, a set of attributes that require status updates collection is summarized. CSC then queries or subscribes to the service agents responsible for meta information collection at each cloud sites.¶
Due to the varying sensitivity and tolerance of different services to changes in computing status, as well as the differentiated priorities among various services, their requirements for metadata collection and update frequency differ from one another. The frequency of collecting a type of meta information should be greater than the maximum among the overall requirements.¶
With the metadata collected by CSC, the information is further organized and stored in NCSDB. A distributed database is introduced here as a sample physical entity which fulfills the functions of a corresponding logical module. A distributed database has the advantages of advanced performance, high availability and simple extensibility. It is highly partitionable and allows horizontal scaling which satisfies the practical scenarios of large scale of service instances. Also, both keys and values can be anything from simple objects to complex compound objects, and thus heterogeneous computing resources can be described and stored.¶
As shown below, the status of computing resources is modeled as a collection of key-value pairs.¶
(------) --- --- ( +------------+ ) (| Instance 1 |) +---------+ (+------------+) | PE1 |--------( +------------+ ) +---------+ ( | Instance 2 | ) (+------------+) -------------- Cloud Site 1 (------) --- --- (+------------+) ( | Instance 3 | ) ( +------------+ ) +---------+ ( +------------+ ) | PE2 |---------(| Instance 4 |) +---------+ (+------------+) ( +------------+ ) ( | Instance 5 | ) (+------------+) -------------- Cloud Site 2 +----+------------+---------+-----------------------------------+ | ID | Instance | Gateway | Computing Status Index(1-n) | +----+------------+---------+-----------+-----------+-----------+ | 01 | Instance 1 | PE1 | CPU 1 | Memory 1 | O/I 1 | +----+------------+---------+-----------+-----------+-----------+ | 01 | Instance 4 | PE2 | CPU 4 | Memory 4 | O/I 4 | +----+------------+---------+-----------+-----------+-----------+ | 01 | Instance 5 | PE2 | CPU 5 | Memory 5 | O/I 5 | +----+------------+---------+-----------+-----------+-----------+ | 02 | Instance 2 | PE1 | CPU 2 | Memory 2 | O/I 2 | +----+------------+---------+-----------+-----------+-----------+ | 02 | Instance 3 | PE2 | CPU 3 | Memory 3 | O/I 3 | +----+------------+---------+-----------+-----------+-----------+
With the introduction of a distributed database, the data of the computing resources can be stored in hierarchically organized directories. A typical form to obtain interested information is described as below:¶
NCSDB can also enable incremental functions. For instance, a pub-sub scheme and a 'Watch' mechanism can be introduced to fulfill service OAM and service protection.¶
+-------------------------+ | Involved Modules | +-------------------------+ +-------------------------+ +-----------------------+ |+-------------+ | | +-----------+| ||Network | | | | Computing || ||Configuration| | | | Status || ||& Control |+--------+| |+--------+| Collector || || +---------+ ||DB-Agent|| +-----------+ ||DB-Agent||+---------+|| || | Protocol| |+--------+| | Network & | |+--------+|| Protocol||| || | Service | | | | Computing | | || Service ||| || +---------+ | | | Status | | |+---------+|| |+-------------+ | | Database | | +-----------+| +-------------------------+ +-----------+ +-----------------------+ | | | | | | | Watch | | | | | prefix | | | | |------------>| | | | | | | | | | |<-------------| | | | | Write | | | | | (/Service | | | |<------------| Instance 1/ | | | | Notify | CPU Load 70) | | | | updates | | | | | | | | | | | | | | Notify | | | | | updates | | | | |<-----------| | | | | | | | |
The procedure of learning and processing updated computing resource status is described as follows:¶
The Middle Ware processes the matadata collected from the network domain and multiple cloud sites at ORC which follows the following procedures:¶
End-to-End Delay=Delay1+Delay2+Delay3+Delay4 Delay1 +-----------+ +---------+ +Ingress PE+---------+Egress PE| +-----------+ +----+----+ | | Delay2 | ----+----- ( +-+--+ ) ( | LB | ) ( +-+--+ ) ( |Delay3 ) ( +---+----+ ) ( |Instance| ) ( +--------+ ) ( Delay4 ) -------------- Cloud Site
Service ID1 Instance1 SRv6 Policy1 Metric=15 Service ID1 Instance3 BE Path Metric=30 Service ID1 Instance2 SRv6 Policy2 Metric=10 Service ID2 Instance4 SRv6 Policy3 Metric=25 Service ID2 Instance5 SRv6 Policy4 Metric=20 Service ID2 Instance6 BE Path Metric=30 Control Plane ------------------------------------------------- Forwarding Plane +-------------+-----------+--------------+ | Index | Next Hop | Interface | +-------------+--------------------------- | Service ID1 | Instance2 | SRv6 Policy2 | +-------------+-----------+--------------+ | Service ID2 | Instance5 | SRv6 Policy4 | +-------------+-----------+--------------+
With the forementioned logical functions and modules designed in a Middle Ware, incremental requirements raised by a learning process of computing status can be satisfied:¶
TBA.¶
TBA.¶
TBA.¶