The Next-Generation Logic Orchestration Engine NopTaskFlow Built from Scratch

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5168

    #1

    The Next-Generation Logic Orchestration Engine NopTaskFlow Built from Scratch

    With the popularity of low-code concepts and products, many are considering introducing the idea of logic orchestration into their projects—offloading logic traditionally produced via hand-crafted hard coding to a logic orchestration engine that can be flexibly configured. In this article, I will introduce the design philosophy of the logic orchestration engine NopTaskFlow in the Nop platform and analyze the mathematical inevitability of its design. At the end, I will explain why NopTaskFlow is a next-generation logic orchestration engine and what typical characteristics this so-called next generation possesses.


    I. What Exactly Does Logic Orchestration Orchestrate?

    When we program using traditional programming languages and frameworks, we are essentially following certain constraint specifications defined by the language, which can be seen as a kind of best practice. However, when we start from scratch to write a very flexible, very low-level logic organization framework, it’s easy to break the previously built-in formal specifications of the language, thereby deviating from the implicit best practice pattern.


    What is the minimal logical unit that can be flexibly organized? The answer in traditional programming languages is now standard: functions. So what essential characteristics do functions have?

    1. Functions have clearly defined inputs and outputs
    2. Functions can be nested calls
    3. Variables used within functions have complex lexical scope


    If we further study the structure of functions, we will find more complex features, such as:

    1. Are function parameters passed by value or by reference? CallByValue? CallByRef? CallByName?
    2. Is support for functional parameters available, i.e., so-called higher-order functions?
    3. Is there an exception handling mechanism independent of the return value?
    4. Is asynchronous return supported?


    Of course, there is the most important aspect: functions are not only the minimal unit for recognizing and organizing logic, but also the minimal unit for abstraction. We can reuse existing functions to define new functions.


    So why do functions become the most fundamental logical organizational unit in programming languages, and when we write a logic orchestration engine now, do we still need to base it on function abstractions? Is there a better abstraction? To clarify this question, we need to understand a bit of history.


    First, we need to be clear that the concept of functions did not originally exist in computer programming languages; the establishment of the function concept was no trivial matter.


    ==== The following is created by Zhipu Qingyan AI=====


    Early Programming Languages (1950s–1960s):
    • Assembly Language: In assembly languages, the concept of functions is not apparent; programmers typically use jump instructions to execute code blocks.
    • Fortran: Released in 1955, Fortran introduced the concept of subroutines, which can be regarded as an early form of functions. However, subroutines in Fortran do not support return values.


    Rise of High-Level Programming Languages (1960s–1970s):
    • ALGOL 60: Released in 1960, ALGOL 60 introduced modern function concepts with support for return values and proposed block structure (local variable scope), which is an important milestone in programming language development.
    • Lisp: Developed in 1958, Lisp treats functions as first-class citizens, meaning functions can be passed, stored, and returned as data—core features of functional programming languages.


    Structured Programming (1970s):
    • The concept of structured programming was first proposed by Edsger W. Dijkstra in his 1968 paper “Go To Statement Considered Harmful,” in which he advocated restricting or eliminating goto statements to improve program structure. The core idea of structured programming is to decompose programs into modular parts and use structures such as sequence, selection (if-then-else), and loops (while, for) to control the flow of programs.
    • C Language: Released in 1972, C was heavily influenced by Algol 68; its function definitions are concise and support recursive calls. It was among the first high-level languages that natively supported structured programming concepts. The popularity of C greatly promoted the structured programming paradigm.


    ==== End of Zhipu Qingyan's creation=====


    The 1980s were dominated by object-oriented programming; the status of functions declined and became subordinate to objects. In Java, we cannot even define functions independently outside of classes. After 2000, functional programming gradually revived, promoting the popularity of immutability and so-called pure functions with no side effects. With the rise of multi-core parallel programming, distributed messaging systems, and big data processing systems, the concept of functions has continued to expand and deepen; modern programming languages now generally include async/await mechanisms as standard.


    Let’s analyze the implicit assumptions brought by the concept of functions.


    First, functions are an inevitable result of information hiding. Information hiding necessarily leads to the world being divided into inside and outside. If the internal small environment can exist independently of the outside (meaning the same function can be called in different external environments without the function needing to perceive the environmental changes), then the association between inside and outside must be restricted to occur only at the boundary. Information obtained by the inside from the outside is called Input, and information obtained by the outside from the inside is called Output. The dimensionality of the boundary is generally much smaller than the dimensionality of the overall system structure (analogous to the boundary of a three-dimensional sphere being a two-dimensional spherical surface), which allows functions to reduce external complexity.

    • If global variables are always read and written inside a function, then we are actually using the procedure abstraction rather than the function abstraction.
    • Service-ization is equivalent to agreeing that both Input and Output are serializable value objects.


    Second, functions automatically introduce the following causal ordering:

    1. Evaluate expressions to obtain the function’s Input parameters
    2. Execute the function
    3. Receive the function’s Output


    Before calling the function, the values of Input will be determined, and Output will only be produced after the function executes successfully. If a function call fails, we won’t receive output variables at all. In particular, if a function has multiple outputs, we always either get all Outputs or none—there is no situation where only part of the Outputs are observed.


    Some logic orchestration frameworks now expose certain intermediate results during step execution as Output Endpoints—for example, exposing the loop index during iteration as an Endpoint and continuously outputting such temporary Output during the loop. This approach deviates from the function abstraction.


    Third, functions have independent variable scopes (namespaces). No matter what a variable is named outside the function, once it is passed as a parameter into the function, we will always use the local input variable name inside the function to refer to it. Meanwhile, temporary variables used inside the function will not be observed externally. Named entities are the mother of all things. Any large-scale, systematic reuse demands avoidance of name conflicts; local names must be used.


    Fourth, when functions are composed, information transfer is achieved indirectly through the current scope. For example:






    output = f(g(input))
    // Actual equivalent to
    output1 = g(input)
    output = f(output1)







    A variable must exist within a scope. When function g returns, its internal scope is conceptually destroyed, and before executing function f, its internal scope does not yet exist. Therefore, g’s return value must first be stored in an outer scope and then forwarded to function f.


    Fifth, functions imply paired bidirectional information flow. We all know that goto is harmful because goto is often one-way with no return; only heaven knows when it will goback and where it will goback to. However, a function is a highly disciplined and extremely predictable organization of information flow. Input passes information into the function, and Output will definitely return—and will return at the original call site (that is, in a mathematical sense, goto and goback are strictly paired). In traditional programming languages, function calls are synchronous and have blocking semantics; they automatically block the current execution instruction flow (equivalent to the timeline of the program world). For asynchronous calls, returning from the function does not mean Output is available, so callback functions have to be used to handle logical dependencies, resulting in the so-called nested callback hell. The async/await syntax in modern programming languages essentially adds blocking semantics to asynchronous functions, so that the order of function calls in source code can still be regarded as the order of timeline evolution.


    In distributed architectures, the most flexible organization method is undoubtedly event sending and listening. Essentially, it is one-way information transmission, similar in spirit to goto. It is flexible, but what is the cost?


    The Wisdom of Pioneers Lost in History

    As the saying goes, the only lesson humans learn from history is that humans cannot learn any lessons from history. Twenty years is a generation, and the next generation facing new problems will forget the wisdom of their predecessors; everything still starts with gut feeling.


    Goto is bad; structured programming is good. This is the idea we are instilled with from the beginning of learning programming—but why? If asked in an interview, I believe many programmers can talk at length, even citing Dijkstra’s classic paper “Go To Statement Considered Harmful” (this is also Dijkstra’s most famous paper). But how many people have actually seriously read this paper? Frankly, I knew of it early on but never read it carefully. Until recently, when I did read it, I discovered that Dijkstra’s reasons for opposing goto are fundamentally different from what we had filled in ourselves; we had long forgotten Dijkstra’s wisdom cleanly.


    At the beginning of the paper, Dijkstra states that he had long observed that the goto statement would harm software quality, but only recently did he find a reason that could explain why in scientific terms.


    The unbridled use of the go to statement has an immediate consequence that it becomes terribly hard to find a

    meaningful set of coordinates in which to describe the process progress.


    a programmer independent coordinate system can be maintained to describe the process in a helpful and manageable way.


    The core idea of Dijkstra’s paper is that if we write code according to the ideas of structured programming, our code text will form an objectively existing coordinate system (independent of the Programmer). With this coordinate system, we can intuitively establish a correspondence between static programs (expanded in the text space) and dynamic processes (unfolded in time) in our minds, whereas goto destroys such a naturally existing coordinate system.


    ==== The following is KimiChat AI’s summary===


    The coordinates of this so-called coordinate system refer to a set of values used to uniquely determine the program’s execution state.

    1. Textual Index. When a program consists only of a series of concatenated instructions, a “textual index” can be determined by pointing to a position between two consecutive action descriptions. This index can be regarded as a position in the program text, pointing to a specific statement in the program. Without control structures (such as conditional statements, loops, etc.), the textual index is sufficient to describe the program’s execution progress.
    2. Dynamic Index. When loops (such as while or repeat statements) are introduced into the program, a single textual index is no longer sufficient. Loops may cause the program to repeatedly execute the same code block, so additional information is needed to track the current loop’s iteration count, which is the so-called “dynamic index.” The dynamic index is a counter recording the number of iterations currently in the loop.
    3. Description of Program Execution. The program’s execution state depends not only on the position in the program text (textual index), but also on the program’s dynamic depth, i.e., the current level of nested calls. Therefore, the program’s execution state can be uniquely determined through a combination of a series of textual and dynamic indices.


    == End of KimiChat AI’s creation===


    Reversible Computation theory can be regarded as a further deepening of Dijkstra’s coordinate system idea. Using Domain Specific Languages (DSL), we can establish domain-specific coordinate systems (not just a general, objectively existing coordinate system), and not only for understanding—we can go further and define reversible Delta operations on this coordinate system, truly leveraging this coordinate system to use it in the software construction process.


    The design of NopTaskFlow follows the general design of the Nop platform’s XDSL. Each step/input/output has a name attribute as a unique identifier, forming a fully domain-coordinate-based logical description. We can use the x:extends operator to inherit such a description, and then use the Delta mechanism to customize modifications.


    It should be noted that NopTaskFlow assumes that Input and Output are single deterministic values, rather than a Flow object that can continuously produce items. Flows are more complex and conflict with the analysis of the function concept in the previous section. In the Nop platform’s planning, modeling for flows will be accomplished via the NopStream framework.


    From the perspective of coordinate systems, we can consider that the flow system introduces a special assumption: the spatial coordinates are frozen, while the time coordinates are flowing (spatial coordinates determine the topology of the flow system). This special assumption brings a special simplification; therefore, it also deserves a dedicated framework to fully exploit the value of this assumption.


    II. The Minimal Logical Organizational Unit: TaskStep

    NopTaskFlow’s design goal is to provide a structured logic decomposition scheme that supports Delta operations. Sticking closely to the function concept in programming languages is undoubtedly the most worry-free choice, and if high-performance compiled execution is needed in the future, it’s also easier to translate orchestration logic into ordinary function implementation code.


    The minimal logical organizational unit in NopTaskFlow is the so-called TaskStep, whose execution logic is shown below:









    for each inputModel
    inputs[inputModel.name] = inputModel.source.evaluate(parentScope)

    outputs = await step.execute(inputs);

    for each outputModel
    parentScope[outputModel.exportAs] = outputs[outputModel.name]







    This is conceptually very similar to function calls in general programming languages:






    var { a: aName, b: bName} = await fn( {x: exprInput1, y: exprInput1} )







    Let’s look at a concrete example of sequential calls:






    name="parentStep">
    name="a"/>

    name="step1">

    name="a"/>


    return a + 1


    name="a2">
    a*2



    name="step2" libName="test.MyTaskLib" stepName="myStep">

    name="a">
    RESULT + 1


    name="b" exportAs="b2"/>


    name="b2"/>


Working...