The core of the FBP principles are the holy grail of true componentized function architecture.
Instead of losing yourself in ever more complex syntax convolutions as has happened in a lot of functional programming, you make the components (even long running stateful ones) self contained with ports for data input and output as the sole means of communicating with them, over buffered channels to allow asynchronous computation, and most importantly keep the network definition separate.
Just this idea in itself is just brilliant. Hats off to
Mr Morrison for that!
It allows to decouple complex software into reusable components, without clever FP syntax.
(Though, FP is perfect for implementing the components themselves. It just doesn't really scale all too well for whole program architecture, in my experience).
One point to note: The visual component of many fbp systems is completely optional, as is the idea of using novel DSLs. You can as well define your networks and components in pure code. See GoFlow https://github.com/trustmaster/goflow) and my own little experiment FlowBase (https://flowbase.org) for examples of that, in Go.
I successfully built a rather complex little app to convert from Semantic web RDF format to (semantic) mediawiki XML dump format, in two weeks straight, of linear development time: for each component (of ca 7), implement, test, go to the next component (See: https://github.com/rdfio/rdf2smw)
The same implementation in procedural PHP took months, and still doesn't have all bugs and strange behaviours filed out.
> you make the components self contained with ports for data input and output as the sole means of communicating with them, over buffered channels to allow asynchronous computation
This concept sounds exactly like actor systems like Erlang/OTP and Akka, only with a different set of terminology.
The submitted site and your comment don't mention those anywhere.
Are there appreciable differences between actor systems and FBP?
Actor systems differ in their means of communication. While actor systems use the actors as the main units that you connect, with fire and forget messaging to each actor's inbox, FBP proposes communication between named ports over channels with bounded buffers, very much like the CSP implementation in Go (just that you have long running components as in actor systems -
something that can easily be implemented in Go with structs with go channels in fields for the ports, as shown in the FlowBase library I mentioned in another comment).
The FBP model provides implicit backpressure, because of the bounded buffers on the channels. The actor model on the other hand is more loosely coupled and flexible.
This actually means FBP systems are slightly more optimal for efficient within-one-node, in-memory parallellisation, whereas actor systems shine more for distributed systems.
> The FBP model provides implicit backpressure, because of the bounded buffers on the channels. The actor model on the other hand is more loosely coupled and flexible.
Note that this is why people recommend using GenServer's "call" instead of "cast" in Elixir/Erlang by default - it applies a sort of backpressure because call waits for the process to return a result rather than being fire and forget.
It's not exactly the same thing, or as configurable as Go channels, but it is an option.
What's curious is that CSP started with named ports as well, but evolved to become anonymous.
> Programs in the original CSP were written as a parallel composition of a fixed number of sequential processes communicating with each other strictly through synchronous message-passing. In contrast to later versions of CSP, each process was assigned an explicit name, and the source or destination of a message was defined by specifying the name of the intended sending or receiving process
I wonder what gives? Why this return to an older form?
What you are programming is the graph of how these machines connect and flow data(keypunch cards) through each other, while the machines themselves and their function are abstracted out. Data does not stay "at rest", it's presumed to reach a terminating point where it exits the graph. A buildup of unprocessed data in a machine's inbox results in an overflow.
It's a very useful model for making a debuggable asynchronous system since it imposes static constraints everywhere that you can map to your real hardware resources, versus the emphasis on dynamism seen in the actor model(actors hold private data, modify their state, create new actors - all explicitly disallowed in FBP).
Having studied and implemented FBP systems in the past, one major takeaway I've gleaned is that most automation problems start off as a linear sequence of processes, and so the branching-graph of FBP looks unwieldy and superfluous. But this is deceptive; you probably don't want to have a huge number of branches in the design, but you will want them as an optimization step or a way to combine inputs.
So it's useful to design with FBP in mind but with a linear interface as the entry point.
Another aspect of this is that FBP graphs are static but you may have a need to reconfigure them frequently; that is, you may want to have a graph compilation step drawn from a source language, rather than manually wiring it up.
A way of making that graph compilation more than a syntax is to include a formal constraint solver: Excel, for example, flows the data after determining a solution for how cells relate to each other. The power of the spreadsheet paradigm really lies in these combinations of concepts.
Lastly, there isn't really magic in the algorithmic/implementation aspects of FBP. It grew out of 1960's mainframe types of problems, and so it can be implemented in a low level way with static pools of memory and pieces of assembly code. But it remains conceptually just as relevant to today's massive distributed systems.
Visual aspect of FBP is also brilliant and helps to navigate/comprehend inherently parallel execution flow. The same complexity network expressed in linear text would require more effort to grasp. And it allows you to leverage your brain's visual capacity which is quite powerful - humans are visual creatures (https://www.seyens.com/humans-are-visual-creatures). For example in Excel the update dependency network is completely hidden. The maintenance of complex Excel models is really hard.
For same reasons we start looking at FBP and functional reactive programing to simplify the design and maintenance of complex interactive UIs. End up implementing Kelp (https://kelp.app) with visual FBP editor and reactive framework (https://kefirjs.github.io/kefir/).
Your flowbase.org domain redirects to a different domain. I thought that perhaps you had a typo in the URL or forgot to renew your registration of the domain, but upon a closer look I think it’s probably just a misconfiguration in your webserver config causing it to redirect to another one of your projects.
My guess is that you forgot to put the definition of the domain sans www in your webserver config.
If my guess is correct then this link should work:
FP decouples the AST (symbolic functions with input ports and an output port) from the evaluation context, which might be async, or sync, backpressured, stateful, exceptional, incremental/reactive ... mix-n-match whatever behaviors you want, all for the same abstract AST
I actually didn't want to make this too much of an FP-bashing comment, as I find great use of many FP concepts, daily.
I have yet to see an FP concept for composition though, that is as simple and generic in its implementation, as the FBP principles.
I have sometimes thought FBP networks provide roughly the same function (pun not intended) as a monad, although I never seem to fully grasp what a monad is, so I can't tell for sure :o)
Yes, dataflow with dynamic topologies (self-adjusting computation) forms a monad. If the "flowchart" is merely static, that forms an applicative. https://blog.janestreet.com/breaking-down-frp/ In my opinion, understanding the monad is essential to understanding how to fully separate an AST from its evaluation behavior. That is basically what a monad is, the monad is the computational structure that does that.
I think most FP languages don't decouple AST from evaluation context. E.g. I cannot perform abstract interpretation on a function's AST, nor rewrite a function's AST for incremental computation, nor perform explicit partial evaluation during function composition by composing the ASTs. I only can access opaque `Input -> Output`.
Also, FP algorithms developed for one evaluation context cannot easily be ported to another. They are implicitly entangled with assumptions. For example, I cannot evaluate a lazy algorithm in an eager evaluation context without paying a huge price. Adding back-pressure for list processing where a program was not designed for it would easily result in deadlock. Adding exceptions without adding unwind (or bracket, try/finally, etc.) to the existing program expressions will easily result in buggy code.
I think your assertion might be true for some specific integrations of FP nodes into FBP. Is this what you mean?
Today-era functional languages (so basically haskell and scala) don't do that at the "host PL" level, but you can model it in-band, and modeling embedded PLs is what most of the cool FP research for the last 5 years is all about. Free monad, tagless-final interpreters, etc. You make a DSL with abstract capabilities and then code your application in that. Any reified behaviors end up encoded into the type. I hear Facebook is using algebraic effects to model and track data access.
FP is relatively convenient for metaprogramming, yes. GADTs and tagless-final encodings (or as I think of them, Church-encoded ASTs) are especially useful for this.
But it wouldn't be a big difference if the host language was heavily imperative, so long as it favors immutable data structures. The FP aspect for this exploration of DSLs is much more social and cultural than technical.
I think that if we really want to decouple AST from evaluation context, the main tool we'd need is to deconflate module 'import' into separate steps to 'load' a module's AST and 'integrate' definitions, allowing for intermediate processing. This requires some host language changes, elimination of module-global state, etc..
Sorry about this, I was typing in a hurry. The links should redirect to https://github.com/flowbase/flowbase but seems to be misconfigured for https. Looking into it.
I don't mean to derail the conversation, but this really does remind me of the game Factorio, though sort of in reverse.
In Factorio, you build a larger and larger factory out of pre-established functional components (assemblers, labs, chemical plants, etc) that take in a limited set of inputs and produce (usually) a single output. Your challenge is not to define the functional core processes, but instead to wire together those functional components by connecting their inputs and outputs in ever-more-automated fashion, starting by hand, then using simple belts (pipes) that eventually allow arbitrary load-balancing via "splitters", and eventually through to trains (the forking and load balancing happening via backpressure in the train system) and robots (where everything is managed essentially as a single state database of requests, and backpressure is provided by output limitations, usually per functional component).
Naively, I think that someday a decent chunk of programming might actually look like this, and parts even be represented visually (though in my opinion likely still defined formally as text). Only I think programmers will continue to write the functional components themselves, unlike in Factorio. They'll just live on different levels of the "codebase", and the "pipes" level will likely be a lot more abstracted than it is in Factorio.
As a software developer, I find this paradigm to map very well to serverless architectures, because you generally want to think a level higher than the per-machine basis. It does require a willingness to forgo handy and well-established tools like the filesystem and Unix pipes in favor of higher level abstractions around transfer and storage of data.
You have, from first principles, reconstructed a huge portion of my thought process for building https://refinery.io
Factorio and Minecraft automation mods are a big inspiration! Check out InfiniFactory too :)
Bridging existing applications to the Serverless paradigm is far from simple. That's one of the biggest struggles I've experienced trying to build a Flow-based software platform.
Learn more every day though. Thank you for the interesting comment!
I took a look at your site (refinery.io), and the "watch demo" is really a "read introduction"... I was expecting to sit back and let you show me 10-15 minutes of video that sells the idea.
It looks very professionally done, but reading screens isn't as easy as it used to be for me, a video is better.
Another video game that works similarly is Oxygen not Included.
One of the most compelling arguments for a data-flow/flow-based programming is the mental model and the visualization aspect of it. This opens up opportunities for monitoring, visual, data-driven programming and it is white board friendly.
In Elements of Clojure[0], the author discusses the concept of "principled components and adaptive systems". And a flow based design reminds me of exactly that. The semantics of composition and communication are well-defined and universal, but internally the components can (should) be specific and concrete.
Similar can be said about Small Talk as well. A primary aspect of its design was the mental model, understanding and learning. The core idea was that learners (especially children) understand things in terms of their operational semantics.
> I don't mean to derail the conversation, but this really does remind me of the game Factorio, though sort of in reverse.
So no, I don't think this is a derailment, but likely one of the most important aspects of paradigms like this.
thank you for making this connection - I had recently been reading Elements of Clojure and noticed the concept of principled components, and I had been slowly working my way towards this realization/analogy myself.
We're building a tool, Estuary Flow, which seeks to be an end-to-end realization of practical, configuration driven, and scale-out flow-based programming -- with an important twist.
The central concept is a "collection", an append-only set of schematized documents, which can be captured and materialized into other systems (e.x. pub/sub, S3 buckets, etc). "Derivations" are collections defined in terms of source collections, and stateful transformations/joins/aggregations which are applied to them.
A key twist is that collections are simultaneously a batch dataset (backed by cloud-storage) and also a real-time stream.
They unify the current dichotomy of "historical" vs "streaming" data into a single addressed concept. Declare a new derivation, and it automatically back-fills over history right from S3, then seamlessly transitions to live data.
If this sounds interesting, check out our docs [0]. We're early, but love feedback!
There has been quite a lot of IoT automation / rule engine stuff done in NoFlo in the last couple of companies I’ve worked at, but obviously I can’t really go into specifics there.
As I was reading this I was enthusiastically agreeing with the idea, but something felt super familiar about it. Then I realized this is basically the same concept as Unix pipes.
I have my little individual programs, grep, awk, sed, jq, etc, and then I can endlessly mix and match those different components to do what I want.
The limitation that I have seen with Unix pipes isn't often with the ability to process or manage the data it is that it only works if all the data is setup the proper way. As I have been in the industry longer it seems to me that most code that gets written isn't about actual computing but just centered around importing and transforming data that is expressed in different ways.
Is there something that makes reading in disparate data records easier so I can focus on the computing part of computer program and less time on parsing data?
you aren't wrong, but i wanted to point out that through `tee` and `cat` you can split or merge pipelines, and with FIFOs you can awkwardly create graph cycles
I'm super intrigued by luna/enso. Only that every time I've tried it, the editor has had serious stability problems. I wonder if a more traditional tooling and editor support wouldn't be more successful for wide adoption?
Side effects are forbidden by structure, flows could be monitored in a GUI/Debugger, and as a result components can be tested as a unit, instead of a whole system. I love it!
It is easier to design digital circuits when you have a whole catalog of 7400 and 4000 series gates, than it is using individual transistors. It is easier to wire a house when you're not making wires and switches with a hammer and a forge.
I welcome this new higher level of abstraction, and am willing to pay the cost in terms of CPU and Memory to get there, just as I'm willing to waste transistors or copper wire to have something done and working.
I recently used nodered for some process automation and it was absolutely fantastic for quickly prototyping, dashboarding, doing sensor integration, etc. Highly recommended!!!
General comment about the Wikipedia article - https://en.wikipedia.org/wiki/Flow-based_programming : given all the interesting discussion on the 'Net, including this thread, it might be time to update the Wikipedia article.
The most substantive addition in recent years is Akaigoro's comment about Actors, added Jan. 2020 (@guitarvydas, care to jump in?!), preceeded, I think, by Joe Witt's reference to NiFi in mid-2018. This general lack of activity I feel might lead readers to assume that FBP is an outdated concept, whereas this thread proves it's definitely alive and kicking! I, personally, am not allowed to update the article, due to the WP ban on self-promotion, so I would like to encourage people to add topics, controversies, anything, to the WP article... Freshen it up a bit, as it were! Thanks, and stay safe, everyone!
Similar concept to what I do when programming in F#/OCaml by default. You create your data types and then they "flow" through functions. Each function is `input -> output` but bigger workflow are also input -> output.
It might not tick all of the boxes for being a complete software development tool, but Excel strikes me as being a dataflow programming model. I wonder if it's a reason why it's easy for laypeople to learn.
Apache Beam seems to be an implementation of this idea. It works well when the things you have to do matches the logic, but it gets tricky if you need to do stuff iteratively or recursively.
Beam has some overlap, but in my understanding it has a rather involved syntax for defining the data flows, quite far from the simple list of connections between in and out ports that you see in FBP systems following J P Morrisons principles more closely.
Apache NiFi comes quite a lot closer, with the main difference that they only have a single in-port, instead of separate named ones. Also it seems to be among the more heavy and complex implementations (for good and bad).
"They" meaning "FBP-like" or "FBP-inspired systems", to use Joe Witt's terminology... They are usually control-flow oriented, and to my mind do not yield the productivity and maintainability advantages of the "classical" FBP paradigm shift. I recently asked Node-RED's Nick O'Leary why Node-RED only allows one input port, and the answer was something like "because we have never run into the requirement"... a) this is not something that can easily be added after the fact, without totally rethinking the product architecture, and b) trying to build complex systems without that feature would be, to me, like trying to hang wallpaper with only one hand! One litmus test I use is that of concatenating data streams using the two different paradigms - I tried to describe this in an article a year or so ago, recently updated: https://jpaulm.github.io/fbp/concat.html Cheers!
In Dataflow -programming (or something like that) how do you program a decision that depends on the result of some component?
It would seem like I need to suspend my computation then "ask" the result from some component, get the result back, and then alter my computation based on that result.
Pure flow-forward would not seem to support this easily. Or can it? Or does it come down to that we always will need BOTH sync- and async- functions? (unless of course we limit the problem domain)
For dataflow programming, you introduce structures. For example, in LabVIEW, one has many types of structures such as for loops, while loops, case structures (for making decisions), event structures (for listening to and responding to user events), type specialization structures (for reacting, at development/compile time, to types in a dynamic way as a type of development/compile time polymorphism), conditional compile structures (for conditionally compiling different parts of code), and disable structures (for disabling certain parts of code).
This is really no different than the type of dataflow you have in other functional languages like F# or Racket. Values flow into special syntax forms such as if or match that allow branching.
The simplest way is with when/unless vertices which take a boolean input and some other inputs, and which either output the other inputs if the boolean input is true/false, or output nothing.
This gets unwieldy quickly. Consider the ajax pattern:
b = a
.filter1()
.filter2()
.filter3()
.filter4();
if (b.foo()) {
return a.filter5();
} else {
return a.filter6();
}
For visual programming, AFAICT you have to feed `a` downstream twice: once to filter5 and once filter6 in preparation for the conditional branch. That requires one of the following:
* intersecting lines, which are way more visually complicated to read than the code above (and can quickly create visual ambiguities which as a class do not exist in a text-based language)
* one line which loops under the bottom of one of the downstream objects, which is visually distracting (especially if you have more than two branches)
If a is a constant, you can repeat it as inputs to when and unless. If a is a value output by a vertex further back, e.g. output of filter0, then yes you do need edges to when and unless. But the output could be labelled, automatically causing identical labelling at the destinations of the edges, and the lines corresponding to the edges could be hidden until the mouse pointer hovers over the either the output of filter0, the input at when, or the input at unless. It's better demonstrated than described in text, but I hope you follow it.
I'm aware that text is more compact, and understand the various user interface issues (e.g. crossing lines), but there are advantages in representing programs as graphs as opposed to text. One of these is the possibility of very strong type checking while a function is being defined, and I'm currently working on that. Another is that it should be possible to take advantage of homoiconicity: programs are graphs, and graphs are the most general data structures. Also, graphs (i.e. data) can double as programs.
Tutorials for my language can be found here: http://www.fmjlang.co.uk/fmj/tutorials/TOC.html
These are out of date, so suffer from the problem you hinted at above, and they also don't explain my aims. I'll write updates to them when the new type system is properly working.
> It's better demonstrated than described in text, but I hope you follow it.
I do understand that. I thought about implementing something like that for Pd, but in reverse. Programs there tend to look like write-only spaghetti, so it would be helpful to have something like a "spotlight" feature that temporarily removes (or blurs, etc.) the spaghetti that isn't part of a currently selected chain of objects. That would make spaghetti at least navigable with a mouse and some time.
Anyhow, I've never found a dataflow or visual programming that really deals with the ergonomics of writing the code. At some point I just want to do a little math, or a quick conditional, or some such conceptually simple business for which the graph just gets in the way. At that point it's easiest to drop down to some kind of text-based DSL inside a single box and be done with it.
You can call Lisp from my language, and call my language from Lisp. But it will still need a lot more work before I start using it for tasks I currently prefer Lisp for.
That gets to the core of my puzzlement. In terms of dataflow if 'b' is an 'agent' then b.foo() is a message we send to it and then we use the result of b.foo() for further decision making. But in a typical dataflow-diagram data flows in one direction only, there is no link in the diagram from "b" to "result of b.foo()".
A diagram could not (easily?) describe the fact that we get the 'b' from one component and then we send the message "foo" to that component 'b". The data-path from this component to "b" could hardly show in the diagram because "b" is the response to a previous request which could be one of many possible alternative responses.
I'm not sure if I can express or even think about this very clearly but questions like this make me doubt the generality of dataflow as an alternative to current-day programming practice.
Researchers at the University of Manchester and elsewhere built computers which implemented dataflow at the hardware level. The main reason we don't have them is because of Moore's Law: wait a few years and a single CPU will be just as fast as multiple dataflow CPUs.
There are historical reasons why we're programming in text in mostly imperative languages: graphics terminals were either not very good or very expensive, and computers had single CPUs, until quite recently.
Yes boolean function, but wouldn't that have to be a synchronous function, one that the component doing the decision making calls?
I understand that there can be loops in the dataflow and feedback can alter the processing logic of a component at some point in the future - since processing is async, the alteration can only happen in the future.
But when a component needs to make a decision, not alter its processing logic but simply make a choice, can it somehow 'ask a question" from some other component which is part of the data-flow, before it it decide which branch of the data-flow to send data to?
My question is simply is it possible and/or practical to do general programming with only (asynchronous) dataflow between the components/functions/units-of-computation?
Or do we need both synchronous and asynchronous components/functions/agents etc. ?
Dataflow doesn't require synchronization because vertices wait until there are values with the same tag on all their inputs before executing.
In pure tagged dataflow systems, asking a question doesn't happen: programs are directed graphs and like most other programming systems vertices have to wait for values. However, at a higher level, you could have vertices sending messages to one another and receiving answers back.
So computation can not start before I have all the inputs, right? But when I --a vertice-- am doing a computation, I would like to ask for more data from some other component, so that I can complete my calculation. So how could I get such data except as one of my input? But I must have all my inputs already, else I would not be executing.
This would then seem to be at the core of the difficulty with data-flow programming. Once a computation of a node has been activated, it can not ask for help from anybody, because it must have all its inputs present before it starts doing its calculation. Does this make sense ???
Right but spreadsheets are hardly used for "general purpose programming". They are good for the tasks they are used for but "systems" are not implemented with spreadsheets.
That's what I would suspect. Dataflow programing would seem to hold great promise in making programs more understandable. But there would seem to be limits as to where it can be applied (for-if) etc. It is not a universal model for practical programming, I assume.
It's a bit like "pure functional programming". We can't really do that in practice because we need IO. The best we can do is to divide the program into two parts, purely functional, and imperative. Similarly I assume we could (and should try to) divide programs into "data-flow-part" and "synchronous-part".
You can write pretty interesting data pipelines using akka streams with a very similar idea to FBP. It's probably not technically FBP, but the whole reactive stream implementation is similar in thought but allows for things like disparate data source speeds without blowing out part of the flow.
A nice and powerful C++ flow-based programming framework is DSPatch [1]. It is definitely worth to mention here, check out also the examples in the audio domain.
This is wildly under developed, mathematically, but eventually with enough FP and category theory and things, we will get back there.
The problem is everyone wants to write little nodes and "just" wire them up. But all the real complexity is not in the nodes, but the nature of the wires and their composition.
The analogy to the soft drink bottling seems...off. I would imagine there's a contract of sorts between stations on rate of output/input, spacing/placement of bottles, etc, between stations that makes it not very asynchronous.
This is a good article, in the history or multithreading sections it would have been nice to mention the various old hardware implementations of this model, such as the transputer or connection machine
neat to see this here! I've recently become convinced Flow Based Programming is the ideal that other programming paradigms are reaching for. But in practice, there's a lot that's not obvious to me how it would actually work-- so it's great to have a bunch of new reading suggestions!
This type of programming is actually a subset of functional programming called point free programming. It's equivalent to programming using only combinators as the fundamental unit.
All programs are actually pipelines of data flowing from a low entropy state to high entropy state with IO and state as the endpoints of the pipes-. Using the point free style or flow based programming makes this entropy and the pipelined nature of all programs more explicit.
Using OOP or regular programming the pipeline nature of data flowing to and from state and IO becomes less evident and more convoluted.
Fascinating. This is how I build my programs. I never knew it had a particular name for the style. I just assumed it was functional-style programming.
This allows me to build ever greater programs, that does very complicated things, and has thousands of lines of code and logic, but everything gets boiled down to small bite-sized pieces.
It’s a more data oriented approach. And it seems to follow an assembly line model.
I liken it to building code pyramids. Where I keep stacking and chaining one function to another, to build even bigger pyramids.
At the end, all the code is heavily unit tested, and I have a high degree of trust in the fidelity of the codebase.
Yeah, when Java 8 introduced the Function interface I started to program this way as well. I get push back some times from the those not initiated to the functional style because I have a ton of little function that do one thing. Unfortunately in Java this means a bunch of class files.
The other thing I noticed when I started using this style is that I was able to ditch unit testing mocking frameworks. This made me realize, when the Function interface was introduce it was a backdoor way to get folks back to coding to an interface, which we should have been doing all along.
Instead of losing yourself in ever more complex syntax convolutions as has happened in a lot of functional programming, you make the components (even long running stateful ones) self contained with ports for data input and output as the sole means of communicating with them, over buffered channels to allow asynchronous computation, and most importantly keep the network definition separate.
Just this idea in itself is just brilliant. Hats off to Mr Morrison for that!
It allows to decouple complex software into reusable components, without clever FP syntax.
(Though, FP is perfect for implementing the components themselves. It just doesn't really scale all too well for whole program architecture, in my experience).
One point to note: The visual component of many fbp systems is completely optional, as is the idea of using novel DSLs. You can as well define your networks and components in pure code. See GoFlow https://github.com/trustmaster/goflow) and my own little experiment FlowBase (https://flowbase.org) for examples of that, in Go.
I successfully built a rather complex little app to convert from Semantic web RDF format to (semantic) mediawiki XML dump format, in two weeks straight, of linear development time: for each component (of ca 7), implement, test, go to the next component (See: https://github.com/rdfio/rdf2smw)
The same implementation in procedural PHP took months, and still doesn't have all bugs and strange behaviours filed out.