More

buttproblem · on May 27, 2017

Techniques like this can be used to make debugging better. There's a lot of work towards this direction but there are some difficulties making it's use widespread.

Some interested related work is Delta Debugging. Here, an automated tool runs a set of tests and uses slicing to narrow down statements in the program causing the bug. The gist of this is to look at the statements which occur on every failing execution and then present these to the end user. More tests causing the error will usually lead to a smaller explanation.

The paper here presented a dynamic slicing technique. The problem with dynamic slicing is runtime overhead. The most naive approach is to log the operands and output of every instruction in the program which is quite expensive. Traces for real world programs will probably be many gigabytes. I don't have any numbers off the top of my head. But you could not deploy the instrumented version of the program for slicing to your users without it degrading performance. Intel recently added some hardware tracing to their processors which might offset this.

The alternative is static slicing where the program is not actually executed. Here, you give the slicer the statement of interest (e.g., a line of code where a seg fault occurred) and it will give you back the statements involved in the computation, same as in dynamic slicing. The problem here is accuracy. It's difficult and expensive to know what a program does before executing it (e.g., what memory locations could be accessed by this location? What is the value of the index used to access this array). So, the slice is often too large, or the slicing algorithm is too complicated and fails to terminate in a reasonable time on a real program.

Of course for hard to diagnose bugs you could use techniques such as this, and maybe it would be beneficial. It's hard to way the cost of wrangling with a slicing tool versus just staring at the code and trying to figure it out yourself. There's many more similar debugging techniques which are also quite interesting (e.g., program synthesis to automatically find and repair bugs, using slicing with test case generation (e.g., fuzzing, symbolic execution) to focus towards particular code regions) but these are probably even more research-y and hard to use in practice.

mannykannot · on May 27, 2017

>It's hard to weigh the cost of wrangling with a slicing tool versus just staring at the code and trying to figure it out yourself.

I cannot provide figures, but for me, when debugging within code that is not my own, I find that even something as simple as a call tree is often a big help (it is complementary to a stack trace, as you often want to figure out what was going on before the failure.)

My guess is that more sophisticated and powerful tools for understanding software will do more to improve its development than yet another language.

buttproblem · on May 27, 2017

I'm curious about how you use this during debugging, would you mind elaborating on what language(s) you're using the call-graph from?

I'm wondering because I've been studying building automated program reasoning tools and have found that even just creating the call-graph is difficult: I imagine the end result potentially confusing the user because it is either incomplete or overly pessimistic. For example, the LLVM call-graph (apart from not being in the language directly used by the developer) does not include edges from indirect calls because their targets cannot be resolved (so it is incomplete). Alternatively, in the case where you don't know which function is called you can just add an edge to every function in the program (pessimistically).

The same story goes for the dependency graph.

I myself am a bit pessimistic myself about more powerful software understanding tools probably because I've been working on the problem for too long. It's particularly difficult because the languages end up being too difficult to analyze, and there is a huge engineering effort in creating an analyzer which is tuneable to handle various language features efficiently. But, if you have a particular problem and a particular (subset) of the language it ends up working nicely.

mannykannot · on May 27, 2017

I have done this for C code. I will sometimes print off sections of a call tree and make notes on it to keep track of what I have found about what updates what as I work backwards from the point of failure.

I have also used the 'call hierarchy' and 'find usages' features in IntelliJ to help me navigate around Java code.

I appreciate your point that more dynamic and feature-rich languages are much more difficult (if not impossible) to analyze statically.

agumonkey · on May 27, 2017

Call tree and deps graph is mandatory to me.

buttproblem · on Nov 4, 2016

Overall, one way to look at formal methods is that the tools range from fully automated (e.g., abstract interpretation, model checking), to manual (e.g., interactive theorem proving). Along this same range, the complexity of properties which can be proved also moves from simple (e.g., absence of buffer overflows) to complex (termination of parameterized systems).

As a concrete example, interactive theorem provers can prove the termination of Paxos for an arbitrary number of nodes (i.e., a system of `n` nodes will reach consensus, where `n` is a positive integer). But, automatically generating such a proof for something as complicated and parameterized as Paxos is an open problem ([1] describes an automatic tool working toward that goal).

Another thing to keep in mind is the goal of the tool: verification versus bug hunting. Verification aims to prove that a property will never be violated, while bug hunting aims to find a property violation. Here's a list of some types of tools off the top of my head.

Bounded model checkers such as CBMC [2] essentially search the entire state-space of the program; you can conceptually think of this as searching through a graph which has some start node, and you'd like to find a path from the start to an error. This is a bug hunting technique since the software/hardware may have an infinite state space (e.g., `while (true) ++i`), hence the search space must be bounded (e.g., unroll loops). For finite-state systems this can generate proofs of correctness. For infinite state systems, it can generate proofs up to a certain bound (e.g., correct for some number of clock cycles, correct for some number of loop iterations). The benefit of bounded model checkers is that when they find an error they generate a counter-example, which is essentially inputs to the program causing it to fail. So, you could use the counter-example as a concrete test case to fix the issue. Techniques such as counter-example guided abstraction refinement, and more recently IC3 and property-directed reachability make use of these counter-examples to generate proofs for infinite state systems.

A complementary technique is abstract interpretation, and numerical abstract interpretation in particular. One successful tool is Astree, though closed source, has been used to verify properties on Airbus planes. Abstract interpretation is complementary to something like bounded-model checking since it can be used to verify properties on infinite state systems. Numerical abstract interpretation can be used to generate numerical proofs such as, "on line 10, variable `x` is in the range `[-10, 10]`, or constraints between variables such as `x - y >= 10`. A difficult part with abstract interpretation, however, is that when it reports an error it may be the case that the error is a false-alarm and it doesn't actually exist. You can think of numerical abstract interpretation as typical "compiler algorithms" such as constant propagation but keeping track of much more complicated information.

Symbolic/Concolic execution, such as KLEE [4], Pex and Moles [5], and their successor IntelliTest [6], are sort of a middle ground between model checking, and traditional unit testing (i.e., hand writing inputs to the program to test it). The main goal is to automatically generate such tests in order to test behavior in the program (e.g., generate a set of test inputs to have 100% code coverage). The modern versions of these techniques make use of dynamic information (i.e., they actually execute the program under test), and use this information to make decisions about future executions. More concretely, you can imagine the program executing past some branch `if (c)` which was not taken (i.e., `c == false`) and then, the analysis asks the question: "what do the program inputs need to be such that `c == true`. An answer to this question generates new inputs and allows another test to be generated and explored.

Stateless model checking is another automated dynamic technique [7,8,9] used to explore non-determinism in concurrent programs. Essentially, it automatically generates "test cases" (i.e., new thread schedules) to efficiently explore the state space caused by a non-deterministic scheduler.

And there's a bunch of other techniques out there too (e.g., symbolic model checking). There's also a huge amount of work on practical test input generation for hardware. On top of all these techniques, there are there applications to solving specific problems, and heuristics to make them more practical and scalable. Overall, most of the techniques are niche and not widely used, but have been applied in various areas.

[1] https://www7.in.tum.de/~gleissen/papers/sharpie.pdf

[2] http://www.cprover.org/cbmc/

[3] http://www.astree.ens.fr/

[4] https://klee.github.io/

[5] https://www.microsoft.com/en-us/research/project/pex-and-mol...

[6] https://www.visualstudio.com/vs/release-notes/#Testing

[7] https://github.com/markus-kusano/SysTest

[8] http://www.srl.inf.ethz.ch/papers/oopsla15-modelchecking.pdf

[9] http://plrg.eecs.uci.edu/software_page/42-2/

buttproblem · on Aug 12, 2015

I was just trying to get Nix working after needing a version of base (GHC) newer than what I had installed.

The situation seems to have gotten a bit simpler with newer versions of the nix tools (this is of no fault of the author of this post, the post is more than a year old, their blog has great haskell stuff!).

Anyway, for haskell development you can just:

   1. Write cabal file (e.g., ./project.cabal)
   2. cabal2nix --shell . >shell.nix
   3. nix-shell

After step 3, all the required packages, including stuff like GHC, will be installed. You can use cabal however you normally do to build the project:

   4. cabal sandbox init
   5. cabal build

psibi · on Aug 12, 2015

Is step 4 really required ? nix-shell already gives you a sandbox environment.

Also their user manual[0] is really good now.

[0]https://hydra.nixos.org/job/nixpkgs/trunk/manual/latest/down...

buttproblem · on July 6, 2015

At least since 1935 [1, 2] people have been trying to formalize all of mathematics with set theory. I always thought this was fairly interesting, similar to how the real numbers can be modeled in a language using dependent types like Coq.

However, I always found Hoare logic, and its concurrent extension Rely--Guarantee from Jones, to be quite easy to understand. The more interesting part is how to do this automatically for a user. Abstract interpretation is one way to do this, but this necessarily requires mapping some programming language to your mathematical model. However, determining the formal semantics of mature programming languages, even C, is still open research (e.g., see papers on Semantics in PLDI [3] 2015).

TLDR: verification is hard.

[1] https://en.wikipedia.org/wiki/Nicolas_Bourbaki

[2] Topoi, the categorial analysis of logic. Goldblatt, Robert. http://digital.library.cornell.edu/cgi/t/text/text-idx?c=mat...

[3] http://conf.researchr.org/track/pldi2015/pldi2015-papers#pro...

mafribe · on July 6, 2015

    At least since 1935 people have been trying to formalize all of mathematics with set theory.

Cantor invented set theory for this purpose in the 1870s. By 1935, Goedel's incompleteness theorems had shown the limitations of this (and any other) foundation of mathematics.

Hoare logic is not a foundation of maths, but an approach towards specification and verification of programs. As such it is in the same space as Curry-Howard based programming languages like Idris and Agda with dependent types. The main differences between the two are that (1) Hoare logic is not restricted to constructive reasoning and (2) Hoare logic is not integrated with the programming language, so programming and proving are distinct activities. Depending on your philosophical outlook, either (1) or (2) or both can be seen as major advantage or disadvantage.

    how to do this automatically

What do you mean by "this"? Automatically specifying programs semantics, or automatically proving program properties?

   determining the formal semantics ... is still open research

This is somewhat misleading. Every compiler gives a perfectly formal semantics to the programming language it compiles. What you probably mean is something like nice abstract accounts of memory models for C that at the same time capture all the optimisations modern C-compilers want to do, while still being implementable on existing CPUs. That is indeed an open problem [1].

[1] https://www.cl.cam.ac.uk/~pes20/cpp/c_concurrency_challenges...

groar · on July 6, 2015

    What you probably mean is something like nice abstract accounts of memory models for C that at the same time capture all the optimisations modern C-compilers want to do, while still being implementable on existing CPUs. That is indeed an open problem [1].

True. Although CompCert [1] is a nice effort toward that goal: a proved C compiler that covers almost all C99 with many optimizations implemented and proved sound.

[1] http://compcert.inria.fr/compcert-C.html

mafribe · on July 6, 2015

CompCert uses tame optimisations, and compiles to mainstream architectures nothing exotic. Most of all, CompCert does not deal with concurrency as far as I'm aware. Concurrency is where memory models matter.

The problem with memory models is not so much verifying it in a mechanised way, but inventing something suitable at all.

groar · on July 6, 2015

  The problem with memory models is not some much verifying it in a mechanised way, but inventing something suitable at all.

Agreed.

I'm aware of some work done a few years ago by people working on weak memory models and extending CompCert with some concurrency primitives: http://www.cl.cam.ac.uk/~pes20/CompCertTSO/doc/

buttproblem · on July 6, 2015

Thanks for the interesting history on set theory. However, if you read [1] or the first chapter of [2] in my reply you'll see that the strides to formalize mathematics started in 1935 so something does not add up.

I did not intend to say Hoare logic being a foundation of mathematics, this was unclear in my post since I just abruptly changed topics. However, i would be delighted to read someone attempting to do just that.

What I intended to say is that the posted paper is basically a formulation of Hoare logic using set theory.

By this I meant both specifying program semantics and proving program properties. As in, the automated construction of Hoare pre/post annotations or for concurrency rely--guarantee annotations. To me, this is bringing languages without depends types closer to similar kinds of correctness guarantees.

Thanks for the clarifications on semantics. You hit the nail on the head.

tel · on July 6, 2015

I went ahead and skimmed the first chapter of [2] (again; I own it) and don't see it justifying that this stuff began in 1935. Indeed, it suggests, e.g., that a major difficulty of the program to formalize all of math in set theory, Russell's Paradox, was discovered in 1901 and then immediately notes that set theory itself was begun by Cantor decades before.

It also then mentions how Set Theory was formalized and fixed through the introduction of formal classes (which fixed the size concerns discovered by Russell by outlawing them---and actually Russell himself proposed a system which did exactly this, his "rammified hierarchy" almost immediately after discovering his paradox; it is the foundational model of the Principia Mathematica which was published in three volumes in 1910, 1912, and 1913) and that these were in use by the mid 1920s.

In fact, the first mention of dates in the 1930s occur in Hilbert's finitist attempts to prove consistency and Godel's destruction of that program.

What you're probably referring to is Bourbaki's work. This wasn't an initial attempt by any means but instead a broad reaching program to popularize the methods of sets that had been previously established. This was the moment of catastrophic phase change when all old mathematical concepts were suddenly shown to be suspect if they could not be axiomatized set theoretically—so certainly when a bulk of work occurred.

But all said, it is certainly the case that set-theoretic foundations of mathematics was a 1870-1930 affair.

buttproblem · on June 30, 2015

I agree both in parsing via another program as well as parsing error messages in my brain (not sure which you were intending).

Perhaps it is because compiler messages suck but my brain has been trained to quickly pick out the line numbers and file names.

buttproblem · on June 18, 2015

I don't know node, but, conceptually, Arrows [1] map to the steam concept very well (as always in Haskell, they are more general). The sugar in Haskell to use them similar to do-notion is really a joy to use.

I was exposed to them in the Haskell music library Euterpea in the Haskell School of Music [2]. The signal processing API uses arrows.

[1] Generalizing Monads to Arrows. John Hughes. http://www.cse.chalmers.se/~rjmh/Papers/arrows.pdf

[2] http://haskell.cs.yale.edu/?post_type=publication&p=112

buttproblem · on June 12, 2015

I'd be interested in this as well; all I could find on their site is that it is uses "patened techniques." Considering the first seperation logic paper was published in 2001 [1] and Coverity founded in 2002 [2], they probably did not (at least originally) use the same techniques.

I'm guessing their site is designed to sell to people and not have the details.

[1] http://fbinfer.com/docs/separation-logic-and-bi-abduction.ht... [2] https://en.wikipedia.org/wiki/Coverity

buttproblem · on May 12, 2015

testing seems to be somewhat of a niche field. Using manual unit testing and random testing gets good results and seems to be the standard. But, you certainly could not devote an entire semester course on them.

Beyond that, more advanced program analysis techniques, such as model checking and abstract interpretation, use some tricky math and don't seem to be widely used outside of some niche fields (critical software).

Testing in VLSI design seems much more commonplace (people usually attribute this to the Intel bug). I don't know how much hardware design is done in a typical CS degree.

buttproblem · on May 10, 2015

Nathan Myhrvold in Modernist Cusine writes about a similar technique. Sous vide first, followed by a brief time in the oven, and finished on a smoker. This seems to be the reverse of the crutch. (The oven is used to prime the meat for smoking and is apparently based on a German technique for making sausages.)

Of course if you cringe at wrapping your smoked food in foil you'll hate this way too. I'm also not sure if this is the technique he used to win his bbq medals, perhaps that's a secret.

buttproblem · on April 28, 2015

While I agree with you, it is better to focus on practical things other than pedantic names, you can really notice the difference more when you look at the foundation of Go's language level concurrency paradigm.

Concurrency in Go is based on Communicating Sequential Processes (CSP) by Hoare; this was originally published in 1978. Reading CSP you see all the ideas are abstractly formalized in math terms: there really is no notion of multiple streams of execution occurring at the same time (i.e., multiple hardware threads). So, a distinction is made between things cooperating (concurrency in their parlance) and things running at the same time (parallelism).

To continue this idea, parallelism, again using their terms, really is an implementation of the more abstract concept of concurrency. That is, given a description of concurrent processes communicating you can run them in parallel (and reason about their correctness).

If you do not make the distinction, regardless of the terms you use, you remove the layers of abstraction necessary to reason about complex systems (separation of high level concepts and their implementation).