Intel’s automated code debugging tool ControlFlag is now open source and available for developers to access for free – a move that will come as a relief to many who are tired of spending hours scrutinizing their software programs in search of a potential anomaly.
Now available via GitHub, ControlFlag taps machine learning to automatically identify bugs in software and firmware code, saving developers the time-consuming task of manually debugging the programs they write.
Announced for the first time at the end of last year, ControlFlag has until now only been used internally by Intel, to spot anomalies in the company’s own software development. By opening the tool up to external developers and letting them build on it, Intel is expecting to push the limits of what the system can do to streamline the process of writing code.
Debugging is critical to program development: almost all large-scale software includes accuracy, performance or security bugs that need to be mitigated. What’s more, every update to those programs, for example the launch of a new feature, introduces another opportunity for an anomaly to appear.
But for the vast majority of developers, the process is a time-consuming and still largely manual chore. This is because most bugs require a semantics analysis to identify, assess the root cause and mitigate – an analysis that even state-of-the-art debugging systems are incapable of carrying out effectively.
“Historically, such semantic analyzers were simply software developers,” Justin Gottschlich, principal AI scientist at Intel Labs, tells ZDNet. “As such, this is a key reason why debugging remains a largely human-driven process.”
The last decades have seen advances in trying to automate debugging, but existing tools are no match for software bugs that are only becoming more complex. This is why developers dislike debugging so much, says Gottschlich: it can take days, weeks and even months to fix a single software defect. It is estimated, in effect, that up to 50% of all software development time is dedicated to debugging.
This comes at a cost for companies, too. According to Intel, the IT industry spent an estimated $2 trillion in 2020 in software development costs associated with debugging code, which represents about half of the average IT budget.
ControlFlag was designed to address this gap, through a capability known as anomaly detection. The tool learns from previous examples to detect normal coding patterns, and can therefore identify anomalies that are likely to cause a bug, regardless of the programming language.
Intel’s team determined that an unsupervised learning approach would be necessary to allow ControlFlag to detect bugs in a wider range of repositories. The system learned coding patterns from over one billion lines of unlabeled source code, which enabled it to reach a high degree of accuracy, and even adapt to a developers’ style to differentiate a software anomaly from a stylistic variation in a programming language.
Since it was introduced last year, Intel has tested the machine-learning tool on various software systems, with promising results. “When we originally designed the system, we didn’t anticipate that it would be able to find highly complex defects,” says Gottschlich. “However, given its self-supervised design, ControlFlag has stunned even us, the ones who built it, in its ability to find highly complex, nuanced software defects.”
Using ControlFlag on just two proprietary software repositories, says Gottschlich, resulted in identifying over 300 defects in production-quality, deployed programs. For example, last year ControlFlag detected a code anomaly in a computer software project named Client URL (cURL), which transfers data using various network protocols over one billion times a day. After reporting the anomaly to the cURL team, they agreed with ControlFlag’s findings and redesigned their code to patch the issue.
The past year has also come with a fair share of learning points as Intel’s team worked to develop ControlFlag. Two key areas for improvement, according to Gottschlich, are to reduce the number of false positives reported by the tool – the number of defects reported that aren’t actual bugs – and to integrate an even more advanced sematic analyzer into ControlFlag’s reasoning.
As a system that is set to become one of the flagship products of Intel’s machine programming suite of tools, however, ControlFlag is set to keep evolving. “It’s unlikely that advances of ControlFlag will ever halt,” says Gottschlich. “This is largely because as software programming languages, hardware description languages, and computing devices evolve, ControlFlag will need to evolve too to keep pace with them.”
The system is part of Intel’s Machine Programming Research (MPR) project, which has the overall objective to reduce the time that it takes to develop software by 1,000 times thanks to automation. One of the areas that Gottschlich’s team is investigating, for example, is to eventually expand the abilities of ControlFlag to automatically repairing the bugs that it detects.
In parallel, Intel’s MPR team is working on a handful of projects that focus on making software development easier. Last year, for instance, the company released a tool co-developed with MIT’s labs, which can study snippers of code to understand what a piece of software intends to do. Called MISIM (Machine Inferred code Similarity), the system uses a catalog of pre-existing code to understand the intent behind a new algorithm and help engineers working on software by suggesting other ways to program, or offering options to make the code more efficient.
Gottschlich anticipates that MISIM will one day work alongside ControlFlag. “When properly fused together, we envision a more powerful new system that will be capable of detecting all the defects ControlFlag currently can, as well as hundreds of defects it currently cannot detect due to their underlying complexity,” says Gottschlich.
In the meantime, developers who are interested in getting started with the tool can now access ControlFlag on GitHub here.