In June 2020, OpenAI released version 3 of itsGenerative Pre-trained Transformer (GPT-3), a natural language transformer that took the tech world by storm with its uncanny ability to generate text seemingly written by humans. But GPT-3 was also trained on computer code, and recently OpenAI released a specialized version of its engine, namedCodex, tailored to help - or perhaps even replace - computer programmers.

In a series of blog posts, we explore different aspects of Codex and assess its capabilities with a focus on the security aspects that affect not only regular developers but also malicious users. This is the second part of the series. (Read the first parthere.)

Because it is based on GPT-3, Codex has benefited from being trained with a massive codebase taken from all over the internet, in virtually every programming language available. However, natural languages and programming languages do not share the same properties. And while natural languages can benefit from the generally higher flexibility in understanding concepts of the human mind, programming languages are bound to more rigid constructs and less forgiving principles, dictated by their respective interpreters or compilers.

For high-level programming languages, it is reasonable to expect that the same statistical modeling that works so well in GPT-3 for natural languages would bear similar advantages for its code generation capabilities. Codex, however, lacks some of the essential constructs required for a real "understanding" of programming languages, such as their abstract syntax tree or the computational architecture of the target machine.

So, how far can we push code generation while still being effective?

The imitation game: Codex's ability to understand low-level code

To assess how deep Codex's understanding of its generated code is, we tested its understanding of assembly language. We chose this to get the farthest away from natural language and the closest to the machine. We tested Codex by giving it assembly code samples, which we asked it to translate to ordinary language.

Attachments

  • Original Link
  • Original Document
  • Permalink

Disclaimer

Trend Micro Inc. published this content on 14 January 2022 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 14 January 2022 14:21:04 UTC.