G-Code: A poorly define language

Many Computer Controlled Machines used for CNC use NC-code / RS-274D / G-Code / Gerber to describe their programs. This language appears to be very loosely defined. To save others searching I am summarising some sources and observations about this language.
The language appears to have evolved over the years from work originally done by Faunc. In general the instructions (words) of the language are well defined but the file formatting tends to not be specified. The variations appear to be concentrated in the handling of comments and defining the file structure.

Line Numbers



Line numbers are optional and consist of an N followed by unsigned digit.

Words



Are a letter followed by a real value. The letter is not case-sensitive in many implementations.

Comments



One particular sticking point is comment delimiters which appear to vary.
Faunc uses:

(
- control out

)
- control in


basically this excludes the contents of the parenthesis from the control flow.

The NIST RS274NGC Interpreter - Version 3 (
http://www.nist.gov/manuscript-publication-search.cfm?pub_id=823374) uses the Faunc convention.

Although the strict definition of the Faunc convention allows embedded comments - some machines apparently don’t handle this well or as expected.

Roland’s NC Code (
http://support.rolanddga.com/docs/Documents/departments/Technical %20Services/Manuals%20and%20Guides/MDX-PRO2_PRO_NC-CODE_EN_R1.pdf, NC Codes Reference Manual) adopts the Faunc convention.

Reprap (http://reprap.org/wiki/G-code) uses a semicolon to the end of line convention.

Data Start / Data End



Some systems require a data start and a data end block (i.e. a % on a line by itself at both the start and end of the program) others don’t. Some are white space tolerant.

White Space



White space is generally not required between words.