In order to write a compiler or interpreter for a language, it must be precisely specified, or defined.
A programming language is defined in some document. The document usually contains a mix of formal notation and informal descriptions.
Some language definitions are sancitioned by an official standards organization (like ISO, IEC, ANSI, etc.) while some don't even care about standardization.
Usually a language is defined by considering its
Syntax (structure)
Semantics (meaning)
Pragmatics (usage)
A language's syntax is normally given by either
These forms are all equivalent. They describe exactly the class of context-free languages.
Note that syntax rarely, if ever, refers to context sensitive aspects. We could, in theory, employ context-sensitive grammars, but they are too hard for humans to read, and too hard to write efficient recognizers for.
Syntax is usually (but not always) divided into
Macrosyntax: in which the derivation trees of the language's grammar are tokens (i.e., words), not individual characters.
Microsyntax: in which it is specified how the strings of characters are grouped into tokens. It is the microsyntax which deals with issues like whitespace, comments, and case sensitivity.
A language's semantics is specified by mapping its syntactic clauses (or abstract syntax tree fragments) into their meaning. Common approaches include:
A hugely important distinction is that between:
Static Semantics: which deals with legality rules – things you can check "at compile time"
Dynamic Semantics: which deals with the "run-time" execution behavior.
Pragmatics does not affect the formal specification of programming languages. However, pragmatic concerns must guide your design of a programming language, if you want it to be easy to read, easy to write, and able to be implemented efficiently.