C++ Coding Convention－青蛙的綠色池塘

Coding convention (或者狹義一點的 coding style) 主要是用來作為撰寫程式碼時的指引，涵蓋範圍包括但不限於命名、語法、格式等，其好處是藉由設計過的命名方式降低名稱衝突的發生、以特定的語法規範避免意外產生邏輯錯誤又難以一眼挑出的 bug、讓整份程式碼 (尤其是多人共同開發的情境) 能擁有較一致的撰寫風格、讓程式碼較容易閱讀、... 等。

雖然 coding convention 有這些好處，但實務上要推行也不是那麼容易，因為每個人的寫作習慣與偏好各有不同，coding convention 便是需要團隊中的大家放棄一些自己原本的習慣與喜好。此外，「什麼算是好的寫作風格」對每個人來說可能定義都不同，因此在訂立 coding convention 時，究竟該以誰的意見為準有時也會面臨挑戰，我就曾經被嗆過 XD... 再則，coding convention 應該要規範到多細節？過多的條目細項對大部份程式設計師來說可能根本就記不住，寫個程式如果大部份時間都得花在留意 coding convention 上頭，也可能降低了程式開發的效率與熱情，或是因為沒人記得住所有規則，最後這份 coding convention 落入形同虛設的下場。過去我就曾經經歷團隊裡花了兩三個月時間，從草擬提案、討論細則、開放投票來決定 coding convention，這段時間 project 的程式碼都寫一堆了，最後定案的規範因為規則太細太雜結果也沒什麼人在執行...

以下分享的是目前我在工作上所使用的 C++ coding convention，因為團隊人數少，在溝通上節省了許多時間。還是要再次聲明，coding 風格孰好孰壞十分主觀，這份規範就算適用在我所處的團隊中，不見得就同樣適用在其它地方，若欲參考請再依您所處的環境斟酌調整。:-)

1. Terminology

1.1 All Capitalized

CUSTOM_NAME

1.2 All Lowercase

custom_name

1.3 Camel Case

A word with the first letter lowercase, and the first letter of each subsequent word-part capitalized.

customName

1.4 Hungarian Notation

An identifier naming convention in which the name of a variable indicates its type.

nCount   // prefix n for number
strName  // prefix str for string
pObject  // prefix p for pointer
m_nRound // prefix m_ for class member and n for number

1.5 Identifier

A developer defined token used to uniquely name a declared object or object instance.

1.6 Pascal Case

A word with the first letter capitalized, and the first letter of each subsequent word-part capitalized.

CustomName

2. Naming Conventions

Identifier	Notation	Example
Class/Struct	Pascal case	class FileLogger
Class member field	All lowercase with prefix _	std::string _log_file_name
Struct member field	All lowercase	std::string file_name
Method	Pascal case	bool LoggerActive()
Parameter	All lowercase	bool WriteLog(const char *log_msg)
Local or global variable	All lowercase	unsigned int log_level
Enum value and preprocessor constant	All capitalized	LOG_LEVEL_INFO

Why need naming conventions?
- 透過一致的命名原則，可讓程式碼閱讀更流暢、避免不斷在不同的程式風格來回切換適應
- 以簡化的規則避免大部份的命名衝突

Why not use Hungarian notation?
匈牙利命名法 (Hungarian notation) 在過去是為了讓程式閱讀者可以透過變數的名稱直接判斷變數的型別，但實際上它卻可能衍生出更多造成混淆不清的問題，例如：
- 若變數的型別被調整、但變數名稱沒跟著修改，會增加閱讀者理解困難
- 「前綴 (prefix) 某個字母代表某個型別」本身就存在著歧異，例如 b 可能是 byte、也可能是 bool，f 可以是 flag、也可以是 float
- 當使用型別較多，前綴命名也變得困難，例如要如何區分 int、int8_t、uint8_t、int16_t、uint16_t、int32_t、uint32_t、...
- C++ 的 class 讓前綴的選擇與理解更燒腦，例如 vector<string> 跟 vector<int> 的前綴相不相同? CMyClass 此類的自訂型別要加什麼前綴?

Why prepend _ to class member field?
在 class member field 名稱前綴 _ 可以簡潔的避免在 class member method 裡可能發生的名稱衝突，例如 constructor 常見的將 caller 提供的參數設定至 class member field：

FileLogger::FileLogger(const char *log_file_name) :
    _log_file_name(log_file_name)
{
}

Why not prepend _ to struct member field?
struct 通常只定義 member field、不定義 member method，struct 通常也比較單純地作為資料集合來用，因此 struct member field 被存取時一般會伴隨著 struct object 一起出現，例如：

item.name /* struct member */ = name /* local variable */

名稱衝突在 struct 的使用慣例上較不會發生，因此不須額外加上前綴符號。

Why are the naming rule for parameter and local variable identical?
在 method 裡，parameter 可視為 local variable 的一部份，理應不該有名稱衝突發生 (如果有的話，那 local variable 之間也會有同樣的問題，這就不該是靠 naming convention 來解決了)。

Why are the naming rule for local and global variable identical?
Local 與 global variable 代表的意義應存在本質上的差異，應透過自身的變數命名來區分，而非以前綴符號區分，例如 global variable 可能具有累積的概念，那名稱裡就可包含 total 或 sum 之類的字詞在內。

3. Coding Style

3.1 Always use curly braces { and } in conditional and flow control statement.

// Bad
if (_enabled_level < requested_level) return;
 
// Bad
if (_enabled_level < requested_level)
    return;
 
// Good
if (_enabled_level < requested_level)
{
    return;
}

Why?
可更清楚標示每個 conditional 與 flow control 對應的範圍，同時避免後續修改時忘記加上 {} 所導致的邏輯錯誤：

if (_enabled_level < requested_level)                // original code
    printf("requested level = %d", requested_level); // new code
    return;                                          // original code

WriteLog();                                          // original code and become dead code!

3.2 Always use parentheses to make clauses in an expression apparent.

// Bad
if (val1 > val2 && val1 > val3)
{ ... }
 
// Bad
if (val & FLAG_FILE_OPENED != 0)
{ ... }
 
// Good
if ((val1 > val2) && (val1 > val3))
{ ... }
 
// Good
if ((val & FLAG_FILE_OPENED) != 0)
{ ... }

Why?
邏輯判斷的先後順序會更明確，且避免非預期的邏輯錯誤，例如 val & FLAG_FILE_OPENED != 0 實際上結果等同 val & (FLAG_FILE_OPENED != 0)。

3.3 Declare each variable independently - not in the same statement.

// Bad
int i = 0, j = 100;
 
// Good
int i = 0;
int j = 100;

3.4 Write only one statement per line.

// Bad
if (_enabled_level < requested_level)
{
    ++_drop_log_count; return;
}
 
// Good
if (_enabled_level < requested_level)
{
    ++_drop_log_count;
    return;
}

3.5 Prefer < and <= to > and >=.

// may take more time to figure out that LOWER_BOUND <= size < UPPER_BOUND
if ((size >= LOWER_BOUND) && (size < UPPER_BOUND))
{ ... }
 
// the relationship is more clear
if ((LOWER_BOUND <= size) && (size < UPPER_BOUND))
{ ... }

Why?
固定左小右大的判斷式順序較符合一維座標上數值大小的相對位置。

3.6 Place curly braces ({ and }) on a new line helps determine the boundary of functional blocks clearly.

for (vector<Logger*>::iterator cur_logger_iter = _loggers.begin();
    cur_logger_iter != _loggers.end();
    ++cur_logger_iter) { // <- hard to locate this open curly brace quickly
    Logger *cur_logger = *cur_logger_iter;
    ...
}
 
for (vector<Logger*>::iterator cur_logger_iter = _loggers.begin();
    cur_logger_iter != _loggers.end();
    ++cur_logger_iter)
{ // <- functional block starts from here more clearly
    Logger *cur_logger = *cur_logger_iter;
    ...
}

4. Language Usage

4.1 Try to initialize class member fields and local variables where you declare them.

// Bad
int i;
int j;
i = 0;
j = 100;
 
// Good
int i = 0;
int j = 100;

4.2 Avoid assignment within conditional statements.

// Bad
if (0 == (count = GetCount()))
{ ... }
 
// Good
count = GetCount();
if (count == 0)
{ ... }

Why?
在判斷式裡做 = (assignment) 雖然合法，但對閱讀者而言不容易判斷是把 == (equality test) 寫錯還是原本意圖就是如此。

4.3 Prefer (bool) true or false to (BOOL) TRUE or FALSE.

// Bad
BOOL is_successful = FALSE;
 
// Good
bool is_successful = false;

4.4 Prefer nullptr to NULL for pointer.

// Bad
const char *file_name = NULL;
 
// Good
const char *file_name = nullptr;

4.5 Prefer enum class to unscoped enum.

// Bad
enum ReturnCode
{
    SUCCEEDED,
    FAILED
};
 
// Good
enum class ReturnCode
{
    SUCCEEDED,
    FAILED
};

Why?
新式的 enum class 提供了較安全的型別檢查，避免意外的邏輯錯誤，同時不同的 enum class 可定義同樣的數值名稱，例如：

// Old enum
enum BuildConfig
{
    DEBUG = 0,
    RELEASE,
    ...
};
 
enum LogLevel
{
    LOG_LEVEL_CRITICAL = 0,
    ...
    // DEBUG, // invalid identifier due to redefinition
    LOG_LEVEL_DEBUG
};
 
LogLevel log_level;
// read log level value from somewhere
 
// compare LogLevel with a BuildConfig value is a logic error,
// but C++ compiler won't treat this behavior as an error.
if (log_level == DEBUG)
{
}

// New enum class
enum class BuildConfig
{
    DEBUG = 0,
    RELEASE,
    ...
};
 
enum class LogLevel
{
    CRITICAL = 0,
    ...
    DEBUG, // same name could be defined in different enum class
    ...
};
 
LogLevel log_level;
// read color value from somewhere
 
// compare LogLevel with a BuildConfig value is a logic error,
// and C++ compiler will treat this behavior as a compiling error.
if (LogLevel == BuildConfig::DEBUG)
{
}

4.6 Always declare class destructor with virtual keyword.

// Bad
class Logger
{
    ~Logger();
};
 
// Good
class Logger
{
    virtual ~Logger();
};

Why?
將 destructor 宣告為 virtual method，可避免在定義衍生類別 (derived class or sub-class) 時可能造成的 resource leak，例如：

class Logger
{
    ~Logger();
};
 
class MyLogger : public Logger
{
    ~MyLogger();
}
 
Logger *logger = new MyLogger();
delete logger; // only ~Logger() is invoked because of the pointer type, so the resource allocated by MyLogger will be leaked.

4.7 Use override keyword instead of virtual keyword to override virtual method in sub-class.

class Logger
{
protected:
    virtual bool WriteLog(const char *msg) = 0;
};
 
// Bad
class MyLogger : public Logger
{
    virtual bool WriteLog(const char *msg);
};
 
// Good
class MyLogger : public Logger
{
    bool WriteLog(const char *msg) override;
};

Why?
雖然在衍生類別中使用 virtual 或 override 關鍵字來指定要 override 某個 virtual method 都是合法的，但使用 virtual 關鍵字可能會意外建立另一個 virtual method、而不是改寫原本的 virtual method (打錯 method 名稱或參數型別)，例如：

class Logger
{
protected:
    virtual bool WriteLog(const char *msg) = 0;
};
 
class MyLogger : public Logger
{
    // create a new virtual method instead of overriding the existing one
    // (method name does not match due to Write'l'og)
    virtual bool Writelog(const char *msg);
 
    // create a new virtual method instead of overriding the existing one
    // (parameter type does not match due to 'unsigned'char)
    virtual bool WriteLog(const unsigned char *msg);
};

改用 override 可以讓 compiler 檢查 method 是否原本就被定義成 virtual method，若否則產生 compiling error。

4.8 Prefer using to #define and typedef to define a type alias

// Bad
#define LoggerList std::vector<Logger*>

// Bad
typedef std::vector<Logger*> LoggerList;

// Good
using LoggerList = std::vector<Logger*>;

Why not use #define?
由於 preprocessor 的特性，用 #define 語法來提供型別別名的方式，有時會導致程式碼中有同樣名稱的 token 被意外替換掉，例如：

#define LoggerList std::vector<Logger*>

LoggerList logger_list; // OK => vector<Logger*> logger_list;

list<Logger*> LoggerList; // Failed => list<Logger*> vector<Logger*>;

Why prefer using to typedef?
相較於 #define 會無差別取代程式碼中具有同樣名稱的 token，typedef 與 using 所建立出來的型別別名只會作用在它們應該出現的語法位置中。但 typedef 在定義別名時語法比較容易搞混，例如 typedef MyType OtherType 這樣的定義中，究竟 MyType 是 OtherType 的別名、還是 OtherType 是 MyType 的別名，不是那麼容易理解。相對而言，using OtherType = MyType 這樣的定義就比較容易看出 OtherType 是 MyType 的別名 (先有 MyType，才有 OtherType)。因此以程式碼的可讀性來說，用 using 來定義型別別名對程式碼的使用者更為友善。

4.9 Be careful with the side effect of logical and (&&) / or (||) operators.

// logical AND short-circuit evaluation - CheckAndDoTask1 will never be performed
if (AlwaysRetFalse() && CheckAndDoTask1())
{ ... }
 
// logical OR short-circuit evaluation - CheckAndDoTask2 will never be performed
if (AlwaysRetTrue() || CheckAndDoTask2())
{ ... }

Side effect of logical AND
當在 logical AND (&&) 陳述句中呼叫其它函式時，要留意當 && 左方得到的值為 false 時、&& 右方的函式不會被執行 (因為當 && 左方已經為 false 時，不論 && 右方的值為何，&& 的結果都為 false)，因此請避免將一定得執行的程式片段放置於 logical AND 陳述句之中。

Side effect of logical OR
當在 logical OR (||) 陳述句中呼叫其它函式時，要留意當 || 左方得到的值為 true 時、|| 右方的函式不會被執行 (因為當 || 左方已經為 true 時，不論 || 右方的值為何，|| 的結果都為 true)，因此請避免將一定得執行的程式片段放置於 logical OR 陳述句之中。

4.10 Use additional curly braces to limit the scope of a RAII variable.

{
    unique_lock<mutex> lock(_msg_counter_mutex); // <- RAII lock acquired for _msg_counter_mutex here
    ++_msg_counter;
    _total_msg_length += log_msg.length();
} // <- RAII lock released here automatically once leaving its defining scope
 
{
    unique_lock<mutex> lock(_logs_mutex); // <- another RAII lock acquired for _logs_mutex here
    _pending_log_msgs.push(log_msg);
} // <- RAII lock released here automatically

4.11 Use static method in base class to realize factory pattern.

enum class LoggerType
{
    CONSOLE,
    FILE
};

class Logger
{
public:
    static Logger *GetLogger(LoggerType type);
};

class ConsoleLogger : public Logger
{ ... };

class FileLogger : public Logger
{ ... };

// factory method of logger implementations
Logger *Logger::GetLogger(LoggerType type)
{
    switch (type)
    {
    case LoggerType::CONSOLE:
        return new ConcoleLogger();
    case LoggerType::FILE:
        return new FileLogger();
    default:
        return nullptr;
    }
}

Factory pattern
使用 factory pattern 時須搭配 base class 中定義的 virtual interface 與物件導向中的 polymorphism。

4.12 Use static local variable in static method of a class to realize singleton pattern.

class ConsoleLogger
{
public:
    static ConsoleLogger &GetInstance();
};

// singleton getter method
ConsoleLogger &ConsoleLogger::GetInstance()
{
    static ConsoleLogger singleton;
    return singleton;
}

Singleton pattern
Singleton 常見的另一種實現方式是以一個 static 或 global pointer 來記錄與管理產生的物件實體，其缺點是在 multithreading 環境中須自己加上同步保護 (物件產生與消滅時)，得另外找個地方讓使用者可以將產生出來的 singleton 物件釋放，還得仰賴使用者記得做釋放的操作。相較之下，使用 static variable 形式實現的 singleton，C++ 本身即已保證 static 變數的建立在 multithreading 環境下能妥善被處理，且使用者毋須承擔因忘記釋放 singleton 物件所造成的 memory leak 風險。

4.12.1 Declare constructor and destructor with private access specifier.

class ConsoleLogger
{
private:
    ConsoleLogger(); // <- private constructor
    virtual ~ConsoleLogger(); // <- private destructor
 
public:
    static ConsoleLogger &GetInstance();
};

Why?
在使用 singleton pattern 時應將對應 class 的 constructor 與 destructor 都宣告為 private 成員，以避免使用者不小心繞過 singleton getter method、自行建立其它的物件實體出來 (by constructor)，亦可避免使用者不小心釋放掉 singleton 物件 (by destructor)。

4.12.2 Remove default copy constructor and copy assignment operator.

class ConsoleLogger
{
private:
    ConsoleLogger();
    virtual ~ConsoleLogger();

public:
    ConsoleLogger(const ConsoleLogger&) = delete; // <- disable copy constructor
    ConsoleLogger& operator=(const ConsoleLogger&) = delete; // <- disable copying

    static ConsoleLogger &GetInstance();
};

Why?
在使用 singleton pattern 時應將對應 class 的 copy constructor 與 copy assignment operator 都移除，以避免使用者不小心透過取得的 singleton 意外複製出新的物件實體，例如：

ConsoleLogger &logger = ConsoleLogger::GetInstance(); // correct usage
ConsoleLogger logger1(ConsoleLogger::GetInstance()); // mistake by copy constructor
ConsoleLogger logger2 = ConsoleLogger::GetInstance(); // mistake by copy assignment