Smart Game Format

From HexWiki
Revision as of 14:16, 8 January 2023 by Selinger (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The Smart Game Format (SGF) is a file format for game records of 2-player board games. It's a text-only, tree-based format that was originally designed for the game of Go, but has been adapted for a number of other games including Hex. Games stored in the SGF format can easily be emailed, posted or processed with text-based tools. SGF files usually use the filename extension .sgf.

The main purpose of SGF is to store records of completed games, and to provide features for storing games that have been analyzed and annotated (e.g., board markup, comments, and variations).

Description of the file format

SGF is a text-based file format (not a binary format). The format can be described in two steps:

1. Syntax rules. This governs how SGF files are parsed. At this level, the SGF format provides a very generic representation of abstract trees of key-value dictionaries. It can be used for many kinds of tree-like data and is not necessarily limited to game trees.

2. Semantic rules. This governs how specific keys and values should be interpreted, often in game-specific ways.

The official SGF specification mixes syntactic and semantics concepts; for example, it specifies how property values must be parsed depending on what kind of property they belong to. By contrast, here we give an (equivalent) description that strictly separates syntax from semantics. This allows the file format to be parsed without any semantic knowledge, and it allows semantic properties to be checked without any knowledge of parsing.

Lexical structure

When reading SGF, the text is first converted to a sequence of lexical tokens. There are 8 different kinds of token:

  • left parenthesis '('
  • right parenthesis ')'
  • semicolon ';'
  • left square bracket '['
  • right square bracket ']'
  • colon ':'
  • a property name, which is a sequence of one or more upper-case ASCII letters
  • a literal string, which is any sequence of:
    • any characters except ':', ']', and '\'
    • two-character escape sequences, which consist of '\' followed by any character

A literal string starts immediately after a '[' or ':' token, and extends until the next unescaped ']' or ':'. No characters except ':', ']', and '\' have special syntactic meanings in literal strings, and in particular, if '[', '(', ')', or ';' appear in a literal string, they are not interpreted as separate tokens. The escape character is '\', and any character following '\' is added to the literal string unchanged, even if that character is ':', ']', or '\'. The only exception is that if '\' is immediately followed by a newline, both are removed. Literal strings may contain whitespace characters, including newlines, and these are preserved. Whitespace is also preserved at the beginning or end of literal strings.

All of the tokens are expressed in the ASCII character set, except for literal string data, which can use any character set. White space before or after tokens is ignored except when it is part of a literal string. Newlines can be encoded as NL, CR, CRNL, or NLCR. The interpretation of literal string data, including what kinds of strings can be used in specified contexts, is further defined by semantic rules, but plays no role in parsing.

In current applications, property names consist of one or two upper-case ASCII letters, and some legacy implementations may not recognize property values that are longer than 2 letters.

Syntactic structure

An SGF file describes one or more finitely branching ordered trees. Moreover, each node of the tree holds a dictionary, which is a mapping from property names to certain kinds of structured values. We begin by describing the encoding of dictionaries.

Dictionaries

A tuple consists of the token '[', zero or more literal strings that are separated by ':', and the token ']'. Examples of tuples are:

[]
[value]
[value1:value2]
[value1:value2:value3]
[Values may be arbitrary strings of characters,
including newlines and   other whitespace. 
Be aware\: the characters '\:', '\]', and '\\' must be escaped.
Other characters \m\a\y be escaped but this is optional. Sequences
such as \n have no special meaning; this is just another way to 
write the letter n.
]

A binding consists of a property name followed by one or more tuples. Examples are:

FF[4]
AP[HexGui:0.9]
AB[a1][a2][a3]
C[This is a comment!]

Where a property name is followed by more than one tuple, the data is intended to be unordered. In other words, the following are two ways of expressing exactly the same data:

AB[a1][a2]
AB[a2][a1]

The data within each tuple is ordered. For example, the following are distinct:

AP[name:version]
AP[version:name]

The semantic rules place further restrictions on how many tuples are allowed after certain property names, and how many components are allowed in certain tuples.

A dictionary consists of zero more more bindings. The property names in any one dictionary must be distinct, and their ordering is not significant (they may and often will be reordered by an application). Example:

AP[HexGui:0.9]FF[4]GM[11]SZ[11]

Tree structure

A node in the tree consists of the token ';' followed by a dictionary (remember that dictionaries can be empty). Here are some examples of nodes:

;AP[HexGui:0.9]FF[4]GM[11]SZ[11]
;B[i3]
;
;AB[f4][g2]PL[B]

A tree is given by the following grammar:

tree ::= '(' node+ tree* ')'

Here, node+ means a sequence of one or more nodes, and tree* means a sequence of zero or more trees. Trees are interpreted as follows: the tree

( node₁ node₂ node₃ ... nodeₙ tree₁ ... treeₖ )

has a single root node₁ with a single child node₂, which has a single child node₃ and so on until nodeₙ, which has k children tree₁ ... treeₖ:

Tree1.png

Note that it is possible that k = 0, in which case nodeₙ is a leaf; it is also possible that n = 1, in which case the entire tree is a leaf. Here are some examples:

Code Tree
(a) Tree2.png
(a b c) Tree3.png
(a (b (c))) Tree3.png
(a (b) (c)) Tree4.png
(a b (c) (d e (f (g) (h)) (i))) Tree5.png

Finally, an SGF file holds a sequence of one or more trees. (The idea of this is that a single file may hold more than one game record, each with its own root. However, in practice, most SGF files contain exactly one tree, and most software that reads SGF files will ignore all but the first tree in it).

To conclude this section, here is an example of a syntactically (but not semantically) well-formed SGF file representing the tree

Tree5.png.

Each node has a dictionary with a single property NN holding the node's label.

(;NN[a];NN[b](;NN[c])(;NN[d];NN[e](;NN[f](;NN[g])(;NN[h]))(;NN[i])))

Semantic rules

Each property accepts specific types of values that are described below. Some properties may only appear in root nodes, and other properties may appear in any node. Some properties are mutually exclusive, i.e., cannot be used together in the same node.

A tuple with 1 component, such as [11], is referred to as a simple value, and a tuple with more than 1 component, such as [name:version], is referred to as a composite value. The SGF format does not permit composite values with more than 2 components.

As a special case, if a property requires a simple value, but a composite value is specified, it is converted to a simple value by concatenating all of its literal strings into a single string separated by ':'. This is because the SGF specification stipulates that ':' may appear unescaped in literal strings for properties whose semantics expects a simple value. Since semantic information is not available at parsing time, we re-construct such values during semantic interpretation.

Some common value types are:

  • Number. Example: [11].
  • Point. This is the name of a cell on the Hex board. Following standard Hex conventions, a cell is named by a column label (one or more letters) followed by a row label (one or more digits). If there are more than 26 columns, they are labeled by base-26 alphabet numbers, i.e., the next columns after 'z' are 'aa', 'ab', 'ac', etc. Cell names are case insensitive. Examples: [a1], [f6], [ab28].
  • Move. This is either a point, or one of the special moves 'swap-sides', 'swap-pieces', 'pass', 'resign', 'forfeit'.
  • Text. This is arbitrary text, except that all whitespace characters other than newlines (example: tab, page break) are converted to spaces.
  • Simpletext. This is arbitrary text, except that all whitespace characters including newlines are converted to spaces.

The SGF format defines a large number of property names, but many are rarely used, not relevant to Hex, or not supported by current software. We only list the most common property values. Others can be found in the official SGF specifications.

Users and applications are permitted to define their own private property names, as long as they do not clash with existing ones. A useful convention is for private property names to start with 'X'. Applications that read SGF files should ignore property names that they do not know about, and if possible, should preserve them (i.e., when writing the same file again).

In SGF, the players are always called B and W (black and white), regardless of which actual player colors were used in the original game. See conventions for more information on player colors and cell numbering.

Root properties

The following properties may appear at root nodes. They describe global attributes of the game, such as its board size.

  • AP. Value: composite name : version. This identifies the name and version of the software application that generated the SGF file. Example: AP[HexGUI:0.10].
  • FF. Value: integer. This identifies the version of the SGF file format, currently 4. Example: FF[4].
  • GM. Value: integer. This identifies the game. The value for Hex is 11. Example: GM[11].
  • SZ. Value: integer, or composite integer : integer. This identifies the board size. For non-square boards, the number of columns is given before the number of rows, e.g. SZ[6:7] for a board with 6 columns and 7 rows (i.e., the distance between the white edges is smaller than the distance between the black edges). If the number of rows and columns is equal, it must be given as a single integer, e.g. SZ[11].

The following properties are not currently supported by HexGui:

  • PB, PW. Value: simpletext. The name of the black player and white player, respectively. Example: PB[Bill LeBoeuf].
  • RE. Value: simpletext. The result of the game. If given, it must be one of the following: 'B+' or 'W+' for a black or white win, respectively, 'Void' for no result (such as suspended play), '?' for an unknown result. Optionally, the method of winning can be specified after '+', as follows: 'B+R', 'B+Resign', 'W+R', or 'W+Resign' for win by resigning, 'B+T', 'B+Time', 'W+T', 'W+Time' for a win on time, 'B+F', 'B+Forfeit', 'W+F', or 'W+Forfeit' for a win by forfeit.
  • DT. Value: date. The date on which the game was played, in the format 'YYYY-MM-DD'. There is support for partial dates and date ranges; see the official specifications site for details.
  • EV. Value: simpletext. The name of the event, e.g., tournament. Example: EV[2022 Mind Sports Olympiad]
  • GC. Value: text. Background information on the game, or a summary of the game itself. This free-form text not usually interpreted by software.
  • SO. Value: simpletext. The source of the game (e.g., book). This can be used to identify the website and table number for games played online. Example: SO[BGA 123456789]

Node properties

The following properties may appear at any node in a game tree. They describe attributes of the particular move or node. There are two kinds of nodes: move nodes and setup nodes. A move node holds a single move by one player, including special moves such as 'swap-sides' or 'resign'. A setup node exists to set up a board position, for example, a special starting position for a game or puzzle, or a position that is used to explain some point in a game comment. A node is a setup node if it does not contain the property B or W. The root node is always a setup node.

  • B, W. Value: move. A move by the black, respectively white, player. There can be at most one B or W property at a given node. Examples: B[a3], W[swap-pieces], B[resign]. A node that has no B or W property is a setup node.
  • AB, AW, AE. Value: list of cells. These properties cannot be combined with the B or W properties. In other words, they are only permitted in setup nodes (including the root node of the game). The values of AB, AW, and AE are lists of cells to be occupied by black, white, or empty, respectively. The cell contents overwrite whatever was there before. In particular, AE can be used to empty a previously occupied cell. A setup node usually also has a PL property to define whose turn it is. Example: AB[e7][e8][e9]AW[a6][b6]AE[f8][g6][g7]PL[B].
  • PL. Value: 'B' or 'W'. Sets the player whose turn it is after the current move or setup. This is especially useful in conjunction with setup nodes, but can also be used for move nodes, say in certain handicap situations where a player gets two moves in a row. The SGF format does not enforce that moves are alternating, nor that the player who makes the next move is actually the player whose turn it is. The PL property is mostly used as a display hint, for example, to set the color of the cursor, or to tell the user of a Hex puzzle whose turn it is.
  • C. Value: text. A human-readable comment for the given node. Comments are free-form, but it is good style to avoid referring to physical board directions since it is not known how the board is oriented for the viewer. So instead of "left edge" or "bottom right corner", it might be better to refer to the "A-edge" or the "k11 corner". It probably makes sense to refer to the players as Black and White, regardless of what the original game colors were, since most SGF viewers tend to use black and white. Example: C[White is already connected to the K-edge.]

The following property is only partially supported by HexGui:

  • LB. Value: list of composite cell : simpletext. This assigns (preferably short) labels to cells. Example: LB[a1:x][a2:y][a3:z]

Example

Here is a small but complete example of a game with two branches, some comments, and a setup node:

(;AP[HexGui:0.10.GIT]FF[4]GM[11]SZ[7]C[Example game]
 ;B[c5]C[This opening is too strong. White will definitely swap it.]
 ;W[swap-pieces];B[c4];W[c5];B[a6];W[c6]C[Good.]
 ;B[a7];W[b5];B[a5];W[b3]
 (;B[d2]C[See the next variation for what happens if Black plays b4.]
  ;W[b4];B[d4];W[e5];B[resign];)
 (;B[b4];W[d2]
  ;AB[a2][b2][c1][d1][d4][d5][e1][e5][f1][f5][g5]
  C[Note that White is connected by templates, requiring only the area shown.])
)

See Also

External links

You can find more info in the Official Specifications Site.