Arena epd file

Discussion about chess-playing software (engines, hosts, opening books, platforms, etc...)
Post Reply
TomLynch
Posts: 7
Joined: Sun May 01, 2011 4:45 am
Real Name: Tom Lynch

Arena epd file

Post by TomLynch » Fri Jan 20, 2012 3:34 am

I need to create an epd file in Arena so a number of positions can be loaded into epd. (The object is to have the engine take each position one move forward.) First step calls for originating an arbitary file with .epd at its end. How is this done please?
I would then like to enter epd and work exclusively with game board(s) as opposed to what I believe are called FEN strings. I've seen epd referred to as an alternative to algebraic notation. Thanks.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: Arena epd file

Post by User923005 » Wed Feb 22, 2012 6:10 am

TomLynch wrote:I need to create an epd file in Arena so a number of positions can be loaded into epd. (The object is to have the engine take each position one move forward.) First step calls for originating an arbitary file with .epd at its end. How is this done please?
This is from the PGN standard by Steven J. Edwards:
16.2: EPD

EPD is "Extended Position Description"; it is a standard for describing chess
positions along with an extended set of structured attribute values using the
ASCII character set. It is intended for data and command interchange among
chessplaying programs. It is also intended for the representation of portable
opening library repositories.

A single EPD uses one text line of variable length composed of four data field
followed by zero or more operations. The four fields of the EPD specification
are the same as the first four fields of the FEN specification.

A text file composed exclusively of EPD data records should have a file name
with the suffix ".epd".


16.2.1: History

EPD is based in part on the earlier FEN standard; it has added extensions for
use with opening library preparation and also for general data and command
interchange among advanced chess programs. EPD was developed by John Stanback
and Steven Edwards; its first implementation is in Stanback's master strength
chessplaying program Zarkov.


16.2.2: Uses for an extended position notation

Like FEN, EPD can also be used for general position description. However,
unlike FEN, EPD is designed to be expandable by the addition of new operations
that provide new functionality as needs arise.

Many interesting chess problem sets represented using EPD can be found at the
chess.uoknor.edu ftp site in the directory pub/chess/SAN_testsuites.


16.2.3: Data fields

EPD specifies the piece placement, the active color, the castling availability,
and the en passant target square of a position. These can all fit on a single
text line in an easily read format. The length of an EPD position description
varies somewhat according to the position and any associated operations. In
some cases, the description could be eighty or more characters in length and so
may not fit conveniently on some displays. However, most EPD descriptions pass
among programs only and these are not usually seen by program users.

(Note: due to the likelihood of future expansion of EPD, implementors are
encouraged to have their programs handle EPD text lines of up to 1024
characters long.)

Each EPD data field is composed only of non-blank printing ASCII characters.
Adjacent data fields are separated by a single ASCII space character.


16.2.3.1: Piece placement data

The first field represents the placement of the pieces on the board. The board
contents are specified starting with the eighth rank and ending with the first
rank. For each rank, the squares are specified from file a to file h. White
pieces are identified by uppercase SAN piece letters ("PNBRQK") and black
pieces are identified by lowercase SAN piece letters ("pnbrqk"). Empty squares
are represented by the digits one through eight; the digit used represents the
count of contiguous empty squares along a rank. A solidus character "/" is
used to separate data of adjacent ranks.


16.2.3.2: Active color

The second field represents the active color. A lower case "w" is used if
White is to move; a lower case "b" is used if Black is the active player.


16.2.3.3: Castling availability

The third field represents castling availability. This indicates potential
future castling that may or may not be possible at the moment due to blocking
pieces or enemy attacks. If there is no castling availability for either side,
the single character symbol "-" is used. Otherwise, a combination of from one
to four characters are present. If White has kingside castling availability,
the uppercase letter "K" appears. If White has queenside castling
availability, the uppercase letter "Q" appears. If Black has kingside castling
availability, the lowercase letter "k" appears. If Black has queenside
castling availability, then the lowercase letter "q" appears. Those letters
which appear will be ordered first uppercase before lowercase and second
kingside before queenside. There is no white space between the letters.


16.2.3.4: En passant target square

The fourth field is the en passant target square. If there is no en passant
target square then the single character symbol "-" appears. If there is an en
passant target square then is represented by a lowercase file character
immediately followed by a rank digit. Obviously, the rank digit will be "3"
following a white pawn double advance (Black is the active color) or else be
the digit "6"
after a black pawn double advance (White being the active color).

An en passant target square is given if and only if the last move was a pawn
advance of two squares. Therefore, an en passant target square field may have
a square name even if there is no pawn of the opposing side that may
immediately execute the en passant capture.


16.2.4: Operations

An EPD operation is composed of an opcode followed by zero or more operands and
is concluded by a semicolon.

Multiple operations are separated by a single space character. If there is at
least one operation present in an EPD line, it is separated from the last
(fourth) data field by a single space character.


16.2.4.1: General format

An opcode is an identifier that starts with a letter character and may be
followed by up to fourteen more characters. Each additional character may be a
letter or a digit or the underscore character.

An operand is either a set of contiguous non-white space printing characters or
a string. A string is a set of contiguous printing characters delimited by a
quote character at each end. A string value must have less than 256 bytes of
data.

If at least one operand is present in an operation, there is a single space
between the opcode and the first operand. If more than one operand is present
in an operation, there is a single blank character between every two adjacent
operands. If there are no operands, a semicolon character is appended to the
opcode to mark the end of the operation. If any operands appear, the last
operand has an appended semicolon that marks the end of the operation.

Any given opcode appears at most once per EPD record. Multiple operations in a
single EPD record should appear in ASCII order of their opcode names
(mnemonics). However, a program reading EPD records may allow for operations
not in ASCII order by opcode mnemonics; the semantics are the same in either
case.

Some opcodes that allow for more than one operand may have special ordering
requirements for the operands. For example, the "pv" (predicted variation)
opcode requires its operands (moves) to appear in the order in which they would
be played. All other opcodes that allow for more than one operand should have
operands appearing in ASCII order. An example of the latter set is the "bm"
(best move[s]) opcode; its operands are moves that are all immediately playable
from the current position.

Some opcodes require one or more operands that are chess moves. These moves
should be represented using SAN. If a different representation is used, there
is no guarantee that the EPD will be read correctly during subsequent
processing.

Some opcodes require one or more operands that are integers. Some opcodes may
require that an integer operand must be within a given range; the details are
described in the opcode list given below. A negative integer is formed with a
hyphen (minus sign) preceding the integer digit sequence. An optional plus
sign may be used for indicating a non-negative value, but such use is not
required and is indeed discouraged.

Some opcodes require one or more operands that are floating point numbers.
Some opcodes may require that a floating point operand must be within a given
range; the details are described in the opcode list given below. A floating
point operand is constructed from an optional sign character ("+" or "-"), a
digit sequence (with at least one digit), a radix point (always "."), and a
final digit sequence (with at least one digit).


16.2.4.2: Opcode mnemonics

An opcode mnemonic used for archival storage and for interprogram communication
starts with a lower case letter and is composed of only lower case letters,
digits, and the underscore character (i.e., no upper case letters). These
mnemonics will also all be at least two characters in length.

Opcode mnemonics used only by a single program or an experimental suite of
programs should start with an upper case letter. This is so they may be easily
distinguished should they be inadvertently be encountered by other programs.
When a such a "private" opcode be demonstrated to be widely useful, it should
be brought into the official list (appearing below) in a lower case form.

If a given program does not recognize a particular opcode, that operation is
simply ignored; it is not signaled as an error.


16.2.5: Opcode list

The opcodes are listed here in ASCII order of their mnemonics. Suggestions for
new opcodes should be sent to the PGN standard coordinator listed near the
start of this document.


16.2.5.1: Opcode "acn": analysis count: nodes

The opcode "acn" takes a single non-negative integer operand. It is used to
represent the number of nodes examined in an analysis. Note that the value may
be quite large for some extended searches and so use of (at least) a long (four
byte) representation is suggested.


16.2.5.2: Opcode "acs": analysis count: seconds

The opcode "acs" takes a single non-negative integer operand. It is used to
represent the number of seconds used for an analysis. Note that the value may
be quite large for some extended searches and so use of (at least) a long (four
byte) representation is suggested.


16.2.5.3: Opcode "am": avoid move(s)

The opcode "am" indicates a set of zero or more moves, all immediately playable
from the current position, that are to be avoided in the opinion of the EPD
writer. Each operand is a SAN move; they appear in ASCII order.


16.2.5.4: Opcode "bm": best move(s)

The opcode "bm" indicates a set of zero or more moves, all immediately playable
from the current position, that are judged to the best available by the EPD
writer. Each operand is a SAN move; they appear in ASCII order.


16.2.5.5: Opcode "c0": comment (primary, also "c1" though "c9")

The opcode "c0" (lower case letter "c", digit character zero) indicates a top
level comment that applies to the given position. It is the first of ten
ranked comments, each of which has a mnemonic formed from the lower case letter
"c" followed by a single decimal digit. Each of these opcodes takes either a
single string operand or no operand at all.

This ten member comment family of opcodes is intended for use as descriptive
commentary for a complete game or game fragment. The usual processing of these
opcodes are as follows:

1) At the beginning of a game (or game fragment), a move sequence scanning
program initializes each element of its set of ten comment string registers to
be null.

2) As the EPD record for each position in the game is processed, the comment
operations are interpreted from left to right. (Actually, all operations in n
EPD record are interpreted from left to right.) Because operations appear in
ASCII order according to their opcode mnemonics, opcode "c0" (if present) will
be handled prior to all other opcodes, then opcode "c1" (if present), and so
forth until opcode "c9" (if present).

3) The processing of opcode "cN" (0 <= N <= 9) involves two steps. First, all
comment string registers with an index equal to or greater than N are set to
null. (This is the set "cN" though "c9".) Second, and only if a string
operand is present, the value of the corresponding comment string register is
set equal to the string operand.


16.2.5.6: Opcode "ce": centipawn evaluation

The opcode "ce" indicates the evaluation of the indicated position in centipawn
units. It takes a single operand, an optionally signed integer that gives an
evaluation of the position from the viewpoint of the active player; i.e., the
player with the move. Positive values indicate a position favorable to the
moving player while negative values indicate a position favorable to the
passive player; i.e., the player without the move. A centipawn evaluation
value close to zero indicates a neutral positional evaluation.

Values are restricted to integers that are equal to or greater than -32767 and
are less than or equal to 32766.

A value greater than 32000 indicates the availability of a forced mate to the
active player. The number of plies until mate is given by subtracting the
evaluation from the value 32767. Thus, a winning mate in N fullmoves is a mate
in ((2 * N) - 1) halfmoves (or ply) a
nd has a corresponding centipawn
evaluation of (32767 - ((2 * N) - 1)). For example, a mate on the move (mate
in one) has a centipawn evaluation of 32766 while a mate in five has a
centipawn evaluation of 32758.

A value less than -32000 indicates the availability of a forced mate to the
passive player. The number of plies until mate is given by subtracting the
evaluation from the value -32767 and then negating the result. Thus, a losing
mate in N fullmoves is a mate in (2 * N) halfmoves (or ply) and has a
corresponding centipawn evaluation of (-32767 + (2 * N)). For example, a mate
after the move (losing mate in one) has a centipawn evaluation of -32765 while
a losing mate in five has a centipawn evaluation of -32757.

A value of -32767 indicates an illegal position. A stalemate position has a
centipawn evaluation of zero as does a position drawn due to insufficient
mating material. Any other position known to be a certain forced draw also has
a centipawn evaluation of zero.


16.2.5.7: Opcode "dm": direct mate fullmove count

The "dm" opcode is used to indicate the number of fullmoves until checkmate is
to be delivered by the active color for the indicated position. It always
takes a single operand which is a positive integer giving the fullmove count.
For example, a position known to be a "mate in three" would have an operation
of "dm 3;" to indicate this.

This opcode is intended for use with problem sets composed of positions
requiring direct mate answers as solutions.


16.2.5.8: Opcode "draw_accept": accept a draw offer

The opcode "draw_accept" is used to indicate that a draw offer made after the
move that lead to the indicated position is accepted by the active player.
This opcode takes no operands.


16.2.5.9: Opcode "draw_claim": claim a draw

The opcode "draw_claim" is used to indicate claim by the active player that a
draw exists. The draw is claimed because of a third time repetition or because
of the fifty move rule or because of insufficient mating material. A supplied
move (see the opcode "sm") is also required to appear as part of the same EPD
record. The draw_claim opcode takes no operands.


16.2.5.10: Opcode "draw_offer": offer a draw

The opcode "draw_offer" is used to indicate that a draw is offered by the
active player. A supplied move (see the opcode "sm") is also required to
appear as part of the same EPD record; this move is considered played from the
indicated position. The draw_offer opcode takes no operands.


16.2.5.11: Opcode "draw_reject": reject a draw offer

The opcode "draw_reject" is used to indicate that a draw offer made after the
move that lead to the indicated position is rejected by the active player.
This opcode takes no operands.


16.2.5.12: Opcode "eco": _Encyclopedia of Chess Openings_ opening code

The opcode "eco" is used to associate an opening designation from the
_Encyclopedia of Chess Openings_ taxonomy with the indicated position. The
opcode takes either a single string operand (the ECO opening name) or no
operand at all. If an operand is present, its value is associated with an
"ECO" string register of the scanning program. If there is no operand, the ECO
string register of the scanning program is set to null.

The usage is similar to that of the "ECO" tag pair of the PGN standard.


16.2.5.13: Opcode "fmvn": fullmove number

The opcode "fmvn" represents the fullmove n umber associated with the position.
It always takes a single operand that is the positive integer value of the move
number.

This opcode is used to explicitly represent the fullmove number in EPD that is
present by default in FEN as the sixth field. Fullmove number information is
usually omitted from EPD because it does not affect move generation (commonly
needed for EPD-using tasks) but it does affect game notation (commonly needed
for FEN-using tasks). Because of the desire for space optimization for large
EPD files, fullmove numbers were dropped from EPD's parent FEN. The halfmove
clock information was similarly dropped.


16.2.5.14: Opcode "hmvc": halfmove clock

The opcode "hmvc" represents the halfmove clock associated with the position.
The halfmove clock of a position is equal to the number of plies since the last
pawn move or capture. This information is used to implement the fifty move
draw rule. It always takes a single operand that is the non-negative integer
value of the halfmove clock.

This opcode is used to explicitly represent the halfmove clock in EPD that is
present by default in FEN as the fifth field. Halfmove clock information is
usually omitted from EPD because it does not affect move generation (commonly
needed for EPD-using tasks) but it does affect game termination issues
(commonly needed for FEN-using tasks). Because of the desire for space
optimization for large EPD files, halfmove clock values were dropped from EPD's
parent FEN. The fullmove number information was similarly dropped.


16.2.5.15: Opcode "id": position identification

The opcode "id" is used to provide a simple identifying label for the indicated
position. It takes a single string operand.

This opcode is intended for use with test suites used for measuring
chessplaying program strength. An example "id" operand for the seven hundred
fifty seventh position of the one thousand one problems in Reinfeld's _1001
Winning Chess Sacrifices and Combinations_ would be "WCSAC.0757" while the
fifteenth position in the twenty four problem Bratko-Kopec test suite would
have an "id" operand of "BK.15".


16.2.5.16: Opcode "nic": _New In Chess_ opening code

The opcode "nic" is used to associate an opening designation from the _New In
Chess_ taxonomy with the indicated position. The opcode takes either a single
string operand (the NIC opening name) or no operand at all. If an operand is
present, its value is associated with an "NIC" string register of the scanning
program. If there is no operand, the NIC string register of the scanning
program is set to null.

The usage is similar to that of the "NIC" tag pair of the PGN standard.


16.2.5.17: Opcode "noop": no operation

The "noop" opcode is used to indicate no operation. It takes zero or more
operands, each of which may be of any type. The operation involves no
processing. It is intended for use by developers for program testing purposes.


16.2.5.18: Opcode "pm": predicted move

The "pm" opcode is used to provide a single predicted move for the indicated
position. It has exactly one operand, a move playable from the position. This
move is judged by the EPD writer to represent the best move available to the
active player.

If a non-empty "pv" (predicted variation) line of play is also present in the
same EPD record, the first move of the predicted variation is the same as the
predicted move.

The "pm" opcode is intended for use as a general "display hint" mechanism.


16.2.5.19: Opcode "pv": predicted variation

The "pv" opcode is used to provide a predicted variation for the indicated
position. It has zero or more operands which represent a sequence of moves
playable from the position. This sequence is judged by the EPD writer to
represent the best play available.

If a "pm" (predicted move) operation is also present in the same EPD record,
the predicted move is the same as the first move of the predicted variation.


16.2.5.20: Opcode "rc": repetition count

The "rc" opcode is used to indicate the number of occurrences of the indicated
position. It takes a single, positive integer operand. Any position,
including the initial starting position, is considered to have an "rc" value of
at least one. A value of three indicates a candidate for a draw claim by the
position repetition rule.


16.2.5.21: Opcode "resign": game resignation

The opcode "resign" is used to indicate that the active player has resigned the
game. This opcode takes no operands.


16.2.5.22: Opcode "sm": supplied move

The "sm" opcode is used to provide a single supplied move for the indicated
position. It has exactly one operand, a move playable from the position. This
move is the move to be played from the position.

The "sm" opcode is intended for use to communicate the most re
cent played move
in an active game. It is used to communicate moves between programs in
automatic play via a network. This includes correspondence play using e-mail
and also programs acting as network front ends to human players.


16.2.5.23: Opcode "tcgs": telecommunication: game selector

The "tcgs" opcode is one of the telecommunication family of opcodes used for
games conducted via e-mail and similar means. This opcode takes a single
operand that is a positive integer. It is used to select among various games
in progress between the same sender and receiver.


16.2.5.24: Opcode "tcri": telecommunication: receiver identification

The "tcri" opcode is one of the telecommunication family of opcodes used for
games conducted via e-mail and similar means. This opcode takes two order
dependent string operands. The first operand is the e-mail address of the
receiver of the EPD record. The second operand is the name of the player
(program or human) at the address who is the actual receiver of the EPD record.


16.2.5.25: Opcode "tcsi": telecommunication: sender identification

The "tcsi" opcode is one of the telecommunication family of opcodes used for
games conducted via e-mail and similar means. This opcode takes two order
dependent string operands. The first operand is the e-mail address of the
sender of the EPD record. The second operand is the name of the player
(program or human) at the address who is the actual sender of the EPD record.


16.2.5.26: Opcode "v0": variation name (primary, also "v1" though "v9")

The opcode "v0" (lower case letter "v", digit character zero) indicates a top
level variation name that applies to the given position. It is the first of
ten ranked variation names, each of which has a mnemonic formed from the lower
case letter "v" followed by a single decimal digit. Each of these opcodes
takes either a single string operand or no operand at all.

This ten member variation name family of opcodes is intended for use as
traditional variation names for a complete game or game fragment. The usual
processing of these opcodes are as follows:

1) At the beginning of a game (or game fragment), a move sequence scanning
program initializes each element of its set of ten variation name string
registers to be null.

2) As the EPD record for each position in the game is processed, the variation
name operations are interpreted from left to right. (Actually, all operations
in n EPD record are interpreted from left to right.) Because operations appear
in ASCII order according to their opcode mnemonics, opcode "v0" (if present)
will be handled prior to all other opcodes, then opcode "v1" (if present), and
so forth until opcode "v9" (if present).

3) The processing of opcode "vN" (0 <= N <= 9) involves two steps. First, all
variation name string registers with an index equal to or greater than N are
set to null. (This is the set "vN" though "v9".) Second, and only if a string
operand is present, the value of the corresponding variation name string
register is set equal to the string operand.
I would then like to enter epd and work exclusively with game board(s) as opposed to what I believe are called FEN strings. I've seen epd referred to as an alternative to algebraic notation. Thanks.
There are lots of tools that will allow you to create Epd strings with a GUI. Epd2diag and Winboard can do this, for instance.
You can also turn a file of PGN games into the EPD records that represent those games with a tool like Pgn2fen that you can collect from here:
http://www.7sun.com/chess/

Post Reply