ASCII/Unicode Data

 

Rules for ASCII / Unicode files

 

Several of the import formats used by the program are ASCII / Unicode files.  Many software packages can produce data in this format and it is very easy to write a program to output data as an 'ASCII/Unicode' file.

 

These notes give some general guidelines on ASCII / Unicode data.

 

Here is a typical example (from a user defined product requirements list):-

 

Work for week 7

END/12B,MFC15,750.0,398,0,100

TOP/11A,MFC15,1750.0,790,0,200

END/17A,MDF18,650.0,390.5,140,A,B,B,0,1,TEAK,PLN

"DOOR/RH",MDF21,960.0,450.0,25,,,,,1,OAK

UNIT/2X,MDF18,960.0,450,30

 

All the data is based on the ASCII/Unicode character set. Each line is a single record with one or more fields. Each field of data is separated by a comma (ASCII decimal code 44) and each line is terminated by a carriage return and line feed (ASCII decimal code 13 and ASCII decimal code 10).

 

Comma separated values - For the files the fields are not of a fixed length or format so each field is recognised by its relative position in the line. For example, the eleventh item (the item after 10 commas) must always refer to the same field. In the above example the eleventh field refers to the front laminate and is not needed for every record. For those records that do need it however it must always be in the eleventh position even if some of the other fields are not needed. This is achieved by representing empty fields (fields that are not needed or empty) by adjacent commas.

 

Omit any fields at the end of a line that are not needed or empty.

 

This scheme is often referred to as 'CSV' or 'comma separated values'.

 

Double quotation marks - Some fields are surrounded by the double quotation symbols e.g. "DATA". (ASCII decimal code 34). This is a common practice in comma separated files for marking the field. The quotes are removed leaving the field value only, e.g. DATA. It is used, for example, to include commas in a field;

 

"WOOD CO","UNIT 25,INDUSTRIAL ESTATE","BRISTOL"

 

In this example there are 3 fields (not 4):-

 

WOOD CO

UNIT 25, INDUSTRIAL ESTATE

BRISTOL

 

Using other separators - For many cases of importing data you can use the software to specify the separating character is not a comma. The usual need for this is when numbers are represented in the data as:-

 

230,50 rather than 230.50

 

For example, if the separating character is : (ASCII decimal code 58) rather than the comma use this all the time and do not mix the two as separators. Here is a simple example:-

 

END/2:123:600,0:500,5:MDF18

 

In this example '600,0' stands for 600.0mm and the ':' colon is used as the separator.

 

Upper and lower case - For all fields, except the text fields (such as product description) or fields inside double quote marks, lower case characters are converted to upper case. Sizes can be millimetres, decimal inches, or fractional inches.

 

Header lines - For some files (but not all) there are one or more header lines, for example, 'Work for week 7' in the example above. This is clearly different from the other lines and is a different record. The header lines are at the top of the file so that it is easy to recognise them.

 

The number of header lines (if any) is required to create or read the correct file format.

 

Notes:

 

The supported file encoded formats can be set from the Import parameters of the relevant application.

 

The encoding formats supported are:-

 

ASCII/ANSI
Unicode UTF-8

Unicode UTF-16(LE)

 
See also