Metadata

From VALEP
Revision as of 11:03, 2 December 2020 by Christiandamboeck (talk | contribs)
Jump to navigation Jump to search

VALEP uses a relational database as well as parsing tools that enable a restrictive use of certain special metadata. A persons, for example, can be identified as the author of a document, only if the person is already stored in VALEPs table of persons. A date is only accepted by VALEP if it is specified in accordance to the socalled EDTF format. On the other hand, (almost) all free text metadata categories enable the usage of the whole range of Unicode symbols in VALEP, e.g. Hong Qian can be alternatively spelled as 洪谦.


Date

Date and time is specified in VALEP using all levels of the highly flexible Extended Date Time Format ([EDTF]) - see the detailed specification their.

  • To specify a simple date use the Year-Month-Day format, e.g. 2020-12-02
  • For a range of dates use / between the dates, e.g. 1900-12-24/1900-12-31
  • Entire months and years can be specified in an obvious way as 1932-10 (= October 1932) and 1968

Location

For locations we are using a powerful internal tool, which is based on the specification of hierarchical structure of areas that can be recently edited only by the Admin. Each city must be located inside of an area. A concrete address, then, is always based on a city. Cities and addresses can be specified by all users of VALEP.

Area

Areas are hierarchically structure. This means two different things.

  • Main areas are specified as nested boxes, e.g. Lower Austria is inside of Austria, Austria is inside of Europe, therefore Lower Austria also belongs to Europe but because Austria does not belong to Asia, Lower Austria also is not a part of Asia.
  • Special areas do not fit into the nested structure of main areas; examples are:
    • The English Speeking World which includes regions from all over the world (USA, Canada, Great Britain, etc.)
    • The Habsburg Empire which includes Austria and parts of other states (Hungary, Czech Republic, Italy, Ukraine, etc.)

The main purpose of this complex combined structure of areas and special areas is that one can filter all documents that were produced in a certain - arbitrarily complex - region.

City

A city is a geographical entity that belongs to an area. An address must always be connected to a city, rather than an area.

Address

An address must specify a city at least but optionally might also add further information about district, zip code, road, number of building, etc. Therefore, Vienna is one address, Vienna 1080, Alserstraße 23/23 is another one.

Person and Institution

In VALEP we store persons and institutions seperately. These can be recently edited only by the admin Admin but we plan to allow any users to add persons and institutions in a future implementation in 2021.

Persons and Institutions can be optionally used for the specification of all metadata categories that cover authors, creators, receivers, issuers, and those being involved in the development of a document. In almost all these cases it is also possible to specify various persons/institutions. Just type in one or more characters that belong to the name of the person/institution and then select the name from the list, using the mouse (single klick) or keyboard (press return).

The difference between persons and institutions, at the level of VALEP's datastructure, is only a matter of complexity:

  • Institutions are characterized only by a Name and an Abbreviated name, together with an optional description
  • Persons, by contrast, add to the Abbreviated Name and optional description the following
    • First Name and Surname
    • Date of Birth and Date of Death, both being specified in the EDTF format (see date)


List of metadata (archive tree)

List of metadata (documents)

  • Enum specifies a metadata category where the user must choose one value from an internal predefined list
  • Name means that the user needs to choose a data set from the table Name that can be edited in the admin section
  • Date and Location need to be specified, according to the rules being described above
  • Unicode (X) means that the user can specify text using the whole range of Unicode symbols; the text is limited to X characters
  • Simple (X) means that the user can specify text only by using a restricted set of characters that include [A … Z] [a … z] [1 … 0] .,;:-+=*/\~#@§$%!?&(){}[]<>|^°´`‘“


  1. Document Category Enum
  2. Title Unicode (300)
  3. Title (alternative long) Unicode (1,000)