Difference between revisions of "Metadata"

From VALEP
Jump to navigation Jump to search
Line 87: Line 87:
 
Document formats can be edited or added by the admin only.  
 
Document formats can be edited or added by the admin only.  
  
== Copying process==
+
== Copying process ==
  
 
A ''Copying process'' is characterized by a '''Name''' and an optional '''Description''' and must be associated with one of the following categories:
 
A ''Copying process'' is characterized by a '''Name''' and an optional '''Description''' and must be associated with one of the following categories:
Line 99: Line 99:
  
 
Copying processes can be edited or added by the admin only.
 
Copying processes can be edited or added by the admin only.
 +
 +
== User ==
 +
 +
This is the internal database of users of VALEP. Users can be created, edited, and deleted only by the admin. They receive a username and password and a certain role inside of VALEP.
  
 
== Data types in VALEP ==
 
== Data types in VALEP ==
Line 112: Line 116:
 
* NN means that the content of the data field must not be null (per default, data fields can always be empty in VALEP)
 
* NN means that the content of the data field must not be null (per default, data fields can always be empty in VALEP)
 
* UQ means that the content of the data field mus be unique, among all instances of the respective data sets (per default, data fields need not be unique in VALEP)
 
* UQ means that the content of the data field mus be unique, among all instances of the respective data sets (per default, data fields need not be unique in VALEP)
 +
* (n) means that a data field might contain several instances of data of the specified type (per default, data fields contain either zero or 1 instance of data of the specified type)
  
 
== All metadata (archive tree) ==
 
== All metadata (archive tree) ==
Line 134: Line 139:
 
*Address  (is recently ''Unicode (300)'' but in a future implementation will become ''Location'')
 
*Address  (is recently ''Unicode (300)'' but in a future implementation will become ''Location'')
 
*Private Collection ''Boolean'' (indicates that an archive is not a public institution)
 
*Private Collection ''Boolean'' (indicates that an archive is not a public institution)
*Owners ''Authority''
+
*Owners ''User(n)''
 +
 
 +
=== Collection ===
 +
 
 +
These second level nodes must have an ''Archive'' as parent. They are not characterized by a ''Title'' but rather by an ''Authority'' that specifies the respective collection, e.g., Carnap collection or Vienna Circle collection.
 +
* Collection ''Authority''
 +
*Description ''Unicode (30,000)''
 +
*Owners ''User(n)''
 +
 
  
  

Revision as of 15:43, 2 December 2020

VALEP uses a relational database as well as parsing tools that enable a restrictive use of certain special metadata. A persons, for example, can be identified as the author of a document, only if the person is already stored in VALEPs table of persons. A date is only accepted by VALEP if it is specified in accordance to the socalled EDTF format. On the other hand, (almost) all free text metadata categories enable the usage of the whole range of Unicode symbols in VALEP, e.g. Hong Qian can be alternatively spelled as 洪谦.


Date

Date and time is specified in VALEP using all levels of the highly flexible Extended Date Time Format ([EDTF]) - see the detailed specification their.

  • To specify a simple date use the Year-Month-Day format, e.g. 2020-12-02
  • For a range of dates use / between the dates, e.g. 1900-12-24/1900-12-31
  • Entire months and years can be specified in an obvious way as 1932-10 (= October 1932) and 1968

Location

For locations we are using a powerful internal tool, which is based on the specification of hierarchical structure of areas that can be recently edited only by the Admin. Each city must be located inside of an area. A concrete address, then, is always based on a city. Cities and addresses can be specified by all users of VALEP.

Area

Areas are hierarchically structure. This means two different things.

  • Main areas are specified as nested boxes, e.g. Lower Austria is inside of Austria, Austria is inside of Europe, therefore Lower Austria also belongs to Europe but because Austria does not belong to Asia, Lower Austria also is not a part of Asia.
  • Special areas do not fit into the nested structure of main areas; examples are:
    • The English Speeking World which includes regions from all over the world (USA, Canada, Great Britain, etc.)
    • The Habsburg Empire which includes Austria and parts of other states (Hungary, Czech Republic, Italy, Ukraine, etc.)

The main purpose of this complex combined structure of areas and special areas is that one can filter all documents that were produced in a certain - arbitrarily complex - region.

City

A city is a geographical entity that belongs to an area. An address must always be connected to a city, rather than an area.

Address

An address must specify a city at least but optionally might also add further information about district, zip code, road, number of building, etc. Therefore, Vienna is one address, Vienna 1080, Alserstraße 23/23 is another one.

Person and Institution

In VALEP we store persons and institutions seperately. These can be recently edited only by the admin Admin but we plan to allow any users to add persons and institutions in a future implementation in 2021.

Persons and Institutions can be optionally used for the specification of all metadata categories that cover authors, creators, receivers, issuers, and those being involved in the development of a document. In almost all these cases it is also possible to specify various persons/institutions. Just type in one or more characters that belong to the name of the person/institution and then select the name from the list, using the mouse (single klick) or keyboard (press return).

The difference between persons and institutions, at the level of VALEP's datastructure, is only a matter of complexity:

  • Institutions are characterized only by a Name and an Abbreviated name, together with an optional description
  • Persons, by contrast, add to the Abbreviated Name and optional description the following
    • First Name and Surname
    • Date of Birth and Date of Death, both optional fields being specified in the EDTF format (see date)
    • Profession, an optional field that covers a brief description of a person's professional occupation and biography (as it might be used in a name index)
    • Two optional fields that cover a Short biography and a long biography
    • An optional list of Institutions that might associate, for example, Rudolf Carnap, with institutions such as the Vienna Circle, Logical Empiricism, the journal Erkenntnis or the German Youth Movement. These institutions must belong to the table of Institutions as described above.

Event

An Event is specified here by a Name and an optional Description, together with the following:

  • An optional location and optional date
  • An event type that needs to be chosen from an internally predefined list that includes items such as Conference or Discussion Circle Meeting

Events can be added by all users of VALEP.

Typeface

Specifies the way in which a text was produced. The options are

  • Long Hand
  • Short Hand
  • Machine Written
  • Printed
  • Electronic
  • Mixed

These options are fixed and cannot be changed by users of VALEP.

Card File

Each instance of the document category File Card needs to become associated with a certain Card File. The latter is identified in VALEP by a Name and by the Person or Institution that ows the Card File. Optionally, the Typeface of a card file might become pre-specified and a Description might be added.

Document format

A Document Format is characterized by a Name and an optional Description and must be associated with one of the following categories:

  • Text/2D/3D Object (= all document categories except Photograph, Audio, Video)
  • Photograph
  • Audio
  • Video

Examples might be A4 or letter for text documents, vinyl disc for Audio.

Document formats can be edited or added by the admin only.

Copying process

A Copying process is characterized by a Name and an optional Description and must be associated with one of the following categories:

  • Text/2D/3D Object (= all document categories except Photograph, Audio, Video)
  • Photograph
  • Audio
  • Video

Examples might be blue print or Xerox for text documents.

Copying processes can be edited or added by the admin only.

User

This is the internal database of users of VALEP. Users can be created, edited, and deleted only by the admin. They receive a username and password and a certain role inside of VALEP.

Data types in VALEP

These are the data types being used in VALEP, here specified in the way in which they are covered in the following two sections:

  • Enum specifies a metadata category where the user must choose one value from an internal predefined list
  • Boolean can be either true or false
  • Name means that the user needs to choose a data set from the table Name that can be edited in the admin section
  • Authority means that the user needs to choose a data set either from the table Person or Institution
  • Date and Location need to be specified, according to the rules being described above
  • Unicode (X) means that the user can specify text using the whole range of Unicode symbols; the text is limited to X characters
  • Simple (X) means that the user can specify text only by using a restricted set of characters that include [A … Z] [a … z] [1 … 0] .,;:-+=*/\~#@§$%!?&(){}[]<>|^°´`‘“
  • NN means that the content of the data field must not be null (per default, data fields can always be empty in VALEP)
  • UQ means that the content of the data field mus be unique, among all instances of the respective data sets (per default, data fields need not be unique in VALEP)
  • (n) means that a data field might contain several instances of data of the specified type (per default, data fields contain either zero or 1 instance of data of the specified type)

All metadata (archive tree)

Nodes of the archive tree

  • might contain a Description
  • all except Collections must contain a Title and optionally contain a Long Title
  • all nodes of the archive tree are child unique, regarding their title, i.e., they are unique among all instances of childs of their parent

Archive

An Archive is a top level node in the archive tree, which represents, typically, a physical archive that, in turn, might represent either a public institution (e.g., university archive, state archive) or a private collection being held by a private institution or person. But an archive might also house digital collections, of course, that disintegrate into electronic files and folders. Physical archives might house any kind of objects, however, VALEP only allows to store digitizations that use the following four data-types:

  • Photograph (jpeg)
  • Text (pdf)
  • Audio (mp3)
  • Video (mp4)

The Archive node in itself only stores metadata that identify an archive, inside of VALEP:

  • Title Unicode (300)
  • Long Title Unicode (300)
  • Description Unicode (30,000)
  • URL Unicode (300)
  • Address (is recently Unicode (300) but in a future implementation will become Location)
  • Private Collection Boolean (indicates that an archive is not a public institution)
  • Owners User(n)

Collection

These second level nodes must have an Archive as parent. They are not characterized by a Title but rather by an Authority that specifies the respective collection, e.g., Carnap collection or Vienna Circle collection.

  • Collection Authority
  • Description Unicode (30,000)
  • Owners User(n)


All metadata (documents)

  1. Document Category Enum
  2. Title Unicode (300)
  3. Title (alternative long) Unicode (1,000)