Difference between revisions of "Metadata"

From VALEP
Jump to navigation Jump to search
(Authority: Person and Institution)
 
(104 intermediate revisions by 2 users not shown)
Line 1: Line 1:
VALEP uses a relational database as well as parsing tools that enable a restrictive use of certain special metadata. A persons, for example, can be identified as the author of a document, only if the person is already stored in VALEPs table of persons. A date is only accepted by VALEP if it is specified in accordance to the socalled EDTF format. On the other hand, (almost) all free text metadata categories enable the usage of the whole range of [https://de.wikipedia.org/wiki/Unicode Unicode] symbols in VALEP, e.g. Hong Qian can be alternatively spelled as 洪谦.
 
  
 +
== Special Data Types in VALEP ==
  
== Date ==
+
=== Date ===
Date and time is specified in VALEP using all levels of the highly flexible Extended Date Time Format ([[https://www.loc.gov/standards/datetime/ EDTF]]) - see the detailed specification their.
 
  
*To specify a simple date use the Year-Month-Day format, e.g. 2020-12-02
+
The formats of date and time in VALEP are represented in the highly adaptable Extended Date Time Format (EDTF): see the [https://www.loc.gov/standards/datetime/ detailed specification].
 +
 
 +
*To specify a simple date, use the Year-Month-Day format, e.g. 2020-12-02 (and make sure that you separate the digits with a hyphen - rather than any other symbol)
 
*For a range of dates use / between the dates, e.g. 1900-12-24/1900-12-31  
 
*For a range of dates use / between the dates, e.g. 1900-12-24/1900-12-31  
*Entire months and years can be specified in an obvious way as 1932-10 (= October 1932) and 1968
+
*Entire months and years are stated as 1932-10 (= October 1932) and 1968 (= the entire year 1968)
  
== Location ==
+
=== Language ===
  
For locations we are using a powerful internal tool, which is based on the specification of hierarchical structure of [[#Area|areas]] that can be recently edited only by the [[Virtual Archive of Logical Empiricism (VALEP)#The internal part (Admin)|Admin]]. Each [[#City|city]] must be located inside of an area. A concrete [[#Address|address]], then, is always based on a city. Cities and addresses can be specified by all users of VALEP.
+
Languages are stored in VALEP in a csv file that can be edited by the admin. The list was initially taken from the [https://id.loc.gov/vocabulary/languages.html Library of Congress] and recently comprises about 500 languages. To select a language, just type some characters of its name and select the language from the list that appears (click once or hit return key). If the desired language does not appear in the list, please email [mailto:christian.damboeck@univie.ac.at Christian Damböck].
  
=== Area ===
+
=== Locations ===
  
Areas are hierarchically structure. This means two different things.  
+
For locations, we are using a powerful internal tool based on a hierarchical structure of [[#Area|areas]]. Each [[#City|city]] must be located within an area. A specific [[#Address|address]] is always based on a city.
  
*'''Main areas''' are specified as nested boxes, e.g. Lower Austria is inside of Austria, Austria is inside of Europe, therefore Lower Austria also belongs to Europe but because Austria does not belong to Asia, Lower Austria also is not a part of Asia.
+
NOTE: in a later implementation of VALEP (implementation presumably around 2024/25) we will use external internet resources to refer to cities and presumably also addresses. However, the hierarchical structuring of areas will remain as a special feature of VALEP that only embeds the external information on cities and addresses.
 +
 
 +
==== Areas ====
 +
 
 +
Areas are hierarchically structured, resulting in the following two categories:
 +
 
 +
*'''Main areas''' are specified as nested boxes, e.g. Lower Austria is part of Austria; Austria, in turn, is part of Europe, therefore Lower Austria also belongs to Europe. Accordingly, because Austria does not belong to Asia, Lower Austria also is not a part of Asia.
 
*'''Special areas''' do not fit into the nested structure of main areas; examples are:
 
*'''Special areas''' do not fit into the nested structure of main areas; examples are:
**''The English Speeking World'' which includes regions from all over the world (USA, Canada, Great Britain, etc.)
+
**''The English-speaking World'', which includes regions from all over the world (USA, Canada, Great Britain, etc.)
**''The Habsburg Empire'' which includes Austria and parts of other states (Hungary, Czech Republic, Italy, Ukraine, etc.)
+
**''The Habsburg Empire'', which includes Austria and parts of other states (Hungary, Czech Republic, Italy, Ukraine, etc.)
  
The main purpose of this complex combined structure of areas and special areas is that one can filter all documents that were produced in a certain - arbitrarily complex - region.  
+
The main purpose of this complex combined structure of areas and special areas is that one can filter all documents that were produced in a certain - arbitrarily complex - region (implementation of this filter feature is still pending).
  
=== City ===
+
==== City ====
  
A city is a geographical entity that belongs to an area. An address must always be connected to a city, rather than an area.  
+
A city is a geographical entity that belongs to an area. An address must always be combined with a city, rather than an area.
  
=== Address ===
+
==== Address ====
  
An address must specify a city at least but optionally might also add further information about district, zip code, road, number of building, etc. Therefore, ''Vienna'' is one address, ''Vienna 1080, Alserstraße 23/23'' is another one.  
+
At a minimum, an address must specify a city, but you can also add additional information, such as district, zip code, street, building number, etc. Therefore, ''Vienna'' is one address, ''Vienna 1080, Alserstraße 23/32'' is another one.
  
== Person and Institution ==
+
=== Authority: Person and Institution ===
  
In VALEP we store persons and institutions seperately. These can be recently edited only by the admin [[Virtual Archive of Logical Empiricism (VALEP)#The internal part (Admin)|Admin]] but we plan to allow any users to add persons and institutions in a future implementation in 2021.
+
Persons and Institutions (= authorities) can be optionally used in order to add information to all metadata categories that cover authors, creators, recipients, issuers, and those being involved in the creation of a document. In almost all these cases, it is also possible to specify several persons/institutions. Just type in one or more characters that belong to the name of the person/institution and then select the name from the drop-down menu, using the mouse (single click) or keyboard (return key).
  
Persons and Institutions can be optionally used for the specification of all metadata categories that cover authors, creators, receivers, issuers, and those being involved in the development of a document. In almost all these cases it is also possible to specify various persons/institutions. Just type in one or more characters that belong to the name of the person/institution and then select the name from the list, using the mouse (single klick) or keyboard (press return).
+
NOTE: in a later implementation of VALEP (implementation presumably in 2024/25) we will integrate these data with other online resources such as [https://www.wikidata.org/ Wikidata]. All information that is now stored in the '''Persons''' and '''Institutions''' tabs will then be made available directly in this external database.
  
The difference between persons and institutions, at the level of VALEP's datastructure, is only a matter of complexity:  
+
The difference between persons and institutions, at the level of VALEP's data structure, is only a matter of complexity:  
  
*'''Institutions''' are characterized only by a '''Name''' and an '''Abbreviated name''', together with an optional '''description'''
+
*'''Institutions''' are characterized only by a '''Name''' and an '''Abbreviated Name''', together with an optional '''Address''', '''URL''' and '''Description'''
*'''Persons''', by contrast, add to the '''Abbreviated Name''' and optional '''description''' the following
+
*'''Persons''', in contrast, have additional descriptors. In addition to '''Abbreviated Name''' and optional '''Address''', '''URL''' and '''Description''' the following fields are available:
 
**'''First Name''' and '''Surname'''
 
**'''First Name''' and '''Surname'''
**'''Date of Birth''' and '''Date of Death''', both optional fields being specified in the EDTF format (see [[#Date|date]])
+
**'''Date of Birth''' and '''Date of Death''', both optional fields, however their completion requires the EDTF format (see [[#Date|date]])
**'''Profession''', an optional field that covers a brief description of a person's professional occupation and biography (as it might be used in a name index)
+
**'''Profession''', an optional field that can provide a brief description of a person's professional background and general biography (as it might be used in a name index)
**Two optional fields that cover a '''Short biography''' and a '''long biography'''
+
**Two optional fields that cover a '''Short Biography''' and a '''Long Biography'''
**An optional list of '''Institutions''' that might associate, for example, Rudolf Carnap, with institutions such as the ''Vienna Circle'', ''Logical Empiricism'', the journal ''Erkenntnis'' or the ''German Youth Movement''. These institutions must belong to the table of '''Institutions''' as described above.
+
**An optional list of '''Institutions''' that might associate, for example, Rudolf Carnap, with institutions such as the ''Vienna Circle'', ''Logical Empiricism'', the journal ''Erkenntnis'' or the ''German Youth Movement''. These institutions must belong to the list of '''Institutions''' as described above.
  
== Event ==
+
=== Event ===
  
An Event is specified here by a '''Name''' and an optional '''Description''', together with the following:  
+
An Event is defined by a '''Name''' and an optional '''Description''', together with the following fields:  
  
 
* An optional [[#Location|location]] and optional [[#Date|date]]
 
* An optional [[#Location|location]] and optional [[#Date|date]]
* An event type that needs to be chosen from an internally predefined list that includes items such as ''Conference'' or ''Discussion Circle Meeting''
+
* An event type to be selected from an internally compiled list including items such as ''Conference'' or ''Discussion Circle Meeting''
  
 
Events can be added by all users of VALEP.
 
Events can be added by all users of VALEP.
  
== Typeface ==
+
=== Card File ===
  
Specifies the way in which a text was produced. The options are  
+
Each instance of the document category ''File Card'' needs to become associated with a certain ''Card File''. The latter is identified in VALEP by a '''Name''' and by the '''Person''' or '''Institution''' that owns the Card File. Optionally, the '''Typeface''' of a card file might become pre-specified and a '''Description''' might be added.
 +
 
 +
=== Typeface (Text Format) ===
 +
 
 +
Specifies the writing format of a text. The options are  
  
 
*Long Hand  
 
*Long Hand  
 
*Short Hand
 
*Short Hand
*Machine Written
+
*Typed
 
*Printed
 
*Printed
 
*Electronic
 
*Electronic
 
*Mixed
 
*Mixed
  
These options are fixed and cannot be changed by users of VALEP.
+
These options are preset and cannot be changed by users of VALEP.
 
 
== Card File ==
 
 
 
Each instance of the document category ''File Card'' needs to become associated with a certain ''Card File''. The latter is identified in VALEP by a '''Name''' and by the '''Person''' or '''Institution''' that ows the Card File. Optionally, the '''Typeface''' of a card file might become pre-specified and a '''Description''' might be added.  
 
  
== Document format ==
+
=== Document Format ===
  
 
A ''Document Format'' is characterized by a '''Name''' and an optional '''Description''' and must be associated with one of the following categories:
 
A ''Document Format'' is characterized by a '''Name''' and an optional '''Description''' and must be associated with one of the following categories:
Line 83: Line 90:
 
*Video
 
*Video
  
Examples might be A4 or letter for text documents, vinyl disc for ''Audio''.
+
Examples might be A4 or letter-size for text documents, vinyl disc for ''Audio''. New categories can always be added to the list.
 
 
Document formats can be edited or added by the admin only.  
 
  
== Copying process==
+
=== Copying process ===
  
 
A ''Copying process'' is characterized by a '''Name''' and an optional '''Description''' and must be associated with one of the following categories:
 
A ''Copying process'' is characterized by a '''Name''' and an optional '''Description''' and must be associated with one of the following categories:
Line 96: Line 101:
 
*Video
 
*Video
  
Examples might be blue print or Xerox for text documents.
+
Examples might be blue print or Xerox for text documents. New items can always be added to the list.
  
Copying processes can be edited or added by the admin only.
+
== General Data Types in VALEP ==
  
== Data types in VALEP ==
+
VALEP uses a relational database as well as parsing tools to ensure consistent metadata. An individual, for example, can only be identified as the author of a document if the name is already entered in VALEP's list of persons. VALEP only accepts date entries matching the EDTF format. On the other hand, (almost) all metadata categories allowing for free text entry accept a wide range of [https://home.unicode.org/ Unicode] symbols, e.g. Hong Qian can alternatively be spelled 洪谦.
  
These are the data types being used in VALEP, here specified in the way in which they are covered in the following two sections:
+
Here is a list of data types used in VALEP in the order in which there are discussed in the following sections:
* ''Enum'' specifies a metadata category where the user must choose one value from an internal predefined list
+
* ''Enum'' defines a metadata category where the user must choose one value from an internal predefined list
 
* ''Boolean'' can be either true or false
 
* ''Boolean'' can be either true or false
* ''Name'' means that the user needs to choose a data set from the table ''Name'' that can be edited in the [[Virtual Archive of Logical Empiricism (VALEP)#The internal part (Admin)|admin]] section
+
* ''Name'' means that the user needs to choose a data set from ''Name'' list, which can be edited in the [[Virtual Archive of Logical Empiricism (VALEP)#The internal part (Admin)|admin]] section
* ''Authority'' means that the user needs to choose a data set either from the table ''Person'' or ''Institution''
+
* ''Authority'' means that the user needs to choose a data set either from the ''Person'' or ''Institution'' lists
* ''Date'' and ''Location'' need to be specified, according to the rules being described above
+
* ''Date'' and ''Location'' need to be specified, according to the rules described above
 
* ''Unicode (X)'' means that the user can specify text using the whole range of [https://de.wikipedia.org/wiki/Unicode Unicode] symbols; the text is limited to X characters
 
* ''Unicode (X)'' means that the user can specify text using the whole range of [https://de.wikipedia.org/wiki/Unicode Unicode] symbols; the text is limited to X characters
 
* ''Simple (X)'' means that the user can specify text only by using a restricted set of characters that include [A … Z] [a … z] [1 … 0] .,;:-+=*/\~#@§$%!?&(){}[]<>|^°´`‘“
 
* ''Simple (X)'' means that the user can specify text only by using a restricted set of characters that include [A … Z] [a … z] [1 … 0] .,;:-+=*/\~#@§$%!?&(){}[]<>|^°´`‘“
 
* NN means that the content of the data field must not be null (per default, data fields can always be empty in VALEP)
 
* NN means that the content of the data field must not be null (per default, data fields can always be empty in VALEP)
* UQ means that the content of the data field mus be unique, among all instances of the respective data sets (per default, data fields need not be unique in VALEP)
+
* UQ means that the content of the data field must be unique, among all instances of the respective data sets (per default, data fields need not be unique in VALEP)
 +
* (n) means that a data field might contain several instances of data of the specified type (per default, data fields contain either zero or 1 instance of data of the specified type)
  
== All metadata (archive tree) ==
+
== All Metadata (Archive Tree) ==
  
Nodes of the archive tree  
+
The archive tree originates from a root called ''Archives''. The leaves of the tree are files.  
*might contain a '''Description'''
 
*all except ''Collections'' must contain a '''Title''' and optionally contain a '''Long Title'''
 
*all nodes of the archive tree are child unique, regarding their title, i.e., they are unique among all instances of childs of their parent
 
  
 
=== Archive ===
 
=== Archive ===
  
An ''Archive'' is a top level node in the archive tree, which represents, typically, a physical archive that, in turn, might represent either a public institution (e.g., university archive, state archive) or a private collection being held by a private institution or person. But an archive might also house digital collections, of course, that disintegrate into electronic files and folders. Physical archives might house any kind of objects, however, VALEP only allows to store digitizations that use the following four data-types:
+
These are the top level nodes of the archive tree. They represent, typically, a physical archive that might be a public institution (e.g., university archive, state archive) or a private collection held by a private institution or individual.
*Photograph (jpeg)
+
 
 +
*Archive ''Authority'' (e.g., ASP, Brenner Archive, IVC, [private files of] XYZ)
 +
*Long Title ''Unicode (300)''
 +
*Description ''Unicode (30,000)''
 +
*Owners ''User(n)'' (those who are allowed to edit this node)
 +
*Locked ''Boolean'' (if selected, then only the admin may delete this node)
 +
 
 +
=== Collection ===
 +
 
 +
These second level nodes must have an ''Archive'' as a parent. They are not characterized by a ''Title'' but rather by an ''Authority'' that specifies the respective collection, e.g., Carnap collection or Vienna Circle collection.
 +
* Collection ''Authority'' (e.g., Carnap collection, Jeffrey collection)
 +
*Description ''Unicode (30,000)''
 +
*URL ''Unicode (300)''
 +
*Owners ''User(n)'' (those who are allowed to edit this node)
 +
*Locked ''Boolean'' (if selected, then only the admin may delete this node)
 +
 
 +
=== Box (recursive) ===
 +
 
 +
Each box must have either a collection or a box as parent. Boxes may only contain boxes and folders, but no files.
 +
*Title ''Unicode (300)''
 +
*Subtitle ''Unicode (3,000)''
 +
*Description ''Unicode (30,000)''
 +
*Date ''Date'' (the date or date-range during which the material was created)
 +
*Signature type (item number) ''Enum'', options are ''no signature proposals'' and ''signature like folder name'' (in a future implementation, this will enable the transfer of the signature of a [[Virtual Archive of Logical Empiricism (VALEP)#Versions|version]] to a box)
 +
*Source type ''Enum'', options are ''Microfiche'', ''Original'', ''Paper Copy'' and ''Other'' (the purpose of this data field is to distinguish original sources from microfiche and paper copies)
 +
*Digitization Type ''Enum'', options are, among others, ''Compact camera handheld'' or ''Scan''
 +
*Producer ''Authority'' (who created the scans)
 +
*Owners ''User(n)'' (those who are permitted to edit this node)
 +
*Locked ''Boolean'' (if selected, then only the admin may delete this node)
 +
 
 +
=== Folder ===
 +
 
 +
Each folder must have either a box or a collection as a parent. Folders may only contain files and versions.
 +
*Title ''Unicode (300)''
 +
*Subtitle ''Unicode (3,000)''
 +
*Description ''Unicode (30,000)''
 +
*Date ''Date'' (the date or date-range during which the material was created)
 +
*Signature type (item number) ''Enum'', options are ''No signature proposals'' and ''Signature like Folder name'' (in a future implementation, this will enable the transfer of a signature of a [[Virtual Archive of Logical Empiricism (VALEP)#Versions|version]] to a folder)
 +
*Source type ''Enum'', options are ''Microfiche'', ''Original'', ''Paper Copy'' and ''Other'' (the purpose of this data field is to distinguish original sources from microfiche and paper copies)
 +
*Digitization Type ''Enum'', options are, among others, ''Compact camera handheld'' or ''Scan''
 +
*Producer ''Authority'' (who created the scans)
 +
*Owners ''User(n)'' (those who are permitted to edit this node)
 +
*Locked ''Boolean'' (if selected, then only the admin may delete this node)
 +
 
 +
=== File ===
 +
 
 +
Files can only be contained in folders. In other words, boxes cannot contain files or files and folders/boxes at the same time. Here, VALEP differs from the nested file structure found in computer systems. The goal is to make the nested structure more transparent and rigid. Physical archives may house any kind of objects, however, VALEP only permits storage of digitized items in the following file types:
 +
*Photograph (jpg)
 
*Text (pdf)
 
*Text (pdf)
 
*Audio (mp3)
 
*Audio (mp3)
 
*Video (mp4)
 
*Video (mp4)
The ''Archive'' node in itself only stores metadata that identify an archive, inside of VALEP:  
+
 
 +
All files are described by the following metadata:
 
*Title ''Unicode (300)''
 
*Title ''Unicode (300)''
*Long Title ''Unicode (300)''
 
 
*Description ''Unicode (30,000)''
 
*Description ''Unicode (30,000)''
*URL ''Unicode (300)''
+
*Open ''Boolean'' (only open files can be viewed by non-registered users)
*Address  (is recently ''Unicode (300)'' but in a future implementation will become ''Location'')
+
*Owners ''User(n)'' (those who are allowed to edit this node)
*Private Collection ''Boolean'' (indicates that an archive is not a public institution)
+
*Locked ''Boolean'' (if selected, then only the admin may delete this node)
*Owners ''Authority''
 
  
 +
== All Metadata (General Documents) ==
  
== All metadata (documents) ==
+
VALEP stores information on archival items as so-called "general documents" (this section) and "versions" (next section). General documents only contain metadata about an archival item; in contrast, only versions can connect the general document to the files of the archive tree. 
  
 +
The nomenclature for general documents is specified in VALEP in several CSV documents and therefore can easily be edited by the admin. The general structure looks like this:
 +
* A fixed list of 49 '''metadata categories''' used in VALEP
 +
* An adjustable list of '''document categories''' containing 13 items up to date
 +
* An adjustable '''document categories table''' associating document categories with the 49 metadata categories used in the respective document category
 +
* An adjustable list of '''document types''', adding to each document category a list of possible document types
  
 +
=== Metadata Categories ===
  
 +
These are the metadata categories for general documents
  
 
<ol start="0">
 
<ol start="0">
<li>Document Category ''Enum''</li>
+
<li>[[#Document categories|Document Category]] ''Enum, NN''</li>
 
<li>Title ''Unicode (300)''</li>
 
<li>Title ''Unicode (300)''</li>
<li>Title (alternative long) ''Unicode (1,000)''</li>
+
<li>Title (alternative, long) ''Unicode (1,000)''</li>
 +
<li>Description ''Unicode (30,000)''</li>
 +
<li>[[#Document types|Document Type]] ''Enum''</li>
 +
<li>[[#Card file|Card file]] ''Card File, NN''</li>
 +
<li>URL ''Unicode (300)''</li>
 +
<li>Author / Sender ''Authority(n)''</li>
 +
<li>Receiver ''Authority(n)''</li>
 +
<li>Involved ''Authority(n)''</li>
 +
<li>Event ''Event''</li>
 +
<li>Related Events ''Event(n)''</li>
 +
<li>Date ''Date(n)''</li>
 +
<li>[[#Location|Location]] / Place of Record ''Location(n)''</li>
 +
<li>Place of Posting ''Location''</li>
 +
<li>[[#Language|Language]] ''Language(n)''</li>
 +
<li>[[#Typeface|Typeface]], ''Enum''</li>
 +
<li>[[#Document format|Document format]] ''Document Format''</li>
 +
<li>Scope ''Simple (30)'' (e.g., 30 pp, 210 min.) (should become Unicode in a later version)</li>
 +
<li>[[#Document status|Document status]] ''Enum''</li>
 +
<li>Publisher ''Unicode (100)''</li>
 +
<li>Place of Publication ''Unicode (100)''</li>
 +
<li>Series Editor ''Authority(n)''</li>
 +
<li>Series Title ''Unicode (300)''</li>
 +
<li>Volume (Series) ''Simple (30)''</li>
 +
<li>Number of volumes ''Simple (30)''</li>
 +
<li>Edition ''Simple (30)''</li>
 +
<li>Date of first edition ''Date''</li>
 +
<li>Place of first edition ''Unicode (100)''</li>
 +
<li>Publisher of first edition ''Unicode (100)''</li>
 +
<li>ISBN ''Simple (50)''</li>
 +
<li>DOI ''Simple (50)''</li>
 +
<li>Autonumous publication ''Unicode (300)''</li>
 +
<li>Volume (Journal) ''Simple (30)''</li>
 +
<li>Issue ''Simple (30)''</li>
 +
<li>Original Publication ''Unicode (300)''</li>
 +
<li>ISSN ''Simple (50)''</li>
 +
</ol>
 +
 
 +
=== Document Categories and Document Types ===
 +
 
 +
In the present nomenclature, VALEP features 13 document categories and 78 document types. In the following list, we associate each document category with its abbreviation, e.g., (''M'') for ''Manuscript / Chronicle / Object''. We list the document types and note some features as outlined in the document categories table (namely which metadata categories belong to a document category).
 +
 
 +
* '''Manuscript / Chronicle / Object (''M'')''' <br />No receiver, no place of posting (see letter), no event (see minutes or memo), no publication data (see book or article) <br />This is the main document category covering manuscripts, chronicles, and notes, but also financial records, and all kinds of 2D and 3D objects.
 +
** General Manuscript
 +
** Book Manuscript
 +
** Article Manuscript
 +
** Lecture Manuscript
 +
** Sketch
 +
** Note
 +
** Diary
 +
** Chronicle
 +
** Calendar
 +
** Financial Record
 +
** Accounting
 +
** Map
 +
** Internet Object
 +
** Other 2D Object
 +
** 3D Object
 +
* '''Minutes (During Event) (''Minutes'')''' <br />Similar to (''M'') but includes event; date and location are either directly entered or covered by the event (the user is responsible for consistency)
 +
** Minutes
 +
** Lecture Notes
 +
** Discussion Protocol
 +
* '''Photo Series (During Event) (''PhotoS'')''' <br />Similar to (''Minutes'') but, as in (''Photo''), no language, typeface, document format, and scope
 +
** Photo Series  (the specific type of the series might be specified by the type of the event that it is documenting)
 +
* '''Memo / Speech (Before or after Event) (''M/S'')''' <br />Similar to (''Minutes'') but the ''Event'' needs to be distinguished here from the  date and location of the creation of the speech or memo
 +
** Memo (after event)
 +
** Speech / lecture (before event)
 +
* '''Letter / Issued document (''L'')''' <br />Similar to (''M'') but includes receiver and place of posting<br />Letters usually do not have a title (it is optional), but the category also includes a variety of issued documents (bills, tickets, certificates etc.), which often have a title.
 +
** Letter
 +
** Post card
 +
** Picture post card
 +
** Telegram
 +
** Email
 +
** Bill
 +
** Ticket
 +
** Prescription
 +
** Confirmation of Payment
 +
** General Certificate
 +
** Personal Document
 +
** School Certificate
 +
** Testament
 +
* '''Photograph (''Photo'')''' <br />Similar to (''PhotoS'') but instead of an event it covers date and location
 +
** Analog Photograph
 +
** Diapositive (Slide)
 +
** Digital Photograph
 +
* '''File Card (''FC'')''' <br />Similar to (''M'') but is bound to a card file
 +
** General File Card
 +
** Addresses / Biographical Notes
 +
** Bibliographical Notes
 +
** Private Matters
 +
** Business / Financial Matters
 +
* '''Book / other printed matter (''B'')''' <br />Similar to (''M'') but instead of a location it covers a range of bibliographical data (20-31)<br />This category covers all kinds of printed material that does not belong to a periodical.
 +
** Book
 +
** Edited book
 +
** Handbook
 +
** Web page
 +
** General printed matter
 +
** Newspaper clipping (as long as it cannot be identified as an ''Article'')
 +
** Bulk mail
 +
** Promotion brochure
 +
** Letter head
 +
** Envelope
 +
** Calling card
 +
** Diploma Thesis
 +
** MA Thesis
 +
** Dissertation
 +
** Habilitation Thesis
 +
** Seminar Thesis
 +
* '''Article (''A'')''' <br />Similar to (''B'') but covers the bibliographical data (32-36)
 +
** Journal article
 +
** Handbook article
 +
** Newspaper article
 +
** Web article
 +
* '''Proceedings (Book) (''PrB'')''' <br />Similar to (''B'') but also covers an event (e.g., the conference whose contributions are published in the proceedings)
 +
** Conference
 +
** Other Event
 +
* '''Proceedings (Article) (''PrA'')''' <br />Similar to (''A'') but also covers an event (e.g., the conference whose contributions are published in the proceedings)
 +
** Conference
 +
** Other event
 +
* '''Audio (''Audio'')''' <br />Similar to (''M'') but also covers an event, no typeface <br />Will typically but not necessarily be utilized with versions that contain audio files (mp3)
 +
** Interview
 +
** Lecture
 +
** Conference
 +
** Discussion circle meeting
 +
** Radio broadcast
 +
** Podcast
 +
** Music
 +
** Other
 +
* '''Video (''Video'')''' <br />Similar to (''M'') but also covers an event, no typeface (text format) <br />Will typically but not necessarily be utilized with versions that contain video files (mp4)
 +
** Interview
 +
** Lecture
 +
** Conference
 +
** Discussion circle meeting
 +
** TV broadcast
 +
** Podcast
 +
** Documentary
 +
** Other movie
 +
** Other
 +
 
 +
== All Metadata (Versions) ==
 +
 
 +
Versions connect files of the archive tree with documents. A version is a container that consists of a non-empty sequence of files that belong to a folder. Sequences of files must not have gaps. If A<sub>1</sub> ... A<sub>n</sub> is the alphabetically ordered sequence of all files of a folder, then a version must always be characterized by a sequence A<sub>i</sub> ... A<sub>j</sub> with 1 ≤ i ≤ j ≤ n. There are six possible types of versions in VALEP:
 +
* Original
 +
* Copy
 +
* Written Duplicate
 +
* Transcription
 +
* Translation
 +
* Commentary
 +
 
 +
A document might contain several versions of any type. Most versions are characterized by the following metadata:
 +
<ol start="37">
 +
<li>Version Type ''Enum'' as specified above, NN</li>
 +
<li>[[#Copying process|Copying process]] ''Copying Process'' (available only for versions of type ''Copy'')</li>
 +
<li>Signature (item number) ''Unicode (300)'' (Note that signatures (item numbers) of a document represent their location in an archive and therefore cannot be associated here with the general document but only with the version; different versions of the same document might, of course, have different signatures (item numbers)</li>
 +
<li>Specific Comments on this version ''Unicode (30,000)''</li>
 +
<li>Version URL ''Unicode (300)''</li>
 +
</ol>
 +
 
 +
The following metadata is available only for versions of type ''Written duplicate'', ''Transcription'', ''Translation'', and ''Commentary'', and for all versions of a document of the category ''Photograph'' or ''Photo series''
 +
<ol start="42">
 +
<li>[[#Document format|Document format]] (version) ''Document format''/li>
 +
<li>[[#Typeface|Typeface]] (text format) (version) ''Typeface''</li>
 +
<li>Author / Developer (version) ''Authority''</li>
 +
<li>[[#Date|Date]] (version) ''Date''</li>
 +
<li>[[#Location|Location]] (version) ''Location''</li>
 +
<li>[[#Language|Language]] (version) ''Language''</li>
 +
<li>Scope (version) ''Simple (30)''</li>
 
</ol>
 
</ol>

Latest revision as of 06:55, 21 September 2022

Special Data Types in VALEP

Date

The formats of date and time in VALEP are represented in the highly adaptable Extended Date Time Format (EDTF): see the detailed specification.

  • To specify a simple date, use the Year-Month-Day format, e.g. 2020-12-02 (and make sure that you separate the digits with a hyphen - rather than any other symbol)
  • For a range of dates use / between the dates, e.g. 1900-12-24/1900-12-31
  • Entire months and years are stated as 1932-10 (= October 1932) and 1968 (= the entire year 1968)

Language

Languages are stored in VALEP in a csv file that can be edited by the admin. The list was initially taken from the Library of Congress and recently comprises about 500 languages. To select a language, just type some characters of its name and select the language from the list that appears (click once or hit return key). If the desired language does not appear in the list, please email Christian Damböck.

Locations

For locations, we are using a powerful internal tool based on a hierarchical structure of areas. Each city must be located within an area. A specific address is always based on a city.

NOTE: in a later implementation of VALEP (implementation presumably around 2024/25) we will use external internet resources to refer to cities and presumably also addresses. However, the hierarchical structuring of areas will remain as a special feature of VALEP that only embeds the external information on cities and addresses.

Areas

Areas are hierarchically structured, resulting in the following two categories:

  • Main areas are specified as nested boxes, e.g. Lower Austria is part of Austria; Austria, in turn, is part of Europe, therefore Lower Austria also belongs to Europe. Accordingly, because Austria does not belong to Asia, Lower Austria also is not a part of Asia.
  • Special areas do not fit into the nested structure of main areas; examples are:
    • The English-speaking World, which includes regions from all over the world (USA, Canada, Great Britain, etc.)
    • The Habsburg Empire, which includes Austria and parts of other states (Hungary, Czech Republic, Italy, Ukraine, etc.)

The main purpose of this complex combined structure of areas and special areas is that one can filter all documents that were produced in a certain - arbitrarily complex - region (implementation of this filter feature is still pending).

City

A city is a geographical entity that belongs to an area. An address must always be combined with a city, rather than an area.

Address

At a minimum, an address must specify a city, but you can also add additional information, such as district, zip code, street, building number, etc. Therefore, Vienna is one address, Vienna 1080, Alserstraße 23/32 is another one.

Authority: Person and Institution

Persons and Institutions (= authorities) can be optionally used in order to add information to all metadata categories that cover authors, creators, recipients, issuers, and those being involved in the creation of a document. In almost all these cases, it is also possible to specify several persons/institutions. Just type in one or more characters that belong to the name of the person/institution and then select the name from the drop-down menu, using the mouse (single click) or keyboard (return key).

NOTE: in a later implementation of VALEP (implementation presumably in 2024/25) we will integrate these data with other online resources such as Wikidata. All information that is now stored in the Persons and Institutions tabs will then be made available directly in this external database.

The difference between persons and institutions, at the level of VALEP's data structure, is only a matter of complexity:

  • Institutions are characterized only by a Name and an Abbreviated Name, together with an optional Address, URL and Description
  • Persons, in contrast, have additional descriptors. In addition to Abbreviated Name and optional Address, URL and Description the following fields are available:
    • First Name and Surname
    • Date of Birth and Date of Death, both optional fields, however their completion requires the EDTF format (see date)
    • Profession, an optional field that can provide a brief description of a person's professional background and general biography (as it might be used in a name index)
    • Two optional fields that cover a Short Biography and a Long Biography
    • An optional list of Institutions that might associate, for example, Rudolf Carnap, with institutions such as the Vienna Circle, Logical Empiricism, the journal Erkenntnis or the German Youth Movement. These institutions must belong to the list of Institutions as described above.

Event

An Event is defined by a Name and an optional Description, together with the following fields:

  • An optional location and optional date
  • An event type to be selected from an internally compiled list including items such as Conference or Discussion Circle Meeting

Events can be added by all users of VALEP.

Card File

Each instance of the document category File Card needs to become associated with a certain Card File. The latter is identified in VALEP by a Name and by the Person or Institution that owns the Card File. Optionally, the Typeface of a card file might become pre-specified and a Description might be added.

Typeface (Text Format)

Specifies the writing format of a text. The options are

  • Long Hand
  • Short Hand
  • Typed
  • Printed
  • Electronic
  • Mixed

These options are preset and cannot be changed by users of VALEP.

Document Format

A Document Format is characterized by a Name and an optional Description and must be associated with one of the following categories:

  • Text/2D/3D Object (= all document categories except Photograph, Audio, Video)
  • Photograph
  • Audio
  • Video

Examples might be A4 or letter-size for text documents, vinyl disc for Audio. New categories can always be added to the list.

Copying process

A Copying process is characterized by a Name and an optional Description and must be associated with one of the following categories:

  • Text/2D/3D Object (= all document categories except Photograph, Audio, Video)
  • Photograph
  • Audio
  • Video

Examples might be blue print or Xerox for text documents. New items can always be added to the list.

General Data Types in VALEP

VALEP uses a relational database as well as parsing tools to ensure consistent metadata. An individual, for example, can only be identified as the author of a document if the name is already entered in VALEP's list of persons. VALEP only accepts date entries matching the EDTF format. On the other hand, (almost) all metadata categories allowing for free text entry accept a wide range of Unicode symbols, e.g. Hong Qian can alternatively be spelled 洪谦.

Here is a list of data types used in VALEP in the order in which there are discussed in the following sections:

  • Enum defines a metadata category where the user must choose one value from an internal predefined list
  • Boolean can be either true or false
  • Name means that the user needs to choose a data set from Name list, which can be edited in the admin section
  • Authority means that the user needs to choose a data set either from the Person or Institution lists
  • Date and Location need to be specified, according to the rules described above
  • Unicode (X) means that the user can specify text using the whole range of Unicode symbols; the text is limited to X characters
  • Simple (X) means that the user can specify text only by using a restricted set of characters that include [A … Z] [a … z] [1 … 0] .,;:-+=*/\~#@§$%!?&(){}[]<>|^°´`‘“
  • NN means that the content of the data field must not be null (per default, data fields can always be empty in VALEP)
  • UQ means that the content of the data field must be unique, among all instances of the respective data sets (per default, data fields need not be unique in VALEP)
  • (n) means that a data field might contain several instances of data of the specified type (per default, data fields contain either zero or 1 instance of data of the specified type)

All Metadata (Archive Tree)

The archive tree originates from a root called Archives. The leaves of the tree are files.

Archive

These are the top level nodes of the archive tree. They represent, typically, a physical archive that might be a public institution (e.g., university archive, state archive) or a private collection held by a private institution or individual.

  • Archive Authority (e.g., ASP, Brenner Archive, IVC, [private files of] XYZ)
  • Long Title Unicode (300)
  • Description Unicode (30,000)
  • Owners User(n) (those who are allowed to edit this node)
  • Locked Boolean (if selected, then only the admin may delete this node)

Collection

These second level nodes must have an Archive as a parent. They are not characterized by a Title but rather by an Authority that specifies the respective collection, e.g., Carnap collection or Vienna Circle collection.

  • Collection Authority (e.g., Carnap collection, Jeffrey collection)
  • Description Unicode (30,000)
  • URL Unicode (300)
  • Owners User(n) (those who are allowed to edit this node)
  • Locked Boolean (if selected, then only the admin may delete this node)

Box (recursive)

Each box must have either a collection or a box as parent. Boxes may only contain boxes and folders, but no files.

  • Title Unicode (300)
  • Subtitle Unicode (3,000)
  • Description Unicode (30,000)
  • Date Date (the date or date-range during which the material was created)
  • Signature type (item number) Enum, options are no signature proposals and signature like folder name (in a future implementation, this will enable the transfer of the signature of a version to a box)
  • Source type Enum, options are Microfiche, Original, Paper Copy and Other (the purpose of this data field is to distinguish original sources from microfiche and paper copies)
  • Digitization Type Enum, options are, among others, Compact camera handheld or Scan
  • Producer Authority (who created the scans)
  • Owners User(n) (those who are permitted to edit this node)
  • Locked Boolean (if selected, then only the admin may delete this node)

Folder

Each folder must have either a box or a collection as a parent. Folders may only contain files and versions.

  • Title Unicode (300)
  • Subtitle Unicode (3,000)
  • Description Unicode (30,000)
  • Date Date (the date or date-range during which the material was created)
  • Signature type (item number) Enum, options are No signature proposals and Signature like Folder name (in a future implementation, this will enable the transfer of a signature of a version to a folder)
  • Source type Enum, options are Microfiche, Original, Paper Copy and Other (the purpose of this data field is to distinguish original sources from microfiche and paper copies)
  • Digitization Type Enum, options are, among others, Compact camera handheld or Scan
  • Producer Authority (who created the scans)
  • Owners User(n) (those who are permitted to edit this node)
  • Locked Boolean (if selected, then only the admin may delete this node)

File

Files can only be contained in folders. In other words, boxes cannot contain files or files and folders/boxes at the same time. Here, VALEP differs from the nested file structure found in computer systems. The goal is to make the nested structure more transparent and rigid. Physical archives may house any kind of objects, however, VALEP only permits storage of digitized items in the following file types:

  • Photograph (jpg)
  • Text (pdf)
  • Audio (mp3)
  • Video (mp4)

All files are described by the following metadata:

  • Title Unicode (300)
  • Description Unicode (30,000)
  • Open Boolean (only open files can be viewed by non-registered users)
  • Owners User(n) (those who are allowed to edit this node)
  • Locked Boolean (if selected, then only the admin may delete this node)

All Metadata (General Documents)

VALEP stores information on archival items as so-called "general documents" (this section) and "versions" (next section). General documents only contain metadata about an archival item; in contrast, only versions can connect the general document to the files of the archive tree.

The nomenclature for general documents is specified in VALEP in several CSV documents and therefore can easily be edited by the admin. The general structure looks like this:

  • A fixed list of 49 metadata categories used in VALEP
  • An adjustable list of document categories containing 13 items up to date
  • An adjustable document categories table associating document categories with the 49 metadata categories used in the respective document category
  • An adjustable list of document types, adding to each document category a list of possible document types

Metadata Categories

These are the metadata categories for general documents

  1. Document Category Enum, NN
  2. Title Unicode (300)
  3. Title (alternative, long) Unicode (1,000)
  4. Description Unicode (30,000)
  5. Document Type Enum
  6. Card file Card File, NN
  7. URL Unicode (300)
  8. Author / Sender Authority(n)
  9. Receiver Authority(n)
  10. Involved Authority(n)
  11. Event Event
  12. Related Events Event(n)
  13. Date Date(n)
  14. Location / Place of Record Location(n)
  15. Place of Posting Location
  16. Language Language(n)
  17. Typeface, Enum
  18. Document format Document Format
  19. Scope Simple (30) (e.g., 30 pp, 210 min.) (should become Unicode in a later version)
  20. Document status Enum
  21. Publisher Unicode (100)
  22. Place of Publication Unicode (100)
  23. Series Editor Authority(n)
  24. Series Title Unicode (300)
  25. Volume (Series) Simple (30)
  26. Number of volumes Simple (30)
  27. Edition Simple (30)
  28. Date of first edition Date
  29. Place of first edition Unicode (100)
  30. Publisher of first edition Unicode (100)
  31. ISBN Simple (50)
  32. DOI Simple (50)
  33. Autonumous publication Unicode (300)
  34. Volume (Journal) Simple (30)
  35. Issue Simple (30)
  36. Original Publication Unicode (300)
  37. ISSN Simple (50)

Document Categories and Document Types

In the present nomenclature, VALEP features 13 document categories and 78 document types. In the following list, we associate each document category with its abbreviation, e.g., (M) for Manuscript / Chronicle / Object. We list the document types and note some features as outlined in the document categories table (namely which metadata categories belong to a document category).

  • Manuscript / Chronicle / Object (M)
    No receiver, no place of posting (see letter), no event (see minutes or memo), no publication data (see book or article)
    This is the main document category covering manuscripts, chronicles, and notes, but also financial records, and all kinds of 2D and 3D objects.
    • General Manuscript
    • Book Manuscript
    • Article Manuscript
    • Lecture Manuscript
    • Sketch
    • Note
    • Diary
    • Chronicle
    • Calendar
    • Financial Record
    • Accounting
    • Map
    • Internet Object
    • Other 2D Object
    • 3D Object
  • Minutes (During Event) (Minutes)
    Similar to (M) but includes event; date and location are either directly entered or covered by the event (the user is responsible for consistency)
    • Minutes
    • Lecture Notes
    • Discussion Protocol
  • Photo Series (During Event) (PhotoS)
    Similar to (Minutes) but, as in (Photo), no language, typeface, document format, and scope
    • Photo Series (the specific type of the series might be specified by the type of the event that it is documenting)
  • Memo / Speech (Before or after Event) (M/S)
    Similar to (Minutes) but the Event needs to be distinguished here from the date and location of the creation of the speech or memo
    • Memo (after event)
    • Speech / lecture (before event)
  • Letter / Issued document (L)
    Similar to (M) but includes receiver and place of posting
    Letters usually do not have a title (it is optional), but the category also includes a variety of issued documents (bills, tickets, certificates etc.), which often have a title.
    • Letter
    • Post card
    • Picture post card
    • Telegram
    • Email
    • Bill
    • Ticket
    • Prescription
    • Confirmation of Payment
    • General Certificate
    • Personal Document
    • School Certificate
    • Testament
  • Photograph (Photo)
    Similar to (PhotoS) but instead of an event it covers date and location
    • Analog Photograph
    • Diapositive (Slide)
    • Digital Photograph
  • File Card (FC)
    Similar to (M) but is bound to a card file
    • General File Card
    • Addresses / Biographical Notes
    • Bibliographical Notes
    • Private Matters
    • Business / Financial Matters
  • Book / other printed matter (B)
    Similar to (M) but instead of a location it covers a range of bibliographical data (20-31)
    This category covers all kinds of printed material that does not belong to a periodical.
    • Book
    • Edited book
    • Handbook
    • Web page
    • General printed matter
    • Newspaper clipping (as long as it cannot be identified as an Article)
    • Bulk mail
    • Promotion brochure
    • Letter head
    • Envelope
    • Calling card
    • Diploma Thesis
    • MA Thesis
    • Dissertation
    • Habilitation Thesis
    • Seminar Thesis
  • Article (A)
    Similar to (B) but covers the bibliographical data (32-36)
    • Journal article
    • Handbook article
    • Newspaper article
    • Web article
  • Proceedings (Book) (PrB)
    Similar to (B) but also covers an event (e.g., the conference whose contributions are published in the proceedings)
    • Conference
    • Other Event
  • Proceedings (Article) (PrA)
    Similar to (A) but also covers an event (e.g., the conference whose contributions are published in the proceedings)
    • Conference
    • Other event
  • Audio (Audio)
    Similar to (M) but also covers an event, no typeface
    Will typically but not necessarily be utilized with versions that contain audio files (mp3)
    • Interview
    • Lecture
    • Conference
    • Discussion circle meeting
    • Radio broadcast
    • Podcast
    • Music
    • Other
  • Video (Video)
    Similar to (M) but also covers an event, no typeface (text format)
    Will typically but not necessarily be utilized with versions that contain video files (mp4)
    • Interview
    • Lecture
    • Conference
    • Discussion circle meeting
    • TV broadcast
    • Podcast
    • Documentary
    • Other movie
    • Other

All Metadata (Versions)

Versions connect files of the archive tree with documents. A version is a container that consists of a non-empty sequence of files that belong to a folder. Sequences of files must not have gaps. If A1 ... An is the alphabetically ordered sequence of all files of a folder, then a version must always be characterized by a sequence Ai ... Aj with 1 ≤ i ≤ j ≤ n. There are six possible types of versions in VALEP:

  • Original
  • Copy
  • Written Duplicate
  • Transcription
  • Translation
  • Commentary

A document might contain several versions of any type. Most versions are characterized by the following metadata:

  1. Version Type Enum as specified above, NN
  2. Copying process Copying Process (available only for versions of type Copy)
  3. Signature (item number) Unicode (300) (Note that signatures (item numbers) of a document represent their location in an archive and therefore cannot be associated here with the general document but only with the version; different versions of the same document might, of course, have different signatures (item numbers)
  4. Specific Comments on this version Unicode (30,000)
  5. Version URL Unicode (300)

The following metadata is available only for versions of type Written duplicate, Transcription, Translation, and Commentary, and for all versions of a document of the category Photograph or Photo series

  1. Document format (version) Document format/li>
  2. Typeface (text format) (version) Typeface
  3. Author / Developer (version) Authority
  4. Date (version) Date
  5. Location (version) Location
  6. Language (version) Language
  7. Scope (version) Simple (30)