Difference between revisions of "Virtual Archive of Logical Empiricism (VALEP)"

From VALEP
Jump to navigation Jump to search
Line 42: Line 42:
 
=== The logarithmic scale of archival work ===
 
=== The logarithmic scale of archival work ===
 
[[File:Logarithmicscale.jpg|thumb]]
 
[[File:Logarithmicscale.jpg|thumb]]
Note also that representing an archive in the way in which it is physically organized also yields a very strong pragmatic benefit. If a digitization of an archive is stored in folders that mirror the structure of the archive, then in VALEP one may upload the entire material, basically, with a single mouse click. The archival work being involved here is close to zero. On the other hand, if this archive is really large as, for example, the Schlick papers in Haarlem that comprise some 50,000 pages or the Carnap papers at the ASP that might come close to 100,000 pages, then one might expect that a carefull processing of the documents that belong to this archive might possibly need to process several tenthousands of documents. If an archivist, say, processes 20 documents per hour, this might amount at several years of full time work of an archivist. So, we compare here two extremely different points on the scale of gains divided by working time:
+
Note also that representing an archive in the way in which it is physically organized also yields a very strong pragmatic benefit. If a digitization of an archive is stored in folders that mirror the structure of the archive, then in VALEP one may upload the entire material, basically, with a single mouse click. The archival work being involved here is close to zero. On the other hand, if this archive is really large as, for example, the Schlick papers in Haarlem that comprise some 50,000 pages or the Carnap papers at the ASP that might come close to 100,000 pages, then one might expect that a carefull processing of the documents that belong to this archive might possibly need to process several tenthousands of documents. If an archivist, say, processes 20 documents per hour, this might amount at several years of full time work of an archivist. So, we compare here two extremely different points on the scale of gains divided by working time. Most of the available archival resources will never become processed into a fine grained document structure, for mere lack of time and financial resources. However, these sources might still be made available, at the level of representation of the physical structure of the source.
 
 
 
 
  
 
=== Adequate [[metadata]] are important ===
 
=== Adequate [[metadata]] are important ===
Line 50: Line 48:
 
Metadata can be needlessly complex and confusing. A careful selection is important. This includes that a document should be associated only with these metadata categories that might become relevant for it. Only a letter, for example, has a receiver or a place of posting, whereas a manuscript unlike a published book or article may not offer any publication data. So, one important aspect of making making metadata adequate is to restrict documents of a certain category to those metadata categories being relevant here.  
 
Metadata can be needlessly complex and confusing. A careful selection is important. This includes that a document should be associated only with these metadata categories that might become relevant for it. Only a letter, for example, has a receiver or a place of posting, whereas a manuscript unlike a published book or article may not offer any publication data. So, one important aspect of making making metadata adequate is to restrict documents of a certain category to those metadata categories being relevant here.  
  
But metadata should also be carefully selected, regarding their format. This holds, in particular, for key metadata such as [[Metadata#Date|date]] and [[Metadata#Location|location]]. Dates should be able to cover not only (several) single days but also entire months or years, and date ranges, e.g. from December 24 1924 until October 1930. This allows one to cover also cases where the date of a document is not sufficiently localized or where a document was produced over a longer period and/or at different days or years. Locations, on the other hand, should become embedded into the mereological structure of geography. That Vienna, for example, belongs to Austria and Europe but also to the Habsburg Empire and the German speeking world, is a fact that is not easily to be reproduced but is needed in order to pick out all Viennese locations, if one filters documents from Vienna, Austria, Europe, the Habsburg Empire, or the German speeking world. Finally, in many other cases, e.g., regarding persons, institutions, languages, a consistently searchable and filterable layout is easily obtained if the database uses relational features and stores these items in certain predefined lists or tables.  
+
But metadata should also be carefully selected, regarding their format. This holds, in particular, for key metadata such as [[Metadata#Date|date]] and [[Metadata#Location|location]]. Dates should be able to cover not only (several) single days but also entire months or years, and date ranges, e.g. from December 24 1924 until October 1930. This allows one to cover also cases where the date of a document is not sufficiently localized or where a document was produced over a longer period and/or at different days or years. Locations, on the other hand, should become embedded into the mereological structure of geography. That Vienna, for example, belongs to Austria and Europe but also to the Habsburg Empire and the German speeking world, is a fact that is not easily to be reproduced but is needed in order to pick out all Viennese locations, if one filters documents from Vienna, Austria, Europe, the Habsburg Empire, or the German speeking world. Finally, in many other cases, e.g., regarding persons, institutions, languages, a consistently searchable and filterable layout is easily obtained if the database uses relational features and stores these items in certain predefined lists or tables. References to these predefined resources, however, should typically be optional, in order to keep the structure of a database as flexible as possible.  
  
=== Documents may have instances ([[#Versions|versions]], [[#Chapters|chapters]]) being spread over different archival sources ===
+
=== Documents might have instances ([[#Versions|versions]], [[#Chapters|chapters]]) being spread over different archival sources ===
  
 
An important aspect of the integration of several archival sources is that documents tend to be located not only in one folder/box that belongs to collection X. Rather, the following holds quite frequently:  
 
An important aspect of the integration of several archival sources is that documents tend to be located not only in one folder/box that belongs to collection X. Rather, the following holds quite frequently:  
Line 58: Line 56:
 
* There are written duplicates of a document, transcriptions, and translations as well as commentaries that lie at very different archival locations.  
 
* There are written duplicates of a document, transcriptions, and translations as well as commentaries that lie at very different archival locations.  
 
* Finally, a document might disintegrate into several parts or chapters that, in turn, might be spread over different archives (some of them might be the orignal source, some might be copies, written duplicates, transcriptions, etc.)
 
* Finally, a document might disintegrate into several parts or chapters that, in turn, might be spread over different archives (some of them might be the orignal source, some might be copies, written duplicates, transcriptions, etc.)
 +
 +
=== Sources might be kept internal as long as copyright issues are not positively resolved ===
 +
  
  
 
=== Desirable Features ===
 
=== Desirable Features ===
  
The following features would be desirable additions to the coverage of existing archival tools:  
+
Along the lines of these considerations, the following features would be desirable additions to the typical coverage of existing archival tools:  
 
* To cover the physical structure of an archive (in order to serve the mnemotechnical skills of researchers and make existing finding aids more useful)
 
* To cover the physical structure of an archive (in order to serve the mnemotechnical skills of researchers and make existing finding aids more useful)
* To provide  
+
* To provide parts with high gains and low costs first and add the rest -- very high gains and very high costs -- only in these cases where the existing resources make this possible
 +
* To provide a flexible handling of metadata categories that tailor them to the required document categories
 +
* To ensure that critical metadata categories such as date and location use a most flexible, consistent and transparent format, together with suitable parsing tools (that avoid inconsistent entries)
 +
* To implement other critical metadata via predefined lists and tables in a relational database setting, while keeping data fields optional whenever possible
 +
* To provide suitable tools that enable the processing of decentralized documents that disintegrate into several versions and chapters
  
 
=== VALEP offers them ===
 
=== VALEP offers them ===
 +
 +
Hardly surprising, the aforementioned features are all offered by VALEP. The design of this tool was from the beginning centered around the idea of combining representation of an archive via its physical structure with representation via documents. The rest of the innovative features of VALEP in part directly followed from this key idea -- this is true, for example, for the implementation of versions and chapters who somewhat intermediate beteween (general) documents and archives --, and in part dived into the conception on the basis of feedback from archivists and the designer's own experience at the archives.
  
 
=== Who can use VALEP? ===
 
=== Who can use VALEP? ===
 +
 +
VALEP is vailable to everybody and it's free of charge in all its varieties. Typical users of VALEP might include:
 +
* Public and private institutions that house material on the history of Logical Empiricism and want to use VALEP as a tool that helps them to distribute their sources and integrate them with other relevant material
 +
* Private persons that hold collections being relevant for the history of Logical Empiricism and want to use VALEP not just to distribute and integrate their sources but also to safeguard them for the future
 +
* Researchers from all over the world who digitized material in the archives and are willing to share this with others and/or want to use VALEP as a tool that allows them to process and better organize their sources
  
 
== Future prospects ==
 
== Future prospects ==

Revision as of 15:30, 3 December 2020

This is the electronic hanbook of VALEP. The page was created on Dec 1, 2020 and will be continusly developed in the following weeks.

  • On the history, hosts, and cooperation partners of VALEP see About VALEP
  • See how VALEP is processing knowledge into metadata
  • Or jump directly to VALEP


The scope and mission of VALEP

Error creating thumbnail: File missing

VALEP is an archive management tool that is intended as a platform for the history of Logical Empiricism and related currents.

VALEP processes

  • (left/red part of the window) the hierarchical structures of archives that include archives, collections, digitizations, shelfs, boxes, folders, files
  • (middle/green part of the window) documents that process files of an archive into objects that belong to a certain document category, document type and become specified by means of metadata that include title, description, author, date
  • (upper right/yellow part of the window) All archive nodes and documents are characterized by metadata that can be viewed in the upper right part of the window
  • (lower right/blue part of the window) Files and documents can be watched in an integrated document viewer (already available) and they can be downloaded and printed (to be implemented in 2021)

VALEP stores titles, descriptions and the like as Unicode. But some metadata categories that include date, location, language, persons, and institutions are stored here via references in a relational database and/or using special formats and parsing tools, e.g., EDTF for data, and an internal tool for the mereological grasp of locations. See the metadata page for the details.

Is it digital humanities?

If one expects from a digital humanities project the adoption of sophisticated statistical methods of experimental research, then the answer is clearly no. Though the data pool being built by VALEP might in the future be used for the adoption of such methods, VALEP neither now nor in the near future is planning to integrate any tools for complex statistical evaluation.

On the other hand, VALEP is certainly aiming to collect large amounts of data. The history of Logical Empiricism, together with related currents such as Neokantianism, French Positivism, British Empiricism, and American Pragmatism, comprises of dozens of main figures and probably thousands of minor figures that include university and private scholars. The estates of many of these relevant figures are to be found in public institutions and private collections. Further material was collected by relevant universitarian and private institutions. There are thousands of manuscripts, publications, and probably millions of letters between representatives of the relevant currents that might be taken into account in one or another way, in our studies of Logical Empiricism. VALEP allows us to story any of these sources, as soon as we get them available in electronic form. Then, we can search them and filter them, in order to select the material that is relevant for us. This is, of course, also a variety of digital humanities.

Existing tools are document oriented and typically cover only rudimentary metadata

Existing tools for the management of archival sources include (1) those tools that university archives such as the Archives of Scientific Philosophy use; (2) open tools such as PhilArchive where everybody might upload electronic documents; (3) tools being tailored for the presentation of the material of a specific origin such as the papers of Ludwig Wittgenstein. All these tools have in common that they are more or less strictly document oriented. They do not mirror the physical structure of an archive but rather store documents that form a particular unit of metadata. This approach could be fruitful, if the processing of the documents might be rather well developed and the metadata might be clear and transparent and sufficiently complex.

However, the problem is that most of the existing tools cover only rather rudimentary metadata, and, in the case of the tools being used by public archives, the problem is often that they hardly process single documents as forming a logical unit of some kind (e.g. letter from Otto Neurath to Rudolf Carnap from December 26, 1934) but rather focus on those units being naturally provided by the archive, viz., folders that contain, e.g., several letters from Carnap to Neurath from the years 1923 to 1929 and sometimes might also include further material that does not directly relate to the main theme. In cases like that, complex metadata may not be possible at all, simply because the document units are too vague.

An archive oriented presentation might be helpful

In cases where a digital archive only covers rudimentary metadata and rather ambiguous documents, it might be most helpful to include a presentation of the digital material that represents the physical structure of an archive. Archives typically structure their material into collections and subcollections, shelfs, boxes, folders, and the like, and the finegrained structure of this organization of the material very often already represents a certain order, e.g., distinguishes between manuscripts and correspondence, puts some chronological order to the material and/or picks out certain topics or correspondence partners. Even if such an order is quite inconsistent and also covers pure chaos at times, users of an archive usually are able to use this order in a mnemotechnical way, often supported by useful finding aids that exist for an archive. Therefore, the most obvious way to make electronic archival sources more transparent and usable would be to add a perspective on the material that mirrors the physcial structure of the archive.

The logarithmic scale of archival work

Error creating thumbnail: File missing

Note also that representing an archive in the way in which it is physically organized also yields a very strong pragmatic benefit. If a digitization of an archive is stored in folders that mirror the structure of the archive, then in VALEP one may upload the entire material, basically, with a single mouse click. The archival work being involved here is close to zero. On the other hand, if this archive is really large as, for example, the Schlick papers in Haarlem that comprise some 50,000 pages or the Carnap papers at the ASP that might come close to 100,000 pages, then one might expect that a carefull processing of the documents that belong to this archive might possibly need to process several tenthousands of documents. If an archivist, say, processes 20 documents per hour, this might amount at several years of full time work of an archivist. So, we compare here two extremely different points on the scale of gains divided by working time. Most of the available archival resources will never become processed into a fine grained document structure, for mere lack of time and financial resources. However, these sources might still be made available, at the level of representation of the physical structure of the source.

Adequate metadata are important

Metadata can be needlessly complex and confusing. A careful selection is important. This includes that a document should be associated only with these metadata categories that might become relevant for it. Only a letter, for example, has a receiver or a place of posting, whereas a manuscript unlike a published book or article may not offer any publication data. So, one important aspect of making making metadata adequate is to restrict documents of a certain category to those metadata categories being relevant here.

But metadata should also be carefully selected, regarding their format. This holds, in particular, for key metadata such as date and location. Dates should be able to cover not only (several) single days but also entire months or years, and date ranges, e.g. from December 24 1924 until October 1930. This allows one to cover also cases where the date of a document is not sufficiently localized or where a document was produced over a longer period and/or at different days or years. Locations, on the other hand, should become embedded into the mereological structure of geography. That Vienna, for example, belongs to Austria and Europe but also to the Habsburg Empire and the German speeking world, is a fact that is not easily to be reproduced but is needed in order to pick out all Viennese locations, if one filters documents from Vienna, Austria, Europe, the Habsburg Empire, or the German speeking world. Finally, in many other cases, e.g., regarding persons, institutions, languages, a consistently searchable and filterable layout is easily obtained if the database uses relational features and stores these items in certain predefined lists or tables. References to these predefined resources, however, should typically be optional, in order to keep the structure of a database as flexible as possible.

Documents might have instances (versions, chapters) being spread over different archival sources

An important aspect of the integration of several archival sources is that documents tend to be located not only in one folder/box that belongs to collection X. Rather, the following holds quite frequently:

  • There is the original document being located in archive X, whereas copies are to be found in other archives, e.g., blueprints of an letter being kept by the sender (which might contain relevant information that the original letter does not provide)
  • There are written duplicates of a document, transcriptions, and translations as well as commentaries that lie at very different archival locations.
  • Finally, a document might disintegrate into several parts or chapters that, in turn, might be spread over different archives (some of them might be the orignal source, some might be copies, written duplicates, transcriptions, etc.)

Sources might be kept internal as long as copyright issues are not positively resolved

Desirable Features

Along the lines of these considerations, the following features would be desirable additions to the typical coverage of existing archival tools:

  • To cover the physical structure of an archive (in order to serve the mnemotechnical skills of researchers and make existing finding aids more useful)
  • To provide parts with high gains and low costs first and add the rest -- very high gains and very high costs -- only in these cases where the existing resources make this possible
  • To provide a flexible handling of metadata categories that tailor them to the required document categories
  • To ensure that critical metadata categories such as date and location use a most flexible, consistent and transparent format, together with suitable parsing tools (that avoid inconsistent entries)
  • To implement other critical metadata via predefined lists and tables in a relational database setting, while keeping data fields optional whenever possible
  • To provide suitable tools that enable the processing of decentralized documents that disintegrate into several versions and chapters

VALEP offers them

Hardly surprising, the aforementioned features are all offered by VALEP. The design of this tool was from the beginning centered around the idea of combining representation of an archive via its physical structure with representation via documents. The rest of the innovative features of VALEP in part directly followed from this key idea -- this is true, for example, for the implementation of versions and chapters who somewhat intermediate beteween (general) documents and archives --, and in part dived into the conception on the basis of feedback from archivists and the designer's own experience at the archives.

Who can use VALEP?

VALEP is vailable to everybody and it's free of charge in all its varieties. Typical users of VALEP might include:

  • Public and private institutions that house material on the history of Logical Empiricism and want to use VALEP as a tool that helps them to distribute their sources and integrate them with other relevant material
  • Private persons that hold collections being relevant for the history of Logical Empiricism and want to use VALEP not just to distribute and integrate their sources but also to safeguard them for the future
  • Researchers from all over the world who digitized material in the archives and are willing to share this with others and/or want to use VALEP as a tool that allows them to process and better organize their sources

Future prospects

The public part

Archive tree

Documents

Versions

Chapters

Metadata Field

The File Viewer

The internal part (Construction Site) - all users except admins

The internal part (Admin)