Data Management
The Research Data Policy of TALAR is outlined in this document.
Data Provision
To get your research data archived, you need to provide:
- primary research data
- description of the data (metadata) according to ISO 24622-1 (CMDI)
- information on the availability of the data for the public, moratoria, etc.
- a data depositing agreement (only for data providers external to the University of Tübingen)
Tool Support
-
To help package research data and to support its description, researchers may use Bagman, a tool to create packages in the BagIt file packaging format.
-
CLARIN-D provides DMPTY, a data management planner.
Recommended Formats
For depositing research data with the Tübingen Archive of Language Resources (TALAR) the archive recommennds and accepts the following data formats. In the case of special requirements not addressed by the following recommendations, researchers should contact the archivists at clarin-repository@sfs.uni-tuebingen.de.
Textual Resources
Type | Data Format | Recommended File Extension | Recommended MIME Type | Comment |
---|---|---|---|---|
Text | TXT | .txt | text/plain | recommended |
TEI Documents | XML | .xml | application/xml | recommended |
Document | PDF/A | application/pdf | recommended |
Specialised Linguistic Resources
Type | Data Format | Recommended File Extension | Recommended MIME Type | Comment |
---|---|---|---|---|
Treebanks | ConLL | .csv, .txt | text/plain | recommended |
Treebanks | negra | .csv, .txt | text/plain | recommended |
TCF | XML | .tcf | application/xml+tcf | recommended |
E-Run Experiment File | E-Run | .ebs2 | application/octet-stream | accepted |
E-Merge Experiment File | E-Merge | .emrg2 | application/x-ole-storage | accepted |
E-Studio Experimental File | E-Studio | .es2 | text/plain | accepted |
Feature Structures | HPSG | .skip | text/plain | legacy, now recommended: TEI |
Feature Structures | TDL | .tdl | text/plain | legacy, now recommended: TEI |
Diverse | tusnelda | .sgml | text/sgml | legacy, now recommended: TEI |
Archive Packages
The content of these packaging formats should follow the TALAR recommendations.
Type | Data Format | Recommended File Extension | Recommended MIME Type | Comment |
---|---|---|---|---|
GNU zip | .gz | application/gzip, application/x-gzip | recommended | |
RAR | .rar | application/x-rar | recommended | |
TAR-GZ | .tgz | application/gzip | recommended | |
TAR | .tar | application/x-tar | recommended | |
Zip | .zip | application/zip | recommended |
Statistics Files and Program Code
Type | Data Format | Recommended File Extension | Recommended MIME Type | Comment |
---|---|---|---|---|
R-scripts | R | .r / .R | text/plain, text/x-matlab | recommended |
SPSS Statistics | SPSS | .sav | application/spss-sav | recommended |
SPSS Statistics | SPSS | .spss | application/spss | recommended |
SPSS Statistics | SPSS | .spv | application/x-spss-spv | recommended |
Lisp Program Code | Lisp | .lsp | text/plain | recommended |
Tables | CSV | .csv | text/plain | recommended |
Tab Separated Data File | TAB | .tab | text/plain | recommended |
Perl script | Perl Script | .pl | application/x-perl | recommended |
Media Resources
Type | Data Format | Recommended File Extension | Recommended MIME Type | Comment |
---|---|---|---|---|
Image | BMP | .bmp | image/bmp | recommended |
Image | JPEG | .jpg | image/jpep | recommended |
Image | PNG | .png | image/png | recommended |
Image | TIFF | .tiff | image/tiff | recommended |
Image | GIF | .gif | image/gif | recommended |
Vector Graphic | SVG | .svg | image/svg+xml | recommended |
Audio | WAVE | .wav | audio/wav | recommended |
Video | M4V | .m4v | application/octet-stream | recommended |
Document | PDF/A | application/pdf | recommended | |
Biosemi EEG file | Biosemi | .bdf | biosig/bdf | accepted |
Other Resources
Type | Data Format | Recommended File Extension | Recommended MIME Type | Comment |
---|---|---|---|---|
Stylesheet | CSS | .css | text/plain | recommended |
Document Type Definition | DTD | .dtd | application/xml-dtd | legacy, now XSD |
Extensible Markup Language | XML | .xml | application/xml, text/xml | recommended |
Xschema | XSD | .xsd | application/xml, text/xml | recommended |
Text | HTML | .html | text/html | recommended |
Text | HTML | .xhtml | text/html | recommended |
Database File | SQLite | .sqlite | application/x-sqlite3 | recommended |
Stylesheet | XML | .xsl | application/xslt+xml | recommended |
Metadata in CMDI | XML | .xml | application/xml+cmdi | recommended |
Bibliography Document | BibTeX | .bib | text/plain | recommended |
Jupyter Notebook | Jupyter Notebook | .ipynb | text/plain | accepted |
Presentation | PowerPoint Presentation | .pptx | application/vnd.openxmlformats-officedocument.presentationml.presentation | accepted |
Legacy Data
Type | Data Format | Recommended File Extension | Recommended MIME Type | Comment |
---|---|---|---|---|
Text Document | DOC | .doc | application/msword | not accepted for new data, tolerated for legacy data for the moment |
Text Document | DOCX | .docx | application/vnd.openxmlformats-officedocument.wordprocessingml.document | not accepted for new data, tolerated for legacy data for the moment |
Text Document | RTF | .rtf | text/plain | not accepted for new data, tolerated for legacy data for the moment |
Text Document | OpenDocument Text Document | .odt | application/vnd.oasis.opendocument.text | accepted if containing formulas and other active components |
Text Document | Microsoft Access | .mdb | application/vnd.ms-access | not accepted for new data, tolerated for legacy data for the moment |
Database File | Microsoft Access | .accdb | text/plain | not accepted for new data, tolerated for legacy data for the moment |
Table | OpenDocument Spreadsheet | .ods | application/vnd.oasis.opendocument.spreadsheet | accepted if containing formulas and other active components |
Table | Excel Spreadsheet | .xls | application/vnd.ms-excel | accepted if containing formulas and other active components |
Table | Excel Spreadsheet | .xlsx | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | accepted if containing formulas and other active components |
Presentation | PowerPoint Presentation | .ppt | application/vnd.ms-powerpoint | not accepted for new data, tolerated for legacy data for the moment |
Presentation | PowerPoint Presentation | .pptm | application/vnd.ms-powerpoint.presentation.macroEnabled.12 | not accepted for new data, tolerated for legacy data for the moment |
Presentation | Mac OS X-Paket-Format | .key | application/vnd.apple.keynote | not accepted for new data, tolerated for legacy data for the moment |
Variable Property Mapping | .vpm | text/plain | not accepted for new data, tolerated for legacy data for the moment |
Status of 2020-02-24