Large File Technology

When we receive large files through alternate transfer mechanisms, we first move them to the production server, with at least one backup on another machine.

Bitstream replace method
This method is preferred, but it can be a pain for packages with many bitstreams. It is the best way to update items while they are still in the curation process. As of 2014-05-01, there is a python script that automates most of the process, but you must have the bitstream id and the path to the new file

$ /home/dryad/scripts/replace_largefile_bitstream.py Enter the bitstream ID or URL:  Bitstream ID: 123456 Enter the path on the filesystem to the large file:  Using MIME type application/x-tar Executing SQL: UPDATE bitstream set size_bytes=12345667, name='filename.tgz', source='filename.tgz' ,checksum='8e9b105a306649361b07c8fe55e1f496', bitstream_format_id=55 where bitstream_id = 123456 UPDATE 1
 * 1) Locate the large file on the filesystem, and have the full path ready.
 * 2) Locate the item in the curation system.
 * 3) Go to the Item Bitstreams page, hover your mouse over the link for downloading the bitstream, and copy the link
 * 4) Run the replace_largefile_bitstream.py script

Copying '/home/transfer/filename.tgz' -> '/opt/dryad-data/assetstore/12/34/56/12345695953157516195038155553895222810' $

The script verifies the checksums before and after. Verify on the 'Item Bitstreams' page that the name and format have been updated, and that the file is accessible from the item page. The script does NOT remove the uploaded file from the FTP directory, so remove that manually after confirming the replacement.

Export/Import method
This method is preferred for dealing with many bitstreams, but only use it for items in the Dryad archive. Do not use it for items that are in the user's workspace or the curation workflow system.


 * Locate the target data package
 * Export a the item from DSpace. NOTE: we don't use the package exporter, because the output METS/AIP output formats are highly redundant and not suitable for human editing.

sudo /opt/dryad/bin/dspace export -t ITEM -i target_item_handle -d. -m -n 1


 * Modify the item to contain the appropriate text/files.
 * Import the new item into DSpace.
 * the importer must be run from one directory above the target content
 * when using the sample content from Dryad's code repository, .svn directories must be removed

sudo /opt/dryad/bin/dspace import -a -s. -c 10255/2 -m map.out -e rscherle@nescent.org
 * This creates a new DSpace object, so remove the old DSpace item (using its handle).

Current problems with the process:
 * HIVE must be disabled -- it doesn't work well with the commandline tools
 * DOI registration doesn't work