Re: [galaxy-dev] [galaxy-user] Experience with Loading NGS data on standalone instance of galaxy
Hello, If I may suggest a workaround for getting big files fast into galaxy: For a local galaxy server, and local files (on the same server as the running galaxy), there is indeed no need to upload the files. What we did is to create a simple tool, which accepts a local path, and copies the file from the local path into the galaxy database. Local file copy is much quicker than uploading the file through HTTP. We need a full file copy because the source files are routinely deleted, but if the source files are kept 'forever', you can modify the tool to create a soft/hard link to the local file - that would be almost instantaneous. Here's an example of such tool: ### ( scifi.cshl.edu is mounted as SMB, so for all practice purposes it behaves like local files ) ### ### The XML file: import_scifi_file.xml ### <tool id="cshl_import_scifi_file" name="Import SciFi file"> <command interpreter="sh">import_scifi_file.sh '$filepath' $output</command> <inputs> <param name="filepath" type="text" size="100" label="File path (on \\Scifi.cshl.edu\hannon )" /> </inputs> <outputs> <data format="txt" name="output" label="Scifi Import: $filepath" /> </outputs> </tool> ### ### The shell script: import_scifi_file.sh ### ### The script basically copies the source file (param 1) ### to the destination file (param 2, which is galaxy's dataset_NNNNN.dat). ### ### The extra code tries to make it as safe as possible, allowing imports only ### from /media/scifi ### ### The script can be changed from copying to linking - would be even faster. ### #!/bin/sh SCIFI_BASE_DIR="/media/scifi" BASE_DIR_LEN=${#SCIFI_BASE_DIR} INPUT="$1" OUTPUT="$2" if [ -z "$OUTPUT" ]; then echo "Usage: $0 [INPUT] [OUTPUT]" >&2 exit 1 fi if [ ! -d "${SCIFI_BASE_DIR}" ]; then echo "Internal Error: Scifi is not mounted on '${SCIFI_BASE_DIR}'" >&2 exit 1 fi FULLPATH="$INPUT" # Convert backslashes (possibly pasted from windows machines) into forward slashes FULLPATH=${FULLPATH//\\/\/} # Remove server prefix (possibly pasted by the user on a windows machine) FULLPATH=${FULLPATH/\/\/scifi\.cshl\.edu\/hannon\//} FULLPATH=${FULLPATH/\/\/scifi\/hannon\//} #Construct full path with "/media/scifi" prefix FULLPATH="${SCIFI_BASE_DIR}/${FULLPATH}" # Safety check - # change to the directory of the requested file. # It should begin with "/media/scifi". # If it doesn't, it means somebody tried to pull a trick by using # a bad mixture of "../../../../.." in the file path. DIRECTORY=$(dirname "$FULLPATH") pushd "$DIRECTORY" > /dev/null REALDIR=$(pwd) popd > /dev/null #extract the prefix from the 'real' directory path REAL_DIR_PREFIX=${REALDIR:0:$BASE_DIR_LEN} #DEBUG #echo "FULLPATH = $FULLPATH" #echo "DIRECTORY= $DIRECTORY" #echo "REALDIR = $REALDIR" #echo "REAL_DIR_PREFIX = $REAL_DIR_PREFIX" #echo "BASE_DIR = $SCIFI_BASE_DIR" # Probably foul play: # the real path of the requested input file does not start with the prefix # of '/media/scifi' - maybe somebody's trying to get a file outside 'scifi' ?? if [ "$REAL_DIR_PREFIX" != "$SCIFI_BASE_DIR" ]; then echo "Error: invalid input file ($INPUT)" >&2 exit 1 fi # If we got here, the $FULLPATH is at least in a valid location under the '/media/scifi' directory. if [ ! -r "$FULLPATH" ]; then echo "Error: input file ($INPUT) is not a valid file." >&2 exit 1 fi cp "$FULLPATH" "$OUTPUT" if [ $? != 0 ]; then echo "Error: failed to copy \"$INPUT\"!" >&2 exit 1 fi echo "File \"$INPUT\" Imported." exit 0 ### ### ### A further improvement is not to allow free-text file path, but instead use dynamic options to select from a list of files, as so (in the XML file): <param name="localfile" type="select" label="Solexa Data File"> <options from_file="cshl_import_files_hannon.txt"> <column name="name" index="1"/> <column name="value" index="0"/> </options> </param> And then have a cron job to create 'cshl_import_files_hannon.txt' with files which can be uploaded. Hope this helps, -gordon. Greg Von Kuster wrote, On 07/21/2009 10:44 AM:
Dear Assaf Thanks a lot sharing your trick. I am currently travelling but I will surely come back to you on this next week when I am back. Best, -Abhi On Tue, Jul 21, 2009 at 12:30 PM, Assaf Gordon<gordon@cshl.edu> wrote:
participants (2)
-
Abhishek Pratap
-
Assaf Gordon