Re: [galaxy-user] [galaxy-bugs] loading data
by Peng, Tao
Hi Jen, I have RNA-seq data for 2 biologial samples loaded in to galaxy.
The samples are from human skin biopsies from genital herpes
reactivation. Using Tophat I will align the reads to human genome. But I
am NOT how to build genital herpes genome (HSV-2) and aligh reads to
HSV-2 genome? Any suggestion is greatly appreciated!
Thanks,
tao
-----Original Message-----
From: Jennifer Jackson [mailto:jen@bx.psu.edu]
Sent: Friday, August 12, 2011 11:54 AM
To: Peng, Tao; galaxy-bugs(a)bx.psu.edu
Subject: Re: [galaxy-bugs] loading data
Hello,
Glad to hear that you were able to load your data.
When logged in, FTP loaded data will initially be in the FTP upload area
under the "Get Data -> Upload" tool (in the center pane). From here,
load data into the history that you wish to work with. If you are not
sure where this is exactly, please note the graphics in the FTP wiki and
screencast.
As a reminder, please send all new and followup questions with a to or
cc to the mailing list, not to individual team members. This is
important for our team to be able to track and answer questions. We
would appreciate your helping out with this going forward.
Hopefully this helps.
Jen
Galaxy team
On 8/12/11 11:39 AM, Peng, Tao wrote:
> Hi jen, I just finished FTP of R4 data set with 8.4 GB. Why can't I
see
> the data on the right panel of galaxy after loggin in?
>
> Tao
>
> -----Original Message-----
> From: Jennifer Jackson [mailto:jen@bx.psu.edu]
> Sent: Tuesday, August 09, 2011 3:26 PM
> To: Peng, Tao
> Cc: galaxy-bugs(a)bx.psu.edu
> Subject: Re: [galaxy-bugs] loading data
>
> Hello Tao,
>
> It sounds like the load works with an uncompressed file but is failing
> when compressed? Perhaps there is a problem with the compression
itself.
>
> We can't really help with this part of the process except to note
which
> compression types we accept. The help wiki link again is:
> http://wiki.g2.bx.psu.edu/Learn/Upload%20via%20FTP
>
> The final goal would be to have all of the data compressed and then
load
>
> using FTP in a batch. If this takes too long to run, then the next
> option is to run either a local or cloud install. Help can be found
at:
> http://getgalaxy.org
> http://galaxyproject.org/Admin/Cloud
>
> Please send all questions related to local or cloud installations to
the
>
> galaxy-dev(a)bx.psu.edu mailing list and not to individual team members
or
>
> to the galaxy-bugs(a)bs.psu.edu mailing list.
>
> Best wishes for your project,
>
> Jen
> Galaxy team
>
> On 8/9/11 2:25 PM, Peng, Tao wrote:
>> If I up-load one file (uncompressed) at one time, I will have 4x8=32
>> files to up-load. Each file takes about 1 hour to load. This is
>> untenable. Any suggestion?
>>
>> Thanks,
>>
>>
>> tao
>>
>> -----Original Message-----
>> From: Jennifer Jackson [mailto:jen@bx.psu.edu]
>> Sent: Tuesday, August 09, 2011 2:04 PM
>> To: Peng, Tao; galaxy-bugs(a)bx.psu.edu
>> Subject: Re: [galaxy-bugs] loading data
>>
>>
>>
>> On 8/9/11 1:55 PM, Peng, Tao wrote:
>>> Hi Jen, if I have 8 fastq.gz files for each of my 4 samples, how
>> should
>>> I load the data for analysis in galaxy?
>>>
>>> Thanks,
>>>
>>> tao
>>>
>>> -----Original Message-----
>>> From: Jennifer Jackson [mailto:jen@bx.psu.edu]
>>> Sent: Tuesday, August 09, 2011 1:31 PM
>>> To: Peng, Tao
>>> Cc: galaxy-bugs(a)bx.psu.edu
>>> Subject: Re: [galaxy-bugs] loading data
>>>
>>> Hello Tao,
>>>
>>> For your other question sent to me directly, Galaxy will accept only
>> one
>>>
>>> file per archive, so sending a multi-file .gz will result in only
the
>>> first file in the archive loaded into the "Get Data -> Upload"
FTP
>> area.
>>>
>>> Given the current issues, please try a restart. Log out of Galaxy
and
>>> FileZilla (and restart your computer if possible). Then begin again,
>>> testing with a single, uncompressed file. You do not have to be
> logged
>>> into your Galaxy account to load with FileZilla, but you will need
to
>> be
>>>
>>> logged in to access the files on the "Get Data -> Upload" form
> after
>>> upload is complete.
>>>
>>> If this fails, perhaps try reinstalling FileZilla or even a
different
>>> FTP client. This tutorial covers an alternative that can also be
used
>>> from the desktop:
>>> http://galaxyproject.org/Learn/Upload via FTP
>>>
>>> If you do have more questions, please leave galaxy-bugs(a)bx.psu.edu
on
>>> the cc list so that our entire team can help contribute to replies.
>> This
>>>
>>> may also be a question that you want to ask the larger user
community
>> at
>>>
>>> galaxy-user(a)bx.psu.edu, as this sounds like an external issue.
> Another
>>> user may have encountered this problem using a PC and have
>> suggestions.
>>>
>>> Thanks!
>>>
>>> Jen
>>> Galaxy team
>>>
>>> On 8/9/11 1:16 PM, Peng, Tao wrote:
>>>> Hi jen, I attached a screen shot of FileZilla. Do you know why the
>> top
>>>> panel says "disconnected from server" while the bottom panel
>> indicates
>>>> it is transferring the data?
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> tao
>>>>
>>>> -----Original Message-----
>>>> From: Jennifer Jackson [mailto:jen@bx.psu.edu]
>>>> Sent: Monday, August 08, 2011 4:55 PM
>>>> To: Peng, Tao
>>>> Cc: galaxy-bugs(a)bx.psu.edu
>>>> Subject: Re: [galaxy-bugs] loading data
>>>>
>>>> Hello Tao,
>>>>
>>>> The loading times do seem to be long for the size of the files,
>>> perhaps
>>>> the internet connection is the problem. That said, FileZilla can
>>> restart
>>>>
>>>> an FTP if it is interrupted and a message would be reported. I
> didn't
>>>> see any restarts reported in the log in the screenshot you sent.
>>>>
>>>> The compression is another place to look for a problem. You might
> try
>>> to
>>>>
>>>> use an alternate compression or no compression and load that way.
> Try
>>>> one file, through FileZilla, as a test to see if that performs
>> better.
>>>> Winzip can compress in a few different formats, .bz2 is also
> accepted
>>> by
>>>>
>>>> Galaxy.
>>>>
>>>> Hopefully one of these will work. Please keep galaxy-bugs in the cc
>>> for
>>>> any follow-up so that our team can help contribute to replies,
>>>>
>>>> Thanks,
>>>>
>>>> Jen
>>>>
>>>> On 8/8/11 4:04 PM, Peng, Tao wrote:
>>>>> Hi jen, when I FTP 9 of the fasq.gz files (each about 350 MB) for
>>>>> sample, the FTP program (FileZilla) keep coping the files again
and
>>>>> again, this is strange to me. Do you know what went wrong here.
>>>>>
>>>>> I attached a powerpoint slide that is the screen shot of FileZila
>>>>> (R1_002.fasq.gz has been copied 3x so far??)
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> tao
>>>>>
>>>>> -----Original Message-----
>>>>> From: Jennifer Jackson [mailto:jen@bx.psu.edu]
>>>>> Sent: Friday, July 29, 2011 1:20 PM
>>>>> To: Peng, Tao
>>>>> Cc: galaxy-bugs(a)bx.psu.edu
>>>>> Subject: Re: [galaxy-bugs] loading data
>>>>>
>>>>> Hello Tao,
>>>>>
>>>>> Our apologies, the public Galaxy instance at http://usegalaxy.org
>> has
>>>>> been experiencing higher than usual usage the last few days. This
>>>> should
>>>>>
>>>>> now be improved. Please try your FTP load again - it is the best
> way
>>>> to
>>>>> load files and the sizes you mention are well within the accepted
>>>>> limits.
>>>>>
>>>>> Thank you for your patience,
>>>>>
>>>>> Best,
>>>>>
>>>>> Jen
>>>>> Galaxy team
>>>>>
>>>>>
>>>>> On 7/29/11 9:47 AM, Peng, Tao wrote:
>>>>>> Hi I work at FHCRC in Seattle. I am trying to load the FASQ data
>>> from
>>>>>> Illumina HySeq. I have 4 samples with 2.4 GB of seq data per
>> sample.
>>>>> For
>>>>>> the overnight FTP, I can't even finish loading half of one
sample.
>>> Is
>>>>>> there any faster way to load the data? Can GALAXY do the analysis
>>> for
>>>>>> this type of large data?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> tao
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/Support
9 years, 3 months
Chip-Seq, Encode Peaks and Galaxy
by Radhouane Aniba
Hi everyone,
I have a list of genomic regions with some variants and would like to study
the correlation between theses variants and epigenomics marks such as
histone modifications.
>From Encode download page, i got some files corresponding to peaks of these
hsitone modifications and would like to know if there is a way to create a
pipeline using galaxy to map my variants, depending on genomic regions to
the information I have from the histone modification peaks.
Is there someone who can point me to a step by step to do things to start
using Galaxy ?
Thank you
Rad
9 years, 4 months
user specific access/options
by Petr Novak
Hi everybody,
I am developing the application on Galaxy server. One of the requirement is
to create user specific list of options. Is it possible to access somehow
$__user_email__ in <options> tag or in <conditional> ?. I did not found
documentation how to used cheetah in galaxy tool xml files but from files
provided with galaxy, cheetah is used only in <command> and <config> tag. Is
that rigth? If it can be used in any part of xml definition file it would
make much easier to generate xml dynamicaly based on the $__user_email__
Does anybody have an idea how to manage this problem?
Petr Novak
9 years, 4 months
RNAseq for Solid data
by Di Kim Nguyen
Hi,
I needed help in crunching RNA-seqs done with solid platform. I have fasq
files. Please give me the steps necessary in this process to get the fpkm
from cufflink? This is mouse data.
Thank you very much, please accept my appreciation.
Di Nguyen
U of Washington
9 years, 4 months
wiggle file
by Richard Mark White
Hi,
this should be simple but it is not..forgive the newbie question.
i am doing chip-seq. bowtie>sam filter for mapped reads>MACS.
i want to create a wiggle file that displays in ucsc, but when i choose the
"WIG" option on macs, and then try to show it in UCSC, it treats each line of
the created WIG file as a separate track, and obviously does not show it as a
graph.
is there a wiki page somewhere that can give me the basics? or can someone
point me in the right direction?
thanks.
rich
9 years, 4 months
why the result files are empty ? and is java applet supported by galaxy system?
by liyanhui607
Hi All,
We want to add a program “PhyML 3.0” to galaxy system. This software was for phylogenetic developed byhttp://www.atgc-montpellier.fr/phyml/binaries.php.It can run normally in Linux environment with a file for example named “sample” as input and two files automatically named “sample.phy_phyml_stats.txt” and “sample.phy_phyml_tree.txt” as outputs.
However, after added it to galaxy system, we found that it can produce the right result files in directory "/var/we/galaxy-dist/database/files/000”, but they did not return to the web page where they are empty. So who can tell me what is wrong?
Another question, is java applet supported by galaxy system?
Thank you very much!
Best Wishes!
Yan-Hui Li
9 years, 4 months
Flagstat crashes in non-linear workflows with TORQUE
by Andrew Warren
>Sorry this is old but I tried recompiling torque after setting the
>NCONNECTS to 20 and the issue's still there.
>
>But there's more: It doesn't affect only flagstat but other non-linear
>workflows. One of the two jobs that are submitted when their "father"
>stopped running triggers the same error:
>PBS error 15033: No free connections
>
>And the best part is that it works fine sometimes. But when it crashes,
>it's always the same job that crashes.
>
>Does anyone have a clue?
>
>Cheers,
>L-A
I was getting the same behavior as you on asynchronous workflows on a
multicore computer that is acting as both head and compute node for the
torque system. Even after recompiling with a higher NCONNECTS I was getting
the same error. I suspect that this is due to galaxy opening up multiple
connections to check the status of currently running jobs. Because there can
be many status checks in an asynchronous workflow the pbs system is randomly
busy depending on when the job submission comes in. To deal with this I
modified the lib/galaxy/jobs/runners/pbs.py script to make multiple attempts
at submitting in the following way:
@@ -286,6 +286,12 @@ class PBSJobRunner( BaseJobRunner ):
log.debug("(%s) submitting file %s" % ( galaxy_job_id, job_file ) )
log.debug("(%s) command is: %s" % ( galaxy_job_id, command_line ) )
job_id = pbs.pbs_submit(c, job_attrs, job_file, pbs_queue_name,
None)
+ ##Modified to give ten tries for qsubbing a job
+ num_try=0
+ while(not job_id and num_try<10):
+ job_id = pbs.pbs_submit(c, job_attrs, job_file,
pbs_queue_name, None)
+ num_try+=1
+
pbs.pbs_disconnect(c)
# check to see if it submitted
I haven't had any problems since.
Cheers,
Andrew
>* Louise-Amélie Schmitt wrote:*>>* Hello everyone*>>**>>* I observed an issue when flagstat is incorporated in a workflow in which*>>* the BAM file it works on is also used by another program (generate*>>* pileup for instance) and is NOT the input dataset (generated by sam to*>>* bam within the workflow).*>>**>>* I tested it with the local job runner and with TORQUE (with the pbs*>>* scheduler and Maui).*>>**>>* - With the local job runner, it works just fine*>>**>>* - With TORQUE I get the following error message:*>>* pbs_submit failed, PBS error 15033: No free connections*>* Hi,*>**>* This can most likely be fixed by increasing the value of NCONNECTS in*>* the TORQUE source, in src/include/libpbs.h, and recompiling on your*>* TORQUE server. I haven't seen a problem after increasing the value to*>* 20.*>**>* --nate*>**
>>* Surprisingly, other non-linear workflows work fine. I only observed this*>>* error with flagstat. Moreover, when flagstat is in a linear workflow, it*>>* works fine too. Ad if it is non-linear but the input dataset is the bam*>>* file flagstat works on, it works fine too.*>>**>>* Please find attached one of the test workflow where I found the error.*>>* The input dataset is a sam file.*>>**>>* Any clue?*>>**>>* Cheers,*>>* LA*>>* {*>>* "a_galaxy_workflow": "true",*>>* "annotation": "to see if it fails if not forked",*>>* "format-version": "0.1",*>>* "name": "test flagstat",*>>* "steps": {*>>* "0": {*>>* "annotation": "",*>>* "id": 0,*>>* "input_connections": {},*>>* "inputs": [*>>* {*>>* "description": "",*>>* "name": "Input Dataset"*>>* }*>>* ],*>>* "name": "Input dataset",*>>* "outputs": [],*>>* "position": {*>>* "left": 200,*>>* "top": 200*>>* },*>>* "tool_errors": null,*>>* "tool_id": null,*>>* "tool_state": "{\"name\": \"Input Dataset\"}",*>>* "tool_version": null,*>>* "type": "data_input",*>>* "user_outputs": []*>>* },*>>* "1": {*>>* "annotation": "",*>>* "id": 1,*>>* "input_connections": {*>>* "source|input1": {*>>* "id": 0,*>>* "output_name": "output"*>>* }*>>* },*>>* "inputs": [],*>>* "name": "SAM-to-BAM",*>>* "outputs": [*>>* {*>>* "name": "output1",*>>* "type": "bam"*>>* }*>>* ],*>>* "position": {*>>* "left": 274.5,*>>* "top": 307*>>* },*>>* "tool_errors": null,*>>* "tool_id": "sam_to_bam",*>>* "tool_state": "{\"source\": \"{\\\"index_source\\\": \\\"cached\\\", \\\"input1\\\": null, \\\"__current_case__\\\": 0}\", \"__page__\": 0}",*>>* "tool_version": "1.1.1",*>>* "type": "tool",*>>* "user_outputs": []*>>* },*>>* "2": {*>>* "annotation": "",*>>* "id": 2,*>>* "input_connections": {*>>* "input1": {*>>* "id": 1,*>>* "output_name": "output1"*>>* }*>>* },*>>* "inputs": [],*>>* "name": "flagstat",*>>* "outputs": [*>>* {*>>* "name": "output1",*>>* "type": "txt"*>>* }*>>* ],*>>* "position": {*>>* "left": 396.5,*>>* "top": 445*>>* },*>>* "tool_errors": null,*>>* "tool_id": "samtools_flagstat",*>>* "tool_state": "{\"__page__\": 0, \"input1\": \"null\"}",*>>* "tool_version": "1.0.0",*>>* "type": "tool",*>>* "user_outputs": []*>>* },*>>* "3": {*>>* "annotation": "",*>>* "id": 3,*>>* "input_connections": {*>>* "refOrHistory|input1": {*>>* "id": 1,*>>* "output_name": "output1"*>>* }*>>* },*>>* "inputs": [],*>>* "name": "Generate pileup",*>>* "outputs": [*>>* {*>>* "name": "output1",*>>* "type": "tabular"*>>* }*>>* ],*>>* "position": {*>>* "left": 519,*>>* "top": 340*>>* },*>>* "tool_errors": null,*>>* "tool_id": "sam_pileup",*>>* "tool_state": "{\"__page__\": 0, \"c\": \"{\\\"consensus\\\": \\\"no\\\", \\\"__current_case__\\\": 0}\", \"indels\": \"\\\"no\\\"\", \"refOrHistory\": \"{\\\"input1\\\": null, \\\"reference\\\": \\\"indexed\\\", \\\"__current_case__\\\": 0}\", \"lastCol\": \"\\\"no\\\"\", \"mapCap\": \"\\\"60\\\"\"}",*>>* "tool_version": "1.1.1",*>>* "type": "tool",*>>* "user_outputs": []*>>* }*>>* }*>>* }*
9 years, 4 months
install tools on cluster
by Wei,Xintao
Hi,
I am learning how to install and configure Galaxy on our computer cluster. Many third party tools, for example, bowtie, samtools, R and so on, have to be installed and configured in order to install galaxy. Do I have to convert the third party tools to MPI version in order to run the tools on a computer cluster? Or do I just install the "regular" version of the tools on the "shared" folder of our cluster?
Thank you!
Xintao
9 years, 4 months