Assembly¶
XML |
Chado Base Table |
Chado Column |
---|---|---|
AssemblyName |
analysis |
name |
AssemblyDescription |
analysis |
description |
|
analysis |
program |
|
analysis |
programversion |
N/A |
analysis |
algorithm |
BioSampleAccn |
analysis |
sourcename |
N/A |
analysis |
sourceversion |
N/A |
analysis |
sourceuri |
SubmissionDate |
analysis |
timeexecuted |
Stats |
analysisprop |
type/value |
FtpSites |
analysisprop |
type/value |
SpeciesTaxid |
organism |
chado.analysis_organism |
RsUid |
dbxref |
chado.analysis_dbxref |
GbUid |
dbxref |
chado.analysis_dbxref |
WGS RS_BioProjects/GB_BioProjects |
project |
(Chado table doesnt exist yet) |
BioSampleID |
biomaterial |
(Chado table doesnt exist yet) |
RefSeq_category |
analysisprop |
rdfs:type (sets the analysis type) |
Note that the program and program version are not found directly in the XML. Instead they are extracted from the FTP attribute.
Analysis type¶
The analysis table has no type_id column. The type is therefore set with the rdfs:type property.
The RefSeq_category tag is used to determine the analysis type. Currently, only the value representative genome is supported and mapped to the bundle Genome Assembly (operation:0525), via the type value ‘genome_assembly’. We have thus far come across no other values for this key in the database.
Is an assembly a Chado analysis or project?¶
This is still an open question. This module maps NCBI Assemblies into chado.analysis
, but it may split the NCBI assembly record into an analysis and project in the future. This is because the current definition of a Chado analysis is a single program run. Assemblies are typically many programs run in a pipeline.
Undecided mappings¶
We don’t currently know how we will map analyses to biomaterials in Chado. BioSamples that are listed in Assembly records are therefore ignored currently.