Accessing XML Files As Variables In A PSQL Script (Sourced From A BASH Script)
Importing Metabolome Data From XML To A PostgreSQL Database and Tables
-
Update (2019-05-14): see my comments, below, on the use of temp tables for parsing XML input!
-
Additional detail in my 2019-05-14 blog post, Exporting Plain Text to PostgreSQL.
I asked this question Accessing external XML files as variables in a PSQL script (sourced from a bash script) on StackOverflow, and also posted my own answer.
Here is an extended, complete version of that issue. I additionally detail the scraping of the XML data from HMDB.ca metabolome files into PostgreSQL tables.
PRELIMINARIES
My data and file locations are:
[victoria@victoria hmdb]$ pwd
/mnt/Vancouver/Programming/data/hmdb
[victoria@victoria hmdb]$ tree -L 2 -F
.
├── bash/
│ ├── hmdb.sh*
├── data/
│ ├── hmdb_metabolites_5000-01.xml
│ ├── hmdb_metabolites_5000-02.xml
│ └── hmdb_metabolites_5000-03.xml
└── sql/
└── hmdb.sql
This next bit may seem complicated but really it’s quite simple.
My PostgreSQL data_directory (PSQL: SHOW data_directory;
] is
/mnt/Vancouver/Programming/RDB/postgres/postgres/data
I added a symbolic link, such that
/mnt/Vancouver/Programming/RDB/postgres/postgres/data/hmdb
points to
/mnt/Vancouver/Programming/data/hmdb/
so that I can work with data outside the Postgres data directory without having to move my data there and chown
it to postgres:postgres
.
[Within that symlinked directory I can work with my files with the normal file ownership (victoria:victoria
), which makes it much easier (IMHO) to work with those data.]
Thus, relative to the Postgres data directory,
/mnt/Vancouver/Programming/RDB/postgres/postgres/data/
the path to
hmdb_metabolites_5000-01.xml
is
hmdb/data/hmdb_metabolites_5000-01.xml
and I need to prefix those XML file names in the BASH script with hmdb/data/
, which resolves the path from the Postgres data directory through my local (symlinked) directory to the XML data file(s).
[Otherwise, you can just move your data files to your PostgreSQL data directory, chown
it to postgres:postgres
, and adjust the paths above to point directly to the files.]
hmdb.sh
#!/bin/bash # vim: syntax=sh # /mnt/Vancouver/Programming/data/hmdb/bash/hmdb.sh # "/mnt/Vancouver/Programming/RDB/postgres/postgres/data/hmdb" # symlinks to "/mnt/Vancouver/Programming/data/hmdb/" # so here you have to refer to # "/mnt/Vancouver/Programming/data/hmdb/data/hmdb_metabolites_5000-01.xml" # as "hmdb/data/hmdb_metabolites_5000-01.xml" # (Relative to the Postgres data directory # "/mnt/Vancouver/Programming/RDB/postgres/postgres/data/", the path to # "hmdb_metabolites_5000-01.xml" is "hmdb/data/hmdb_metabolites_5000-01.xml".) DATA_DIR=../data/ PGOPTIONS="-c client_min_messages=error" i=0 for file in $DATA_DIR/*.xml do if [ $i == 0 ]; then # printf "Hello!\n" psql postgres -c "select pg_terminate_backend(pid) from pg_stat_activity where datname='hmdb';" psql postgres -c "DROP DATABASE hmdb;" psql postgres -c "CREATE DATABASE hmdb;" fi ((++i)) # bash_var=hmdb/$(echo ${file##*/}) bash_var=hmdb/data/$(echo ${file##*/}) echo $bash_var psql -d hmdb -v bash_var=$bash_var -f ../sql/hmdb.sql done psql hmdb -c "SELECT *, tableoid::regclass::text AS tablename FROM hmdb_identifiers ORDER BY accession DESC LIMIT 5;" psql hmdb -c "SELECT count(id) from hmdb_identifiers;" psql hmdb -c "SELECT *, tableoid::regclass::text AS tablename FROM hmdb_synonyms ORDER BY accession DESC LIMIT 5;" psql hmdb -c "SELECT count(id) from hmdb_synonyms;" # psql hmdb -c "SELECT *, tableoid::regclass::text AS tablename FROM hmdb_identifiers WHERE accession='hmdb0000208';" # psql hmdb -c "SELECT *, tableoid::regclass::text AS tablename FROM hmdb_synonyms WHERE accession='hmdb0000208';" # psql hmdb -c "SELECT *, tableoid::regclass::text AS tablename FROM hmdb_identifiers WHERE accession ILIKE 'HMDB0000208';" # psql hmdb -c "SELECT *, tableoid::regclass::text AS tablename FROM hmdb_synonyms WHERE accession ILIKE 'HMDB0000208';" psql hmdb -c "SELECT tableoid::regclass::text AS tablename, synonym from hmdb_synonyms ORDER BY length(synonym) DESC LIMIT 5;" # Enable this as a audible alert for completion of long-running scripts: for i in 1 2 3; do { aplay /mnt/Vancouver/Programming/scripts/PHASER.WAV ; sleep 0.25 ; } &> /dev/null; done
BASH script notes:
-
select pg_terminate_backend(pid) from pg_stat_activity where datname='hmdb';
disconnects the PSQL user (if connected) from the databasehmdb
, allowing the script to drop and (re)create that database, de novo. -
$(echo ${file##*/})
returns the filename including extension (basename). Note the addedhmdb/data/
prefix bit. -
I also tried
$i -eq 0
, but this did not work (here) -
((++i))
incrementsi
by one, in bash (more here) -
psql -v <var>
: variable assignment (see manpage:man psql | grep \\-d
)
hmdb.sql
\c hmdb CREATE TABLE hmdb_identifiers ( id SERIAL, accession VARCHAR(15) NOT NULL, name VARCHAR(300) NOT NULL, cas_number VARCHAR(12), pubchem_cid INT, PRIMARY KEY (id), UNIQUE (accession) ); CREATE TABLE hmdb_synonyms ( id SERIAL, accession VARCHAR(15) NOT NULL, synonym VARCHAR(300) NOT NULL, PRIMARY KEY (id), FOREIGN KEY (accession) REFERENCES hmdb_identifiers (accession) ON UPDATE CASCADE ON DELETE CASCADE, UNIQUE (accession, synonym) ); -- \echo '\n[hmdb.sql] bash_var:' :bash_var '\n' -- I tried doing this with one TEMP TABLE but I got duplicate values for otherwise -- non-duplicate data (accession; cas_number; ...), due to the nature of unnesting -- the XML data viz-a-viz the `synonyms` field. In doing so, with those data xpath -- generated duplicate entries for accession, ... and consequently I was getting -- `duplicate key value violates unique constraint 'hmdb_identifiers_accession_key'` -- errors with the SQL INSERT statements. -- The cost is those data (XML files) need to be read/parsed for each TEMP TABLE. -- ---------------------------------------------------------------------------- UPDATE (2019-05-14): It should be possible to do the PostgreSQL inserts directly, WITHOUT the use of a TEMP TABLE -- see my 2019-05-14 blog post for an example. In that experiment (Claws Mail processing), the use of temp tables resulted in ENORMOUS files and extraordinarily long run times. Reading over the (1000's) of XML files and parsing/inserting the data DIRECTLY into PostgreSQL ran on the order of 600 faster with ~14,000x less space!! -- ---------------------------------------------------------------------------- CREATE TEMP TABLE tmp_table AS SELECT (xpath('//accession/text()', x))[1]::text::varchar(15) AS accession ,(xpath('//name/text()', x))[1]::text::varchar(300) AS name ,(xpath('//cas_registry_number/text()', x))[1]::text::varchar(12) AS cas_number ,(xpath('//pubchem_compound_id/text()', x))[1]::text::int AS pubchem_cid -- -- FROM unnest(xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('hmdb/hmdb.xml'), 'UTF8')))) x FROM unnest(xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file(:'bash_var'), 'UTF8')))) x ; CREATE TEMP TABLE tmp_table2 AS SELECT (xpath('//accession/text()', x))[1]::text::varchar(15) AS accession ,unnest(xpath('//synonyms/synonym/text()', x))::text::varchar(300) AS synonym FROM unnest(xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file(:'bash_var'), 'UTF8')))) x ; INSERT INTO hmdb_identifiers (accession, name, cas_number, pubchem_cid) SELECT lower(accession), lower(name), lower(cas_number), pubchem_cid FROM tmp_table; INSERT INTO hmdb_synonyms (accession, synonym) SELECT lower(accession), lower(synonym) FROM tmp_table2 ON CONFLICT DO NOTHING; -- https://stackoverflow.com/questions/38619072/how-to-replace-multiple-special-characters-in-postgres-9-5 UPDATE hmdb_synonyms SET synonym = REPLACE(synonym,'&gt;','>'); UPDATE hmdb_synonyms SET synonym = REPLACE(synonym,'&lt;','<'); -- TEST ~ MANUAL UPDATES: INSERT INTO hmdb_identifiers (accession, name, cas_number, pubchem_cid) values (lower('HMDB0240268'), lower('Farnesyl acetate'), '4128-17-0', NULL); INSERT INTO hmdb_synonyms (accession, synonym) values (lower('HMDB0240268'), lower('(2E,6E)-3,7,11-Trimethyldodeca-2,6,10-trien-1-yl acetate')); DROP TABLE tmp_table; DROP TABLE tmp_table2;
SQL script notes:
-
In the xpath statements I recast the
::text
to (for example)::text::varchar(15)
. I initially did this to recast the XML text for the PubChem numeric id to the Postgres integer datatype (::text::int
), and realized it was probably good practice to explicitly recast all the data, per the Postgres table schema. -
More significantly regarding the latter point, if I did NOT recast the datatype in the xpath statement, and the field entry (e.g.
synonym
name length) exceeded the SQLvarchar(300)
length limit, those data threw a PSQL error and the table did not update (i.e. wasblank:
no entries, anywhere!). -
The
UPDATE hmdb_synonyms SET synonym = REPLACE(synonym,'>','>');
line updates entries like
[hmdb]# SELECT synonym from hmdb_synonyms where accession='hmdb0006669';
o-6-deoxy-alpha-l-galactopyranosyl-(1->3)-o-[o-6-deoxy-alpha-l-galactopyranosyl-(1->4)-o-
to
o-6-deoxy-alpha-l-galactopyranosyl-(1->3)-o-[o-6-deoxy-alpha-l-galactopyranosyl-(1->4)-o-
-
It appears that on rare occasion duplicate data conflict with one another, raising errors and stopping the script (no table update):
psql:../sql/hmdb.sql:85: ERROR: 23505: duplicate key value violates unique constraint `hmdb_identifiers_accession_key` DETAIL: Key (accession)=(hmdb0240268) already exists. SCHEMA NAME: public TABLE NAME: hmdb_identifiers CONSTRAINT NAME: hmdb_identifiers_accession_key LOCATION: _bt_check_unique, nbtinsert.c:434 psql:../sql/hmdb.sql:87: ERROR: 23505: duplicate key value violates unique constraint `hmdb_synonyms_synonym_key` DETAIL: Key (synonym)=((2e,6e)-3,7,11-trimethyldodeca-2,6,10-trien-1-yl acetate) already exists. SCHEMA NAME: public TABLE NAME: hmdb_synonyms CONSTRAINT NAME: hmdb_synonyms_synonym_key LOCATION: _bt_check_unique, nbtinsert.c:434
The solution was to add the ON CONFLICT DO NOTHING
statement to the INSERT INTO hmdb_synonyms ...
statement.
BASH script execution:
[victoria@victoria bash]$ ./hmdb.sh pg_terminate_backend ---------------------- DROP DATABASE CREATE DATABASE hmdb/data/hmdb_metabolites_5000-01.xml You are now connected to database "hmdb" as user "victoria". CREATE TABLE CREATE TABLE SELECT 1 SELECT 13 INSERT 0 1 INSERT 0 13 UPDATE 13 UPDATE 13 INSERT 0 1 INSERT 0 1 DROP TABLE DROP TABLE hmdb/data/hmdb_metabolites_5000-02.xml You are now connected to database "hmdb" as user "victoria". psql:../sql/hmdb.sql:13: ERROR: 42P07: relation "hmdb_identifiers" already exists LOCATION: heap_create_with_catalog, heap.c:1067 psql:../sql/hmdb.sql:23: ERROR: 42P07: relation "hmdb_synonyms" already exists LOCATION: heap_create_with_catalog, heap.c:1067 SELECT 1 SELECT 12 INSERT 0 1 INSERT 0 12 UPDATE 26 UPDATE 26 psql:../sql/hmdb.sql:85: ERROR: 23505: duplicate key value violates unique constraint "hmdb_identifiers_accession_key" DETAIL: Key (accession)=(hmdb0240268) already exists. SCHEMA NAME: public TABLE NAME: hmdb_identifiers CONSTRAINT NAME: hmdb_identifiers_accession_key LOCATION: _bt_check_unique, nbtinsert.c:434 psql:../sql/hmdb.sql:87: ERROR: 23505: duplicate key value violates unique constraint "hmdb_synonyms_accession_synonym_key" DETAIL: Key (accession, synonym)=(hmdb0240268, (2e,6e)-3,7,11-trimethyldodeca-2,6,10-trien-1-yl acetate) already exists. SCHEMA NAME: public TABLE NAME: hmdb_synonyms CONSTRAINT NAME: hmdb_synonyms_accession_synonym_key LOCATION: _bt_check_unique, nbtinsert.c:434 DROP TABLE DROP TABLE hmdb/data/hmdb_metabolites_5000-03.xml You are now connected to database "hmdb" as user "victoria". psql:../sql/hmdb.sql:13: ERROR: 42P07: relation "hmdb_identifiers" already exists LOCATION: heap_create_with_catalog, heap.c:1067 psql:../sql/hmdb.sql:23: ERROR: 42P07: relation "hmdb_synonyms" already exists LOCATION: heap_create_with_catalog, heap.c:1067 SELECT 1 SELECT 43 INSERT 0 1 INSERT 0 43 UPDATE 69 UPDATE 69 psql:../sql/hmdb.sql:85: ERROR: 23505: duplicate key value violates unique constraint "hmdb_identifiers_accession_key" DETAIL: Key (accession)=(hmdb0240268) already exists. SCHEMA NAME: public TABLE NAME: hmdb_identifiers CONSTRAINT NAME: hmdb_identifiers_accession_key LOCATION: _bt_check_unique, nbtinsert.c:434 psql:../sql/hmdb.sql:87: ERROR: 23505: duplicate key value violates unique constraint "hmdb_synonyms_accession_synonym_key" DETAIL: Key (accession, synonym)=(hmdb0240268, (2e,6e)-3,7,11-trimethyldodeca-2,6,10-trien-1-yl acetate) already exists. SCHEMA NAME: public TABLE NAME: hmdb_synonyms CONSTRAINT NAME: hmdb_synonyms_accession_synonym_key LOCATION: _bt_check_unique, nbtinsert.c:434 DROP TABLE DROP TABLE id | accession | name | cas_number | pubchem_cid | tablename ----+-------------+--------------------+------------+-------------+------------------ 2 | hmdb0240268 | farnesyl acetate | 4128-17-0 | [NULL] | hmdb_identifiers 5 | hmdb0000005 | 2-ketobutyric acid | 600-18-0 | 58 | hmdb_identifiers 3 | hmdb0000002 | 1,3-diaminopropane | 109-76-2 | 428 | hmdb_identifiers 1 | hmdb0000001 | 1-methylhistidine | 332-80-9 | 92105 | hmdb_identifiers count ------- 4 id | accession | synonym | tablename ----+-------------+----------------------------------------------------------+--------------- 14 | hmdb0240268 | (2e,6e)-3,7,11-trimethyldodeca-2,6,10-trien-1-yl acetate | hmdb_synonyms 46 | hmdb0000005 | α-oxo-n-butyric acid | hmdb_synonyms 45 | hmdb0000005 | α-oxo-n-butyrate | hmdb_synonyms 41 | hmdb0000005 | α-ketobutyric acid | hmdb_synonyms 40 | hmdb0000005 | α-ketobutyrate | hmdb_synonyms count ------- 69 tablename | synonym ---------------+---------------------------------------------------------- hmdb_synonyms | (2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoic acid hmdb_synonyms | (2e,6e)-3,7,11-trimethyldodeca-2,6,10-trien-1-yl acetate hmdb_synonyms | (2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoate hmdb_synonyms | trimethylenediamine dihydrochloride hmdb_synonyms | alpha-ketobutyric acid, sodium salt [victoria@victoria bash]$
PSQL
After running the BASH script, if you were previously connected to the hmdb database and you run a query, you’ll get a warning about being disconnected; e.g.,
[hmdb]# SELECT * from hmdb_identifiers;
FATAL: 57P01: terminating connection due to administrator command
LOCATION: ProcessInterrupts, postgres.c:2884
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
[hmdb]#
You can either:
-
rerun that PSQL query (as you are now reconnected);
-
manually reconnect,
\c hmdb;
then run your query; or -
execute this command (note the preceding semicolon; you will get the error message, but this is a quick one-liner):
;SELECT * from hmdb_identifiers;
PSQL SCRIPT EXECUTION:
In any event, here is the PSQL output, after running the BASH script (note that due to the UTF8 specification in the PSQL script, we get Greek letters properly displayed, e.g. α-ketoglutarate | α-keto-δ-guanidinovalerate | α-oxo-N-butyric acid | ...
):
[hmdb]# \c hmdb; You are now connected to database "hmdb" as user "victoria". [hmdb]# SELECT * from hmdb_identifiers LIMIT 5; id accession name cas_number pubchem_cid -- ----------- ------------------ ---------- ----------- 1 hmdb0000001 1-methylhistidine 332-80-9 92105 2 hmdb0240268 farnesyl acetate 4128-17-0 [NULL] 3 hmdb0000002 1,3-diaminopropane 109-76-2 428 5 hmdb0000005 2-ketobutyric acid 600-18-0 58 [hmdb]# SELECT * from hmdb_synonyms LIMIT 5; id accession synonym -- ----------- -------------------------------------------------------- 1 hmdb0000001 (2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoic acid 2 hmdb0000001 1-methylhistidine 3 hmdb0000001 pi-methylhistidine 4 hmdb0000001 (2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoate 5 hmdb0000001 1 methylhistidine [hmdb]# SELECT synonym from hmdb_synonyms ORDER BY length(synonym) DESC LIMIT 5; synonym -------------------------------------------------------- (2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoic acid (2e,6e)-3,7,11-trimethyldodeca-2,6,10-trien-1-yl acetate (2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoate trimethylenediamine dihydrochloride alpha-ketobutyric acid, sodium salt [hmdb]# SELECT * from hmdb_synonyms ORDER BY accession DESC LIMIT 4; id accession synonym -- ----------- -------------------------------------------------------- 14 hmdb0240268 (2e,6e)-3,7,11-trimethyldodeca-2,6,10-trien-1-yl acetate 29 hmdb0000005 2-oxobutyric acid 28 hmdb0000005 2-ketobutanoic acid 30 hmdb0000005 3-methyl pyruvic acid [hmdb]# DELETE FROM hmdb_identifiers WHERE accession='hmdb0240268'; DELETE 1 [hmdb]# SELECT count(id) from hmdb_synonyms; count ----- 68 [hmdb]# SELECT * from hmdb_synonyms ORDER BY accession DESC LIMIT 3; id accession synonym -- ----------- --------------------- 28 hmdb0000005 2-ketobutanoic acid 29 hmdb0000005 2-oxobutyric acid 30 hmdb0000005 3-methyl pyruvic acid [hmdb]# SELECT * from hmdb_synonyms WHERE synonym ILIKE '%lpha-keto%'; id accession synonym -- ----------- ----------------------------------- 31 hmdb0000005 alpha-ketobutyrate 32 hmdb0000005 alpha-ketobutyric acid 61 hmdb0000005 alpha-keto-n-butyrate 62 hmdb0000005 alpha-keto-n-butyric acid 63 hmdb0000005 alpha-ketobutric acid 70 hmdb0000005 alpha-ketobutyric acid, sodium salt ## DATA INGRESS CHECK: ## =================== [hmdb]# SELECT * from hmdb_synonyms as synonym where accession ILIKE 'hmdb0106899'; id accession synonym ------ ----------- ------------------------------------------------------------------------------------------------------- 923304 hmdb0106899 (2s)-1-[(15-methylhexadecanoyl)oxy]-3-[(12-methyltridecanoyl)oxy]propan-2-yl 17-methyloctadecanoic acid 923305 hmdb0106899 1-isoheptadecanoyl-2-isononadecanoyl-3-isotetradecanoyl-glycerol 923306 hmdb0106899 tg(i-17:0/i-19:0/i-14:0) 923307 hmdb0106899 triacylglycerol 923308 hmdb0106899 triglyceride 923309 hmdb0106899 tg(50:0) 923310 hmdb0106899 tracylglycerol(50:0) 923311 hmdb0106899 tag(50:0) 923312 hmdb0106899 tracylglycerol(i-17:0/i-19:0/i-14:0) 923313 hmdb0106899 tag(i-17:0/i-19:0/i-14:0) [hmdb]# ## COMPARE THOSE DATA (ABOVE) WITH THE SOURCE DATA FILE DATA (not uploaded): ## hmdb_metabolites-07.xml : [victoria@victoria data]$ dpl Wed May 2 19:46:53 PDT 2018 /mnt/Vancouver/Programming/data/hmdb/data total 3606796 -rw-r--r-- 1 victoria victoria 500008714 Apr 25 15:32 hmdb_metabolites-01.xml -rw-r--r-- 1 victoria victoria 500033500 Apr 25 15:33 hmdb_metabolites-02.xml -rw-r--r-- 1 victoria victoria 500012180 Apr 25 15:34 hmdb_metabolites-03.xml -rw-r--r-- 1 victoria victoria 500013231 Apr 25 15:35 hmdb_metabolites-04.xml -rw-r--r-- 1 victoria victoria 500013021 Apr 25 15:37 hmdb_metabolites-05.xml -rw-r--r-- 1 victoria victoria 500005018 Apr 25 15:38 hmdb_metabolites-06.xml -rw-r--r-- 1 victoria victoria 500020416 Apr 25 15:39 hmdb_metabolites-07.xml -rw-r--r-- 1 victoria victoria 193200978 Apr 25 15:40 hmdb_metabolites-08.xml [victoria@victoria data]$ rg . -i -A25 -e "hmdb0106899" hmdb_metabolites-07.xml 948166: <accession>HMDB0106899</accession> 948167- <status>expected</status> 948168- <secondary_accessions> 948169- </secondary_accessions> 948170- <name>TG(i-17:0/i-19:0/i-14:0)</name> 948171- <description>TG(i-17:0/i-19:0/i-14:0) belongs to the family of triradyglycerols, which are glycerolipids lipids containing a common glycerol backbone to which at least one fatty acyl group is esterified. Their general formula is [R1]OCC(CO[R2])O[R3]. TG(i-17:0/i-19:0/i-14:0) is made up of one 15-methylhexadecanoyl(R1), one 17-methyloctadecanoyl(R2), and one 12-methyltridecanoyl(R3).</description> 948172- <synonyms> 948173- <synonym>(2S)-1-[(15-Methylhexadecanoyl)oxy]-3-[(12-methyltridecanoyl)oxy]propan-2-yl 17-methyloctadecanoic acid</synonym> 948174- <synonym>1-isoheptadecanoyl-2-isononadecanoyl-3-isotetradecanoyl-glycerol</synonym> 948175- <synonym>TG(i-17:0/i-19:0/i-14:0)</synonym> 948176- <synonym>Triacylglycerol</synonym> 948177- <synonym>Triglyceride</synonym> 948178- <synonym>TG(50:0)</synonym> 948179- <synonym>Tracylglycerol(50:0)</synonym> 948180- <synonym>TAG(50:0)</synonym> 948181- <synonym>Tracylglycerol(i-17:0/i-19:0/i-14:0)</synonym> 948182- <synonym>TAG(i-17:0/i-19:0/i-14:0)</synonym> 948183- </synonyms> 948184- <chemical_formula>C53H102O6</chemical_formula> 948185- <average_molecular_weight>835.393</average_molecular_weight> 948186- <monisotopic_molecular_weight>834.767641004</monisotopic_molecular_weight> 948187- <iupac_name>(2S)-1-[(15-methylhexadecanoyl)oxy]-3-[(12-methyltridecanoyl)oxy]propan-2-yl 17-methyloctadecanoate</iupac_name> 948188- <traditional_iupac>(2S)-1-[(15-methylhexadecanoyl)oxy]-3-[(12-methyltridecanoyl)oxy]propan-2-yl 17-methyloctadecanoate</traditional_iupac> 948189- <cas_registry_number/> 948190- <smiles>[H][C@](COC(=O)CCCCCCCCCCCCCC(C)C)(COC(=O)CCCCCCCCCCC(C)C)OC(=O)CCCCCCCCCCCCCCCC(C)C</smiles> 948191- <inchi>InChI=1S/C53H102O6/c1-47(2)39-33-27-21-15-11-8-7-9-13-18-26-32-38-44-53(56)59-50(46-58-52(55)43-37-31-25-20-19-23-29-35-41-49(5)6)45-57-51(54)42-36-30-24-17-14-10-12-16-22-28-34-40-48(3)4/h47-50H,7-46H2,1-6H3/t50-/m0/s1</inchi> [victoria@victoria data]$ [hmdb]# \l List of databases Name Owner Encoding Collate Ctype Access privileges ---------- -------- -------- ----------- ----------- --------------------- hmdb victoria UTF8 en_US.UTF-8 en_US.UTF-8 metabolism victoria UTF8 en_US.UTF-8 en_US.UTF-8 ncbi victoria UTF8 en_US.UTF-8 en_US.UTF-8 postgres postgres UTF8 en_US.UTF-8 en_US.UTF-8 template0 postgres UTF8 en_US.UTF-8 en_US.UTF-8 =c/postgres + postgres=CTc/postgres template1 postgres UTF8 en_US.UTF-8 en_US.UTF-8 =c/postgres + postgres=CTc/postgres zzz victoria UTF8 en_US.UTF-8 en_US.UTF-8 [hmdb]# \dt List of relations Schema Name Type Owner ------ ----------------- ----- -------- public hmdb_identifiers table victoria public hmdb_synonyms table victoria [hmdb]# \d hmdb_identifiers Table "public.hmdb_identifiers" Column Type Collation Nullable Default ----------- ---------------------- --------- -------- -------------------------------------------- id integer not null nextval('hmdb_identifiers_id_seq'::regclass) accession character varying(15) not null name character varying(300) not null cas_number character varying(12) pubchem_cid integer Indexes: "hmdb_identifiers_pkey" PRIMARY KEY, btree (id) "hmdb_identifiers_accession_key" UNIQUE CONSTRAINT, btree (accession) Referenced by: TABLE "hmdb_synonyms" CONSTRAINT "hmdb_synonyms_accession_fkey" FOREIGN KEY (accession) REFERENCES hmdb_identifiers(accession) ON UPDATE CASCADE ON DELETE CASCADE [hmdb]# \d hmdb_synonyms Table "public.hmdb_synonyms" Column Type Collation Nullable Default --------- ---------------------- --------- -------- ----------------------------------------- id integer not null nextval('hmdb_synonyms_id_seq'::regclass) accession character varying(15) not null synonym character varying(300) not null Indexes: "hmdb_synonyms_pkey" PRIMARY KEY, btree (id) "hmdb_synonyms_accession_synonym_key" UNIQUE CONSTRAINT, btree (accession, synonym) Foreign-key constraints: "hmdb_synonyms_accession_fkey" FOREIGN KEY (accession) REFERENCES hmdb_identifiers(accession) ON UPDATE CASCADE ON DELETE CASCADE [hmdb]# SELECT * from hmdb_identifiers LIMIT 5; id accession name cas_number pubchem_cid -- ----------- --------------------- ---------- ----------- 1 hmdb0000001 1-methylhistidine 332-80-9 92105 2 hmdb0000002 1,3-diaminopropane 109-76-2 428 3 hmdb0000005 2-ketobutyric acid 600-18-0 58 4 hmdb0000008 2-hydroxybutyric acid 600-15-7 11266 5 hmdb0000010 2-methoxyestrone 362-08-3 440624 [hmdb]# SELECT * from hmdb_synonyms LIMIT 5; id accession synonym ---- ----------- ------------------------ 4117 hmdb0000182 2,6-diaminohexanoic acid 4118 hmdb0000182 6-amino-aminutrin 4119 hmdb0000182 6-amino-l-norleucine 4120 hmdb0000182 a-lysine 4121 hmdb0000182 alpha-lysine [hmdb]# SELECT i.accession, i.name, s.synonym FROM hmdb_identifiers as i RIGHT JOIN hmdb_synonyms AS s ON i.accession = s.accession LIMIT 50; accession name synonym ----------- ------------------ -------------------------------------------------------- hmdb0000001 1-methylhistidine 1 methylhistidine hmdb0000001 1-methylhistidine 1-methylhistidine hmdb0000001 1-methylhistidine 1-methyl histidine hmdb0000001 1-methylhistidine 1-methyl-histidine hmdb0000001 1-methylhistidine 1-methylhistidine dihydrochloride hmdb0000001 1-methylhistidine 1-methyl-l-histidine hmdb0000001 1-methylhistidine 1-mhis hmdb0000001 1-methylhistidine 1-n-methyl-l-histidine hmdb0000001 1-methylhistidine (2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoate hmdb0000001 1-methylhistidine (2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoic acid hmdb0000001 1-methylhistidine l-1-methylhistidine hmdb0000001 1-methylhistidine n1-methyl-l-histidine hmdb0000001 1-methylhistidine pi-methylhistidine hmdb0000002 1,3-diaminopropane 1,3-diaminepropane hmdb0000002 1,3-diaminopropane 1,3-diamino-n-propane hmdb0000002 1,3-diaminopropane 1,3-propanediamine hmdb0000002 1,3-diaminopropane 1,3-propylenediamine hmdb0000002 1,3-diaminopropane 1,3-trimethylenediamine hmdb0000002 1,3-diaminopropane 3-aminopropylamine hmdb0000002 1,3-diaminopropane a,w-propanediamine hmdb0000002 1,3-diaminopropane propane-1,3-diamine hmdb0000002 1,3-diaminopropane tn hmdb0000002 1,3-diaminopropane trimethylenediamine hmdb0000002 1,3-diaminopropane trimethylenediamine dihydrochloride hmdb0000002 1,3-diaminopropane trimethylenediamine hydrochloride hmdb0000005 2-ketobutyric acid 2-ketobutanoate hmdb0000005 2-ketobutyric acid 2-ketobutanoic acid hmdb0000005 2-ketobutyric acid 2-ketobutyrate hmdb0000005 2-ketobutyric acid 2-oxobutanoate hmdb0000005 2-ketobutyric acid 2-oxo-butanoate hmdb0000005 2-ketobutyric acid 2-oxo-butanoic acid hmdb0000005 2-ketobutyric acid 2-oxobutanoic acid hmdb0000005 2-ketobutyric acid 2-oxobutyrate hmdb0000005 2-ketobutyric acid 2-oxo-butyrate hmdb0000005 2-ketobutyric acid 2-oxo-butyric acid hmdb0000005 2-ketobutyric acid 2-oxobutyric acid hmdb0000005 2-ketobutyric acid 2-oxo-n-butyrate hmdb0000005 2-ketobutyric acid 2-oxo-n-butyric acid hmdb0000005 2-ketobutyric acid 3-methylpyruvate hmdb0000005 2-ketobutyric acid 3-methyl pyruvate hmdb0000005 2-ketobutyric acid 3-methyl pyruvic acid hmdb0000005 2-ketobutyric acid 3-methylpyruvic acid hmdb0000005 2-ketobutyric acid a-ketobutyrate hmdb0000005 2-ketobutyric acid a-ketobutyric acid hmdb0000005 2-ketobutyric acid a-keto-n-butyrate hmdb0000005 2-ketobutyric acid a-keto-n-butyric acid hmdb0000005 2-ketobutyric acid alpha-ketobutric acid hmdb0000005 2-ketobutyric acid alpha-ketobutyrate hmdb0000005 2-ketobutyric acid alpha-ketobutyric acid hmdb0000005 2-ketobutyric acid alpha-ketobutyric acid, sodium salt [hmdb]# SELECT i.accession, i.name, s.synonym FROM hmdb_identifiers as i RIGHT JOIN hmdb_synonyms AS s ON i.accession = s.accession WHERE s.synonym ILIKE '%ketobut%' LIMIT 25; accession name synonym ----------- ------------------------------- ------------------------------------------------------ hmdb0001553 2-oxo-4-methylthiobutanoic acid 4-methylthio-2-ketobutanoate hmdb0001553 2-oxo-4-methylthiobutanoic acid 4-methylthio-2-ketobutanoic acid hmdb0001553 2-oxo-4-methylthiobutanoic acid 4-methylthio-2-ketobutyrate hmdb0001553 2-oxo-4-methylthiobutanoic acid s-methyl-alpha-ketobutyric acid hmdb0001553 2-oxo-4-methylthiobutanoic acid 4-methylthio-2-ketobutyric acid hmdb0003407 diacetyl 2,3-diketobutane hmdb0003407 diacetyl diketobutane hmdb0033851 4,4-dimethoxy-2-butanone 3-ketobutyraldehyde dimethyl acetal hmdb0003771 2-methylacetoacetic acid 2-methyl-3-ketobutyric acid hmdb0031204 4-hydroxy-2-oxobutanoic acid g-hydroxy-a-ketobutyric acid hmdb0006454 l-2-amino-3-oxobutanoic acid 2-amino-3-ketobutyric acid hmdb0006454 l-2-amino-3-oxobutanoic acid 2-amino-3-ketobutyrate hmdb0000005 2-ketobutyric acid a-ketobutyrate hmdb0033698 acetoin acetate 2-ketobutan-3-yl acetate hmdb0033851 4,4-dimethoxy-2-butanone 3-ketobutyraldehyde dimethylacetal hmdb0036396 3-methylbutyl 3-oxobutanoate 3-methylbutyl beta-ketobutyrate hmdb0036230 1-methyl-2-oxopropyl butyrate 2-ketobutan-3-yl butanoate hmdb0036396 3-methylbutyl 3-oxobutanoate isoamyl beta-ketobutyrate hmdb0036396 3-methylbutyl 3-oxobutanoate isopentyl beta-ketobutyrate hmdb0038256 geranyl acetoacetate geranyl beta-ketobutyrate hmdb0039324 xi-3-hydroxy-2-oxobutanoic acid 3-hydroxy-alpha-ketobutyrate hmdb0039324 xi-3-hydroxy-2-oxobutanoic acid beta-hydroxy-alpha-ketobutyric acid hmdb0041914 ketophenylbutazone 1,2-diphenyl-4-(gamma-ketobutyl)-3,5-pyrazolidinedione hmdb0041914 ketophenylbutazone 1,2-diphenyl-4-(g-ketobutyl)-3,5-pyrazolidinedione hmdb0041914 ketophenylbutazone 1,2-diphenyl-4-(γ-ketobutyl)-3,5-pyrazolidinedione [hmdb]# SELECT i.accession, i.name, s.synonym FROM hmdb_identifiers as i JOIN hmdb_synonyms AS s ON i.accession = s.accession WHERE s.synonym ILIKE '%α-keto%' LIMIT 25; accession name synonym ----------- -------------------------- ------------------------------ hmdb0000005 2-ketobutyric acid α-ketobutyrate hmdb0000005 2-ketobutyric acid α-ketobutyric acid hmdb0000019 alpha-ketoisovaleric acid α-ketoisovalerate hmdb0000019 alpha-ketoisovaleric acid α-ketoisovaleric acid hmdb0000019 alpha-ketoisovaleric acid α-keto-isovalerate hmdb0000019 alpha-ketoisovaleric acid α-keto-isovaleric acid hmdb0000019 alpha-ketoisovaleric acid α-ketovaline hmdb0000119 glyoxylic acid α-ketoacetate hmdb0000119 glyoxylic acid α-ketoacetic acid hmdb0000695 ketoleucine α-ketoisocaproate hmdb0000205 phenylpyruvic acid α-ketohydrocinnamate hmdb0000205 phenylpyruvic acid α-ketohydrocinnamic acid hmdb0000491 3-methyl-2-oxovaleric acid α-keto-β-methyl-n-valerate hmdb0000491 3-methyl-2-oxovaleric acid α-keto-β-methyl-n-valeric acid hmdb0000491 3-methyl-2-oxovaleric acid α-keto-β-methylvalerate hmdb0000491 3-methyl-2-oxovaleric acid α-keto-β-methylvaleric acid hmdb0000208 oxoglutaric acid α-ketoglutarate hmdb0000208 oxoglutaric acid α-ketoglutaric acid hmdb0000243 pyruvic acid α-ketopropionate hmdb0000243 pyruvic acid α-ketopropionic acid hmdb0000695 ketoleucine α-ketoisocaproic acid hmdb0031522 mesoxalic acid α-ketomalonate hmdb0001587 phenylglyoxylic acid α-ketophenylacetate hmdb0001587 phenylglyoxylic acid α-ketophenylacetic acid hmdb0004225 2-oxoarginine α-keto-δ-guanidinovalerate [hmdb]#
DATA:
I uploaded the XML data files used in this answer at this Gist
Direct links: