1. Update  (2019-05-14): see my comments, below, on the use of temp tables for parsing XML input!

  2. Additional detail in my 2019-05-14 blog post, Exporting Plain Text to PostgreSQL.


I asked this question Accessing external XML files as variables in a PSQL script (sourced from a bash script) on StackOverflow, and also posted my own answer.

Here is an extended, complete version of that issue. I additionally detail the scraping of the XML data from HMDB.ca metabolome files into PostgreSQL tables.

PRELIMINARIES

My data and file locations are:

[victoria@victoria hmdb]$ pwd
  /mnt/Vancouver/Programming/data/hmdb

[victoria@victoria hmdb]$ tree -L 2 -F
  .
  ├── bash/
  │   ├── hmdb.sh*
  ├── data/
  │   ├── hmdb_metabolites_5000-01.xml
  │   ├── hmdb_metabolites_5000-02.xml
  │   └── hmdb_metabolites_5000-03.xml
  └── sql/
    └── hmdb.sql

This next bit may seem complicated but really it’s quite simple.

My PostgreSQL data_directory (PSQL: SHOW data_directory;] is

/mnt/Vancouver/Programming/RDB/postgres/postgres/data

I added a symbolic link, such that

/mnt/Vancouver/Programming/RDB/postgres/postgres/data/hmdb

points to

/mnt/Vancouver/Programming/data/hmdb/

so that I can work with data outside the Postgres data directory without having to move my data there and chown it to postgres:postgres.

[Within that symlinked directory I can work with my files with the normal file ownership (victoria:victoria), which makes it much easier (IMHO) to work with those data.]

Thus, relative to the Postgres data directory,

/mnt/Vancouver/Programming/RDB/postgres/postgres/data/

the path to

hmdb_metabolites_5000-01.xml

is

hmdb/data/hmdb_metabolites_5000-01.xml

and I need to prefix those XML file names in the BASH script with hmdb/data/, which resolves the path from the Postgres data directory through my local (symlinked) directory to the XML data file(s).

[Otherwise, you can just move your data files to your PostgreSQL data directory, chown it to postgres:postgres, and adjust the paths above to point directly to the files.]


hmdb.sh

#!/bin/bash
# vim: syntax=sh

# /mnt/Vancouver/Programming/data/hmdb/bash/hmdb.sh

# "/mnt/Vancouver/Programming/RDB/postgres/postgres/data/hmdb"
# symlinks to "/mnt/Vancouver/Programming/data/hmdb/"
# so here you have to refer to
# "/mnt/Vancouver/Programming/data/hmdb/data/hmdb_metabolites_5000-01.xml"
# as "hmdb/data/hmdb_metabolites_5000-01.xml"
# (Relative to the Postgres data directory
# "/mnt/Vancouver/Programming/RDB/postgres/postgres/data/", the path to
#  "hmdb_metabolites_5000-01.xml" is "hmdb/data/hmdb_metabolites_5000-01.xml".) 

DATA_DIR=../data/
PGOPTIONS="-c client_min_messages=error" 
i=0

for file in $DATA_DIR/*.xml
  do
    if [ $i == 0 ]; then
      # printf "Hello!\n"
      psql postgres -c "select pg_terminate_backend(pid) from pg_stat_activity where datname='hmdb';"
      psql postgres -c "DROP DATABASE hmdb;"
      psql postgres -c "CREATE DATABASE hmdb;"
    fi
    ((++i))
    # bash_var=hmdb/$(echo ${file##*/})
    bash_var=hmdb/data/$(echo ${file##*/})
    echo $bash_var
    psql -d hmdb -v bash_var=$bash_var -f ../sql/hmdb.sql
done

psql hmdb -c "SELECT *, tableoid::regclass::text AS tablename FROM hmdb_identifiers ORDER BY accession DESC LIMIT 5;"
psql hmdb -c "SELECT count(id) from hmdb_identifiers;"

psql hmdb -c "SELECT *, tableoid::regclass::text AS tablename FROM hmdb_synonyms ORDER BY accession DESC LIMIT 5;"
psql hmdb -c "SELECT count(id) from hmdb_synonyms;"

# psql hmdb -c "SELECT *, tableoid::regclass::text AS tablename FROM hmdb_identifiers WHERE accession='hmdb0000208';"
# psql hmdb -c "SELECT *, tableoid::regclass::text AS tablename FROM hmdb_synonyms WHERE accession='hmdb0000208';"

# psql hmdb -c "SELECT *, tableoid::regclass::text AS tablename FROM hmdb_identifiers WHERE accession ILIKE 'HMDB0000208';"
# psql hmdb -c "SELECT *, tableoid::regclass::text AS tablename FROM hmdb_synonyms WHERE accession ILIKE 'HMDB0000208';"

psql hmdb -c "SELECT tableoid::regclass::text AS tablename, synonym from hmdb_synonyms ORDER BY length(synonym) DESC LIMIT 5;"

# Enable this as a audible alert for completion of long-running scripts:
for i in 1 2 3; do { aplay /mnt/Vancouver/Programming/scripts/PHASER.WAV ; sleep 0.25 ; } &> /dev/null; done

BASH script notes:

  • select pg_terminate_backend(pid) from pg_stat_activity where datname='hmdb'; disconnects the PSQL user (if connected) from the database hmdb, allowing the script to drop and (re)create that database, de novo.

  • $(echo ${file##*/}) returns the filename including extension (basename). Note the added hmdb/data/ prefix bit.

  • I also tried $i -eq 0, but this did not work (here)

  • ((++i)) increments i by one, in bash (more here)

  • psql -v <var>: variable assignment (see manpage: man psql | grep \\-d)

  • phaser.wav

hmdb.sql

\c hmdb

CREATE TABLE hmdb_identifiers (
  id SERIAL,
  accession VARCHAR(15) NOT NULL,
  name VARCHAR(300) NOT NULL,
  cas_number VARCHAR(12),
  pubchem_cid INT,
  PRIMARY KEY (id),
  UNIQUE (accession)
);

CREATE TABLE hmdb_synonyms (
  id SERIAL,
  accession VARCHAR(15) NOT NULL,
  synonym VARCHAR(300) NOT NULL,
  PRIMARY KEY (id),
  FOREIGN KEY (accession) REFERENCES hmdb_identifiers (accession)
    ON UPDATE CASCADE ON DELETE CASCADE,
  UNIQUE (accession, synonym)
);

-- \echo '\n[hmdb.sql] bash_var:' :bash_var '\n'

-- I tried doing this with one TEMP TABLE but I got duplicate values for otherwise
-- non-duplicate data (accession; cas_number; ...), due to the nature of unnesting
-- the XML data viz-a-viz the `synonyms` field. In doing so, with those data xpath
-- generated duplicate entries for accession, ... and consequently I was getting
-- `duplicate key value violates unique constraint 'hmdb_identifiers_accession_key'`
-- errors with the SQL INSERT statements.
-- The cost is those data (XML files) need to be read/parsed for each TEMP TABLE.


-- ----------------------------------------------------------------------------
UPDATE (2019-05-14): It should be possible to do the PostgreSQL inserts directly,
WITHOUT the use of a TEMP TABLE -- see my 2019-05-14 blog post for an example.  In
that experiment (Claws Mail processing), the use of temp tables resulted in ENORMOUS
files and extraordinarily long run times.  Reading over the (1000's) of XML files
and parsing/inserting the data DIRECTLY into PostgreSQL ran on the order of 600
faster with ~14,000x less space!!
-- ----------------------------------------------------------------------------

CREATE TEMP TABLE tmp_table AS 
SELECT 
  (xpath('//accession/text()', x))[1]::text::varchar(15) AS accession
  ,(xpath('//name/text()', x))[1]::text::varchar(300) AS name 
  ,(xpath('//cas_registry_number/text()', x))[1]::text::varchar(12) AS cas_number 
  ,(xpath('//pubchem_compound_id/text()', x))[1]::text::int AS pubchem_cid 
-- -- FROM unnest(xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('hmdb/hmdb.xml'), 'UTF8')))) x
FROM unnest(xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file(:'bash_var'), 'UTF8')))) x
;

CREATE TEMP TABLE tmp_table2 AS 
SELECT 
  (xpath('//accession/text()', x))[1]::text::varchar(15) AS accession
  ,unnest(xpath('//synonyms/synonym/text()', x))::text::varchar(300) AS synonym
FROM unnest(xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file(:'bash_var'), 'UTF8')))) x
;

INSERT INTO hmdb_identifiers (accession, name, cas_number, pubchem_cid)
  SELECT lower(accession), lower(name), lower(cas_number), pubchem_cid FROM tmp_table;

INSERT INTO hmdb_synonyms (accession, synonym)
  SELECT lower(accession), lower(synonym) FROM tmp_table2
  ON CONFLICT DO NOTHING;

-- https://stackoverflow.com/questions/38619072/how-to-replace-multiple-special-characters-in-postgres-9-5
UPDATE hmdb_synonyms SET synonym = REPLACE(synonym,'&gt;','>');
UPDATE hmdb_synonyms SET synonym = REPLACE(synonym,'&lt;','<');

-- TEST ~ MANUAL UPDATES:
INSERT INTO hmdb_identifiers (accession, name, cas_number, pubchem_cid)
  values (lower('HMDB0240268'), lower('Farnesyl acetate'), '4128-17-0', NULL);
INSERT INTO hmdb_synonyms (accession, synonym)
  values (lower('HMDB0240268'), lower('(2E,6E)-3,7,11-Trimethyldodeca-2,6,10-trien-1-yl acetate'));

DROP TABLE tmp_table;
DROP TABLE tmp_table2;

SQL script notes:

  • In the xpath statements I recast the ::text to (for example) ::text::varchar(15). I initially did this to recast the XML text for the PubChem numeric id to the Postgres integer datatype (::text::int), and realized it was probably good practice to explicitly recast all the data, per the Postgres table schema.

  • More significantly regarding the latter point, if I did NOT recast the datatype in the xpath statement, and the field entry (e.g. synonym name length) exceeded the SQL varchar(300) length limit, those data threw a PSQL error and the table did not update (i.e. was blank: no entries, anywhere!).

  • The

    UPDATE hmdb_synonyms SET synonym = REPLACE(synonym,'&gt;','>');

line updates entries like

[hmdb]# SELECT synonym from hmdb_synonyms where accession='hmdb0006669';

o-6-deoxy-alpha-l-galactopyranosyl-(1-&gt;3)-o-[o-6-deoxy-alpha-l-galactopyranosyl-(1-&gt;4)-o-

to

o-6-deoxy-alpha-l-galactopyranosyl-(1->3)-o-[o-6-deoxy-alpha-l-galactopyranosyl-(1->4)-o-
  • It appears that on rare occasion duplicate data conflict with one another, raising errors and stopping the script (no table update):

    psql:../sql/hmdb.sql:85: ERROR:  23505: duplicate key value violates unique constraint `hmdb_identifiers_accession_key`
    DETAIL:  Key (accession)=(hmdb0240268) already exists.
    SCHEMA NAME:  public
    TABLE NAME:  hmdb_identifiers
    CONSTRAINT NAME:  hmdb_identifiers_accession_key
    LOCATION:  _bt_check_unique, nbtinsert.c:434
    psql:../sql/hmdb.sql:87: ERROR:  23505: duplicate key value violates unique constraint `hmdb_synonyms_synonym_key`
    DETAIL:  Key (synonym)=((2e,6e)-3,7,11-trimethyldodeca-2,6,10-trien-1-yl acetate) already exists.
    SCHEMA NAME:  public
    TABLE NAME:  hmdb_synonyms
    CONSTRAINT NAME:  hmdb_synonyms_synonym_key
    LOCATION:  _bt_check_unique, nbtinsert.c:434

The solution was to add the ON CONFLICT DO NOTHING statement to the INSERT INTO hmdb_synonyms ... statement.

BASH script execution:

[victoria@victoria bash]$ ./hmdb.sh 
pg_terminate_backend 
----------------------

DROP DATABASE
CREATE DATABASE

hmdb/data/hmdb_metabolites_5000-01.xml
You are now connected to database "hmdb" as user "victoria".
CREATE TABLE
CREATE TABLE
SELECT 1
SELECT 13
INSERT 0 1
INSERT 0 13
UPDATE 13
UPDATE 13
INSERT 0 1
INSERT 0 1
DROP TABLE
DROP TABLE

hmdb/data/hmdb_metabolites_5000-02.xml
You are now connected to database "hmdb" as user "victoria".
psql:../sql/hmdb.sql:13: ERROR:  42P07: relation "hmdb_identifiers" already exists
LOCATION:  heap_create_with_catalog, heap.c:1067
psql:../sql/hmdb.sql:23: ERROR:  42P07: relation "hmdb_synonyms" already exists
LOCATION:  heap_create_with_catalog, heap.c:1067
SELECT 1
SELECT 12
INSERT 0 1
INSERT 0 12
UPDATE 26
UPDATE 26
psql:../sql/hmdb.sql:85: ERROR:  23505: duplicate key value violates unique constraint "hmdb_identifiers_accession_key"
DETAIL:  Key (accession)=(hmdb0240268) already exists.
SCHEMA NAME:  public
TABLE NAME:  hmdb_identifiers
CONSTRAINT NAME:  hmdb_identifiers_accession_key
LOCATION:  _bt_check_unique, nbtinsert.c:434
psql:../sql/hmdb.sql:87: ERROR:  23505: duplicate key value violates unique constraint "hmdb_synonyms_accession_synonym_key"
DETAIL:  Key (accession, synonym)=(hmdb0240268, (2e,6e)-3,7,11-trimethyldodeca-2,6,10-trien-1-yl acetate) already exists.
SCHEMA NAME:  public
TABLE NAME:  hmdb_synonyms
CONSTRAINT NAME:  hmdb_synonyms_accession_synonym_key
LOCATION:  _bt_check_unique, nbtinsert.c:434
DROP TABLE
DROP TABLE

hmdb/data/hmdb_metabolites_5000-03.xml
You are now connected to database "hmdb" as user "victoria".
psql:../sql/hmdb.sql:13: ERROR:  42P07: relation "hmdb_identifiers" already exists
LOCATION:  heap_create_with_catalog, heap.c:1067
psql:../sql/hmdb.sql:23: ERROR:  42P07: relation "hmdb_synonyms" already exists
LOCATION:  heap_create_with_catalog, heap.c:1067
SELECT 1
SELECT 43
INSERT 0 1
INSERT 0 43
UPDATE 69
UPDATE 69
psql:../sql/hmdb.sql:85: ERROR:  23505: duplicate key value violates unique constraint "hmdb_identifiers_accession_key"
DETAIL:  Key (accession)=(hmdb0240268) already exists.
SCHEMA NAME:  public
TABLE NAME:  hmdb_identifiers
CONSTRAINT NAME:  hmdb_identifiers_accession_key
LOCATION:  _bt_check_unique, nbtinsert.c:434
psql:../sql/hmdb.sql:87: ERROR:  23505: duplicate key value violates unique constraint "hmdb_synonyms_accession_synonym_key"
DETAIL:  Key (accession, synonym)=(hmdb0240268, (2e,6e)-3,7,11-trimethyldodeca-2,6,10-trien-1-yl acetate) already exists.
SCHEMA NAME:  public
TABLE NAME:  hmdb_synonyms
CONSTRAINT NAME:  hmdb_synonyms_accession_synonym_key
LOCATION:  _bt_check_unique, nbtinsert.c:434

DROP TABLE
DROP TABLE

id |  accession  |        name        | cas_number | pubchem_cid |    tablename     
----+-------------+--------------------+------------+-------------+------------------
  2 | hmdb0240268 | farnesyl acetate   | 4128-17-0  |      [NULL] | hmdb_identifiers
  5 | hmdb0000005 | 2-ketobutyric acid | 600-18-0   |          58 | hmdb_identifiers
  3 | hmdb0000002 | 1,3-diaminopropane | 109-76-2   |         428 | hmdb_identifiers
  1 | hmdb0000001 | 1-methylhistidine  | 332-80-9   |       92105 | hmdb_identifiers

count 
-------
    4

id |  accession  |                         synonym                          |   tablename   
----+-------------+----------------------------------------------------------+---------------
14 | hmdb0240268 | (2e,6e)-3,7,11-trimethyldodeca-2,6,10-trien-1-yl acetate | hmdb_synonyms
46 | hmdb0000005 | α-oxo-n-butyric acid                                     | hmdb_synonyms
45 | hmdb0000005 | α-oxo-n-butyrate                                         | hmdb_synonyms
41 | hmdb0000005 | α-ketobutyric acid                                       | hmdb_synonyms
40 | hmdb0000005 | α-ketobutyrate                                           | hmdb_synonyms

count 
-------
    69

  tablename   |                         synonym                          
---------------+----------------------------------------------------------
hmdb_synonyms | (2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoic acid
hmdb_synonyms | (2e,6e)-3,7,11-trimethyldodeca-2,6,10-trien-1-yl acetate
hmdb_synonyms | (2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoate
hmdb_synonyms | trimethylenediamine dihydrochloride
hmdb_synonyms | alpha-ketobutyric acid, sodium salt

[victoria@victoria bash]$

PSQL

After running the BASH script, if you were previously connected to the hmdb database and you run a query, you’ll get a warning about being disconnected; e.g.,

[hmdb]# SELECT * from hmdb_identifiers;

FATAL:  57P01: terminating connection due to administrator command
LOCATION:  ProcessInterrupts, postgres.c:2884
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

[hmdb]#

You can either:

  • rerun that PSQL query (as you are now reconnected);

  • manually reconnect, \c hmdb; then run your query; or

  • execute this command (note the preceding semicolon; you will get the error message, but this is a quick one-liner):

    ;SELECT * from hmdb_identifiers;

PSQL SCRIPT EXECUTION:

In any event, here is the PSQL output, after running the BASH script (note that due to the UTF8 specification in the PSQL script, we get Greek letters properly displayed, e.g. α-ketoglutarate | α-keto-δ-guanidinovalerate | α-oxo-N-butyric acid | ...):


[hmdb]# \c hmdb;
You are now connected to database "hmdb" as user "victoria".

[hmdb]# SELECT * from hmdb_identifiers LIMIT 5;

id  accession         name        cas_number pubchem_cid 
-- ----------- ------------------ ---------- -----------
1 hmdb0000001 1-methylhistidine  332-80-9         92105
2 hmdb0240268 farnesyl acetate   4128-17-0       [NULL]
3 hmdb0000002 1,3-diaminopropane 109-76-2           428
5 hmdb0000005 2-ketobutyric acid 600-18-0            58

[hmdb]# SELECT * from hmdb_synonyms LIMIT 5;

id  accession                          synonym                          
-- ----------- --------------------------------------------------------
1 hmdb0000001 (2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoic acid
2 hmdb0000001 1-methylhistidine
3 hmdb0000001 pi-methylhistidine
4 hmdb0000001 (2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoate
5 hmdb0000001 1 methylhistidine

[hmdb]# SELECT synonym from hmdb_synonyms ORDER BY length(synonym) DESC LIMIT 5;

                        synonym                          
--------------------------------------------------------
(2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoic acid
(2e,6e)-3,7,11-trimethyldodeca-2,6,10-trien-1-yl acetate
(2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoate
trimethylenediamine dihydrochloride
alpha-ketobutyric acid, sodium salt

[hmdb]# SELECT * from hmdb_synonyms ORDER BY accession DESC LIMIT 4;

id  accession                          synonym                          
-- ----------- --------------------------------------------------------
14 hmdb0240268 (2e,6e)-3,7,11-trimethyldodeca-2,6,10-trien-1-yl acetate
29 hmdb0000005 2-oxobutyric acid
28 hmdb0000005 2-ketobutanoic acid
30 hmdb0000005 3-methyl pyruvic acid

[hmdb]# DELETE FROM hmdb_identifiers WHERE accession='hmdb0240268';
DELETE 1

[hmdb]# SELECT count(id) from hmdb_synonyms;

count 
-----
  68

[hmdb]# SELECT * from hmdb_synonyms ORDER BY accession DESC LIMIT 3;

id  accession         synonym        
-- ----------- ---------------------
28 hmdb0000005 2-ketobutanoic acid
29 hmdb0000005 2-oxobutyric acid
30 hmdb0000005 3-methyl pyruvic acid

[hmdb]# SELECT * from hmdb_synonyms WHERE synonym ILIKE '%lpha-keto%';

id  accession                synonym               
-- ----------- -----------------------------------
31 hmdb0000005 alpha-ketobutyrate
32 hmdb0000005 alpha-ketobutyric acid
61 hmdb0000005 alpha-keto-n-butyrate
62 hmdb0000005 alpha-keto-n-butyric acid
63 hmdb0000005 alpha-ketobutric acid
70 hmdb0000005 alpha-ketobutyric acid, sodium salt


## DATA INGRESS CHECK:
## ===================

[hmdb]# SELECT * from hmdb_synonyms as synonym where accession ILIKE 'hmdb0106899';

  id    accession                                                  synonym                                                 
------ ----------- -------------------------------------------------------------------------------------------------------
923304 hmdb0106899 (2s)-1-[(15-methylhexadecanoyl)oxy]-3-[(12-methyltridecanoyl)oxy]propan-2-yl 17-methyloctadecanoic acid
923305 hmdb0106899 1-isoheptadecanoyl-2-isononadecanoyl-3-isotetradecanoyl-glycerol
923306 hmdb0106899 tg(i-17:0/i-19:0/i-14:0)
923307 hmdb0106899 triacylglycerol
923308 hmdb0106899 triglyceride
923309 hmdb0106899 tg(50:0)
923310 hmdb0106899 tracylglycerol(50:0)
923311 hmdb0106899 tag(50:0)
923312 hmdb0106899 tracylglycerol(i-17:0/i-19:0/i-14:0)
923313 hmdb0106899 tag(i-17:0/i-19:0/i-14:0)

[hmdb]#

## COMPARE THOSE DATA (ABOVE) WITH THE SOURCE DATA FILE DATA (not uploaded):
## hmdb_metabolites-07.xml :

[victoria@victoria data]$ dpl

  Wed May  2 19:46:53 PDT 2018
  /mnt/Vancouver/Programming/data/hmdb/data

  total 3606796
  -rw-r--r-- 1 victoria victoria 500008714 Apr 25 15:32 hmdb_metabolites-01.xml
  -rw-r--r-- 1 victoria victoria 500033500 Apr 25 15:33 hmdb_metabolites-02.xml
  -rw-r--r-- 1 victoria victoria 500012180 Apr 25 15:34 hmdb_metabolites-03.xml
  -rw-r--r-- 1 victoria victoria 500013231 Apr 25 15:35 hmdb_metabolites-04.xml
  -rw-r--r-- 1 victoria victoria 500013021 Apr 25 15:37 hmdb_metabolites-05.xml
  -rw-r--r-- 1 victoria victoria 500005018 Apr 25 15:38 hmdb_metabolites-06.xml
  -rw-r--r-- 1 victoria victoria 500020416 Apr 25 15:39 hmdb_metabolites-07.xml
  -rw-r--r-- 1 victoria victoria 193200978 Apr 25 15:40 hmdb_metabolites-08.xml

[victoria@victoria data]$ rg . -i -A25 -e "hmdb0106899"

  hmdb_metabolites-07.xml
  948166:  <accession>HMDB0106899</accession>
  948167-  <status>expected</status>
  948168-  <secondary_accessions>
  948169-  </secondary_accessions>
  948170-  <name>TG(i-17:0/i-19:0/i-14:0)</name>
  948171-  <description>TG(i-17:0/i-19:0/i-14:0) belongs to the family of triradyglycerols, which are glycerolipids lipids containing a common glycerol backbone to which at least one fatty acyl group is esterified. Their general formula is [R1]OCC(CO[R2])O[R3]. TG(i-17:0/i-19:0/i-14:0) is made up of one 15-methylhexadecanoyl(R1), one 17-methyloctadecanoyl(R2), and one 12-methyltridecanoyl(R3).</description>

  948172-  <synonyms>
  948173-    <synonym>(2S)-1-[(15-Methylhexadecanoyl)oxy]-3-[(12-methyltridecanoyl)oxy]propan-2-yl 17-methyloctadecanoic acid</synonym>
  948174-    <synonym>1-isoheptadecanoyl-2-isononadecanoyl-3-isotetradecanoyl-glycerol</synonym>
  948175-    <synonym>TG(i-17:0/i-19:0/i-14:0)</synonym>
  948176-    <synonym>Triacylglycerol</synonym>
  948177-    <synonym>Triglyceride</synonym>
  948178-    <synonym>TG(50:0)</synonym>
  948179-    <synonym>Tracylglycerol(50:0)</synonym>
  948180-    <synonym>TAG(50:0)</synonym>
  948181-    <synonym>Tracylglycerol(i-17:0/i-19:0/i-14:0)</synonym>
  948182-    <synonym>TAG(i-17:0/i-19:0/i-14:0)</synonym>
  948183-  </synonyms>

  948184-  <chemical_formula>C53H102O6</chemical_formula>
  948185-  <average_molecular_weight>835.393</average_molecular_weight>
  948186-  <monisotopic_molecular_weight>834.767641004</monisotopic_molecular_weight>
  948187-  <iupac_name>(2S)-1-[(15-methylhexadecanoyl)oxy]-3-[(12-methyltridecanoyl)oxy]propan-2-yl 17-methyloctadecanoate</iupac_name>
  948188-  <traditional_iupac>(2S)-1-[(15-methylhexadecanoyl)oxy]-3-[(12-methyltridecanoyl)oxy]propan-2-yl 17-methyloctadecanoate</traditional_iupac>
  948189-  <cas_registry_number/>
  948190-  <smiles>[H][C@](COC(=O)CCCCCCCCCCCCCC(C)C)(COC(=O)CCCCCCCCCCC(C)C)OC(=O)CCCCCCCCCCCCCCCC(C)C</smiles>
  948191-  <inchi>InChI=1S/C53H102O6/c1-47(2)39-33-27-21-15-11-8-7-9-13-18-26-32-38-44-53(56)59-50(46-58-52(55)43-37-31-25-20-19-23-29-35-41-49(5)6)45-57-51(54)42-36-30-24-17-14-10-12-16-22-28-34-40-48(3)4/h47-50H,7-46H2,1-6H3/t50-/m0/s1</inchi>

[victoria@victoria data]$ 

[hmdb]# \l

                             List of databases
   Name     Owner   Encoding   Collate      Ctype      Access privileges   
---------- -------- -------- ----------- ----------- ---------------------
hmdb       victoria UTF8     en_US.UTF-8 en_US.UTF-8 
metabolism victoria UTF8     en_US.UTF-8 en_US.UTF-8 
ncbi       victoria UTF8     en_US.UTF-8 en_US.UTF-8 
postgres   postgres UTF8     en_US.UTF-8 en_US.UTF-8 
template0  postgres UTF8     en_US.UTF-8 en_US.UTF-8 =c/postgres          +
                                                     postgres=CTc/postgres
template1  postgres UTF8     en_US.UTF-8 en_US.UTF-8 =c/postgres          +
                                                     postgres=CTc/postgres
zzz        victoria UTF8     en_US.UTF-8 en_US.UTF-8 

[hmdb]# \dt

           List of relations
Schema       Name        Type   Owner   
------ ----------------- ----- --------
public hmdb_identifiers  table victoria
public hmdb_synonyms     table victoria

[hmdb]# \d hmdb_identifiers

                                  Table "public.hmdb_identifiers"
  Column             Type          Collation Nullable                   Default                    
----------- ---------------------- --------- -------- --------------------------------------------
id          integer                          not null nextval('hmdb_identifiers_id_seq'::regclass)
accession   character varying(15)            not null 
name        character varying(300)           not null 
cas_number  character varying(12)                     
pubchem_cid integer                                   
Indexes:
    "hmdb_identifiers_pkey" PRIMARY KEY, btree (id)
    "hmdb_identifiers_accession_key" UNIQUE CONSTRAINT, btree (accession)
Referenced by:
    TABLE "hmdb_synonyms" CONSTRAINT "hmdb_synonyms_accession_fkey" FOREIGN KEY (accession) REFERENCES hmdb_identifiers(accession) ON UPDATE CASCADE ON DELETE CASCADE

[hmdb]# \d hmdb_synonyms

                                 Table "public.hmdb_synonyms"
 Column            Type          Collation Nullable                  Default                  
--------- ---------------------- --------- -------- -----------------------------------------
id        integer                          not null nextval('hmdb_synonyms_id_seq'::regclass)
accession character varying(15)            not null 
synonym   character varying(300)           not null 
Indexes:
    "hmdb_synonyms_pkey" PRIMARY KEY, btree (id)
    "hmdb_synonyms_accession_synonym_key" UNIQUE CONSTRAINT, btree (accession, synonym)
Foreign-key constraints:
    "hmdb_synonyms_accession_fkey" FOREIGN KEY (accession) REFERENCES hmdb_identifiers(accession) ON UPDATE CASCADE ON DELETE CASCADE

[hmdb]# SELECT * from hmdb_identifiers LIMIT 5;

id  accession          name          cas_number pubchem_cid 
-- ----------- --------------------- ---------- -----------
 1 hmdb0000001 1-methylhistidine     332-80-9         92105
 2 hmdb0000002 1,3-diaminopropane    109-76-2           428
 3 hmdb0000005 2-ketobutyric acid    600-18-0            58
 4 hmdb0000008 2-hydroxybutyric acid 600-15-7         11266
 5 hmdb0000010 2-methoxyestrone      362-08-3        440624

[hmdb]# SELECT * from hmdb_synonyms LIMIT 5;

 id   accession          synonym          
---- ----------- ------------------------
4117 hmdb0000182 2,6-diaminohexanoic acid
4118 hmdb0000182 6-amino-aminutrin
4119 hmdb0000182 6-amino-l-norleucine
4120 hmdb0000182 a-lysine
4121 hmdb0000182 alpha-lysine

[hmdb]# SELECT i.accession, i.name, s.synonym FROM hmdb_identifiers as i RIGHT JOIN hmdb_synonyms AS s ON i.accession = s.accession LIMIT 50;

 accession         name                                synonym                          
----------- ------------------ --------------------------------------------------------
hmdb0000001 1-methylhistidine  1 methylhistidine
hmdb0000001 1-methylhistidine  1-methylhistidine
hmdb0000001 1-methylhistidine  1-methyl histidine
hmdb0000001 1-methylhistidine  1-methyl-histidine
hmdb0000001 1-methylhistidine  1-methylhistidine dihydrochloride
hmdb0000001 1-methylhistidine  1-methyl-l-histidine
hmdb0000001 1-methylhistidine  1-mhis
hmdb0000001 1-methylhistidine  1-n-methyl-l-histidine
hmdb0000001 1-methylhistidine  (2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoate
hmdb0000001 1-methylhistidine  (2s)-2-amino-3-(1-methyl-1h-imidazol-4-yl)propanoic acid
hmdb0000001 1-methylhistidine  l-1-methylhistidine
hmdb0000001 1-methylhistidine  n1-methyl-l-histidine
hmdb0000001 1-methylhistidine  pi-methylhistidine
hmdb0000002 1,3-diaminopropane 1,3-diaminepropane
hmdb0000002 1,3-diaminopropane 1,3-diamino-n-propane
hmdb0000002 1,3-diaminopropane 1,3-propanediamine
hmdb0000002 1,3-diaminopropane 1,3-propylenediamine
hmdb0000002 1,3-diaminopropane 1,3-trimethylenediamine
hmdb0000002 1,3-diaminopropane 3-aminopropylamine
hmdb0000002 1,3-diaminopropane a,w-propanediamine
hmdb0000002 1,3-diaminopropane propane-1,3-diamine
hmdb0000002 1,3-diaminopropane tn
hmdb0000002 1,3-diaminopropane trimethylenediamine
hmdb0000002 1,3-diaminopropane trimethylenediamine dihydrochloride
hmdb0000002 1,3-diaminopropane trimethylenediamine hydrochloride
hmdb0000005 2-ketobutyric acid 2-ketobutanoate
hmdb0000005 2-ketobutyric acid 2-ketobutanoic acid
hmdb0000005 2-ketobutyric acid 2-ketobutyrate
hmdb0000005 2-ketobutyric acid 2-oxobutanoate
hmdb0000005 2-ketobutyric acid 2-oxo-butanoate
hmdb0000005 2-ketobutyric acid 2-oxo-butanoic acid
hmdb0000005 2-ketobutyric acid 2-oxobutanoic acid
hmdb0000005 2-ketobutyric acid 2-oxobutyrate
hmdb0000005 2-ketobutyric acid 2-oxo-butyrate
hmdb0000005 2-ketobutyric acid 2-oxo-butyric acid
hmdb0000005 2-ketobutyric acid 2-oxobutyric acid
hmdb0000005 2-ketobutyric acid 2-oxo-n-butyrate
hmdb0000005 2-ketobutyric acid 2-oxo-n-butyric acid
hmdb0000005 2-ketobutyric acid 3-methylpyruvate
hmdb0000005 2-ketobutyric acid 3-methyl pyruvate
hmdb0000005 2-ketobutyric acid 3-methyl pyruvic acid
hmdb0000005 2-ketobutyric acid 3-methylpyruvic acid
hmdb0000005 2-ketobutyric acid a-ketobutyrate
hmdb0000005 2-ketobutyric acid a-ketobutyric acid
hmdb0000005 2-ketobutyric acid a-keto-n-butyrate
hmdb0000005 2-ketobutyric acid a-keto-n-butyric acid
hmdb0000005 2-ketobutyric acid alpha-ketobutric acid
hmdb0000005 2-ketobutyric acid alpha-ketobutyrate
hmdb0000005 2-ketobutyric acid alpha-ketobutyric acid
hmdb0000005 2-ketobutyric acid alpha-ketobutyric acid, sodium salt

[hmdb]# SELECT i.accession, i.name, s.synonym FROM hmdb_identifiers as i RIGHT JOIN hmdb_synonyms AS s ON i.accession = s.accession WHERE s.synonym ILIKE '%ketobut%' LIMIT 25;

 accession               name                                      synonym                         
----------- ------------------------------- ------------------------------------------------------
hmdb0001553 2-oxo-4-methylthiobutanoic acid 4-methylthio-2-ketobutanoate
hmdb0001553 2-oxo-4-methylthiobutanoic acid 4-methylthio-2-ketobutanoic acid
hmdb0001553 2-oxo-4-methylthiobutanoic acid 4-methylthio-2-ketobutyrate
hmdb0001553 2-oxo-4-methylthiobutanoic acid s-methyl-alpha-ketobutyric acid
hmdb0001553 2-oxo-4-methylthiobutanoic acid 4-methylthio-2-ketobutyric acid
hmdb0003407 diacetyl                        2,3-diketobutane
hmdb0003407 diacetyl                        diketobutane
hmdb0033851 4,4-dimethoxy-2-butanone        3-ketobutyraldehyde dimethyl acetal
hmdb0003771 2-methylacetoacetic acid        2-methyl-3-ketobutyric acid
hmdb0031204 4-hydroxy-2-oxobutanoic acid    g-hydroxy-a-ketobutyric acid
hmdb0006454 l-2-amino-3-oxobutanoic acid    2-amino-3-ketobutyric acid
hmdb0006454 l-2-amino-3-oxobutanoic acid    2-amino-3-ketobutyrate
hmdb0000005 2-ketobutyric acid              a-ketobutyrate
hmdb0033698 acetoin acetate                 2-ketobutan-3-yl acetate
hmdb0033851 4,4-dimethoxy-2-butanone        3-ketobutyraldehyde dimethylacetal
hmdb0036396 3-methylbutyl 3-oxobutanoate    3-methylbutyl beta-ketobutyrate
hmdb0036230 1-methyl-2-oxopropyl butyrate   2-ketobutan-3-yl butanoate
hmdb0036396 3-methylbutyl 3-oxobutanoate    isoamyl beta-ketobutyrate
hmdb0036396 3-methylbutyl 3-oxobutanoate    isopentyl beta-ketobutyrate
hmdb0038256 geranyl acetoacetate            geranyl beta-ketobutyrate
hmdb0039324 xi-3-hydroxy-2-oxobutanoic acid 3-hydroxy-alpha-ketobutyrate
hmdb0039324 xi-3-hydroxy-2-oxobutanoic acid beta-hydroxy-alpha-ketobutyric acid
hmdb0041914 ketophenylbutazone              1,2-diphenyl-4-(gamma-ketobutyl)-3,5-pyrazolidinedione
hmdb0041914 ketophenylbutazone              1,2-diphenyl-4-(g-ketobutyl)-3,5-pyrazolidinedione
hmdb0041914 ketophenylbutazone              1,2-diphenyl-4-(γ-ketobutyl)-3,5-pyrazolidinedione

[hmdb]# SELECT i.accession, i.name, s.synonym FROM hmdb_identifiers as i JOIN hmdb_synonyms AS s ON i.accession = s.accession WHERE s.synonym ILIKE '%α-keto%' LIMIT 25;

 accession             name                       synonym             
----------- -------------------------- ------------------------------
hmdb0000005 2-ketobutyric acid         α-ketobutyrate
hmdb0000005 2-ketobutyric acid         α-ketobutyric acid
hmdb0000019 alpha-ketoisovaleric acid  α-ketoisovalerate
hmdb0000019 alpha-ketoisovaleric acid  α-ketoisovaleric acid
hmdb0000019 alpha-ketoisovaleric acid  α-keto-isovalerate
hmdb0000019 alpha-ketoisovaleric acid  α-keto-isovaleric acid
hmdb0000019 alpha-ketoisovaleric acid  α-ketovaline
hmdb0000119 glyoxylic acid             α-ketoacetate
hmdb0000119 glyoxylic acid             α-ketoacetic acid
hmdb0000695 ketoleucine                α-ketoisocaproate
hmdb0000205 phenylpyruvic acid         α-ketohydrocinnamate
hmdb0000205 phenylpyruvic acid         α-ketohydrocinnamic acid
hmdb0000491 3-methyl-2-oxovaleric acid α-keto-β-methyl-n-valerate
hmdb0000491 3-methyl-2-oxovaleric acid α-keto-β-methyl-n-valeric acid
hmdb0000491 3-methyl-2-oxovaleric acid α-keto-β-methylvalerate
hmdb0000491 3-methyl-2-oxovaleric acid α-keto-β-methylvaleric acid
hmdb0000208 oxoglutaric acid           α-ketoglutarate
hmdb0000208 oxoglutaric acid           α-ketoglutaric acid
hmdb0000243 pyruvic acid               α-ketopropionate
hmdb0000243 pyruvic acid               α-ketopropionic acid
hmdb0000695 ketoleucine                α-ketoisocaproic acid
hmdb0031522 mesoxalic acid             α-ketomalonate
hmdb0001587 phenylglyoxylic acid       α-ketophenylacetate
hmdb0001587 phenylglyoxylic acid       α-ketophenylacetic acid
hmdb0004225 2-oxoarginine              α-keto-δ-guanidinovalerate

[hmdb]# 

DATA:

I uploaded the XML data files used in this answer at this Gist

Direct links: