To: hovy
CC: philpot
Subject: Omega3 stats
--text follows this line--
I'm reporting this by "subontology" where a subontology is (typically)
a concept space along with related sense and lexical item spaces. I'm
further grouping by sets of related subontologies; this is the way the
files and downloads are/will be organized. NB: Because Omega is the
organizing center, the stats below sometimes appear to under-represent
the complexity of the satellite components. Sizes are as stored in
indexed tabular database format on disk, so they are good for relative
comparisons if nothing else; the size data I gave you before was
as-materialized in the db, and so includes indexes and overhead: a
better total size is 13 Gb. Measurements in terms of Mb, Gb below are
bytes; in terms of M, K are counts.
1. Omega base ontology
A. O3 subontology: Omega [76 Mb]
791 distinct entity/entity relation types (where an entity is a
concept, lexical item, or sense; most of these are Mikro-derived
and used 50 or fewer times)
36 distinct attribute types (link between entity and a literal)
2.7 M assertions (= relation links + attribute links)
120 K concepts
156 K EN lexical items
28 K ES lexical items
270K senses
B. D subontology: WordNet subject domains (Magnini & Cavaglia) [1.2 Mb]
1 relation type
3 attribute types
86 K assertions
166 concepts
2. Verb frames
C. PFRM subontology: Propbank [5.3 Mb]
1 relation type
4 attribute types
40 K assertions
4600 (frame) concepts
D. TFRM subontology: Theta grids [6.3 Mb]
16 attribute types
73 K assertions
13 K (frame) concepts
E. FFRM subontology: Framenet linkage via Namhee [2.8 Mb]
4 attribute types
6 relation types
73 K assertions
5 K (frame) concepts
F. WNVFRM: Wordnet-2.0 "verb frames" [1.8 Mb]
1 attribute type
63133 assertions
35 (frame) concepts: skeletal like 'Something someone'
3. Harvested Instances
G. MFI subontology: from Michael Fleischman [450 Mb]
5 attribute types
1 relation types
15 M assertions
467 K instances (concepts)
H. PPI subontology: from Patrick Pantel, via Eric [13 Mb]
3 attribute types
3 relation types
315 K assertions
26 K senses
26 K EN lexical items
26 K concepts
I. DRI subontology: from Deepak Ravichandran, via Eric [380 Mb]
4 attribute types
3 relation types
8.6 M assertions
777 K senses
777 K EN lexical items
738 K concepts
4. Geographical information: GEO subontology. Composed of two
complementary sections
J. GNIS section (from USGS: USA information only) [1.4 Gb]
21 attribute types
22 relation types
30 M assertions
1.9 M senses
1.1 M named entity lexical items
1.9 M concepts (distinct geo features)
K. GNS section (from NGA: international only) [7.9 Gb]
5 attribute types
2 relation types
203 M assertions (updated)
7.5 M senses M (updated)
5.0 M named entity lexical items (updated)
5.4 M concepts (distinct geo features) (updated)
5. SEMCOR subontology. Eric compiled this. It contains a lot of
statistical annotations to O3 concepts. Beyond that I'm not
clear. [7.5 Mb]
1 attribute type
166 K assertions
6. Wordnet Topic Signatures, from Eneko Agirre, via Eric. Eric mapped
these from WN 1.6 to WN 2.0 as I recall. [2.7 Gb]
3 attribute types
197 M assertions
7. Ontobank annotations (Nick White) subontology: A way to link Omega
and Ontobank. Maybe we won't deliver this. [540 Kb]
10 attribute types
3 relation types
11 K assertions
923 senses
533 annotation concepts: these are like "multi-senses"
133 EN lexical items