To: hovy CC: philpot Subject: Omega3 stats --text follows this line-- I'm reporting this by "subontology" where a subontology is (typically) a concept space along with related sense and lexical item spaces. I'm further grouping by sets of related subontologies; this is the way the files and downloads are/will be organized. NB: Because Omega is the organizing center, the stats below sometimes appear to under-represent the complexity of the satellite components. Sizes are as stored in indexed tabular database format on disk, so they are good for relative comparisons if nothing else; the size data I gave you before was as-materialized in the db, and so includes indexes and overhead: a better total size is 13 Gb. Measurements in terms of Mb, Gb below are bytes; in terms of M, K are counts. 1. Omega base ontology A. O3 subontology: Omega [76 Mb] 791 distinct entity/entity relation types (where an entity is a concept, lexical item, or sense; most of these are Mikro-derived and used 50 or fewer times) 36 distinct attribute types (link between entity and a literal) 2.7 M assertions (= relation links + attribute links) 120 K concepts 156 K EN lexical items 28 K ES lexical items 270K senses B. D subontology: WordNet subject domains (Magnini & Cavaglia) [1.2 Mb] 1 relation type 3 attribute types 86 K assertions 166 concepts 2. Verb frames C. PFRM subontology: Propbank [5.3 Mb] 1 relation type 4 attribute types 40 K assertions 4600 (frame) concepts D. TFRM subontology: Theta grids [6.3 Mb] 16 attribute types 73 K assertions 13 K (frame) concepts E. FFRM subontology: Framenet linkage via Namhee [2.8 Mb] 4 attribute types 6 relation types 73 K assertions 5 K (frame) concepts F. WNVFRM: Wordnet-2.0 "verb frames" [1.8 Mb] 1 attribute type 63133 assertions 35 (frame) concepts: skeletal like 'Something someone' 3. Harvested Instances G. MFI subontology: from Michael Fleischman [450 Mb] 5 attribute types 1 relation types 15 M assertions 467 K instances (concepts) H. PPI subontology: from Patrick Pantel, via Eric [13 Mb] 3 attribute types 3 relation types 315 K assertions 26 K senses 26 K EN lexical items 26 K concepts I. DRI subontology: from Deepak Ravichandran, via Eric [380 Mb] 4 attribute types 3 relation types 8.6 M assertions 777 K senses 777 K EN lexical items 738 K concepts 4. Geographical information: GEO subontology. Composed of two complementary sections J. GNIS section (from USGS: USA information only) [1.4 Gb] 21 attribute types 22 relation types 30 M assertions 1.9 M senses 1.1 M named entity lexical items 1.9 M concepts (distinct geo features) K. GNS section (from NGA: international only) [7.9 Gb] 5 attribute types 2 relation types 203 M assertions (updated) 7.5 M senses M (updated) 5.0 M named entity lexical items (updated) 5.4 M concepts (distinct geo features) (updated) 5. SEMCOR subontology. Eric compiled this. It contains a lot of statistical annotations to O3 concepts. Beyond that I'm not clear. [7.5 Mb] 1 attribute type 166 K assertions 6. Wordnet Topic Signatures, from Eneko Agirre, via Eric. Eric mapped these from WN 1.6 to WN 2.0 as I recall. [2.7 Gb] 3 attribute types 197 M assertions 7. Ontobank annotations (Nick White) subontology: A way to link Omega and Ontobank. Maybe we won't deliver this. [540 Kb] 10 attribute types 3 relation types 11 K assertions 923 senses 533 annotation concepts: these are like "multi-senses" 133 EN lexical items