This page provides access to and statistics about class-specific subsets of the Schema.org data contained in the October 2024 version of the Web Data Commons Microdata and JSON-LD corpus. The datasets are part of the Web Data Commons Schema.org Data Set Series
As many users are only interested in specific types of Schema.org data (like product data, event data, job postings,
or data describing local businesses), we have created class-specific subsets out of the complete and merged Microdata and JSON-LD corpora for a
selection of schema.org classes.
The subsets contain all instances of a specific class of either formats as well as all other data that is found on
the webpages containing these instances. For example, a page containing data about a product might also contain
reviews and offers for this product; a page containing data about an event might also contain data about the
location of the event and the persons involved in the event.
The data is represented in N-Quads format, meaning that the forth
element of each quad contains the URL of the webpage from which the data was extracted.
To facilitate the download and access to the class specific data, we provide the schema.org subsets in chunks. Each
chunk contains quads of specific pay-level-domains (PLDs), i.e. all quads of one PLD, e.g. yummly.com,
are organized within the same chunk file. Additionally, we provide lookup files containing the mappings between PLDs
and their corresponding chunks as well as csv files with PLD-specific statistics.
Please note that:
Schema.org Subset | General Stats | Related Classes | Size (# Files) | Download (Sample) | PLD to File look-up PLD Specific Stats |
---|---|---|---|---|---|
AdministrativeArea | Quads: 96,086,119 URLs: 521,585 Hosts: 4,933 | http://schema.org/ListItem (1,499,751)http://schema.org/ImageObject (1,454,619) http://schema.org/AdministrativeArea (1,301,571)http://schema.org/Person (976,876) http://schema.org/PostalAddress (966,279) | 1.25 GB (8) |
AdministrativeArea (sample) |
lookup_file pld_stats_file |
Airport | Quads: 53,684,719 URLs: 173,702 Hosts: 1,003 | http://schema.org/Airport (3,562,733)http://schema.org/GeoCoordinates (2,546,608) http://schema.org/Flight (1,331,733)http://schema.org/Airline (1,258,369) http://schema.org/Offer (1,139,954) | 490.76 MB (5) |
Airport (sample) |
lookup_file pld_stats_file |
Answer | Quads: 1,617,417,253 URLs: 14,298,778 Hosts: 414,222 | http://schema.org/Answer (60,188,640)http://schema.org/Question (51,845,095) http://schema.org/ListItem (32,575,842)https://schema.org/Answer (22,038,464) http://schema.org/ImageObject (20,709,476) | 29.69 GB (126) |
Answer (sample) |
lookup_file pld_stats_file |
Book | Quads: 249,603,999 URLs: 4,208,106 Hosts: 18,993 | http://schema.org/Book (10,291,224)http://schema.org/Country (6,776,472) http://schema.org/Person (5,755,257)http://schema.org/Offer (3,590,467) http://schema.org/ListItem (3,350,195) | 4.35 GB (20) |
Book (sample) |
lookup_file pld_stats_file |
City | Quads: 235,105,383 URLs: 1,156,025 Hosts: 16,149 | http://schema.org/City (5,772,799)http://schema.org/ImageObject (4,144,523) http://schema.org/Person (4,069,973)http://schema.org/PostalAddress (3,790,692) http://schema.org/OpeningHoursSpecification (2,991,182) | 2.62 GB (19) |
City (sample) |
lookup_file pld_stats_file |
ClaimReview | Quads: 3,919,715 URLs: 49,708 Hosts: 343 | http://schema.org/Organization (123,301)http://schema.org/ImageObject (95,783) http://schema.org/ListItem (93,535)http://schema.org/Person (66,621) http://schema.org/ClaimReview (59,710) | 69.39 MB (1) |
ClaimReview (sample) |
lookup_file pld_stats_file |
CollegeOrUniversity | Quads: 112,777,803 URLs: 1,001,649 Hosts: 5,121 | http://schema.org/ImageObject (4,911,790)http://schema.org/CollegeOrUniversity (3,892,042) http://schema.org/Person (3,167,877)http://schema.org/PostalAddress (2,714,298) http://schema.org/GeoCoordinates (1,995,606) | 1.25 GB (9) |
CollegeOrUniversity (sample) |
lookup_file pld_stats_file |
Continent | Quads: 759,731 URLs: 6,752 Hosts: 66 | http://schema.org/City (57,883)http://schema.org/AdministrativeArea (42,597) http://schema.org/Country (10,423)http://schema.org/Continent (7,337) http://schema.org/GeoCoordinates (5,692) | 9.5 MB (1) |
Continent (sample) |
lookup_file pld_stats_file |
Country | Quads: 950,481,115 URLs: 7,110,847 Hosts: 35,296 | http://schema.org/Country (31,979,996)http://schema.org/ListItem (23,422,340) http://schema.org/Organization (15,663,556)http://schema.org/PostalAddress (11,083,956) http://schema.org/Offer (11,007,053) | 12.7 GB (73) |
Country (sample) |
lookup_file pld_stats_file |
CreativeWork | Quads: 2,064,113,912 URLs: 45,276,024 Hosts: 1,325,636 | https://schema.org/CreativeWork (80,071,257)https://schema.org/SiteNavigationElement (55,892,203)https://schema.org/Person (40,215,674)https://schema.org/WPHeader (32,248,539)https://schema.org/WPFooter (30,340,361) | 84.08 GB (160) |
CreativeWork (sample) |
lookup_file pld_stats_file |
Dataset | Quads: 58,627,800 URLs: 694,158 Hosts: 2,024 | http://schema.org/DataDownload (2,584,058)http://schema.org/Dataset (1,559,803) http://schema.org/Organization (1,056,210)http://schema.org/PropertyValue (744,074) http://schema.org/Person (737,459) | 844.09 MB (5) |
Dataset (sample) |
lookup_file pld_stats_file |
EducationalOrganization | Quads: 67,328,226 URLs: 830,258 Hosts: 11,630 | http://schema.org/EducationalOrganization (1,393,334)http://schema.org/ListItem (1,202,342) http://schema.org/ImageObject (983,689)http://schema.org/PostalAddress (955,404) http://schema.org/Person (627,438) | 1.04 GB (6) |
EducationalOrganization (sample) |
lookup_file pld_stats_file |
Event | Quads: 1,959,220,573 URLs: 14,077,815 Hosts: 399,470 | http://schema.org/Event (62,979,080)http://schema.org/Place (47,079,855) http://schema.org/PostalAddress (36,842,757)http://schema.org/Person (23,766,448) http://schema.org/ListItem (19,234,094) | 24.16 GB (152) |
Event (sample) |
lookup_file pld_stats_file |
FAQPage | Quads: 1,416,338,018 URLs: 11,600,305 Hosts: 385,257 | http://schema.org/Answer (48,925,667)http://schema.org/Question (48,641,655) http://schema.org/ListItem (30,144,276)http://schema.org/ImageObject (20,603,938) https://schema.org/Answer (17,421,753) | 25.04 GB (110) |
FAQPage (sample) |
lookup_file pld_stats_file |
GeoCoordinates | Quads: 3,183,262,704 URLs: 25,257,658 Hosts: 567,267 | http://schema.org/ListItem (73,477,803)http://schema.org/PostalAddress (53,036,985) http://schema.org/GeoCoordinates (50,514,808)http://schema.org/OpeningHoursSpecification (32,389,059)http://schema.org/Offer (31,583,240) | 40.99 GB (247) |
GeoCoordinates (sample) |
lookup_file pld_stats_file |
GovernmentOrganization | Quads: 25,786,196 URLs: 389,511 Hosts: 1,940 | http://schema.org/ListItem (1,425,244)http://schema.org/GovernmentOrganization (547,322) http://schema.org/ImageObject (478,534)http://schema.org/PropertyValue (289,526) http://schema.org/PostalAddress (228,173) | 393.95 MB (2) |
GovernmentOrganization (sample) |
lookup_file pld_stats_file |
Hospital | Quads: 17,744,405 URLs: 178,173 Hosts: 2,489 | http://schema.org/PostalAddress (408,831)http://schema.org/Hospital (341,547) https://schema.org/MedicalProcedure (265,300)http://schema.org/GeoCoordinates (230,028) http://schema.org/ListItem (193,701) | 238.78 MB (2) |
Hospital (sample) |
lookup_file pld_stats_file |
Hotel | Quads: 244,120,606 URLs: 1,961,745 Hosts: 24,641 | http://schema.org/ImageObject (12,124,449)http://schema.org/Hotel (4,413,153) http://schema.org/PostalAddress (4,118,968)http://schema.org/ListItem (4,004,110) http://schema.org/AggregateRating (2,332,147) | 3.6 GB (19) |
Hotel (sample) |
lookup_file pld_stats_file |
JobPosting | Quads: 175,208,843 URLs: 3,606,167 Hosts: 63,320 | http://schema.org/PostalAddress (6,754,011)http://schema.org/Place (6,689,046) http://schema.org/Organization (4,452,042)http://schema.org/JobPosting (4,068,469) http://schema.org/ListItem (2,519,111) | 7.09 GB (14) |
JobPosting (sample) |
lookup_file pld_stats_file |
LakeBodyOfWater | Quads: 35,276 URLs: 689 Hosts: 100 | http://schema.org/ImageObject (1,060)http://schema.org/Organization (765) http://schema.org/WebPage (687)http://schema.org/LakeBodyOfWater (681)http://schema.org/Person (562) | 1.72 MB (1) |
LakeBodyOfWater (sample) |
lookup_file pld_stats_file |
LandmarksOrHistoricalBuildings | Quads: 3,005,491 URLs: 33,102 Hosts: 460 | http://schema.org/ImageObject (112,997)http://schema.org/LandmarksOrHistoricalBuildings (95,368)http://schema.org/PostalAddress (64,910)http://schema.org/CreativeWork (50,724) http://schema.org/OpeningHoursSpecification (49,374) | 58.71 MB (1) |
LandmarksOrHistoricalBuildings (sample) |
lookup_file pld_stats_file |
Language | Quads: 586,554,652 URLs: 4,742,134 Hosts: 11,556 | http://schema.org/Person (25,797,772)http://schema.org/Comment (19,971,596) http://schema.org/ListItem (10,307,155)http://schema.org/Language (9,360,775) http://schema.org/InteractionCounter (7,608,122) | 10.42 GB (45) |
Language (sample) |
lookup_file pld_stats_file |
Library | Quads: 7,343,688 URLs: 206,299 Hosts: 938 | http://schema.org/Library (220,963)http://schema.org/Place (115,805) http://schema.org/CreativeWork (108,818)http://schema.org/ListItem (95,132) http://schema.org/PostalAddress (90,187) | 117.54 MB (1) |
Library (sample) |
lookup_file pld_stats_file |
LocalBusiness | Quads: 2,246,054,619 URLs: 27,185,774 Hosts: 1,456,656 | http://schema.org/ListItem (68,904,089)http://schema.org/LocalBusiness (42,251,508) http://schema.org/PostalAddress (39,581,828)http://schema.org/ImageObject (16,997,007) http://schema.org/Offer (16,958,951) | 29.72 GB (175) |
LocalBusiness (sample) |
lookup_file pld_stats_file |
Mountain | Quads: 232,960 URLs: 11,296 Hosts: 63 | http://schema.org/Mountain (20,970)http://schema.org/GeoCoordinates (13,074) http://schema.org/propertyValue (5,749)http://schema.org/ListItem (1,101) http://schema.org/Place (712) | 5.23 MB (1) |
Mountain (sample) |
lookup_file pld_stats_file |
Movie | Quads: 150,239,669 URLs: 1,849,268 Hosts: 8,969 | http://schema.org/Person (9,033,938)http://schema.org/Movie (3,785,906) http://schema.org/ListItem (2,092,367)http://schema.org/AggregateRating (1,498,557) http://schema.org/Place (1,232,216) | 2.14 GB (12) |
Movie (sample) |
lookup_file pld_stats_file |
Museum | Quads: 5,066,224 URLs: 81,583 Hosts: 653 | http://schema.org/PostalAddress (108,572)http://schema.org/ListItem (81,923) http://schema.org/Museum (81,129)http://schema.org/ImageObject (72,825) http://schema.org/OpeningHoursSpecification (63,146) | 68.46 MB (1) |
Museum (sample) |
lookup_file pld_stats_file |
MusicAlbum | Quads: 81,155,565 URLs: 582,473 Hosts: 2,813 | http://schema.org/Country (6,016,664)http://schema.org/Offer (2,290,151) http://schema.org/MusicRecording (2,229,386)http://schema.org/MusicAlbum (1,964,947) http://schema.org/MusicGroup (1,252,222) | 832.36 MB (7) |
MusicAlbum (sample) |
lookup_file pld_stats_file |
MusicRecording | Quads: 115,966,463 URLs: 879,827 Hosts: 5,315 | http://schema.org/MusicRecording (6,362,571)http://schema.org/Country (4,576,815) http://schema.org/Offer (2,499,676)http://schema.org/MusicAlbum (1,372,455) https://schema.org/MusicRecording (1,358,158) | 1.19 GB (9) |
MusicRecording (sample) |
lookup_file pld_stats_file |
Organization | Quads: 40,064,384,727 URLs: 612,884,806 Hosts: 8,025,176 | http://schema.org/ListItem (1,116,154,472)http://schema.org/ImageObject (837,492,697) http://schema.org/Organization (825,432,236)http://schema.org/Offer (451,026,750) http://schema.org/BreadcrumbList (390,105,797) | 639.43 GB (3103) |
Organization (sample) |
lookup_file pld_stats_file |
Painting | Quads: 10,557,884 URLs: 62,182 Hosts: 530 | http://schema.org/Person (2,199,905)http://schema.org/Offer (478,440) http://schema.org/Painting (264,239)http://schema.org/Product (154,817) http://schema.org/ListItem (90,303) | 88.0 MB (1) |
Painting (sample) |
lookup_file pld_stats_file |
Park | Quads: 645,311 URLs: 8,017 Hosts: 337 | http://schema.org/PostalAddress (25,330)http://schema.org/Organization (15,538) http://schema.org/Park (8,573)http://schema.org/ListItem (7,464) http://schema.org/GeoCoordinates (7,252) | 9.85 MB (1) |
Park (sample) |
lookup_file pld_stats_file |
Person | Quads: 25,756,876,296 URLs: 332,386,290 Hosts: 5,567,720 | http://schema.org/ImageObject (603,867,548)http://schema.org/Person (553,148,240) http://schema.org/ListItem (552,470,158)http://schema.org/Organization (273,894,286) http://schema.org/WebPage (271,623,678) | 486.43 GB (1995) |
Person (sample) |
lookup_file pld_stats_file |
Place | Quads: 3,314,731,430 URLs: 26,959,880 Hosts: 536,281 | http://schema.org/Place (84,443,010)http://schema.org/ListItem (69,601,113) http://schema.org/PostalAddress (68,406,696)http://schema.org/Event (51,435,095) http://schema.org/Person (34,851,209) | 46.97 GB (257) |
Place (sample) |
lookup_file pld_stats_file |
Product | Quads: 21,541,073,999 URLs: 279,730,608 Hosts: 3,309,246 | http://schema.org/Offer (749,412,376)http://schema.org/ListItem (500,275,491) http://schema.org/Product (492,109,637)http://schema.org/Organization (279,070,322) http://schema.org/ImageObject (153,495,226) | 315.19 GB (1668) |
Product (sample) |
lookup_file pld_stats_file |
QAPage | Quads: 150,398,856 URLs: 2,328,621 Hosts: 11,113 | http://schema.org/Person (8,306,375)http://schema.org/Answer (6,535,032) http://schema.org/ListItem (2,161,088)http://schema.org/Question (2,116,945) http://schema.org/QAPage (2,000,032) | 3.16 GB (12) |
QAPage (sample) |
lookup_file pld_stats_file |
Question | Quads: 1,632,265,643 URLs: 15,017,687 Hosts: 418,463 | http://schema.org/Answer (59,458,768)http://schema.org/Question (52,840,194) http://schema.org/ListItem (32,375,177)https://schema.org/Answer (21,594,163) http://schema.org/ImageObject (21,100,418) | 29.97 GB (127) |
Question (sample) |
lookup_file pld_stats_file |
RadioStation | Quads: 11,700,578 URLs: 236,879 Hosts: 862 | http://schema.org/ListItem (318,064)http://schema.org/RadioStation (285,623) http://schema.org/NewsArticle (201,603)http://schema.org/ImageObject (161,884) http://schema.org/WPSideBar (123,784) | 197.45 MB (1) |
RadioStation (sample) |
lookup_file pld_stats_file |
Recipe | Quads: 258,355,715 URLs: 2,746,673 Hosts: 37,305 | http://schema.org/HowToStep (8,610,681)http://schema.org/ListItem (5,355,061) http://schema.org/ImageObject (3,430,769)http://schema.org/Person (3,051,928) http://schema.org/Recipe (2,922,483) | 4.43 GB (20) |
Recipe (sample) |
lookup_file pld_stats_file |
Restaurant | Quads: 158,668,167 URLs: 1,186,921 Hosts: 84,257 | http://schema.org/Offer (6,208,726)http://schema.org/MenuItem (3,963,413) http://schema.org/Restaurant (2,969,814)http://schema.org/Product (2,780,583) http://schema.org/ListItem (2,372,384) | 1.79 GB (13) |
Restaurant (sample) |
lookup_file pld_stats_file |
RiverBodyOfWater | Quads: 170,020 URLs: 1,418 Hosts: 25 | https://schema.org/Canal (16,992)https://schema.org/Service (5,580) http://schema.org/ImageObject (2,198)http://schema.org/ListItem (2,022) http://schema.org/TouristDestination (1,746) | 2.85 MB (1) |
RiverBodyOfWater (sample) |
lookup_file pld_stats_file |
School | Quads: 10,072,237 URLs: 187,096 Hosts: 2,099 | http://schema.org/School (291,503)http://schema.org/ListItem (194,016) http://schema.org/PostalAddress (180,528)http://schema.org/Organization (106,718) http://schema.org/ImageObject (95,256) | 163.6 MB (1) |
School (sample) |
lookup_file pld_stats_file |
SearchAction | Quads: 27,878,243,924 URLs: 417,722,788 Hosts: 6,756,347 | http://schema.org/ListItem (1,052,354,014)http://schema.org/ImageObject (653,530,325) http://schema.org/WebSite (433,191,623)http://schema.org/SearchAction (422,554,600) http://schema.org/BreadcrumbList (408,756,058) | 349.64 GB (2160) |
SearchAction (sample) |
lookup_file pld_stats_file |
ShoppingCenter | Quads: 15,255,183 URLs: 135,249 Hosts: 1,345 | http://schema.org/Offer (363,660)http://schema.org/ListItem (251,172) http://schema.org/PostalAddress (249,166)http://schema.org/Organization (238,757) http://schema.org/ShoppingCenter (180,908) | 209.82 MB (2) |
ShoppingCenter (sample) |
lookup_file pld_stats_file |
SkiResort | Quads: 1,173,165 URLs: 28,128 Hosts: 245 | http://schema.org/ListItem (42,596)http://schema.org/SkiResort (38,305) http://schema.org/PostalAddress (24,781)http://schema.org/Person (21,854) http://schema.org/Review (21,440) | 24.25 MB (1) |
SkiResort (sample) |
lookup_file pld_stats_file |
SportsEvent | Quads: 118,761,252 URLs: 801,134 Hosts: 7,213 | http://schema.org/SportsTeam (6,022,913)http://schema.org/SportsEvent (5,824,189) http://schema.org/Place (5,054,962)http://schema.org/PostalAddress (4,570,869) http://schema.org/Organization (1,017,320) | 1.04 GB (10) |
SportsEvent (sample) |
lookup_file pld_stats_file |
SportsTeam | Quads: 99,708,850 URLs: 754,133 Hosts: 4,063 | http://schema.org/SportsTeam (7,166,902)http://schema.org/SportsEvent (2,995,861) http://schema.org/Place (2,388,090)http://schema.org/PostalAddress (2,094,768) http://schema.org/Person (1,310,046) | 953.32 MB (8) |
SportsTeam (sample) |
lookup_file pld_stats_file |
StadiumOrArena | Quads: 14,432,465 URLs: 57,179 Hosts: 256 | http://schema.org/SportsTeam (937,973)http://schema.org/StadiumOrArena (322,770) http://schema.org/SportsEvent (247,964)http://schema.org/SportsMatchCompetitor (247,784) http://schema.org/Organization (231,215) | 123.21 MB (2) |
StadiumOrArena (sample) |
lookup_file pld_stats_file |
TVEpisode | Quads: 29,570,994 URLs: 220,868 Hosts: 1,065 | http://schema.org/Country (3,253,439)http://schema.org/TVEpisode (974,891) http://schema.org/Person (505,805)https://schema.org/TVEpisode (300,012) http://schema.org/TVSeries (213,857) | 306.01 MB (3) |
TVEpisode (sample) |
lookup_file pld_stats_file |
TelevisionStation | Quads: 1,927,396 URLs: 22,721 Hosts: 89 | http://schema.org/ListItem (44,898)http://schema.org/ImageObject (41,683) http://schema.org/TelevisionStation (39,377)http://schema.org/Person (26,370) http://schema.org/WebPage (24,917) | 29.75 MB (1) |
TelevisionStation (sample) |
lookup_file pld_stats_file |
In case you are interested in a particular class or set of classes which is not listed above, please get in contact with the WebDataCommons team via Mailing List or our Google Group.
We provide the extracted data for download using a variation of the N-Quads format. For users who prefer other formats, we provide code for converting the download files into CSV and JSON formats, which are supported by a wide range of spreadsheet applications, relational databases and data mining frameworks like the python data analysis library pandas. Please find further details on how to convert the download files to other formats on the main page.
The jupyter notebooks used to create the schema.org subsets from the MD and JSON-LD corpus can be checked out from our Git repository.
The extraction of December 2024 was done with version 1.5 of the extractor. For more information about the framework and a detailed description how to run a own extraction visit the framework page.
Please send questions and feedback to the Web Data Commons mailing list or post them in our Web Data Commons Google Group.