This site lets users to execute full-text queries to search Google's C4 Dataset. Our hope is this will help ML practitioners better understand its contents, so that they're aware of the potential biases and issues that may be inherited via it's use.

The dataset is released under the terms of ODC-BY. By using this, you are also bound by the Common Crawl Terms of Use in respect of the content contained in the dataset.

You can read more about the supported query syntax here . Each record has two fields, url and text, both of which are searchable. The fields are indexed using the Standard analyzer, which means you can't search for punctuation.