This site lets users to execute full-text queries to search Google's C4 Dataset. Our hope is this will help ML practitioners better understand its contents, so that they're aware of the potential biases and issues that may be inherited via it's use.
The dataset is released under the terms of ODC-BY. By using this, you are also bound by the Common Crawl Terms of Use in respect of the content contained in the dataset.
You can read more about the supported query syntax
here
. Each record has two fields, url
and text
, both of
which are searchable. The fields are indexed using the
Standard analyzer,
which means you can't search for punctuation.
-
https://brainbaking.com/
Brain Baking: transforming personal thoughts about thoughts into well-digestible material. The reflective aroma of burnt nervous tissue. Includes a crispy crust of relations between technology , philosophy and the world .
-
https://brainbaking.com/about/
Professionally, I am a PhD researcher at the Faculty of Engineering Technology, KU Leuven. I have been an experienced software engineer for 11 years before that, taking on various roles from agile coaching to technical lead. I am not only interested in technical software development, but also in what happens at non-cognitive, human level when developing software together. I used to be only good at programming because I thought, as a Computer Scientist, you specialize instead of generalize. But the more I worked with computers, the more my hands itched to do something else. So nowadays I love to go wide and pass on that enthousiasm for knowledge on any level. Test your stuff before writing your code! I’m a heavy Test Driven Development (TDD) fan and I bark at those who don’t. I teach agile software engineering techniques, in both industry and academia. Pair programming used as a tool to learn from each other and to improve code quality are two values I firmly believe in. Take a look at my Github account or Curriculum Vitae in Dutch. I hold a professional bread baker’s degree so naturally I love to think of myself as a real baker. My passion within baking is sourdough bread, spreading the word by organizing workshops to repopularize it’s use. I’m a fountain pen addict and avid journaler. This website2 is the ideal base for writing down my thoughts about virtually anything, primarly intended to amuse myself and not others. I like to integrate Philosophical and Psychological approaches into my research. I’m just starting to learn how to use ink other than for writing purposes. I used to be almost exclusively a fantasy reader. Now I mostly read non-fiction on the most diverse topics, but I do have a soft spot for things like philosophy, art, mindful food and software engineering.
-
https://brainbaking.com/post/healing-creative-scars/
Want to skip to the practical part? Are you aware of your problems? Once upon a time in a land not very far away (in fact, it’s the very same we live in), a small boy and his Gameboy grew up. He didn’t have many hobbies: gaming. He didn’t have many friends. Playing outside is okay as long as the 4 AA batteries are fully charged and the sun isn’t shining too bright and the playing is confined to one square meter. As the boy grew up, his taste for music didn’t broaden: it narrowed. Teachers said he couldn’t draw or run or swim properly, so he naturally assumed he couldn’t draw or run or swim properly. Twenty years later, that boy who became a man still has a hard time unthinking he cannot draw or run or swim. That is called a creative scar and can be very deadly. If someone ever says to you “haha, you can’t X”, chances are that you’ll tend to believe that person. Effects? Next time you’ll think twice before doing X again. No way someone will laugh at you for doing something silly, right? So forget X, let’s try Y instead. But why should you? Because one or two persons who have no idea what X is all about claim to know if you are good at it or not? Unconsciously incompetent: The individual does not understand or know how to do something and does not necessarily recognize the deficit. There are a lot like-minded diagrams or flow charts or resumes out there that try to tell you the same. The Shu, Ha, Ri learning principles of martial arts explained here assume that you are willing to improve meaning you’d already be in the consciously incompetent stage. Okay, so how do you move from the bottom part of the pyramid all the way to the top? That’s where self improvement comes into play. Lists. I love lists. Your favorite (gameboy) game? Top 5 good food! What about movies? Birthday wanted lists etc etc. Regrets. Too bad I don’t have many friends to play boardgames with. Wishes. I’d love to run 10km once. Wow, that seems highly unlikely, as I remember someone said I can’t run. Still… Would be cool. Like to visit Japan someday. Or walk on the Great Wall of China, why not. Brain dumps. A Momentary Lapse of Reason. How I feel on a given moment. Somehow, thanks to lots and lots and lots of writing, I started to become more aware of my own thoughts and feelings. In that time I read a lot of self help books - 9 out of 10 were junk, but I kept small summaries in my notebook. Somehow, some quotes or key giveaways still lingered in my extended analog memory, ready to be reread and executed. I always liked to write - otherwise I wouldn’t be writing this article right here - but using a pen and ink to write down thoughts felt great. I finally had a way to track what I was doing or should be doing. Of course concepts like journaling aren’t new at all: Roman emperor and philosopher Marcus Aurelius wrote down how he should behave centuries ago. Great minds like Wittgenstein, John Locke and Seneca also loved writing. The medium was and is a great fit for rapid note taking - whether it’s ideas or feelings, that doesn’t really matter. As long as you do something with it afterwards. Or not - that’s okay too. So I did not try to reinvent the wheel: I merely looked and copied what worked for me. That very same thing might not work for you at all - you might be one of those digital whizz kids that like to use Evernote or Google Keep. I’m more of an analog guy: it enables me to sketch, write, make mind maps and paste pictures. But what should you do with your piece of work? My favorite American Timothy Ferris has the answer to that. It’s called “quantify yourself”: don’t just gather data but also normalize it, tune it, change yourself, push yourself, re-gather data and analyze it further. Since the sudden increase in popularity on smart watches, it’s rather easy for anyone to keep track of their health. Steps per day, sleep cycle, calories intake, you name it and your trusty watch will track it for you. There’s even a conference on Quantified Self full of (technology) enthusiasts eager to share their experience with recording and adjusting stuff to make the world a better place. But tracking itself isn’t enough. Tim Ferris monitors his own body in extreme detail: after taking vitamins, before and after a workout, … Then he alters his daily routine. It’s called modern self optimization. Tim is all about improving yourself and he even hosts an extreme successful podcast called the Tim Ferris Show. Subjects as “Maximizing Strength, Improving Mindset, and Becoming the World’s Fittest Man” are not uncommon on the show, it’s certainly worth checking out. Tim is also a fan of “morning pages” in the morning: 5 minutes of writing in your journal before you do anything else. Except for getting out of bed (and maybe meditating) of course. It is possible to become world-class in just about anything in six months or less. Armed with the right framework, you can seemingly perform miracles, whether with Spanish, swimming, or anything in between. That is a powerful way of saying you should have a system that holds those ideas. In GTD, you re-read your captured ideas once or twice a week and decide what to do with them. Did you change your mind? Scratch those notes or throw away the loose page. Did you connect one idea with another, but is it still ripening like a nicely aged bottle of wine? Then connect them and add additional thoughts. But again, the key part is re-reading and re-prioritizing everything. If something you think about involves work that is less than 10 minutes, you have the option to execute it right there instead of taking note of it. Those notes in my journal set me on track to identify what I wanted to do the most and helped me actually realize them. I’ll list some examples here because I like to list stuff, not because I like to brag what I did. Maybe I also do, but it’s to show you what a person is capable of - if you have the willpower to sit through some emotional and physical pain. My P.E. teacher used to laugh at my pathetic attempts to run. Now I run 10km. The only books I loved to read were fantasy to escape from this world. Now I run a lot of non-fiction books on any subject. I decided that I should finally learn to draw. After falling in love with sourdough bread, I decided to follow a three year long night class to become a professional baker, and did an intensive internship, combined with my full-time job. These examples might sound trivial to you, but mean the world to me. Everything wasn’t a simple “okay let’s do that” thought but came up organically by tracking what I was thinking on a given moment. When I notice I complain a lot about my bad drawing skills, I might - finally - do something about that. Before journaling, I simply had no idea! One of the most influential (on a personal level) books I’ve read is Where good ideas come from by Steven Johnson. One particular chapter struck me as very helpful. Steven talks about the collision of ideas by pairing them up “by accident”. It’s called serendipity. Imagine going to the library with one or two books in your mind to search for. When you enter the library, a few books are displayed as ‘new’ and you pick up one of them and start reading the cover. Browsing like that makes you connect the dots and eventually stumble upon new, interesting and exciting stuff. I usually go back home with other books than I intended to and never regret the decision after reading them. The same is true for your bag of ideas that you call your journal. The more ideas that are compressed together on the A5 pages (in my case), the higher the chances that you’ll come up with something new based on those ideas. Again, one clear requirement is re-reading what you’ve written down. Some people like order and try to keep separate journals on ideas for the house, things from work and emotional thoughts. I am not one of those: I like my journals to be as organic as possible: let it grow, I say. I write whatever comes to me, and the next day the next thought is written down beside the previous one. Everything underneath each other. That might not work for you - just try some things and see what you like. Digital tools make tagging and searching a lot easier but require you to keep your cellphone closeby. I just happen to like fountain pens a lot more than cellphones. I like cooking. After a few journals, I like cooking and bread baking and fermenting and I’m now a vegetarian. I like drawing. After a few journals, I would like to know more about book binding, calligraphy and turning my own wooden fountain pen. I like learning. Now I like teaching, philosophy and Buddhism. Thanks to a chain of new ideas and things to try out, you get to discover new ideas and things to try out. It’s an adventure, and it never stops. More on learning how to learn in the earlier samurai learning mindset blog post. Chances are you’ve already heard from Christopher Avery’s Responsibility Process ™. I won’t go into detail here but the model consists of different components that actually match the competency model I mentioned above: there’s denial (you’re not aware of any problems), there’s justify/blame (you’re aware but not willing to do something about it), there’s obligation/shame (you’re doing something about it because you have to) and finally there’s responsibility. Writing down thoughts feels like stepping through that model - at least in the beginning, it does. You’re not sure you’re ready to admit that you might benefit from writing at all. But after your first or second small success you feel obliged to continue. Being in contact with the God inside you, and serving him honestly. There are a lot of techniques involved in efficient journaling I like to save for a future blog post to be linked here. (Update: it’s here!) But in practice, it all boils down to just starting to write. Do what works for you. If you decide to buy a journal, consider scanning in pages after it’s full. I archive everything in Evernote but that requires tagging manually and soaks op a lot of hours. GTD is a nice way to keep track of things to do - if you’re the kind of person who likes TODOs. I also like pasting pictures, doodling and sketching and even scrap-booking. Whatever that works. Travel journals, details of book reviews, course summaries you’ve followed, loose notes, gardening schemes, … It’s all there waiting for you to discover. Have fun connecting the dots!
-
https://brainbaking.com/post/development-principles-in-cooking/
A lot of people seem to think I’m the kind of chef who uses loads and loads of ingredients, combining and layering without thinking twice. We were having a discussion about what to cook for dinner this evening. It’s ‘donderdag veggiedag’, an initiative from the Belgian EVA VZW to eat a vegetarian meal each thursday, and since I’m a vegetarian, it’s generally accepted that I should know a lot of good recipes. That got me thinking. Do I think of myself as a ‘complicated’ chef (sounds cooler than cook)? The answer is: I don’t. I think I used to. When I started cooking for myself 10 years ago, I did my best to live up to my father’s expectations to prepare intricate meals with lots of different steps to prepare. But as I had no idea what I was doing, I continually failed at creating something decent. I did not hesitate to create my own Thai curry paste, without even knowing what the required ingredients where. I bought a bag of 2 kilo dried hot peppers from the local asian market without even knowing I would ever use 4 of those peppers. I read and loved cookbooks like Yottam Ottolenghi’s Plenty but couldn’t keep up with the staggering amount of fresh herbs required for each recipe. Years passed. My cooking behaviour changed. I started to get the hang of it. I learned to learn about cooking, the meta-cooking, the ‘deep stuff’. I got into fermenting, creating simple things myself, creating everything myself and knowing what I’m doing and why I’m doing it. I started being addicted to food related (the why) books and stopped reading cookbooks (the how). So, my style evolved, as it usually does when you’re doing something for quite some time with deliberate practice, with attention to detail. Today, after that weird discussion ending, it actually hit me: you can apply several software development principles to (my) cooking style(s). Instead of throwing every herb and spice you have at something, be more picky. Think about what will go well with that one ingredient. Is pepper really needed here? Should you even add cheese, doesn’t that hide the taste of your main ingredient? Keep it simple. I made something really simple for dinner: tortilla’s. That would require something nice spreadable and some topping. I bought some local (another thing to focus on) buffalo mozzarella from the Ardennes and still had a cooked beetroot. BAM. Well, not really. I added a fermented shallot to create a sour tang and - of course - some salt. Because it had to be spreadable and because I happen to love olive oil, that was also added. Three ingredients. (Salt and oil doesn’t count as an ingredient in my kitchen) That’s it. Okay, I’m lying, fermenting a shallot requires a week of undevoted attention. (It actually does not…) I like layering simple things to create something complex, but still in essence, simple. I made something really simple this morning: a REST API. I made something really simple this evening: a beetroot spread. You Aint Gonna Need It (YAGNI) is a variant of KISS. You won’t be needing those nuts in a pesto if you’re using a lot in pasta, you won’t taste that anyway combined with the hard cheese. I’ve found it a lot better if you toast the nuts whole and add them to the meal as a finisher, instead of grinding them in the pesto. Leave out the nuts when making a pesto as pizza base, that’s going to be expensive and you won’t be tasting it. Using a lot of the same tasting ingredients can ruin the whole meal. Simple examples are something too sour (lemon and too much acidity) or too sweet (strawberries with honey and sugar). Of course, fervent sweet lovers will disagree on this one. But it’s a handy rule when combining ingredients. Don’t add too much vinegar and a lot of lemon. Don’t add loads of pepper combined with chilli. Peter Reinhart’s sacred Quest for the Best pizza let him to believe you should use a very limited amount of core ingredients on your pizza, let’s say 4. You can include the sauce as one of those ingredients, or not - that’s up to you. But don’t exceed that number or you risk the chance of tasting “pizza” in general and not “smoked bell pepper” and “mozzarella”. I used to throw as much stuff on the pizza base as I could - one wouldn’t want to be hungry afterwards, right? The crust couldn’t even get baked well because of all the moisture. I think you can extrapolate any general programming principle to cooking principles if you’re a bit creative.
-
https://brainbaking.com/post/teaching-oo-with-gba/
C++ and a GBA engine. Let's learn to create a game! Electrical Engineering students have to work through a programming course in their third year at KU Leuven, a course called ‘Software Design in C/C++’. This course is one of the things I inherited from my retired colleague when I started working for the University. As is the case with most programming courses, it’s contents was boring as hell. So, instead of simply making minor adjustments to the syllabus and calling it a day, in the summer of 2018 I decided to throw everything in the begin and start over - hooray, a greenfield project! This was one of the rare opportunities for me to do so as most other courses are taught together with others. This one wasn’t, I also had to lecture theory and administer exams. coincidentally, also in that very same summer, I re-found my love for the retro Game Boy, and started wondering how to program games for it. My knowledge of C and C++ was limited to a few years of practical use in the industry, working on administrative Windows MFC applications. I got to work, by scanning documentation, cursing and swearing a lot, and ultimately learning how to create games for the Game Boy Advance. Most if not all GBA games are written in C. As much as I like simple elegance, I did opt for C++ instead, because I wanted to bring in object-orientation and unit testing the way I was used to these, like in C# and Java. That also means getting used to the ugliness of C++. Oh well. There is no OS. Nothing is managed. Do it yourself. Hardware interaction is done through address memory mapping. That means reading and writing byte streams to hard-wired pointers. Graphics is another tough nut to crack, using optimized sprites, a shared palette, and other techniques needed because of the limited (hardware) space. All those things are extras, the main point of the course is to learn object-oriented development, close to the hardware. Most students had difficulties enough with the C++ syntax. They did complete another programming course, software design in Java, but that seemed to be long lost and forgotten, as is usually the case with hard working students. The syllabus is accessible through the links, although in Dutch. God I love that game. I’ll gladly take every opportunity I have to look and/or play it. The sprite engine does the heavy lifting in terms of image memory allocation and storage, and provides some abstract concepts for sprites/backgrounds/music/scenes. But not before we’ve seen these in the labs ourselves (lab 1-4). Take a look at the GitHub documentation of the engine to get a better picture on the included features. (For the most part) Unit tested and all. It is cross-platform compatible, as the GBA is actually an ARM machine, you’ll be needing a cross-compiler from the DevKitPro toolchain. After the oral defense of their game, students completed a short survey that helped me assess what to do with the course during the next academic year. Most students were very enthusiastic regarding the inclusion of the GBA, compared to another dull set of assignments. They also responded positively to the question whether the Game Boy could be used in other courses as well, such as hardware architecture design, or (advanced) chip design (using an FPGA). I’m happy they liked it, and although the theoretical exam results were not that promising, the practical projects were. We will be using more or less the same course contents next year. After academic year 2019 - 2020, the course will sadly disappear into two other courses: the C hardware-related part will merge into a more OS-oriented course, and the C++ object-oriented design part will merge into a software engineering-oriented course. Both new courses still need to be planned and implemented. That means there’s still a chance for me to sneak in Castlevania, right?
-
https://brainbaking.com/post/handing-over-enough-when-inspiring/
Are you handing over enough when inspiring someone? What info to convey, when to stop? The other day, I was having a discussion with a friend and colleague about reaching out to others. He had an idea on combining patterns learned from the enterprise software development world (clean code, TDD, domain driven design, you name it) with patterns learned from the gaming development world (rapid prototyping, getting stuff done, intensive usage of frameworks like Unity). An excellent idea if you ask me. But he was hesitant - others might not be that interested in taking time to write unit tests in their game. That’s called an assumption. So I proposed to write small blog posts in hugo using Markdown and GitHub pages. Very lightweight, only one command away from generating concise entries. This might take 10 minutes of your time per week. So he nodded. “Hm-hmm”. And we started talking about something else. Interrupt the conversation and push further. Ignore it for now and take a mental note do do something with it later. Often enough I’ve hit a brick wall with option one. It’s annoying for the other person to constantly be interrupted and I wouldn’t make myself very popular. I do have that nasty habit and I’m working on that. But let’s return to option two for now. I forgot about the discussion and moved on. But taking mental notes is like async AJAX calls: they do return, but you don’t know when. That’s when it hit me: why don’t I email him some details on how to quickly start a blog? Would that be appropriate or not? In any case he’s free to delete the email - no harm done. But if he decides to do something with it, all the better. The step from no blog and no experience in Hugo is smaller when a friendly push in the right direction has been given. One could say “what have you to gain? Why do you care?” And I don’t - I’m not a game developer, but I do care - a bit - about the subject. I do care - a lot - about my friend and his personal growth. Show your work, remember? I love to inspire others on things that I’m passionate about myself. And sharing is caring. I would love to receive an email with loose ends like this from someone I talked to. So my decision to throw some info over the wall has been made. I’m eager to find out if the seed has been planted and nurtured. We’ll have to see. I will not be disappointed if it has not - giving and expecting something in return is not giving. I love giving. Altruism must have originated from the egocentric self, some believe. I would not count myself to be part of that group. The Japanese encounter a lot of stress if they are given a gift: it has to be matched in value to give something back which has the same value: not less (shame) or more. It’s like trying to buy presents for your family at Christmas: a sometimes horrifying experience. You do not know the value of the gift you’ll be given and you’ll have to guess how much to spend to give. Of course, handing out information - which should be free in the first place - is not the same as handing out something physical. True giving is expecting nothing in return. True giving is wanting nothing in return!