I have lately recentered my activity on research design and user research giving that it corresponds better to my field of expertise with an academic background in social sciences, computering and social uses of the internet. Even if I have heen enjoying what I have been doing in the past, I decided I will give less priority to projects mostly focused on the technical side as tools to provide interfaces + services development. I have improved my coding and design abilities but it’s definitely not what I want to do in the future, I want to create nice, beautiful and pragmatical connected objects that will be useful for the future of us, respectful of people’s needs and well integrated in our environment.
I will also take some time out of dead end productivism to keep traveling, researching new topics of interests and write creatively.
Consequently, I will have no spare time for writing regular posts giving the huge amount of time it takes to research, write then translate and publish. But I will probably have informations to share about my entrepreneurship venture. So if you’re interested in design research methods and social uses of the internet, you can subscribe to my newsletter (in french then I will see if it deserves a translation for international audiences!).
In this last post, I will then talk about one of my favorite subject : ethics and the danger of web scraping that I consider responsible for the loss of good practices in business and for the impoverishment of quality contents.
Web scraping or web craping ?
Web scraping is bullish. A circular economy will never be sustainable with mere copy pasting. You have always to bring something in exchange of what you take. Or you’ll disappear in the long run because you’re not able to scale in autonomy, you’re not being resilient. You will remain a lost child and never become a fulfilled and accomplished grown-up.
Good things are long to grow and being creative is not about the product you manage to create but about the enrishment you get on the way of the journey.
Web scraping kills both the message and the messenger. It draws on the path of a necropolis.
If it allows markets to get shortened access to data, it also brings on the table serious ethical considerations that are rarely envisioned. The capacity to appropriate in a short amount of time massive quantity of informations about users, UGC (User Generated Content), with no collective purposes or objectives is not making society knowledgeable or online real stories profitable to everyone.
Following me, fastened processes to get information access is not the best way to foster commitment, trust and social creativity with the public. There are many reasons why it’s important to take the time to reflect out of direct operationalization (even if operationalization might be useful in emergency times, we’re not into that state of permanent urgency either yet).
Primo, as web scraping is the technical act of extracting and downloading data from any websites you can connect to, technically accessible with a single url by using free extraction tools available online, as, say, import.io, which allows to access any public data and to store it on your computer in a table or a spreadsheet within a couple of hours, it’s not socially trivial.
This kind of technique allowing to extract incomplete data with missing links is approximative and unprofessional. Better to use a full design protocole that won’t cost you more than investing huge amounts of money on an app that won’t be problem solving. Invest in the brain behind machines not in the machine itself.
Deuxio, this technique might cause a lack of fit between interpretations and data because the question of « What is to be done next with that ? » is rarely answered. Extracting massively will never give access to knowledge. Knowledge processes result from the researcher/team investment. Nothing more complicated. You can be a casual soccer player and play with your friends in your background or you can play for the Olympics and train everyday.
Many corporations see the extraction of data as a way though to reduce their costs. By doing so, they deny the limit of an approach where the stages of the process are reduced, which prevent the personal and collective enlightment gained through the act of digging meaning from heteregenous digital fieldworks.
Considering you are unaware of the problem and extracting data innocently (after all, these are public data), it’s important to have a clear idea about what you want :
Answering the needs of society at a large by giving oneself time to contribute meaningfully is the best and honest way to answer the problem.
Tertio, innovative design comes from a collaborative culture, so you need to remain genuine, benevolent and desinterested or biased with personal interests that will only benefit yourself. By blindly collecting any type of sensitive data as private users data, you manipulate your object, give false models to satisfy some obscure financial plan. You’re not acting for the profit of the society as a whole. Subjects might be hurt when the commitment is low as no one feels accountable for answering public claims and call for data transparency.
Industry players are tracking users data without any ethical procedures and they do hurt. Don’t expect people to get engaged in cooperation this way, they will rather run away.
In fact, data extraction « at the wild » that is not part of a self-concious practice is undesirable as it might damage internet culture for ever. Ask for consentment when you directly interact for scientific purpose. Don’t presume on people’s agreement. Keep yourself accountable by sharing as much as you take. And don’t be that spin doctors that everybody fears. By denying ethical methods and procedures, your reputation, you legitimacy and your authority as a leader are at stake.
Dataification is not everything. Refinement is also a key process. Numbers have their limits. It’s not about the number of people liking or seeing you, it’ss about who is watching and hearing you thoroughly and what impact you get with your science.
The main issue we people face with web scraping is the risk of analyzing everything, anywhere, in a rush against the clock that can later destroy the internet culture of trust and the internet science itself as well as the structure of web economy.
In a society that is going to experience scarce ressources, it is no time for wasting precious ressources.
Ce(tte) œuvre est mise à disposition selon les termes de la Licence Creative Commons Attribution – Pas d’Utilisation Commerciale – Partage dans les Mêmes Conditions 4.0 International.