I am an Associate Professor (UHD) at the User-Centric Data Science group at the Computer Science department of the Vrije Universiteit Amsterdam (VU) I am also a co-director of the Cultural AI Lab. In my research, I combine (Semantic) Web technologies with Human-Computer Interaction, Knowledge Representation and Information Extraction to tackle research challenges in various domains. These include Cultural Heritage, Digital Humanities and ICT for Development (ICT4D). I am currently involved in the following research projects:

  • HEDGE-IoT: IoT data conversion and enrichment; user-centric and explainable machine learning
  • HAICu: Perspective-aware AI to make digital heritage collections more accessible.
  • InTaVia: making linked cultural heritage and biographical data usable for end-users
  • Pressing Matter: developing data models to support societal reconciliation with the colonial past and its afterlives.
  • Interconnect: machine learning on IoT and smart energy knowledge graphs 
  • Hybrid Intelligence: Augmenting Human Intellect
  • CARPA: responsible production using crowdsourcing in Africa

For other and older research projects, see the “research” tab.

HEDGE-IoT project kickoff

The HorizonEurope project HEDGE-IoT started January 2024. The 3.5 year project will build on existing technology to develop a Holistic Approach towards Empowerment of the DiGitalization of the Energy Ecosystem through adoption of IoT solutions. For VU, this project allows us to continue with the research and development initiated in the InterConnect project on data interoperability and explainable machine learning for smart buildings.

Researchers from the User-Centric Data Science group will participate in the project mostly in the context of the Dutch pilot, which will run in Arnhems Buiten, the former testing location of KEMA in the east of the Netherlands. In the pilot, we will collaborate closely with the other Dutch partners: TNO and Arnhems Buiten. At this site, an innovative business park is being realized that has its own power grid architecture, allowing for exchange of data and energy, opening the possibility for various AI-driven services for end-users.

VU will research a) how such data can be made interoperable and enriched with external information and knowledge and b) how such data can be made accessible to services and end-users through data dashboards that include explainable AI.

The image above shows the Arnhems Buiten buildings and the energy grid (source: Arnhems Buiten)

Share This:

SUMAC keynote on Knowledge Graphs for Cultural Heritage and Digital Humanities

I was honored to be invited as a keynote speaker for the 5th edition of the SUMAC 2023 workshop (analySis, Understanding and proMotion of heritAge Contents) held in conjunction with ACM Multimedia in Ottawa, Canada. In the keynote, I sketched how Knowledge Graphs as a technology can be applied to the cultural heritage domain with examples of opportunities for new types of research in the field of digital humanities specifically with respect to analyses and visualisation of such (multi-modal) data.

In the talk, I discussed the promises and challenges of designing, constructing and enriching knowledge graphs for cultural heritage and digital humanities and how such integrated and multimodal data can be browsed, queried or analysed using state of the art machine learning.

I also addressed the issue of polyvocality, where multiple perspectives on (historical) information are to be represented. Especially in contexts such as that of (post-)colonial heritage, representing multiple voices is crucial.

You can find the complete abstract of my talk here and the (compressed) presentation slides itself below.

Share This:

Best NIAA project award for VR project

The award for the Best Network Institute Academy Assistant project for this year goes to the project titled “Between Art, Data, and Meaning – How can Virtual Reality expand visitors’ perspectives on cultural objects with colonial background?” This project was carried out by VU students Isabel Franke and Stefania Conte, supervised by Thilo Hartmann and UCDS researchers Claudia Libbi and myself A project report and research paper is forthcoming but you can see the poster below.

Share This:

HAICu project funded

It has pleased NWO to award the HAICu consortium through the National Research Agenda programme. In the HAICu project, AI researchers, Digital Humanities researchers, heritage professionals and engaged citizens work together on scientific breakthroughs to open, link and analyze large-scale multimodal digital heritage collections in context.

At VU, researchers from the User-Centric Data Science group will research how to create compelling narratives as a way to present multiple perspectives in multimodal data and how to provide transparency regarding the origin of data and the ways in which it was created. These questions will be addressed in collaboration with the Museum for World Cultures on how citizen-contributed descriptions can be combined with AI-generated labels into polyvocal narratives around objects related to the Dutch colonial past in Indonesia. 

Share This:

A look back at the HHAI2023 Doctoral Consortium

The next generation of Hybrid-Human-AI researchers are here! As part of the second International Conference on Hybrid Human-Artificial Intelligence that was held in June in Munich, German, myself and Amy Loutfi of Örebro University organized a doctoral consortium. We put out a Call for Papers asking for early to late stage PhD candidates on the topic of Hybrid Human-AI research to submit their research proposals. We received 10 submissions and after a smooth peer-reviewing process we were able to invite 8 participants to the workshop in Munich.

A really nice room for a really nice symposium

The workshop started with a great keynote by Wendy Mackay of Inria, Paris-Saclay, and the Université Paris-Saclay. Wendy is a great authority on Human-Computer Interaction and the relation of that field to Artificial Intelligence and she gave a great talk about the importance of being sensitive to both ends of the AI-HCI scale.

Wendy Mackay

Next, the participants presented their research (plans) in 20 minute presentations, with plenty time for questions and discussions. We were joined by multiple members of the community who provided interesting comments and discussion items after the talks. Each presenter was paired with another participant who would lead the discussion following the presentation. All in all my impression was that this set-up lead to a fruitful and nice atmosphere for in-depth discussions about the research.

The participants of the Doctoral Consortium (from left to right: Anastasiya Zakreuskaya, Johanna Wolff, Dhivyabharathi Ramasamy, Cosimo Palma, Regina Duarte, Victor de Boer, Wendy Mackay, Azade Farshad, Amir Homayounirad, and Nicole Orzan).

Below you find some pictures of the day. The entire programme, including (most of) the papers can be found on the HHAI conference web page. The papers are published by IOS press in the proceedings of the conference: Augmenting Human Intellect.

On behalf of Amy as well: Thank you Azade Farshad, Johanna Wolff, Regina Duarte, Amir Homayounirad, Anastasiya Zakreuskaya, Nicole Orzan, Dhivyabharathi Ramasamy, Cosimo Palma and Wendy Mackay for making the DC work. Thanks as well to the wonderful organization team of HHAI2023 to make everything run so smooth!

Share This:

DHBenelux2023 trip report

Two weeks ago, I visited the 2023 edition of the Digital Humanities Benelux conference in Brussels. It turned out this was the 10th anniversary edition, which goes to show that the Luxembourgian, Belgian and Dutch DH community is alive and kicking! This years gathering at the Royal Library of Belgium brought together humanities and computer science researchers and practitioners from the BeNeLux and beyond. Participants got to meet interesting tools, datasets and use cases, all the while critically assessing issues around perspective, representation and bias in each.

On the workshop day, I attended part of a tutorial organized by people from Göttingen University on the use of Linked Data for historical data. They presented a OpenRefine and WikiData-centric pipeline also including a batch wikidata editing tool https://quickstatements.toolforge.org/.

The second half of that day I attended a workshop on the Kiara tool presented by the people behind the Dharpa project. The basic premise of the tool makes a lot of sense: while many DH people use Python notebooks, it is not always clear what operations specific blocks of code map to. Reusing other peoples code becomes difficult and reusing existing data transformation code is not trivial. The solution of Kiara is an environment in which pre-defined well-documented modules are made available so that users can easily, find, select and combine modules for data transformation. For any DH infrastructure, one has to make decisions in what flexibility to offer users. My hunch is that this limited set of operations will not be enough for arbitrary DH-Data Science pipelines and that full flexibility (provided by python notebooks) will be needed. Nevertheless, we have to keep thinking on how infrastructures provide support for pipeline transparency, reusability and cater to less digital literate users.

On the first day of the main conference, Roeland Ordelman presented our own work on the CLARIAH MediaSuite: Towards ’Stakeholder Readiness’ in the CLARIAH Media Suite: Future-Proofing an Audio-Visual Research Infrastructure. This talk was preceded by a very interesting talk from Loren Verreyen who worked with a digital dataset of program guides (I know of similar datasets archived at Beeld and Geluid). Unfortunately, the much awaited third talk on the Distracted Boyfriend meme was cancelled.

Interesting talks on the first day included a presentation by Paavo Van der Eecken on capturing uncertainty in manually annotating images. This work “Thinking Outside of the Bounding Box: A Reconsideration of the Application of Computational Tools on Uncertain Humanities Data” and its main premise that disagreement is a valuable signal are reminiscent of the CrowdTruth approach.

A very nice duo-presentation was given by Daria Kondakova and Jakob Kohler on Messy Myths: Applying Linked Open Data to Study Mythological Narratives. This paper uses the theoretical framework of Zgol to back up the concept of hylemes to analyze mythological texts. Such hylemes are triple-like statements (subject-verb-object) that describe events in text. In the context of the project, these hylemes were then converted to full-blown Linked Open Data to allow for linking and comparing versions of myths. A research prototype can be found here https://dareiadareia-messy-myths.streamlit.app/ .

The GLOBALISE project was also present at the conference with presentation about the East-Asian shipping vocabulary and a poster.

https://twitter.com/victordeboer/status/1664279204823986184

At the poster session, I had the pleasure to present a poster from students of the VU DH minor and their supervisors on a tool to identify and link occupations in biographical descriptions.

VU DH Minor students’ poster https://twitter.com/victordeboer/status/1664199079251832832

The keynote by Patricia Murrieta-Flores from University of Lancaster introduced the concept of Cosmovision with respect to the archiving and enrichment of (colonial) heritage objects from meso-America. This concept of Cosmovision is very related to our polyvocality aims and the connection to computer vision is inspiring if not very challenging.

It is great to see that DHBenelux continues to be a very open and engaging community of humanities and computer science people, bringing together datasets, tools, challenges and methods.

Share This:

Digital Humanities in Practice 22-23

As part of the VU Digital Humanities and Social Analytics Minor, this year we again had students do a capstone project in January to show off their DH and SA skills and knowledge. The students were matched with researchers and practitioners in the field to tackle a specific challenge in four weeks. We again thank these wonderful external supervisors for their effort. The students’ effort resulted in really impressive projects, showcased in the compilation video below.

In total, nine projects were executed and we list the titles and hosting organisations below.

Reception of Dutch films by critics and film fansUvA-CREATE
Rethinking provenance through networksLeiden University-Humanities
Gender and Facial RecognitionVU Amsterdam-Computer Science
Impact measurement in VRUTwente
Locating Press PhotosNIOD
Exploring Music Collections through data stories, exploratory interfaces and innovative applicationsNetherlands Institute for Sound and Vision
Predicting news headlines tests: What makes users clickVU-Social Science
Semi-Automatic Refinement of Historic OccupationsVU and IISG
1000 bombs and grenadesNetwerk Oorlogsbronnen

Share This:

Explainable AI using visual Machine Learning

The InterConnect project gathers 50 European entities to develop and demonstrate advanced solutions for connecting and converging digital homes and buildings with the electricity sector. Machine Learning (ML) algorithms play a significant role in the InterConnect project. Most prominent are the services that do some kind of forecasting like predicting energy consumption for (Smart) devices and households in general. The SAREF ontology allows us to standardize input formats for common ML approaches and that explainability can be increased by selecting algorithms that inherently have these features (e.g. Decision Trees) and by using interactive web environments like Jupyter Notebooks a convenient solution for users is created where step by step the algorithmic procedures can be followed and visualized and forms an implementation example for explainable AI.

Read more, and watch our live demonstration video on the InterConnect project page.

Share This:

Simulating creativity in GANs with IoT

[This blog post is based on the Artificial Intelligence MSc thesis project from Fay Beening, supervised by myself and Joost de Boo, more information can be found on Fay’s website]

Recently, generative art has been one of the fields where AI, especially deep learning has caught the public eye. Algorithms and online tools such as Dall-E are able to produce astounding results based on large artistic datasets. One class of algorithms that has been at the root of this success is the Generative Adversarial Network (GAN), frequently used in online art-generating tools because of their ability to produce realistic artefacts.

but, is this “””real””” art? is this “””real””” creativity?

To address this, Fay investigated current theories on art and art education and found that these imply that true human creativity can be split into three types: 1) combinational, 2) explorative and 3) transformative creativity but that it also requires real-world experiences and interactions with people and the environment. Therefore, Fay in her thesis proposes to combine the GAN with an Internet of Things (IoT) setup to make it behave more creative.

Arduin-based prototype (image from Fay’s thesis)

She then designed a system that extends the original GAN with an interactive IoT system (implemented in an Arduino-based prototype) to simulate a more creative process. The prototype of the design showed a successful implementation of creative behaviour that can react to the environment and gradually change the direction of the generated images.

Images shown to the participant during the level of creativity task. Images 2 and 6 are creative GAN generated images. Images 1 and 5 are human-made art. Images 3 and 4 are online GAN generated art.

The generated art was evaluated based on their creativity by doing task-based interviews with domain experts. The results show that the the level to which the generated images are considered to be creative depends heavily on the participant’s view of creativity.

Share This:

Representing temporal vagueness on the
semantic web for historical datasets

[This post is based on the Master Information Sciences project of Fabian Witeczek and reuses text from his thesis. The research is part of VU’s effort in the Intavia project and was co-supervised by Go Sugimoto]

To represent properly temporal data on the Semantic Web, there is a need for an ontology to represent vague or imprecise dates. In the context of his research, Fabian Witeczek developed an ontology that can be used to represent various forms of such vague dates. The engineering process of the ontology started with a requirements analysis that contained the collection of data records from existing Digital Humanities Linked Data sets containing temporally vague dates: Biographynet and Europeana. The occurrences of vagueness were evaluated, and categories of vagueness were defined.

The categories were evaluated through a survey conducted with domain experts in the digital humanities domain. The experts were also questioned about their problems when working with temporally vague dates. The survey results confirmed the meaningfulness of the ontology requirements and the categories of vagueness which were: 1) Unknown deviation, 2) within a time span, 3) before or after a
specific date, 4) date options, and 5) complete vagueness.

Visualization of the vague date ontology

Based on the findings, the ontology was designed and implemented, scoping to year-granularity only. Lastly, the ontology was tested and evaluated by linking its instances to instances of a historical dataset. This research concludes that the presented vague date ontology offers a clear way to specify how vague dates are and in which regard they are vague. However, the ontology requires much effort to make it work in practice for researchers in digital humanities. This is due to precision and deviation values that need to be set for every record within the datasets.

Example SPARQL query using concepts from the vague dates ontology

More information can be found in the Master Thesis, linked below.

The ontology itself is found in Fabian’s github account

Share This: