Who am I?

I am Silvio Peroni, a Ph.D. student from the University of Bologna. I graduated with honors in 2008 in Computer Science with a master thesis titled Automatic conversion of documents: a model and an implementation, and am currently working with prof. Fabio Vitali on themes connected with markup languages and the Semantic Web.

  1. Current research
  2. Past studies
  3. Work experience

1. Current research

I am working on complex documents – i.e., documents describing complex overlapping scenarios and multiple hierarchies. In my current work I am developing a meta-syntax for non-embedded markup, called EARMARK, that can be used for stand-off annotations of textual content with fully W3C-compliant technologies.

EARMARK is based on an ontologically precise definition of markup that instantiates the markup of a text document as an independent OWL document outside of the text strings it annotates, and through appropriate OWL and SWRL characterizations it can define structures such as trees or graphs (i.e., r-GODDAGs and full GODDAGs, the two main structures used to represent markup hierarchies) and can be used to generate validity constraints (including co-constraints currently unavailable in most validation languages), and to verify adherence to content model patterns, such as the structural patterns introduced in my master thesis.

EARMARK documents are OWL documents that can be expressed as RDF assertions, and by using plain and standard W3C technologies a number of Semantic Web tools can be used for generating, converting, querying and displaying EARMARK documents. Particularly relevant here is the process of converting EARMARK annotations into traditional embedded languages, such as XML or TexMecs. Of course, not all EARMARK assertions (that represent a directed graph) can be directly transformed into XML markup structures (that can only express trees). Any specific subset of the EARMARK document that can be expressed in the destination syntax (e.g., any of the possible tree substructures) can be immediately generated, and the remaining assertions need to be either left out or forcedly embedded using any of a number of well-known or newly-introduced syntactic tricks.

Apart from describing complex documents, EARMARK offers another important advantage: the possibility of validating higher-level properties on the document. For example, we built an ontology O that describes structural patterns and their relationships (see the Di Iorio’s Ph.D. thesis for more details about them). The idea is to understand whether an EARMARK document D is valid against the structural pattern properties. This can be verified by simply launching a reasoner that checks the consistency of the ontology instances of D with respect to the ontology O.

2. Past studies

My master thesis work was written considering both theoretical studies about recognition of the roles of document parts and the technical development of tools to deal with them. From the theoretical point of view, my contribution concerns and tries to offers improvements to a number of well-studied general models about two different subjects: the underlying organization of documents, and the structural patterns for elements in a markup language.

First of all, I worked on the underlying organization of documents by addressing the Pentaformat model as suggested by Di Iorio in his Ph.D. thesis. This model concerns the recognition of the roles that the elements of any document in a markup representation can have. The goal is to partition a document according to five particular constituents (or dimension): content, structure, presentation, metadata and behavior.

I therefore proposed a rule-based mechanism to partition XML documents according to this five-dimensional model in order to convert automatically them in new documents by re-using or modifying separately one or more of these five constituents.

Moreover, I have studied the issue of patterns in document structures as an exercise in pentaformat conversions that act on the structure dimension. My contribution in this context was to study actual and frequent non-patterned scenarios and developing (in Java and XSLT) an engine that converts a non-patterned XML document into the corresponding document correctly patterned, fully maintaining identicity over content, presentation, metadata and behavior.

3. Work experience

I spent eight months (six as Intern, two as Consultant) in the Knowledge Media Institute (Open University) in Milton Keynes. Here I had the pleasure to work with an exceptional team of Semantic Web experts headed by Enrico Motta.

Here, we – Enrico Motta, Mathieu d’Aquin and I – seek a way to find out a restricted and meaningful snapshot of each ontology, in order to understand which ontologies most probably contain the searched statement. Starting from four different criteria – natural categories, concept density, concept popularity and ontology coverage – we developed an algorithm to identify the key concepts of an ontology as similarly as possible to the way a human being would, and we were confirmed by interesting experimental results.

The second research topic in KMI I worked on is a computational model for assigning trust values to ontological entities. In contrast with other models of trust which assume that a user explicitly reviews entities in the ontology, in this model the user is only asked to express a quality judgment over a specific triple. Trust values are then automatically assigned to the relevant entities in the ontology, which are either explicitly or implicitly related to the triple.

Leave a Reply

Your email address will not be published.


five − = 2

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>