Processing language for multilingual resources #341

Open
gsergiu opened this Issue Aug 9, 2016 · 4 comments

Projects

None yet

3 participants

@gsergiu
gsergiu commented Aug 9, 2016 edited

The conclusion of ticket #337 (comment) is that there is a M to N relationship between the dc:language of multilingual resources and the text processors that might process the annotation body and/or target.

Therefore the following proposal for the definition of the processing language property:

"This property represents the relationship between the language of the resources (Body or Target) and the text processors or classes of text processors that may process the resources for rendering, indexing or any NLP processing."

  • Consequently I propose that the verbose representation of this property should include <language, processor_class, processor_id> tuples.
    It is recommended to use a vocabulary for processor classes like: textual_representation, audio_representation, visual_representation (i.e. image), text_indexing, nlp_processing
    Example:
processingLanguage:{
  {language: [“en”, “fr”, “ro”],  processor_class: “textual_representation”},
  {language: “en”,  processor_class: “text_indexing”, processor_id : “<snowball_indexer_uri>”},
  {language: “ro”,  processor_class: “audio_representation”, processor_id : “<TTS_RO_uri>”}
}
  • The minified representation could be compliant with the current specification, with the meaning that all text processors (all types) should use the same processing language.
  • There are 2 open questions:

a. Should this property be named “processing”?
b. Should this information be embedded within the annotations (model) or in the protocol (own http request)?

@azaroth42
Collaborator

Then we would have to define all of the processing classes and identities for well known processors. That's far far outside of the scope of this working group.

Sorry, but we just can't do that. And especially now during CR. Tagging as V2.

@azaroth42 azaroth42 added this to the v2 milestone Aug 11, 2016
@azaroth42 azaroth42 added the postpone label Aug 11, 2016
@gsergiu
gsergiu commented Aug 12, 2016

This implies to define processing classes, yes, however it was not requested that the standard. in #309 was proposed the usage of a primer document with practical guidelines for implementation.

I was simply proposing the correct structure which is perfectly alingned with the motivation and explanations provided by @r12a and @fsasaki which ended up whith the introduction of this processingLanguage. As it is obvious and recognized in #335 the current definition of the processingLanguage and the information carried by this field is incomplete.

I really don't understand ... why again the decision to close this ticket without discussing it with the stakeholders.

@akuckartz

@gsergiu This issue was not closed.

@gsergiu
gsergiu commented Aug 24, 2016

@akuckartz It is ok for me to postpone for V2, however I would suggest to take this option in account for the related tickets, even if this will not be solved as proposed in V1.
I probably missinterpreted the comment of @azaroth42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment