Pathology Portal

Automated annotation in UniProt

Not yet rated

UniProt is a high quality, comprehensive protein resource in which the core activity is the expert review and annotation of proteins where the function has been experimentally investigated. At the same time, the UniProt database contains large numbers of proteins which are predicted to exist from gene models, but which do not have associated experimental evidence indicating their function. UniProt commits significant resources to developing computational methods for functional annotation of these predicted proteins based on the data in entries that have gone through the expert review process.

We will describe the two main automated annotation systems currently in use. First, UniRule, which is an established UniProt system in which curators manually develop rules for annotation. Second, ARBA (Association-Rule-Based Annotator), which is a multi-class learning system which uses rule mining techniques to generate concise annotation models. ARBA employs a data exclusion algorithm that censors data not suitable for computational annotation, and generates human-readable rules for each UniProt release. As part of our interest in engaging with the machine learning community, we will also introduce the contribution of ProtNLM (Protein Natural Language Model), from Google Research, which annotates proteins which have "uncharacterised" names.

We will also introduce UniFIRE, an open source software that enables researchers to annotate their own protein dataset by using the above mentioned annotation systems.

  • Recall the role of UniProt's two main automated annotation systems
  • Describe how UniRule and ARBA work
  • Get started using these automated annotation systems
  • Pedro Raposo
    EMBL-EBI
  • Elena Speretta
    EMBL-EBI

Resource details

Contributed by: Pathology Portal
Authored by: Pedro Raposo, The European Molecular Biology Laboratory-European Bioinformatics Institute's (EMBL-EBI).
Elena Speretta, The European Molecular Biology Laboratory-European Bioinformatics Institute's (EMBL-EBI).
Licence: Creative Commons: Attribution-NonCommercial-NoDerivatives 4.0 International More information on licences
First contributed: 01 January 2023
Audience access level: Full user

Ratings

0 ratings

Not yet rated
5 star
0%
4 star
0%
3 star
0%
2 star
0%
1 star
0%
Report an issue with this resource

You may report a resource, for example, if there is an issue with copyright infringement, breach of personal data, factual inaccuracies, typing errors or safety concerns. The type of issue will determine whether the resource is immediately removed from the platform or if the contributor is asked to make amendments. You can report a resource from the resource information page or by contacting the Learning Hub support team.

You can contact the Learning Hub support team by completing the support form or if you have a general enquiry you can email enquiries@learninghub.nhs.uk.