Search the UniProt Reference Clusters (UniRef) for protein clusters. UniRef groups similar protein sequences into clusters to speed up searches and reduce redundancy. Three cluster types exist: UniRef100 (identical sequences), UniRef90 (sequences with >=90% identity), and UniRef50 (sequences with >=50% identity).
Use this endpoint when you want to work with protein clusters rather than individual entries. Each cluster has a representative member sequence, and cluster entries include the member count and organism diversity. This is useful for reducing redundancy in large-scale analyses or for finding protein families across different species.
Original spec description
Search UniProt Reference Clusters (UniRef) which cluster sequences at 100%, 90%, and 50% identity levels.
Usage Tips
- UniRef cluster IDs look like UniRef100_P01308 or UniRef90_A0A5E4NKF1
- Three cluster types: UniRef100 (100% identical), UniRef90 (90%+ similar), UniRef50 (50%+ similar)
- Larger clusters (UniRef50) group more divergent sequences; smaller clusters (UniRef100) are more specific
- `memberCount` shows how many sequences are in the cluster
- `organismCount` shows how many different species are represented
- The representative member provides the cluster name and representative sequence
- Use UniRef for protein family analysis and reducing redundancy in datasets