Class Schema

  • All Implemented Interfaces:
    java.io.Serializable

    public class Schema
    extends java.lang.Object
    implements java.io.Serializable
    The schema of a data table.

    Schema ID will only be populated when reading from/writing to table metadata, otherwise it will be default to 0.

    See Also:
    Serialized Form
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Accessor<StructLike> accessorForField​(int id)
      Returns an accessor for retrieving the data from StructLike.
      java.lang.Integer aliasToId​(java.lang.String alias)
      Returns the column id for the given column alias.
      Types.StructType asStruct()
      Returns the underlying struct type for this schema.
      Types.NestedField caseInsensitiveFindField​(java.lang.String name)
      Returns a sub-field by name as a Types.NestedField.
      Schema caseInsensitiveSelect​(java.lang.String... names)
      Creates a projection schema for a subset of columns, selected by case insensitive names
      Schema caseInsensitiveSelect​(java.util.Collection<java.lang.String> names)
      Creates a projection schema for a subset of columns, selected by case insensitive names
      static void checkCompatibility​(Schema schema, int formatVersion)
      Check the compatibility of the schema with a format version.
      java.util.List<Types.NestedField> columns()
      Returns a List of the columns in this Schema.
      java.lang.String findColumnName​(int id)
      Returns the full column name for the given id.
      Types.NestedField findField​(int id)
      Returns the sub-field identified by the field id as a Types.NestedField.
      Types.NestedField findField​(java.lang.String name)
      Returns a sub-field by name as a Types.NestedField.
      Type findType​(int id)
      Returns the Type of a sub-field identified by the field id.
      Type findType​(java.lang.String name)
      Returns the Type of a sub-field identified by the field name.
      java.util.Map<java.lang.String,​java.lang.Integer> getAliases()
      Returns an alias map for this schema, if set.
      int highestFieldId()
      Returns the highest field ID in this schema, including nested fields.
      java.util.Set<java.lang.Integer> identifierFieldIds()
      The set of identifier field IDs.
      java.util.Set<java.lang.String> identifierFieldNames()
      Returns the set of identifier field names.
      java.util.Map<java.lang.Integer,​java.lang.Integer> idsToOriginal()
      The ID's of some fields will be re-assigned if GetID is specified for the Schema.
      java.util.Map<java.lang.Integer,​java.lang.Integer> idsToReassigned()
      The ID's of some fields will be re-assigned if GetID is specified for the Schema.
      java.lang.String idToAlias​(java.lang.Integer fieldId)
      Returns the full column name in the unconverted data schema for the given column id.
      java.util.Map<java.lang.Integer,​java.lang.String> idToName()
      Returns a map for this schema between field id and qualified field names.
      boolean sameSchema​(Schema anotherSchema)
      Checks whether this schema is equivalent to another schema while ignoring the schema ID.
      int schemaId()
      Returns the schema ID for this schema.
      Schema select​(java.lang.String... names)
      Creates a projection schema for a subset of columns, selected by name.
      Schema select​(java.util.Collection<java.lang.String> names)
      Creates a projection schema for a subset of columns, selected by name.
      java.lang.String toString()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • Schema

        public Schema​(java.util.List<Types.NestedField> columns,
                      java.util.Map<java.lang.String,​java.lang.Integer> aliases)
      • Schema

        public Schema​(java.util.List<Types.NestedField> columns,
                      java.util.Map<java.lang.String,​java.lang.Integer> aliases,
                      java.util.Set<java.lang.Integer> identifierFieldIds)
      • Schema

        public Schema​(java.util.List<Types.NestedField> columns,
                      java.util.Set<java.lang.Integer> identifierFieldIds)
      • Schema

        public Schema​(int schemaId,
                      java.util.List<Types.NestedField> columns)
      • Schema

        public Schema​(int schemaId,
                      java.util.List<Types.NestedField> columns,
                      java.util.Set<java.lang.Integer> identifierFieldIds)
      • Schema

        public Schema​(int schemaId,
                      java.util.List<Types.NestedField> columns,
                      java.util.Set<java.lang.Integer> identifierFieldIds,
                      TypeUtil.GetID getId)
      • Schema

        public Schema​(int schemaId,
                      java.util.List<Types.NestedField> columns,
                      java.util.Map<java.lang.String,​java.lang.Integer> aliases,
                      java.util.Set<java.lang.Integer> identifierFieldIds)
      • Schema

        public Schema​(int schemaId,
                      java.util.List<Types.NestedField> columns,
                      java.util.Map<java.lang.String,​java.lang.Integer> aliases,
                      java.util.Set<java.lang.Integer> identifierFieldIds,
                      TypeUtil.GetID getID)
    • Method Detail

      • schemaId

        public int schemaId()
        Returns the schema ID for this schema.

        Note that schema ID will only be populated when reading from/writing to table metadata, otherwise it will be default to 0.

      • highestFieldId

        public int highestFieldId()
        Returns the highest field ID in this schema, including nested fields.
      • getAliases

        public java.util.Map<java.lang.String,​java.lang.Integer> getAliases()
        Returns an alias map for this schema, if set.

        Alias maps are created when translating an external schema, like an Avro Schema, to this format. The original column names can be provided in a Map when constructing this Schema.

        Returns:
        a Map of column aliases to field ids
      • idToName

        public java.util.Map<java.lang.Integer,​java.lang.String> idToName()
        Returns a map for this schema between field id and qualified field names.
        Returns:
        a map of field id to qualified field names
      • asStruct

        public Types.StructType asStruct()
        Returns the underlying struct type for this schema.
        Returns:
        the StructType version of this schema.
      • identifierFieldIds

        public java.util.Set<java.lang.Integer> identifierFieldIds()
        The set of identifier field IDs.

        Identifier is a concept similar to primary key in a relational database system. It consists of a unique set of primitive fields in the schema. An identifier field must be at root, or nested in a chain of structs (no maps or lists). A row should be unique in a table based on the values of the identifier fields. Optional, float and double columns cannot be used as identifier fields. However, Iceberg identifier differs from primary key in the following ways:

        • Iceberg does not enforce the uniqueness of a row based on this identifier information. It is used for operations like upsert to define the default upsert key.
        • A nested field in a struct can be used as an identifier. For example, if there is a "last_name" field inside a "user" struct in a schema, field "user.last_name" can be set as a part of the identifier field.

        Returns:
        the set of identifier field IDs in this schema.
      • identifierFieldNames

        public java.util.Set<java.lang.String> identifierFieldNames()
        Returns the set of identifier field names.
      • findType

        public Type findType​(java.lang.String name)
        Returns the Type of a sub-field identified by the field name.
        Parameters:
        name - a field name
        Returns:
        a Type for the sub-field or null if it is not found
      • findType

        public Type findType​(int id)
        Returns the Type of a sub-field identified by the field id.
        Parameters:
        id - a field id
        Returns:
        a Type for the sub-field or null if it is not found
      • findField

        public Types.NestedField findField​(int id)
        Returns the sub-field identified by the field id as a Types.NestedField.
        Parameters:
        id - a field id
        Returns:
        the sub-field or null if it is not found
      • findField

        public Types.NestedField findField​(java.lang.String name)
        Returns a sub-field by name as a Types.NestedField.

        The result may be a top-level or a nested field.

        Parameters:
        name - a String name
        Returns:
        a Type for the sub-field or null if it is not found
      • caseInsensitiveFindField

        public Types.NestedField caseInsensitiveFindField​(java.lang.String name)
        Returns a sub-field by name as a Types.NestedField.

        The result may be a top-level or a nested field.

        Parameters:
        name - a String name
        Returns:
        the sub-field or null if it is not found
      • findColumnName

        public java.lang.String findColumnName​(int id)
        Returns the full column name for the given id.
        Parameters:
        id - a field id
        Returns:
        the full column name in this schema that resolves to the id
      • aliasToId

        public java.lang.Integer aliasToId​(java.lang.String alias)
        Returns the column id for the given column alias. Column aliases are set by conversions from Parquet or Avro to this Schema type.
        Parameters:
        alias - a full column name in the unconverted data schema
        Returns:
        the column id in this schema, or null if the column wasn't found
      • idToAlias

        public java.lang.String idToAlias​(java.lang.Integer fieldId)
        Returns the full column name in the unconverted data schema for the given column id. Column aliases are set by conversions from Parquet or Avro to this Schema type.
        Parameters:
        fieldId - a column id in this schema
        Returns:
        the full column name in the unconverted data schema, or null if one wasn't found
      • accessorForField

        public Accessor<StructLike> accessorForField​(int id)
        Returns an accessor for retrieving the data from StructLike.

        Accessors do not retrieve data contained in lists or maps.

        Parameters:
        id - a column id in this schema
        Returns:
        an Accessor to retrieve values from a StructLike row
      • select

        public Schema select​(java.lang.String... names)
        Creates a projection schema for a subset of columns, selected by name.

        Names that identify nested fields will select part or all of the field's top-level column.

        Parameters:
        names - String names for selected columns
        Returns:
        a projection schema from this schema, by name
      • select

        public Schema select​(java.util.Collection<java.lang.String> names)
        Creates a projection schema for a subset of columns, selected by name.

        Names that identify nested fields will select part or all of the field's top-level column.

        Parameters:
        names - a List of String names for selected columns
        Returns:
        a projection schema from this schema, by name
      • caseInsensitiveSelect

        public Schema caseInsensitiveSelect​(java.lang.String... names)
        Creates a projection schema for a subset of columns, selected by case insensitive names

        Names that identify nested fields will select part or all of the field's top-level column.

        Parameters:
        names - a List of String names for selected columns
        Returns:
        a projection schema from this schema, by names
      • caseInsensitiveSelect

        public Schema caseInsensitiveSelect​(java.util.Collection<java.lang.String> names)
        Creates a projection schema for a subset of columns, selected by case insensitive names

        Names that identify nested fields will select part or all of the field's top-level column.

        Parameters:
        names - a List of String names for selected columns
        Returns:
        a projection schema from this schema, by names
      • sameSchema

        public boolean sameSchema​(Schema anotherSchema)
        Checks whether this schema is equivalent to another schema while ignoring the schema ID.
        Parameters:
        anotherSchema - another schema
        Returns:
        true if this schema is equivalent to the given schema
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • idsToReassigned

        public java.util.Map<java.lang.Integer,​java.lang.Integer> idsToReassigned()
        The ID's of some fields will be re-assigned if GetID is specified for the Schema.
        Returns:
        map of original to reassigned field ids
      • idsToOriginal

        public java.util.Map<java.lang.Integer,​java.lang.Integer> idsToOriginal()
        The ID's of some fields will be re-assigned if GetID is specified for the Schema.
        Returns:
        map of reassigned to original field ids
      • checkCompatibility

        public static void checkCompatibility​(Schema schema,
                                              int formatVersion)
        Check the compatibility of the schema with a format version.

        This validates that the schema does not contain types that were released in later format versions.

        Parameters:
        schema - a Schema
        formatVersion - table format version