Manage big data file shares in a portal—ArcGIS GeoAnalytics Server

Big data file shares are registered through your portal's content page. When you add a big data file share in portal, it also creates a related data store item. When you add a cloud store big data file share, it will create a big data file share item, a data store item of type big data file share, and a data store item of type cloud store. A big data file share portal item includes the following tabs:

Overview—Provides general information on your big data file share and the related data store items. The related data store items can be shared and deleted with your big data file share.
Datasets—Lists the datasets and outlines the schema of the input data. Dataset information includes the fields and formats that represent geometry and time.
Outputs—Outlines optional output templates, which allow you to write results to a big data file share. The output templates are optional and are created after you register a big data file share. See Create, edit and view output templates to learn how to create or edit an output template.
Settings—Describes content status, extent, and delete protection.

You can view and edit the datasets and schema and the output templates through the big data file share item.

Note:

To share a Big Data File Share item, you must share the root data store item. The root data store for a big data file share of type Cloud is the Data Store (Cloud) item of the same name. For all other types of big data file shares (File Share, HDFS, and HIVE) the root data store is the Data Store (Big Data File Share) item of the same name.

Edit big data file shares

Once you have created a big data file share through portal, you can use the big data file share item to view the datasets, edit the datasets formatting, or sync your big data file share to add additional datasets.

A big data file share is composed of one or more datasets. The number of datasets is dependent on the number of folders in your big data file share location. You can view the datasets that have been successfully registered in your big data file share.

If you expected to find more datasets in your big data file share or are missing any, do the following:

Verify that you correctly registered the top-level folder. For more information, see Prepare your data.
Check that your input data is in an allowable format, such as a collection of delimited files, shapefiles, parquet, or ORC.
Ensure that the schema of your input dataset of interest is consistent for a collection of files (all files in a single dataset must have the same fields).

You can use the dataset to verify the number of datasets within a big data file share or review dataset schemas for a registered dataset. You can modify a selected dataset's schema by updating its geometry, time definition, and field names using the steps below.

Edit big data file share input datasets

Editing the big data file share item allows you to modify how your data is registered and is used for analysis. You can also use the edit option to view how your data is currently registered. For details about each option on this dialog box, see editing parameters in big data file shares. To edit dataset parameters, do the following:

Open the Big Data File Share item in your portal contents.
Click the Dataset tab.
Click the Edit button beside the dataset you want to edit.
Modify the dataset using the Fields, Geometry, Time, and File options.
When you have finished editing dataset properties, click Save.

Edit a big data file share manifest or hints file

On the Show advanced option of the Datasets tab of the big data file share, you can view, download, and upload the manifest or hints file. If you upload a manifest, it overwrites any changes you have made to your big data file share datasets and replaces the existing datasets and schema. To learn more about the big data file share manifest, see Big data file share manifest. To learn more about using a hints file, see Hints file. To edit a big data file share manifest or hints file, do the following:

Open the Big Data File Share item in your portal contents.
Click the Datasets tab.
Click the Show advanced toggle button to turn it on.
1. To download the manifest file, click Download in the manifest section.
2. To download the hints file, click Download in the hints section.
Use a text editor to modify and save changes locally to the downloaded.json manifest file or .dat hints file.
Tip:
The default file format for the hints file is .dat. Once you've downloaded the file, you can change its extension to .txt and edit the file.
To upload an edited file, in the big data file share, go to the Dataset tab, and turn on Show advanced.
1. To upload the manifest, click Upload under manifest, and browse to the updated .json file.
2. To upload the hints file, click Upload under hints, and browse to the updated .txt file.
Click Upload.

If you upload a hints file, sync the big data file share. When you sync, only datasets with hints or new datasets are updated, and changes made to any other datasets not in the hints file remain the same.

Sync your big data file share

You can sync in your big data connection if you add new datasets to your data source or if you have uploaded a hints file. The hints file provides specifications that are used when regenerating the big data file share.

Note:

When a big data file share is synced, it only updates the big data file share for existing datasets that have a hints file or new datasets. Any edits you have made to the datasets that are in the hints file are overwritten with the rules defined in the hints file.

Open the Big Data File Share item in your portal contents.
Click the Datasets tab.
Click the Sync button to turn it on.

Create, edit, and view output templates

You can create, view, or edit output templates. You can also edit attributes and settings for the output templates, which outline how output results are written to the big data file share.

To create an output template, complete the following steps:

Open the Big Data File Share item in your portal contents.
Click the Outputs tab.
Click the Add output template button.
Create a name for the output template and select the file type the output template will write to.
1. Set the geometry formats for this template by clicking the Geometry tab. You can set them for one, two, or all geometry types. The formatting options are the same as input big data file shares.
2. Set the time formats for this template by clicking Time tab. You can leave the time blank, set for one of instant or interval, or both. The time formatting options are the same as input big data file share time formats.
Click Save when you're done.

Use the same steps to view or edit a template.

Big data file share editing parameters

The big data file share editor comprises the following four sections:

Fields
Geometry
Time
File

It is recommended that you use a hints file before editing your data if manifest generation did not correctly determine field names, encoding, field delimiters, or quote characters of a delimited file.

Fields

The fields section lists all of the fields in a dataset. When you select a dataset, you can see the following for each field:

The name of the field
The field type

You can only modify the field name and type for delimited files. If you are modifying many field names, it is recommended that you use a hints file.

Learn more about supported field types

Geometry

The geometry section lists the type of geometry, how it is represented, and the spatial reference. The following table outlines the available options, with notes for changes you can make, depending on the input dataset type:

Geometry parameters


Parameter	Description	Delimited files	Shapefiles	ORC files	Parquet files
Geometry	The geometry type. Options are Point, Polyline, Polygon, or None. If there is no geometry (None), the dataset is a table.	Editable	Cannot be modified	Editable	Editable
Spatial reference (WKID/WKT)	The spatial reference of the dataset. This option is only shown if geometry is not none.	Editable. By default, it will be set to 4326, WGS 1984.	Cannot be modified	Editable	Editable
Geometry format type	How the geometry is formatted for each feature. Options are XYZ (fields that represent X, Y, and optionally Z values—XYZ is only applicable to points), WKT (well known text), WKB (well known binary),GeoJson, EsriJson, and EsriShape . This option is only shown if the geometry is not a none.	Editable	Not available, option will not show.	Editable	Editable
Geometry fields	This is used to specify which fields represent geometries. In some cases, the field must be a specific field type. WKB and EsriShape formats requires a binary field, and GeoJSON and EsriJSON require a string field. XYZ fields must be numeric. This option is only shown if the geometry is not a none.	Editable	Not available, option will not show.	Editable	Editable

Time

The time section outlines how time is represented. The following table outlines the available options, with notes for changes you can make, depending on the input dataset type. Time options are the same for all data types, except where noted.

Time parameters


Parameter	Description	Example
Time type	The type of the input time. Options are Instant (a single moment in time), Interval (a span of time with a start and end time), and None.	Instant
Time fields, Start time fields and End time fields	If you select an Instant, you will see Time fields. If you select Interval, you will see Start time fields and End time fields. These options specify the fields and formatting used to define time in your input data. Time can use one or more fields to define time, as well as use one or more formats for a single field. By default, the first field with the name time will be used as the time field, with an estimate of the time format. If there is a shapefile, the first field of type date will be used. At least one row must be populated for these tables. See time formats to learn more about formatting. The time formatting table is only available if Time Type is not None.	Example with a single field used to represent time with two different formats: Field—TimeField Format—yy/MM/dd hh:mm:ss Field—TimeField Format—yyyy-MMM-dd hh:mm:ss Example with two fields used to represent time: Field—DateField Format—yy/MM/dd Field—TimeField Format—hh:mm:ss
Time zone	The time zone of the input time. This option is only available if Time Type is not None. The default is UTC.	UTC

Time formats

The following table outlines how to represent time formatting. All examples show how to represent the time January 2, 2016, at 9:45:02.05 PM.

Time formats in big data file shares


Format	Meaning	Example
yy	The year, represented by two digits.	16
yyyy	The year, represented by four digits.	2016
MM	The month, represented numerically.	01 or 1
MMM	The month, represented using three letters.	Jan
MMMM	The month, represented using the complete spelling.	January
dd	The day.	02 or 2
HH	The hour when using a 24-hour day; values range from 0-23.	21
hh	The hour when using a 12-hour day; values range from 1-12.	9
mm	The minute; values range from 0-59.	45
ss	The second; values range from 0-59.	02
SSS	The millisecond; values range from 0-999.	50
a	The AM/PM marker.	PM
epoch_millis	The time in milliseconds from epoch.	1509581781000
epoch_seconds	The time in seconds from epoch.	1509747601
Z	The time zone offset expressed in hours.	-0100 or -01:00
ZZZ	The time zone offset expressed using IDs.	America/Los_Angeles
''	Use single quotes to add text that doesn't represent a value outlined in this table.	'T'

The following table shows examples for different formats of the same date, January 2, 2016, at 9:45:02.05 PM:

Time format examples


Input date	Format
01/02/2016 9:45:02PM	MM/dd/yyyy hh:mm:ssa
Jan02-16 21:45:02	MMMdd-yy HH:mm:ss
January 02 2016 9:45:02.050PM	MMMM dd yyyy hh:mm:ss.SSSa
01/02/2017T9:45:14:05-0000	MM/dd/yyyy'T'HH:mm:ssZ

File

The file section outlines the format the data is in. Data may be in one of the following formats:

Shapefile (.shp)
Delimited file (for example .csv)
Parquet file
ORC file

The available parameters differ, depending on the dataset. For shapefiles, ORC, and parquet files, the only parameter is the file type, which cannot be modified. If the input dataset is a delimited file, there will be multiple parameters that can be modified. To modify values for a delimited file, use a hints file and regenerate the manifest.. These are outlined in the following table:

Dataset formats


Parameter	Description
File extension	Lists the file type extension on the input dataset. Common formats are .csv and .txt.
Field delimiter	Determines the delimiter for each field. Common formats are , and ;.
Record terminator	Determines the terminator for each row of data. Common formats are \n and \t.
Quote character	Determines the character used for quotes.
Has header row	A Boolean value that determines if the input table included a header row. If a header row is included, the headers will be used for the field names. Field name information is predicting geometry and time fields.
Encoding	The type of encoding used on the file. By default, this will be UTF-8.

Big data file share output template editing parameters

The big data file share output template editor comprises the following three sections:

Name and file type
Geometry formatting
Time formatting

Note:

The input big data file shares have a fields section. The output templates do not have a fields section, since the resulting fields are determined by the GeoAnalytics Tools creating the result. ORC only supports field names that include the Basic Latin alphabet and numeric characters. All other characters in a field name are replaced with an underscore.

Output geometry formats

The geometry section lists how you want the output geometry to be formatted for each geometry type (point, line, polygon). There are two parts to determining the output geometry:

The spatial reference—You can leave it empty, and it uses the tool results (default). Optionally, provide a WKID or WKT string, and all results are projected to that spatial reference. This value is shared across all output geometries.
The geometry formatting type and fields—This is described in more detail below.

For each template, you can define how you want to format the geometry of the dataset, as well as the field names that represent geometry. Depending on the dataset type (delimited files, shapefiles, ORC, or parquet), you can output results in different formats. Shapefiles will not have a specified format and will always write a shapefile dataset. The following table outlines those formats:

Output geometry formats


Geometry type	Output Fields	Delimited files	Shapefiles	ORC files	Parquet files
XYZ—An X, Y, and optionally Z field. This option is only available for points.	By default, three new fields will be created named X, Y, and Z. You can optionally change these field names.
WKT	By default, one new field named Geometry will be created. You can optionally change the output field names.
GeoJSON	By default, one new field named Geometry will be created. You can optionally change the output field names.
EsriJSON	By default, one new field named Geometry will be created. You can optionally change the output field names.
WKB	By default, one new field named Geometry will be created. You can optionally change the output field names.
EsriShape	By default, one new field named Geometry will be created. You can optionally change the output field names.

Output time formats

The time section outlines how output time is represented. Formatting time requires the following information:

Formatting for both instants and intervals.
The field names to which time will be written.
The format (String or Date) that time will be written as. Note that delimited files can only be formatted with string.
For intervals, which fields represent the start and end time.

Time formatting is the same as for input big data files. See Time formats in big data file shares.

Output dataset format

The dataset format section outlines the output format to which the data will be written. Data may be in one of the following formats:

Shapefile (.shp)
Delimited file (for example .csv)
Parquet file
ORC file

The available parameters differ, depending on the dataset. For shapefiles, ORC, and parquet files, the only parameter is the file type, which cannot be modified. If the dataset is a delimited file, there will be multiple parameters that can be modified in ArcGIS Server Manager. These are outlined in the following table:

Dataset formats


Parameter	Description
File extension	Extensions are never applied to an output datasets.
Field delimiter	Determines the delimiter for each field. Common formats are , and ;.
Record terminator	The terminator for each row of data cannot be set. For Windows, the terminator is \r\n. For Linux, it's \n.
Quote character	Determines the character used for quotes.
Has header row	A Boolean value that determines if the output table will include a header row representing the field names. By default, this is true.
Encoding	This will always be UTF-8.

Feedback on this topic?

Note:

Edit big data file shares

Edit big data file share input datasets

Edit a big data file share manifest or hints file

Tip:

Sync your big data file share

Note:

Create, edit, and view output templates

Big data file share editing parameters

Fields

Geometry

Geometry parameters

Time

Time parameters

Time formats

Time formats in big data file shares

Time format examples

File

Dataset formats

Big data file share output template editing parameters

Note:

Output geometry formats

Output geometry formats

Output time formats

Output dataset format

Dataset formats

In this topic