Big data file shares are registered through your portal's content page. When you add a big data file share in portal, it also creates a related data store item. When you add a cloud store big data file share, it will create a big data file share item, a data store item of type big data file share, and a data store item of type cloud store. A big data file share portal item includes the following tabs:
- Overview—Provides general information on your big data file share and the related data store items. The related data store items can be shared and deleted with your big data file share.
- Datasets—Lists the datasets and outlines the schema of the input data. Dataset information includes the fields and formats that represent geometry and time.
- Outputs—Outlines optional output templates, which allow you to write results to a big data file share. The output templates are optional and are created after you register a big data file share. See Create, edit and view output templates to learn how to create or edit an output template.
- Settings—Describes content status, extent, and delete protection.
You can view and edit the datasets and schema and the output templates through the big data file share item.
Note:
To share a Big Data File Share item, you must share the root data store item. The root data store for a big data file share of type Cloud is the Data Store (Cloud) item of the same name. For all other types of big data file shares (File Share, HDFS, and HIVE) the root data store is the Data Store (Big Data File Share) item of the same name.Edit big data file shares
Once you have created a big data file share through portal, you can use the big data file share item to view the datasets, edit the datasets formatting, or sync your big data file share to add additional datasets.
A big data file share is composed of one or more datasets. The number of datasets is dependent on the number of folders in your big data file share location. You can view the datasets that have been successfully registered in your big data file share.
If you expected to find more datasets in your big data file share or are missing any, do the following:
- Verify that you correctly registered the top-level folder. For more information, see Prepare your data.
- Check that your input data is in an allowable format, such as a collection of delimited files, shapefiles, parquet, or ORC.
- Ensure that the schema of your input dataset of interest is consistent for a collection of files (all files in a single dataset must have the same fields).
You can use the dataset to verify the number of datasets within a big data file share or review dataset schemas for a registered dataset. You can modify a selected dataset's schema by updating its geometry, time definition, and field names using the steps below.
Edit big data file share input datasets
Editing the big data file share item allows you to modify how your data is registered and is used for analysis. You can also use the edit option to view how your data is currently registered. For details about each option on this dialog box, see editing parameters in big data file shares. To edit dataset parameters, do the following:
- Open the Big Data File Share item in your portal contents.
- Click the Dataset tab.
- Click the Edit button beside the dataset you want to edit.
- Modify the dataset using the Fields, Geometry, Time, and File options.
- When you have finished editing dataset properties, click Save.
Edit a big data file share manifest or hints file
On the Show advanced option of the Datasets tab of the big data file share, you can view, download, and upload the manifest or hints file. If you upload a manifest, it overwrites any changes you have made to your big data file share datasets and replaces the existing datasets and schema. To learn more about the big data file share manifest, see Big data file share manifest. To learn more about using a hints file, see Hints file. To edit a big data file share manifest or hints file, do the following:
- Open the Big Data File Share item in your portal contents.
- Click the Datasets tab.
- Click the Show advanced toggle button to turn it on.
- To download the manifest file, click Download in the manifest section.
- To download the hints file, click Download in the hints section.
- Use a text editor to modify and save changes locally to the downloaded.json manifest file or .dat hints file.
Tip:
The default file format for the hints file is .dat. Once you've downloaded the file, you can change its extension to .txt and edit the file. - To upload an edited file, in the big data file share, go to the Dataset tab, and turn on Show advanced.
- To upload the manifest, click Upload under manifest, and browse to the updated .json file.
- To upload the hints file, click Upload under hints, and browse to the updated .txt file.
- Click Upload.
If you upload a hints file, sync the big data file share. When you sync, only datasets with hints or new datasets are updated, and changes made to any other datasets not in the hints file remain the same.
Sync your big data file share
You can sync in your big data connection if you add new datasets to your data source or if you have uploaded a hints file. The hints file provides specifications that are used when regenerating the big data file share.
Note:
When a big data file share is synced, it only updates the big data file share for existing datasets that have a hints file or new datasets. Any edits you have made to the datasets that are in the hints file are overwritten with the rules defined in the hints file.- Open the Big Data File Share item in your portal contents.
- Click the Datasets tab.
- Click the Sync button to turn it on.
Create, edit, and view output templates
You can create, view, or edit output templates. You can also edit attributes and settings for the output templates, which outline how output results are written to the big data file share.
To create an output template, complete the following steps:
- Open the Big Data File Share item in your portal contents.
- Click the Outputs tab.
- Click the Add output template button.
- Create a name for the output template and select the file type the output template will write to.
- Set the geometry formats for this template by clicking the Geometry tab. You can set them for one, two, or all geometry types. The formatting options are the same as input big data file shares.
- Set the time formats for this template by clicking Time tab. You can leave the time blank, set for one of instant or interval, or both. The time formatting options are the same as input big data file share time formats.
- Click Save when you're done.
Use the same steps to view or edit a template.
Big data file share editing parameters
The big data file share editor comprises the following four sections:
- Fields
- Geometry
- Time
- File
It is recommended that you use a hints file before editing your data if manifest generation did not correctly determine field names, encoding, field delimiters, or quote characters of a delimited file.
Fields
The fields section lists all of the fields in a dataset. When you select a dataset, you can see the following for each field:
- The name of the field
- The field type
You can only modify the field name and type for delimited files. If you are modifying many field names, it is recommended that you use a hints file.
Geometry
The geometry section lists the type of geometry, how it is represented, and the spatial reference. The following table outlines the available options, with notes for changes you can make, depending on the input dataset type:
Geometry parameters
Parameter | Description | Delimited files | Shapefiles | ORC files | Parquet files |
---|---|---|---|---|---|
Geometry | The geometry type. Options are Point, Polyline, Polygon, or None. If there is no geometry (None), the dataset is a table. | Editable | Cannot be modified | Editable | Editable |
Spatial reference (WKID/WKT) | The spatial reference of the dataset. This option is only shown if geometry is not none. | Editable. By default, it will be set to 4326, WGS 1984. | Cannot be modified | Editable | Editable |
Geometry format type | How the geometry is formatted for each feature. Options are XYZ (fields that represent X, Y, and optionally Z values—XYZ is only applicable to points), WKT (well known text), WKB (well known binary),GeoJson, EsriJson, and EsriShape . This option is only shown if the geometry is not a none. | Editable | Not available, option will not show. | Editable | Editable |
Geometry fields | This is used to specify which fields represent geometries. In some cases, the field must be a specific field type. WKB and EsriShape formats requires a binary field, and GeoJSON and EsriJSON require a string field. XYZ fields must be numeric. This option is only shown if the geometry is not a none. | Editable | Not available, option will not show. | Editable | Editable |
Time
The time section outlines how time is represented. The following table outlines the available options, with notes for changes you can make, depending on the input dataset type. Time options are the same for all data types, except where noted.
Time parameters
Parameter | Description | Example |
---|---|---|
Time type | The type of the input time. Options are Instant (a single moment in time), Interval (a span of time with a start and end time), and None. | Instant |
Time fields, Start time fields and End time fields | If you select an Instant, you will see Time fields. If you select Interval, you will see Start time fields and End time fields. These options specify the fields and formatting used to define time in your input data. Time can use one or more fields to define time, as well as use one or more formats for a single field. By default, the first field with the name time will be used as the time field, with an estimate of the time format. If there is a shapefile, the first field of type date will be used. At least one row must be populated for these tables. See time formats to learn more about formatting. The time formatting table is only available if Time Type is not None. | Example with a single field used to represent time with two different formats:
Example with two fields used to represent time:
|
Time zone | The time zone of the input time. This option is only available if Time Type is not None. The default is UTC. | UTC |
Time formats
The following table outlines how to represent time formatting. All examples show how to represent the time January 2, 2016, at 9:45:02.05 PM.
Time formats in big data file shares
Format | Meaning | Example |
---|---|---|
yy | The year, represented by two digits. | 16 |
yyyy | The year, represented by four digits. | 2016 |
MM | The month, represented numerically. | 01 or 1 |
MMM | The month, represented using three letters. | Jan |
MMMM | The month, represented using the complete spelling. | January |
dd | The day. | 02 or 2 |
HH | The hour when using a 24-hour day; values range from 0-23. | 21 |
hh | The hour when using a 12-hour day; values range from 1-12. | 9 |
mm | The minute; values range from 0-59. | 45 |
ss | The second; values range from 0-59. | 02 |
SSS | The millisecond; values range from 0-999. | 50 |
a | The AM/PM marker. | PM |
epoch_millis | The time in milliseconds from epoch. | 1509581781000 |
epoch_seconds | The time in seconds from epoch. | 1509747601 |
Z | The time zone offset expressed in hours. | -0100 or -01:00 |
ZZZ | The time zone offset expressed using IDs. | America/Los_Angeles |
'' | Use single quotes to add text that doesn't represent a value outlined in this table. | 'T' |
The following table shows examples for different formats of the same date, January 2, 2016, at 9:45:02.05 PM:
Time format examples
Input date | Format |
---|---|
01/02/2016 9:45:02PM | MM/dd/yyyy hh:mm:ssa |
Jan02-16 21:45:02 | MMMdd-yy HH:mm:ss |
January 02 2016 9:45:02.050PM | MMMM dd yyyy hh:mm:ss.SSSa |
01/02/2017T9:45:14:05-0000 | MM/dd/yyyy'T'HH:mm:ssZ |
File
The file section outlines the format the data is in. Data may be in one of the following formats:
- Shapefile (.shp)
- Delimited file (for example .csv)
- Parquet file
- ORC file
The available parameters differ, depending on the dataset. For shapefiles, ORC, and parquet files, the only parameter is the file type, which cannot be modified. If the input dataset is a delimited file, there will be multiple parameters that can be modified. To modify values for a delimited file, use a hints file and regenerate the manifest.. These are outlined in the following table:
Dataset formats
Parameter | Description |
---|---|
File extension | Lists the file type extension on the input dataset. Common formats are .csv and .txt. |
Field delimiter | Determines the delimiter for each field. Common formats are , and ;. |
Record terminator | Determines the terminator for each row of data. Common formats are \n and \t. |
Quote character | Determines the character used for quotes. |
Has header row | A Boolean value that determines if the input table included a header row. If a header row is included, the headers will be used for the field names. Field name information is predicting geometry and time fields. |
Encoding | The type of encoding used on the file. By default, this will be UTF-8. |
Big data file share output template editing parameters
The big data file share output template editor comprises the following three sections:
- Name and file type
- Geometry formatting
- Time formatting
Note:
The input big data file shares have a fields section. The output templates do not have a fields section, since the resulting fields are determined by the GeoAnalytics Tools creating the result. ORC only supports field names that include the Basic Latin alphabet and numeric characters. All other characters in a field name are replaced with an underscore.
Output geometry formats
The geometry section lists how you want the output geometry to be formatted for each geometry type (point, line, polygon). There are two parts to determining the output geometry:
- The spatial reference—You can leave it empty, and it uses the tool results (default). Optionally, provide a WKID or WKT string, and all results are projected to that spatial reference. This value is shared across all output geometries.
- The geometry formatting type and fields—This is described in more detail below.
Output geometry formats
Geometry type | Output Fields | Delimited files | Shapefiles | ORC files | Parquet files |
---|---|---|---|---|---|
XYZ—An X, Y, and optionally Z field. This option is only available for points. | By default, three new fields will be created named X, Y, and Z. You can optionally change these field names. | ||||
WKT | By default, one new field named Geometry will be created. You can optionally change the output field names. | ||||
GeoJSON | By default, one new field named Geometry will be created. You can optionally change the output field names. | ||||
EsriJSON | By default, one new field named Geometry will be created. You can optionally change the output field names. | ||||
WKB | By default, one new field named Geometry will be created. You can optionally change the output field names. | ||||
EsriShape | By default, one new field named Geometry will be created. You can optionally change the output field names. |
Output time formats
The time section outlines how output time is represented. Formatting time requires the following information:
- Formatting for both instants and intervals.
- The field names to which time will be written.
- The format (String or Date) that time will be written as. Note that delimited files can only be formatted with string.
- For intervals, which fields represent the start and end time.
Time formatting is the same as for input big data files. See Time formats in big data file shares.
Output dataset format
The dataset format section outlines the output format to which the data will be written. Data may be in one of the following formats:
- Shapefile (.shp)
- Delimited file (for example .csv)
- Parquet file
- ORC file
The available parameters differ, depending on the dataset. For shapefiles, ORC, and parquet files, the only parameter is the file type, which cannot be modified. If the dataset is a delimited file, there will be multiple parameters that can be modified in ArcGIS Server Manager. These are outlined in the following table:
Dataset formats
Parameter | Description |
---|---|
File extension | Extensions are never applied to an output datasets. |
Field delimiter | Determines the delimiter for each field. Common formats are , and ;. |
Record terminator | The terminator for each row of data cannot be set. For Windows, the terminator is \r\n. For Linux, it's \n. |
Quote character | Determines the character used for quotes. |
Has header row | A Boolean value that determines if the output table will include a header row representing the field names. By default, this is true. |
Encoding | This will always be UTF-8. |