Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add md as an alias for Markdown #6338

Merged
merged 2 commits into from
May 30, 2023
Merged

Add md as an alias for Markdown #6338

merged 2 commits into from
May 30, 2023

Conversation

lildude
Copy link
Member

@lildude lildude commented Mar 22, 2023

Description

Many people use ```md for Markdown codeblocks within Markdown files. This is currently only declared as an extension which is shared with "GCC Machine Description". When markup sees md it searches for the language name, aliases and then extensions in order, and as "GCC Machine Description" comes first alphabetically, Lisp syntax highlighting is applied and not Markdown.

This resolves that by adding md as an alias to Markdown.

🎩 to @wooorm in #6335 for bringing this to my attention and making me realise this could be resolved from Linguist 🙇

Checklist:

  • I am fixing a misclassified language
    • I have included a new sample for the misclassified language:
      • Sample source(s): N/A
      • Sample license(s): N/A
    • I have included a change to the heuristics to distinguish my language from others using the same extension. N/A

The final checkboxes aren't applicable as this is "fixing" a misclassification in markup, not Linguist.

@lildude lildude requested a review from a team as a code owner March 22, 2023 11:41
@wooorm
Copy link
Contributor

wooorm commented Mar 22, 2023

Oh my gosh this is awesome! Great, thanks! :)

I was also thinking, after my last comment, alternatively, that there might be a more involved, but perhaps ultimately better solution, also for other coalescing extensions.
Also maintain a list of unique extensions mapping to language names.
It would be useful for github/markup, but also for other tools that only have a file name or extension name, and for whatever reason can’t apply heuristics/infer modelines based on file contents.

@lildude
Copy link
Member Author

lildude commented Mar 22, 2023

Also maintain a list of unique extensions mapping to language names.
It would be useful for github/markup, but also for other tools that only have a file name or extension name, and for whatever reason can’t apply heuristics/infer modelines based on file contents.

We've got this already... it's the languages.yml file 😁 This can be parsed by any app that isn't written in Ruby or doesn't want to use Linguist to do as you've suggested. I'm not a fan of adding another file to do this as it would be something else to maintain and would be unnecessary overhead as Linguist doesn't need it.

@wooorm
Copy link
Contributor

wooorm commented Mar 22, 2023

I looked at duplicates in that files. To see what else you could use the same “hack” for. Here’s the result.

Giant table! Select to expand
Extension With ., maps to Without, maps to Alternatives
.1 Roff n/a Roff & Roff Manpage
.1in Roff n/a Roff & Roff Manpage
.1m Roff n/a Roff & Roff Manpage
.1x Roff n/a Roff & Roff Manpage
.2 Roff n/a Roff & Roff Manpage
.3 Roff n/a Roff & Roff Manpage
.3in Roff n/a Roff & Roff Manpage
.3m Roff n/a Roff & Roff Manpage
.3p Roff n/a Roff & Roff Manpage
.3pm Roff n/a Roff & Roff Manpage
.3qt Roff n/a Roff & Roff Manpage
.3x Roff n/a Roff & Roff Manpage
.4 Roff n/a Roff & Roff Manpage
.5 Roff n/a Roff & Roff Manpage
.6 Roff n/a Roff & Roff Manpage
.7 Roff n/a Roff & Roff Manpage
.8 Roff n/a Roff & Roff Manpage
.9 Roff n/a Roff & Roff Manpage
.al AL AL AL & Perl
.as ActionScript n/a ActionScript & AngelScript
.asc AGS Script n/a AGS Script, AsciiDoc, & Public Key
.asm Assembly Assembly Assembly & Motorola 68K Assembly
.asy Asymptote n/a Asymptote & LTspice Symbol
.b Brainfuck n/a Brainfuck & Limbo
.bas BASIC n/a BASIC, FreeBasic, & VBA
.bb BitBake n/a BitBake, BlitzBasic, & Clojure
.bf Beef n/a Beef, Befunge, Brainfuck, & HyPhy
.brd Eagle n/a Eagle & KiCad Legacy Layout
.bs Bikeshed n/a Bikeshed & BrighterScript
.cake C# C# C# & CoffeeScript
.cfg HAProxy n/a HAProxy & INI
.cgi Perl n/a Perl, Python, & Shell
.ch Charity n/a Charity & xBase
.cl Common Lisp n/a Common Lisp, Cool, & OpenCL
.cls Apex n/a Apex, ObjectScript, OpenEdge ABL, TeX, VBA, & Visual Basic 6.0
.cp C++ n/a C++ & Component Pascal
.cs C# n/a C# & Smalltalk
.csl Kusto n/a Kusto & XML
.cue CUE CUE CUE & Cue Sheet
.d D D D, DTrace, & Makefile
.ddl PLSQL n/a PLSQL & SQL
.dsc Debian Package Control File n/a Debian Package Control File & DenizenScript
.dsp Faust n/a Faust & Microsoft Developer Studio Project
.e E E E, Eiffel, & Euphoria
.ecl ECL ECL ECL & ECLiPSe
.es Erlang n/a Erlang & JavaScript
.ex Elixir n/a Elixir & Euphoria
.f Filebench WML n/a Filebench WML, Forth, & Fortran
.fcgi Lua n/a Lua, PHP, Perl, Python, Ruby, & Shell
.for Formatted n/a Formatted, Forth, & Fortran
.fr Forth n/a Forth, Frege, & Text
.frag GLSL n/a GLSL & JavaScript
.frm VBA n/a VBA & Visual Basic 6.0
.fs F# n/a F#, Filterscript, Forth, & GLSL
.ftl Fluent FreeMarker Fluent & FreeMarker
.fx FLUX n/a FLUX & HLSL
.g G-code n/a G-code & GAP
.gd GAP n/a GAP & GDScript
.gml Game Maker Language n/a Game Maker Language, Gerber Image, Graph Modeling Language, & XML
.gs GLSL n/a GLSL, Genie, Gosu, & JavaScript
.gst Gosu n/a Gosu & XML
.h C n/a C, C++, & Objective-C
.hh C++ n/a C++ & Hack
.html Ecmarkup HTML Ecmarkup & HTML
.i Assembly n/a Assembly, Motorola 68K Assembly, & SWIG
.ice JSON n/a JSON & Slice
.inc Assembly PHP Assembly, C++, HTML, Motorola 68K Assembly, NASL, PHP, POV-Ray SDL, Pascal, Pawn, SQL, & SourcePawn
.j Jasmin J Jasmin & Objective-J
.jq JSONiq jq JSONiq & jq
.json JSON JSON JSON, OASv2-json, & OASv3-json
.ks KerboScript n/a KerboScript & Kickstart
.l Common Lisp n/a Common Lisp, Lex, PicoLisp, & Roff
.lisp Common Lisp Common Lisp Common Lisp & NewLisp
.ls LiveScript LiveScript LiveScript & LoomScript
.lsp Common Lisp n/a Common Lisp & NewLisp
.m Limbo M Limbo, M, MATLAB, MUF, Mathematica, Mercury, & Objective-C
.m4 M4 M4 M4 & M4Sugar
.man Roff Roff Roff & Roff Manpage
.mask Mask Mask Mask & Unity3D Asset
.mc M4 n/a M4, Monkey C, & Win32 Message File
.md GCC Machine Description n/a GCC Machine Description & Markdown
.mdoc Roff Roff Roff & Roff Manpage
.ml OCaml n/a OCaml & Standard ML
.mm Objective-C++ n/a Objective-C++ & XML
.mo Modelica n/a Modelica & Motoko
.mod AMPL n/a AMPL, Linux Kernel Module, Modula-2, & XML
.moo Mercury n/a Mercury & Moocode
.mqh MQL4 n/a MQL4 & MQL5
.ms MAXScript n/a MAXScript, Roff, & Unix Assembly
.n Nemerle n/a Nemerle & Roff
.nas Assembly n/a Assembly & Nasal
.nb Mathematica n/a Mathematica & Text
.ncl Gerber Image NCL Gerber Image, NCL, Text, & XML
.nl NL NL NL & NewLisp
.odin Object Data Instance Notation Odin Object Data Instance Notation & Odin
.p Gnuplot n/a Gnuplot & OpenEdge ABL
.pbt PowerBuilder n/a PowerBuilder & Protocol Buffer Text Format
.php Hack PHP Hack & PHP
.pl Perl n/a Perl, Prolog, & Raku
.plist OpenStep Property List n/a OpenStep Property List & XML Property List
.plt Gnuplot n/a Gnuplot & Prolog
.pluginspec Ruby n/a Ruby & XML
.pm Perl n/a Perl, Raku, & X PixMap
.pod Pod Pod Pod & Pod 6
.pp Pascal n/a Pascal & Puppet
.prc PLSQL n/a PLSQL & SQL
.pro IDL n/a IDL, INI, Proguard, Prolog, & QMake
.properties INI n/a INI & Java Properties
.q HiveQL q HiveQL & q
.qs Q# n/a Q# & Qt Script
.r R R R & Rebol
.re C++ n/a C++ & Reason
.res ReScript n/a ReScript & XML
.rno RUNOFF n/a RUNOFF & Roff
.rpy Python n/a Python & Ren'Py
.rs RenderScript Rust RenderScript, Rust, & XML
.rsc Rascal n/a Rascal & RouterOS Script
.s Motorola 68K Assembly n/a Motorola 68K Assembly & Unix Assembly
.sc Scala n/a Scala & SuperCollider
.scd Markdown n/a Markdown & SuperCollider
.sch Eagle n/a Eagle, KiCad Schematic, Scheme, & XML
.shader GLSL n/a GLSL & ShaderLab
.sls SaltStack n/a SaltStack & Scheme
.sol Gerber Image n/a Gerber Image & Solidity
.spec Python n/a Python, RPM Spec, & Ruby
.sql PLSQL SQL PLSQL, PLpgSQL, SQL, SQLPL, & TSQL
.srt SRecode Template n/a SRecode Template & SubRip Text
.st Smalltalk n/a Smalltalk & StringTemplate
.star STAR STAR STAR & Starlark
.sw Sway n/a Sway & XML
.t Perl n/a Perl, Raku, Terra, & Turing
.toc TeX n/a TeX & World of Warcraft Addon Data
.ts TypeScript TypeScript TypeScript & XML
.tst GAP n/a GAP & Scilab
.tsx TSX TSX TSX & XML
.txt Adblock Filter List n/a Adblock Filter List, Text, & Vim Help File
.v Coq V Coq, V, & Verilog
.vba VBA VBA VBA & Vim Script
.vhost ApacheConf n/a ApacheConf & Nginx
.w CWeb n/a CWeb & OpenEdge ABL
.workflow HCL n/a HCL & XML
.x DirectX 3D File n/a DirectX 3D File, Linker Script, Logos, & RPC
.yaml MiniYAML YAML MiniYAML, OASv2-yaml, OASv3-yaml, & YAML
.yml MiniYAML YAML MiniYAML, OASv2-yaml, OASv3-yaml, & YAML
.yy JSON n/a JSON & Yacc

Some explanation:
First and last columns are probs pretty obvious.
Second and third, with and without dot, are more complex.

To illustrate, let’s look at .inc, which has a bunch of different potential matches.
It’s marked as Assembly if with a dot, and as PHP when without:

` ```.inc `:

```.inc
mov	[con_handle],STD_OUTPUT_HANDLE ; assembly?
class PullRequest { static $_TSPEC; } // php?
```

` ```inc `:

```inc
mov	[con_handle],STD_OUTPUT_HANDLE ; assembly?
class PullRequest { static $_TSPEC; } // php?
```

For reference, ` ```asm `:

```asm
mov	[con_handle],STD_OUTPUT_HANDLE ; assembly?
class PullRequest { static $_TSPEC; } // php?
```

For reference, ` ```php `:

```php
mov	[con_handle],STD_OUTPUT_HANDLE ; assembly?
class PullRequest { static $_TSPEC; } // php?
```

```.inc:

mov	[con_handle],STD_OUTPUT_HANDLE ; assembly?
class PullRequest { static $_TSPEC; } // php?

```inc:

mov	[con_handle],STD_OUTPUT_HANDLE ; assembly?
class PullRequest { static $_TSPEC; } // php?

For reference, ```asm:

mov	[con_handle],STD_OUTPUT_HANDLE ; assembly?
class PullRequest { static $_TSPEC; } // php?

For reference, ```php:

mov	[con_handle],STD_OUTPUT_HANDLE ; assembly?
class PullRequest { static $_TSPEC; } // php?

Note that probably no one knows that you can use a dot in a info string like this! But it’s interesting to show the difference.

It’s also important to understand that when an extension is listed but not classified as a name (so say .mdx prior to my recent PR), it is possible to use it without a dot too.
That’s what most folks likely do use: ```mdx to match MDX, even though no name is specified.

Now, what you show in this PR, is that you can add an extension name which would otherwise be classified wrongly, without a dot as a name.
When the third column is n/a (meaning a name is still available), and the second column is wrong, you can add the extension name (without a dot) to one of the alternatives.

To illustrate, .es is currently classified as Erlang instead of JavaScript.
Assuming after some investigation it turns out most folks use ```es to signal JavaScript instead (doubtful), the name alias ES could be added for JavaScript, fixing the code blocks!

@lildude
Copy link
Member Author

lildude commented Mar 22, 2023

Thanks @wooorm. That's way more than you really needed to do 🙇

Lets keep things simple and start with just this md/markdown case and consider the others as and when they come up.

@Alhadis Alhadis changed the title Add md as an alias for Markdown Add md as an alias for Markdown Mar 22, 2023
@wooorm
Copy link
Contributor

wooorm commented Mar 22, 2023

Hah, yeah.
Just trying to match GitHub as closely as I can. Will be quite useful for the markdown grammar for example, as that should show people when embedded languages will or wont highlight. Thought I’d share ;)

Also, TIL, you can put filenames and entire paths in there?! I never knew.

Input:

```example.js
console.log(1)
```

```folder/to/index.tsx
export function huh() {
  return <div />
}
```

Output:

console.log(1)
export function huh() {
  return <div />
}

@Alhadis
Copy link
Collaborator

Alhadis commented Mar 22, 2023

@wooorm Wanna know another cool trick? In markup formats other than Markdown (Wikitext, reStructuredText, AsciiDoc, etc), you can achieve highlighted code-blocks using specially prepared HTML. Similar hacks exist for footnotes, diagrams, and equations.

Copy link
Collaborator

@Alhadis Alhadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a fairly harmless addition to me.

@wooorm
Copy link
Contributor

wooorm commented Mar 23, 2023

Yeah, weird stuff in the depths of GitHub! I don’t know much about how the other languages work, but I did know you can embed things in markdown! I know of ````mermaid, ```geojson, ```topojson, and ```stl. Also knew about the <pre lang="stuff">, but didn’t know about <a href="...mp3> turning into a video, nor that you can use <sup>s and <section>s into footnotes (which makes sense)! Thanks :)

examples of mermaid, geojson, topojson, stl
```mermaid
graph TD;
    A-->B;
    A-->C;
    B-->D;
    C-->D;
```

```geojson
{
    "type": "Point",
    "coordinates": [-122.43, 37.77],
    "properties": {"name": "SF", "marker-color": "ff6347"}
}
```

```topojson
{
  "type": "Topology",
  "objects": {
    "example": {
      "type": "GeometryCollection",
      "geometries": [
        {
          "type": "Point",
          "coordinates": [-122.43, 37.77],
          "properties": {"name": "SF", "marker-color": "ff6347"}
        }
      ]
    }
  }
}
```

```stl
solid square
   facet normal -1 0 0
      outer loop
         vertex 0 100 100
         vertex 0 100 0
         vertex 0 0 100
      endloop
   endfacet
   facet normal -1 0 0
      outer loop
         vertex 0 0 100
         vertex 0 100 0
         vertex 0 0 0
      endloop
   endfacet
   facet normal 0 0 1
      outer loop
         vertex 100 100 100
         vertex 0 100 100
         vertex 100 0 100
      endloop
   endfacet
   facet normal 0 0 1
      outer loop
         vertex 100 0 100
         vertex 0 100 100
         vertex 0 0 100
      endloop
   endfacet
   facet normal 1 0 0
      outer loop
         vertex 100 100 0
         vertex 100 100 100
         vertex 100 0 0
      endloop
   endfacet
   facet normal 1 0 0
      outer loop
         vertex 100 0 0
         vertex 100 100 100
         vertex 100 0 100
      endloop
   endfacet
   facet normal 0 0 -1
      outer loop
         vertex 0 100 0
         vertex 100 100 0
         vertex 0 0 0
      endloop
   endfacet
   facet normal 0 0 -1
      outer loop
         vertex 0 0 0
         vertex 100 100 0
         vertex 100 0 0
      endloop
   endfacet
   facet normal 0 1 0
      outer loop
         vertex 100 100 100
         vertex 100 100 0
         vertex 0 100 100
      endloop
   endfacet
   facet normal 0 1 0
      outer loop
         vertex 0 100 100
         vertex 100 100 0
         vertex 0 100 0
      endloop
   endfacet
   facet normal 0 -1 0
      outer loop
         vertex 100 0 0
         vertex 100 0 100
         vertex 0 0 0
      endloop
   endfacet
   facet normal 0 -1 0
      outer loop
         vertex 0 0 0
         vertex 100 0 100
         vertex 0 0 100
      endloop
   endfacet
endsolid
```

wooorm added a commit to wooorm/starry-night that referenced this pull request Mar 24, 2023
GitHub includes several languages that have the same extensions.
For example, `.cake` is used for CoffeeScript and for C#.
But which one is used for ` ```cake `?
The answer is complex.
But this commit solves it, it removes duplicate extensions from
languages that will not get highlighted as such if you use them.
Practically, that means `.cake` is no longer present here as an
extension for CoffeeScript.
Previously, which language was chosen was decided based on in which
order languages were loaded: later languages “won”.

There are some cases, where the common extensions, such as `.md`,
is *not* used to highlight as markdown, which one might expect.
It’s instead highlighted as Lisp.
That’s unfortunate, but it’s better to match how GitHub works, and
these cases can be fixed upstream in `github/linguist`.

Finally, an almost never used feature of fenced code blocks on GitHub,
is that you can write explicit extensions (` ```.js `).
Sometimes, when there is a dot, it maps to different languages.
This project now matches GitHub in what values without a dot, and with a
dot, map to.

More info in:
<github-linguist/linguist#6338>.
@Alhadis
Copy link
Collaborator

Alhadis commented Mar 27, 2023

I know of ```mermaid, ```geojson, ```topojson, and ```stl

Those aren't specific to Markdown, GitHub-flavoured or otherwise. Watch:

InputOutput (HTML)
$ cmark-gfm --github-pre-lang <<'MARKDOWN'
```mermaid
graph TD;
	A-->B;
	A-->C;
	B-->D;
	C-->D;
```
MARKDOWN
<pre lang="mermaid"><code>graph TD;
	A--&gt;B;
	A--&gt;C;
	B--&gt;D;
	C--&gt;D;
</code></pre>

That HTML output could've been prepared by any number of lightweight markup languages that GitHub supports, as the markup rendering pipeline essentially goes like:

flowchart LR
	Markup --> html1
	html1[Raw HTML] --> html2
	html2[Sanitised HTML] --> html3
	html3[Code-blocks processed] --> html4
	html4[Other stuff addded:<br/>tasklists, emoji, etc]
Loading

@wooorm
Copy link
Contributor

wooorm commented Mar 27, 2023

Good to know, just, my focus is markdown, and GH is a big player there :)

@lildude lildude requested a review from a team as a code owner May 30, 2023 09:16
@lildude lildude enabled auto-merge May 30, 2023 09:39
@lildude lildude disabled auto-merge May 30, 2023 09:59
@lildude lildude merged commit 9a04901 into master May 30, 2023
5 checks passed
@lildude lildude deleted the lildude/add-md-alias branch May 30, 2023 10:00
@github-linguist github-linguist deleted a comment from XoL1507 Aug 12, 2023
@github-linguist github-linguist deleted a comment from aguinaldok4 Mar 21, 2024
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants
-